Summary Statistics Overview
Summary statistics provide a comprehensive overview of a dataset by summarizing key characteristics in a few meaningful numbers. These include measures of central tendency, dispersion, and position.
Essential Summary Statistics Components
1. Measures of Central Tendency
- Mean: Average value
- Median: Middle value (50th percentile)
- Mode: Most frequently occurring value
2. Measures of Dispersion
- Range: Max - Min
- Variance: Average squared deviation from mean
- Standard Deviation: Square root of variance
- Interquartile Range (IQR): Q₃ - Q₁
- Coefficient of Variation: (σ/μ) × 100%
3. Position Measures
- Minimum: Smallest value
- Q₁: First quartile (25th percentile)
- Median: Middle value
- Q₃: Third quartile (75th percentile)
- Maximum: Largest value
4. Additional Measures
- Skewness: Measure of asymmetry
- Kurtosis: Measure of tail heaviness
Summary Statistics for Ungrouped Data
Example 1: Complete Analysis
Daily website traffic (visitors) for 12 days:
125, 142, 138, 155, 148, 161, 135, 170, 152, 165, 140, 158
Calculate comprehensive summary statistics.
Solution:
Step 1: Arrange in ascending order
125, 135, 138, 140, 142, 148, 152, 155, 158, 161, 165, 170
n = 12
Step 2: Calculate Central Tendency
Mean: $$\mu = \frac{125 + 135 + 138 + 140 + 142 + 148 + 152 + 155 + 158 + 161 + 165 + 170}{12} = \frac{1789}{12} = 149.08$$
Median (Q₂):
For even n: Average of 6th and 7th values
$$Q_2 = \frac{148 + 152}{2} = 150$$
Mode: No value repeats (no mode)
Step 3: Calculate Position Measures
Minimum: 125
Q₁: Position = (1 × 13)/4 = 3.25 $$Q_1 = 138 + 0.25(140 - 138) = 138.5$$
Q₃: Position = (3 × 13)/4 = 9.75 $$Q_3 = 158 + 0.75(161 - 158) = 160.25$$
Maximum: 170
Step 4: Calculate Dispersion
Range: 170 - 125 = 45
Variance: $$s^2 = \frac{\sum(x_i - \bar{x})^2}{n-1} = \frac{3642.92}{11} = 331.17$$
Standard Deviation: $$s = \sqrt{331.17} = 18.20$$
IQR: 160.25 - 138.5 = 21.75
Coefficient of Variation: $$CV = \frac{18.20}{149.08} \times 100 = 12.21%$$
Complete Summary Statistics Report:
| Statistic | Value |
|---|---|
| Count | 12 |
| Mean | 149.08 |
| Median | 150.00 |
| Mode | None |
| Std Dev | 18.20 |
| Variance | 331.17 |
| Min | 125 |
| Q₁ | 138.50 |
| Q₂ | 150.00 |
| Q₃ | 160.25 |
| Max | 170 |
| Range | 45 |
| IQR | 21.75 |
| CV | 12.21% |
Summary Statistics for Grouped Data
Example: Grouped Data Analysis
| Class Interval | Frequency |
|---|---|
| 20-30 | 5 |
| 30-40 | 12 |
| 40-50 | 18 |
| 50-60 | 10 |
| 60-70 | 5 |
Calculate summary statistics.
Solution:
Cumulative Frequency Table:
| Class | Midpoint | f | CF | f×m |
|---|---|---|---|---|
| 20-30 | 25 | 5 | 5 | 125 |
| 30-40 | 35 | 12 | 17 | 420 |
| 40-50 | 45 | 18 | 35 | 810 |
| 50-60 | 55 | 10 | 45 | 550 |
| 60-70 | 65 | 5 | 50 | 325 |
| N=50 | 2230 |
Mean: $$\mu = \frac{\sum f_i \times m_i}{N} = \frac{2230}{50} = 44.6$$
Median (Q₂):
Position = N/2 = 25, Median class = 40-50
$$Q_2 = 40 + \left(\frac{25 - 17}{18}\right) \times 10 = 44.44$$
Q₁:
Position = N/4 = 12.5, Q₁ class = 30-40
$$Q_1 = 30 + \left(\frac{12.5 - 5}{12}\right) \times 10 = 36.25$$
Q₃:
Position = 3N/4 = 37.5, Q₃ class = 50-60
$$Q_3 = 50 + \left(\frac{37.5 - 35}{10}\right) \times 10 = 52.5$$
Range: 70 - 20 = 50 (or 60 - 20)
IQR: 52.5 - 36.25 = 16.25
Standard Deviation: (calculated using deviation method) $$s \approx 11.2$$
Summary for Grouped Data:
| Statistic | Value |
|---|---|
| Count | 50 |
| Mean | 44.6 |
| Median | 44.44 |
| Min | 20 |
| Q₁ | 36.25 |
| Q₃ | 52.5 |
| Max | 70 |
| IQR | 16.25 |
| Std Dev | 11.2 |
Interpretation Guidelines
| Statistic | What It Tells You |
|---|---|
| Mean | Central location; affected by outliers |
| Median | Middle value; robust to outliers |
| Std Dev | Typical spread from the mean |
| IQR | Spread of middle 50% of data |
| Skewness | Data symmetry (positive/negative/zero) |
| Range | Total spread (Max - Min) |
When to Use Summary Statistics
- Data Reporting: Providing comprehensive dataset overview
- Comparative Analysis: Comparing multiple datasets
- Quality Assessment: Evaluating data characteristics
- Statistical Testing: Prerequisites for many statistical tests
- Dashboard Development: Key metrics for business intelligence
- Research Papers: Standard practice in academic publications
Summary Statistics vs. Descriptive Statistics
- Summary Statistics: Specific, calculated numbers (mean = 45.2)
- Descriptive Statistics: Broader category including tables, charts, summaries
Key Insights from Summary Statistics
- Central Tendency: Where data tends to cluster
- Spread: How dispersed the data is
- Symmetry: If data is skewed or symmetric
- Outliers: Extreme values beyond typical range
- Data Quality: Completeness and consistency
Best Practices
- Always calculate multiple summary statistics (not just mean)
- Report standard deviation alongside the mean
- Include the five-number summary for quick visualization
- Use median for skewed distributions
- Document which measures are most relevant for your analysis
- Consider creating a summary statistics table for reports