How to Calculate Descriptive Statistics in R

Sep 30, 2024 by Vaibhav Choudhari

To calculate descriptive statistics in R, you can use the summary() function or the describeBy() function from the psych package.

Descriptive statistics provide a summary of the main features of a dataset, giving you insights into the data’s central tendency, dispersion, and shape.

In this article, we will explore how to calculate descriptive statistics in R using these two methods.

Method 1: Use summary() Function

The summary() function in R provides a quick overview of the main descriptive statistics for each column in a data frame. Here’s the syntax:

summary(df)

The following example shows how to use the summary() function to calculate descriptive statistics in R.

Use summary() Function to Calculate Descriptive Statistics

Let’s see how we can use the summary() function to calculate descriptive statistics for a data frame:

# Create data frame
df <- data.frame(
  Value = c(108, 99, 135, 95, 98, 105),
  Reading = c(110, 97, 91, 89, 80, 85)
)

# Calculate descriptive statistics
summary_stats <- summary(df)

# Display descriptive statistics
print(summary_stats)

Output: 👇️

     Value           Reading     
 Min.   : 95.00   Min.   :80.00  
 1st Qu.: 98.25   1st Qu.:86.00  
 Median :106.50   Median :90.00  
 Mean   :106.67   Mean   :92.00  
 3rd Qu.:112.25   3rd Qu.:96.25  
 Max.   :135.00   Max.   :110.00

In this example, the summary() function calculates the descriptive statistics for the Value and Reading columns of the data frame.

The output includes the minimum, first quartile, median, mean, third quartile, and maximum values for each column.

Method 2: Use describeBy() Function

The describeBy() function from the psych package provides detailed descriptive statistics for each group in a data frame. Here’s the syntax:

library(psych)

describeBy(df, group)

The following example shows how to use the describeBy() function to calculate descriptive statistics in R.

Use describeBy() Function to Calculate Descriptive Statistics

Let’s see how we can use the describeBy() function to calculate descriptive statistics for a data frame grouped by a specific column:

# Load necessary library
library(psych)

# Create data frame
df <- data.frame(
  Machine_name = c("A", "A", "B", "B", "C", "C", "D", "D"),
  Pressure = c(78.2, 80.21, 78.2, 82.56, 71.7, 72.12, 73.85, 80.21),
  Temperature = c(35, 36, 36, 38, 32, 32, 31, 34)
)

# Calculate descriptive statistics by group
d <- describeBy(df, df$Machine_name)

# Show descriptive statistics
print(d)

Output: 👇️

 Descriptive statistics by group 
group: A
              vars n mean   sd median trimmed  mad  min   max range skew kurtosis  se
Machine_name*    1 2  1.0 0.00    1.0     1.0 0.00  1.0  1.00  0.00  NaN      NaN 0.0
Pressure         2 2 79.2 1.42   79.2    79.2 1.49 78.2 80.21  2.01    0    -2.75 1.0
Temperature      3 2 33.5 2.12   33.5    33.5 2.22 32.0 35.00  3.00    0    -2.75 1.5
----------------------------------------------------------------------- 
group: B
              vars n  mean   sd median trimmed  mad  min   max range skew kurtosis   se
Machine_name*    1 2  2.00 0.00   2.00    2.00 0.00  2.0  2.00  0.00  NaN      NaN 0.00
Pressure         2 2 80.38 3.08  80.38   80.38 3.23 78.2 82.56  4.36    0    -2.75 2.18
Temperature      3 2 34.00 2.83  34.00   34.00 2.97 32.0 36.00  4.00    0    -2.75 2.00
----------------------------------------------------------------------- 
group: C
              vars n  mean   sd median trimmed  mad  min   max range skew kurtosis   se
Machine_name*    1 2  3.00 0.00   3.00    3.00 0.00  3.0  3.00  0.00  NaN      NaN 0.00
Pressure         2 2 71.91 0.30  71.91   71.91 0.31 71.7 72.12  0.42    0    -2.75 0.21
Temperature      3 2 33.50 3.54  33.50   33.50 3.71 31.0 36.00  5.00    0    -2.75 2.50
----------------------------------------------------------------------- 
group: D
              vars n  mean   sd median trimmed  mad   min   max range skew kurtosis   se
Machine_name*    1 2  4.00 0.00   4.00    4.00 0.00  4.00  4.00  0.00  NaN      NaN 0.00
Pressure         2 2 77.03 4.50  77.03   77.03 4.71 73.85 80.21  6.36    0    -2.75 3.18
Temperature      3 2 36.00 2.83  36.00   36.00 2.97 34.00 38.00  4.00    0    -2.75 2.00

In this example, the describeBy() function calculates the descriptive statistics for the Pressure and Temperature columns, grouped by the Machine_name column.

The output includes the number of observations, mean, standard deviation, median, trimmed mean, median absolute deviation, minimum, maximum, range, skewness, kurtosis, and standard error for each group.

How to Calculate Euclidean Distance in R

Sep 30, 2024
TUTORIALS

How to Calculate Deciles in R

Sep 30, 2024
TUTORIALS