To calculate descriptive statistics in R, you can use the summary()
function or the describeBy()
function from the psych
package.
Descriptive statistics provide a summary of the main features of a dataset, giving you insights into the data’s central tendency, dispersion, and shape.
In this article, we will explore how to calculate descriptive statistics in R using these two methods.
Method 1: Use summary() Function
The summary()
function in R provides a quick overview of the main descriptive statistics for each column in a data frame. Here’s the syntax:
summary(df)
The following example shows how to use the summary()
function to calculate descriptive statistics in R.
Use summary() Function to Calculate Descriptive Statistics
Let’s see how we can use the summary()
function to calculate descriptive statistics for a data frame:
# Create data frame
df <- data.frame(
Value = c(108, 99, 135, 95, 98, 105),
Reading = c(110, 97, 91, 89, 80, 85)
)
# Calculate descriptive statistics
summary_stats <- summary(df)
# Display descriptive statistics
print(summary_stats)
Output: 👇️
Value Reading
Min. : 95.00 Min. :80.00
1st Qu.: 98.25 1st Qu.:86.00
Median :106.50 Median :90.00
Mean :106.67 Mean :92.00
3rd Qu.:112.25 3rd Qu.:96.25
Max. :135.00 Max. :110.00
In this example, the summary()
function calculates the descriptive statistics for the Value
and Reading
columns of the data frame.
The output includes the minimum, first quartile, median, mean, third quartile, and maximum values for each column.
Method 2: Use describeBy() Function
The describeBy()
function from the psych
package provides detailed descriptive statistics for each group in a data frame. Here’s the syntax:
library(psych)
describeBy(df, group)
The following example shows how to use the describeBy()
function to calculate descriptive statistics in R.
Use describeBy() Function to Calculate Descriptive Statistics
Let’s see how we can use the describeBy()
function to calculate descriptive statistics for a data frame grouped by a specific column:
# Load necessary library
library(psych)
# Create data frame
df <- data.frame(
Machine_name = c("A", "A", "B", "B", "C", "C", "D", "D"),
Pressure = c(78.2, 80.21, 78.2, 82.56, 71.7, 72.12, 73.85, 80.21),
Temperature = c(35, 36, 36, 38, 32, 32, 31, 34)
)
# Calculate descriptive statistics by group
d <- describeBy(df, df$Machine_name)
# Show descriptive statistics
print(d)
Output: 👇️
Descriptive statistics by group
group: A
vars n mean sd median trimmed mad min max range skew kurtosis se
Machine_name* 1 2 1.0 0.00 1.0 1.0 0.00 1.0 1.00 0.00 NaN NaN 0.0
Pressure 2 2 79.2 1.42 79.2 79.2 1.49 78.2 80.21 2.01 0 -2.75 1.0
Temperature 3 2 33.5 2.12 33.5 33.5 2.22 32.0 35.00 3.00 0 -2.75 1.5
-----------------------------------------------------------------------
group: B
vars n mean sd median trimmed mad min max range skew kurtosis se
Machine_name* 1 2 2.00 0.00 2.00 2.00 0.00 2.0 2.00 0.00 NaN NaN 0.00
Pressure 2 2 80.38 3.08 80.38 80.38 3.23 78.2 82.56 4.36 0 -2.75 2.18
Temperature 3 2 34.00 2.83 34.00 34.00 2.97 32.0 36.00 4.00 0 -2.75 2.00
-----------------------------------------------------------------------
group: C
vars n mean sd median trimmed mad min max range skew kurtosis se
Machine_name* 1 2 3.00 0.00 3.00 3.00 0.00 3.0 3.00 0.00 NaN NaN 0.00
Pressure 2 2 71.91 0.30 71.91 71.91 0.31 71.7 72.12 0.42 0 -2.75 0.21
Temperature 3 2 33.50 3.54 33.50 33.50 3.71 31.0 36.00 5.00 0 -2.75 2.50
-----------------------------------------------------------------------
group: D
vars n mean sd median trimmed mad min max range skew kurtosis se
Machine_name* 1 2 4.00 0.00 4.00 4.00 0.00 4.00 4.00 0.00 NaN NaN 0.00
Pressure 2 2 77.03 4.50 77.03 77.03 4.71 73.85 80.21 6.36 0 -2.75 3.18
Temperature 3 2 36.00 2.83 36.00 36.00 2.97 34.00 38.00 4.00 0 -2.75 2.00
In this example, the describeBy()
function calculates the descriptive statistics for the Pressure
and Temperature
columns, grouped by the Machine_name
column.
The output includes the number of observations, mean, standard deviation, median, trimmed mean, median absolute deviation, minimum, maximum, range, skewness, kurtosis, and standard error for each group.