The iris dataset is built-in datset in R, it has data on 150 iris flowers, with measurements for four features: sepal length, sepal width, petal length, and petal width.
In this article we see how to load, explore, summarize and visualize iris dataset in R.
Load the Iris Dataset
To load the iris dataset we use data() function:
# Load the iris dataset
data(iris)
Let see how we can get first six rows from iris dataset:
# Get first few rows of dataset
head(iris)
The following output shows first six rows from iris dataset.
Output:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
Summarize the Iris Dataset
To summarize the iris dataset we use summary() function:
# Get statistical values of column of dataset
summary(iris)
The below output shows quick summary for each variable of dataset.
Output:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100 setosa :50
1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300 versicolor:50
Median :5.800 Median :3.000 Median :4.350 Median :1.300 virginica :50
Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
Get Dimension of the Iris Dataset
To get number of rows and column of iris dataset we use dim() function:
# Shows rows and columns
dim(iris)
The below output shows total number of rows and columns of iris dataset.
Output:
[1] 150 5
Get Column Names of the Iris Dataset
To get column names of iris dataset we use names() function:
# Shows column names
names(iris)
The following output shows column names of iris dataset.
Output:
[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
Visualize the Iris Dataset
There are multiple function in R used to visualize dataset.Let see these function one by one.
Let’s create histogram using hist() function:
# Plot histogram for values of petal length
hist(iris$Petal.Length,
col='green',
main='Histogram',
xlab='Length',
ylab='Frequency')
The following snippet shows histogram for petal length variable.
Output:
To create scatterplot we use plot() function:
# Create scatterplot of petal width vs. petal length
plot(iris$Petal.Width, iris$Petal.Length,
col='red',
main='Scatterplot',
xlab='Petal Width',
ylab='Petal Length',
pch=19)
The below snippet displays shows scatterplot of Petal width vs petal length.
Output:
We can plot boxplot using boxplot() function:
# Create boxplot of petal width by Species
boxplot(Petal.Length~Species,
data=iris,
main='Petal Length by Species',
xlab='Species',
ylab='Petal Length',
col='steelblue',
border='black')
The output shows boxplot for petal width grouped by species.
Output:
Using all these function we can visualize dataset.