The preceding sections created a number of R objects. You should see them in the Environment pane in RStudio or listing them by entering ls()
at the console. There are a number of fundamental data types in R that are the building blocks for data analysis. The sections below explore different data types and illustrate further operations on them.
Vectors
Examples of vectors are:
# numeric
c(2,3,5,2,7,1)
# The numbers 3, 4, .., 10
3:10
# logical
c(TRUE,FALSE,FALSE,FALSE,TRUE,TRUE,FALSE)
# character
c("London","Leeds","New York","Montevideo", NA)
Vectors may have different modes such as logical, numeric or character. The first two vectors above are numeric, the third is logical (i.e. a vector with elements of mode logical), and the fourth is a string vector (i.e. a vector with elements of mode character). The missing value symbol, which is NA
, can be included as an element of a vector.
The c
in c(2, 3, 5, 7, 1)
above is an acronym for concatenate, i.e. the meaning is: Join these numbers together in to a vector. Existing vectors may be included among the elements that are to be concatenated. In the following code, we form vectors x
and y
(overwriting those that the x
and y
that were defined earlier) which we then concatenate to form a vector z
:
## [1] 2 3 5 2 7 1
## [1] 10 15 12
## [1] 2 3 5 2 7 1 10 15 12
The concatenate function c()
may also be used to join lists.
Vectors can be subsetted. There are two common ways to extract subsets from vectors. Note in both cases, the use of the square brackets [ ]
.
- Specify or index the elements that are to be extracted, e.g.
## [1] 3 2
Note that negative numbers can be used to omit specific vector elements:
## [1] 2 2 7 1
- Specify a vector of logical values to select elements. The elements that are extracted are those for which the logical value is
TRUE
. Thus suppose we want to extract values ofx
that are greater than 4.
## [1] 5 7
Examine the logical selection:
## [1] FALSE FALSE TRUE FALSE TRUE FALSE
A number of relations may be used in the extraction of subsets of vectors are < <= > >= == !=
. The first four compare magnitudes, ==
tests for equality, and !=
tests for inequality.
Matrices and Data Frames
The fundamental difference between a matrix
and data.frame
are that matrices can only contain a single data type – numeric, logical, text etc. Whereas a data frame can have different types of data in each column, with all elements of any column being of the same type i.e. all numeric, all factors, all logical, all character, etc.
Matrices are easy to define:
## [,1] [,2]
## [1,] 1 6
## [2,] 2 7
## [3,] 3 8
## [4,] 4 9
## [5,] 5 10
## [,1] [,2]
## [1,] 1 2
## [2,] 3 4
## [3,] 5 6
## [4,] 7 8
## [5,] 9 10
## [,1] [,2]
## [1,] "a" "f"
## [2,] "b" "g"
## [3,] "c" "h"
## [4,] "d" "i"
## [5,] "e" "j"
Many R packages come with datasets. The iris
dataset is an internal R dataset and is loaded to your R session with the code below.
This a data.frame
:
## [1] "data.frame"
The code below uses the head()
function to print out the first 6 rows and the dim()
function to tell us the dimensions of iris
.
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## [1] 150 5
The str()
function can be used to indicate the formats of the attributes (columns, fields) in iris
:
## 'data.frame': 150 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
Here we can see that 4 of the attributes are numeric
and the other is a factor
(a kind of ordered categorical variable).
The summary()
function is also very useful and shows different summaries of the individual attributes (columns) in iris
.
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
## Median :5.800 Median :3.000 Median :4.350 Median :1.300
## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
## Species
## setosa :50
## versicolor:50
## virginica :50
##
##
##
The main R graphics function is plot()
and when it is applied to a data frame or it a matrix shows how attribute values correlate to each other. There are various other alternative helpful forms of graphical summary. The scatterplot shown in Figure 2 is of the first 4 fields (columns) in iris
. Note how the inclusion of upper.panel=panel.smooth
causes the lowess curves to be added to Figure 2.2.

Figure 2.2: A plot of the numeric variables in the iris data.
The individual data types can also be investigated using the sapply()
function. This applies a function to each column in matrix or data frame:
A key property in a data.frame
is that columns can be vectors of any type. It is effectively a list (group) of column vectors, all of equal length.
Further Data types
There are many more data types in R. Chapter 1 in Comber and Brunsdon (2021) provides a brief introduction to some of the important ones, and Chapter 2 in [Brunsdon and Comber (2018) provides a comprehensive overview with worked examples and exercises.
Read more here: Source link