A vector is the simplest type of data structure in R. The R manual defines a vector as “a single entity consisting of a collection of things.” A collection of numbers, for example, is a numeric vector — the first five integer numbers form a numeric vector of length 5.
##Returns a vector of 1, 2, 3, 4, 5
A matrix is a collection of data elements arranged in a two-dimensional rectangular layout. All columns in a matrix must have the same mode(all numeric or all characters, etc.) and the same length. A matrix is a special case 2 dimensional array (referenced below).
Arrays are similar to matrices but can have more than two dimensions. For example:
R> X <- array(1:18, dim = c(3, 2, 3)) R> X
, , 1
[1,] 1 4
[2,] 2 5
[3,] 3 6
, , 2
[1,] 7 10
[2,] 8 11
[3,] 9 12
, , 3
[1,] 13 16
[2,] 14 17
[3,] 15 18
The main difference between a matrix and a data.frame is that a data.frame can store data of different mode. Rule of thumb is that you use data frames if columns (variables) can be expected to be of different types (numeric/character/logical etc.). Matrices are for data of the same type.
A list is an ordered collection of data of arbitrary types. You can create a list of vectors, matrices, or even other lists.
Qualitative data (categorical variables) that can assume only a discrete set of values are represented in R by factors. In a factor the qualitative values are turned into numbers, and the ‘link’ between these values and the corresponding original categories is stored as ‘levels’.
##Several hundred entries with “True” if there was a credit default, and “False” if there was no credit default
##Stores the entries as 0’s and 1’s, and now treats them as nominal variables
credit$Default <- factor(credit$Default)