Data Types in R

Vectors

A vector is the simplest type of data structure in R. The R manual defines a vector as “a single entity consisting of a collection of things.” A collection of numbers, for example, is a numeric vector — the first five integer numbers form a numeric vector of length 5.

##Returns a vector of 1, 2, 3, 4, 5
c(1,2,3,4,5)

Matrices

A matrix is a collection of data elements arranged in a two-dimensional rectangular layout. All columns in a matrix must have the same mode(all numeric or all characters, etc.) and the same length. A matrix is a special case 2 dimensional array (referenced below).

Arrays

Arrays are similar to matrices but can have more than two dimensions. For example:

R> X <- array(1:18, dim = c(3, 2, 3)) R> X
, , 1

[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6

, , 2

[,1] [,2]
[1,] 7 10
[2,] 8 11
[3,] 9 12

, , 3

[,1] [,2]
[1,] 13 16
[2,] 14 17
[3,] 15 18

Data.frames

The main difference between a matrix and a data.frame is that a data.frame can store data of different mode. Rule of thumb is that you use data frames if columns (variables) can be expected to be of different types (numeric/character/logical etc.). Matrices are for data of the same type.

Lists

A list is an ordered collection of data of arbitrary types. You can create a list of vectors, matrices, or even other lists.

Factors

Qualitative data (categorical variables) that can assume only a discrete set of values are represented in R by factors. In a factor the qualitative values are turned into numbers, and the ‘link’ between these values and the corresponding original categories is stored as ‘levels’.

##Several hundred entries with “True” if there was a credit default, and “False” if there was no credit default
##Stores the entries as 0’s and 1’s, and now treats them as nominal variables
credit$Default <- factor(credit$Default)

Leave a Reply

Your email address will not be published. Required fields are marked *