Fix NULL levels problem in R: stringsAsFactors

26 May 2020 , 1781 words

You have read in your data, now you are ready to work with some factor column in the data. However, when you try to check the levels of the factor column, you get a NULL result:

> caff = read.table(file="caffeine.txt", header = T)
> caff$Dose
 [1] "Zero" "Zero" "Zero" "Zero" "Zero" "Zero" "Zero" "Zero" "Zero" "Zero" "Low"  "Low"  "Low"  "Low"  "Low" 
[16] "Low"  "Low"  "Low"  "Low"  "Low"  "High" "High" "High" "High" "High" "High" "High" "High" "High" "High"
> levels(caff$Dose)
NULL

The most common cause of this problem is that your factor column (caff$Dose in the example) is of a wrong data type. To verify this, use the str command:

> str(caff)
'data.frame':	30 obs. of  2 variables:
 $ Taps: int  242 245 244 248 247 248 242 244 246 242 ...
 $ Dose: chr  "Zero" "Zero" "Zero" "Zero" ...

Note the Dose column’s data type is chr, not factor.

As far as I know, R’s default behaviour when you read in data, is to set the datatype of string columns to factor . Yet on a new environment I recently set up, it’s doing the opposite.

The fix can be done at different levels:

  1. Fix existing data:

    caff$Dose <- as.factor(caff$Dose)
    
  2. Fix when read in data:

    caff = read.table(file="caffeine.txt", header = T, stringsAsFactors = T)
    
  3. Change default at R session beginning, or R profile:

    options(stringsAsFactors = TRUE)
    

Where to put the change really depends on the best fit behaviour for data you work with. Some people may prefer using chr rather than factor.

Anyway, now if you try str command, it should be like this:

> str(caff)
'data.frame':	30 obs. of  2 variables:
 $ Taps: int  242 245 244 248 247 248 242 244 246 242 ...
 $ Dose: Factor w/ 3 levels "High","Low","Zero": 3 3 3 3 3 3 3 3 3 3 ...
> levels(caff$Dose)
[1] "High" "Low"  "Zero"