You have read in your data, now you are ready to work with some factor column in the data. However, when you try to check the levels of the factor column, you get a NULL
result:
> caff = read.table(file="caffeine.txt", header = T)
> caff$Dose
[1] "Zero" "Zero" "Zero" "Zero" "Zero" "Zero" "Zero" "Zero" "Zero" "Zero" "Low" "Low" "Low" "Low" "Low"
[16] "Low" "Low" "Low" "Low" "Low" "High" "High" "High" "High" "High" "High" "High" "High" "High" "High"
> levels(caff$Dose)
NULL
The most common cause of this problem is that your factor column (caff$Dose
in the example) is of a wrong data type. To verify this, use the str
command:
> str(caff)
'data.frame': 30 obs. of 2 variables:
$ Taps: int 242 245 244 248 247 248 242 244 246 242 ...
$ Dose: chr "Zero" "Zero" "Zero" "Zero" ...
Note the Dose
column’s data type is chr
, not factor
.
As far as I know, R’s default behaviour when you read in data, is to set the datatype of string columns to factor
. Yet on a new environment I recently set up, it’s doing the opposite.
The fix can be done at different levels:
-
Fix existing data:
caff$Dose <- as.factor(caff$Dose)
-
Fix when read in data:
caff = read.table(file="caffeine.txt", header = T, stringsAsFactors = T)
-
Change default at R session beginning, or R profile:
options(stringsAsFactors = TRUE)
Where to put the change really depends on the best fit behaviour for data you work with. Some people may prefer using chr
rather than factor
.
Anyway, now if you try str
command, it should be like this:
> str(caff)
'data.frame': 30 obs. of 2 variables:
$ Taps: int 242 245 244 248 247 248 242 244 246 242 ...
$ Dose: Factor w/ 3 levels "High","Low","Zero": 3 3 3 3 3 3 3 3 3 3 ...
> levels(caff$Dose)
[1] "High" "Low" "Zero"