Random Forest in R: New factor levels not present in the training data

Random Forest in R: New factor levels not present in the training data

Fellow newbie here, I was just toying around with Titanic these days. I think it doesn´t make sense to have the Parch variable as a factor, so maybe make it numeric and that may solve the problem:

train$Parch <- as.numeric(train$Parch)

Otherwise, the test data has 2 obs with the value of 9 for Parch, which are not present in the train data:

> table(train$Parch)

0   1   2   3   4   5   6 
678 118  80   5   4   5   1 

> table(test$Parch)

0   1   2   3   4   5   6   9 
324  52  33   3   2   1   1   2 
> 

Alternatively, if you need the variable to be a factor, then you could just add another level to it:

train$Parch <- as.factor(train$Parch) # in my data, Parch is type int
train$Parch
levels(train$Parch) <- c(levels(train$Parch), 9) 
train$Parch # now Parch has 7 levels
table(train$Parch) # level 9 is empty

Random Forest in R: New factor levels not present in the training data

Leave a Reply

Your email address will not be published.