r – Principal Components Analysis:Error in colMeans(x, na.rm = TRUE) : x must be numeric

r – Principal Components Analysis:Error in colMeans(x, na.rm = TRUE) : x must be numeric

You can convert a character vector to numeric values by going via factor. Then each unique value gets a unique integer code. In this example, theres four values so the numbers are 1 to 4, in alphabetical order, I think:

> d = data.frame(country=c(foo,bar,baz,qux),x=runif(4),y=runif(4))
> d
  country          x         y
1     foo 0.84435112 0.7022875
2     bar 0.01343424 0.5019794
3     baz 0.09815888 0.5832612
4     qux 0.18397525 0.8049514
> d$country = as.numeric(as.factor(d$country))
> d
  country          x         y
1       3 0.84435112 0.7022875
2       1 0.01343424 0.5019794
3       2 0.09815888 0.5832612
4       4 0.18397525 0.8049514

You can then run prcomp:

> prcomp(d)
Standard deviations:
[1] 1.308665216 0.339983614 0.009141194

               PC1          PC2          PC3
country -0.9858920  0.132948161 -0.101694168
x       -0.1331795 -0.991081523 -0.004541179
y       -0.1013910  0.009066471  0.994805345

Whether this makes sense for your application is up to you. Maybe you just want to drop the first column: prcomp(d[,-1]) and work with the numeric data, which seems to be what the other answers are trying to achieve.

The first column of the data frame is character. So you can recode it to row names as :

data2 %>% remove_rownames %>% column_to_rownames(var=country)
princ <- prcomp(data2)

Alternatively as :

data2 <- data2[,-1]
rownames(data2) <- data2[,1]
princ <- prcomp(data2)

r – Principal Components Analysis:Error in colMeans(x, na.rm = TRUE) : x must be numeric

In R, adding the factor method to a character set of data, does not make it numeric.
Indeed it is to make our machine learning model a mathematical model but it is not numeric data.

Example: If you have a list of names and then they are being encoded numerically then it may happen that a certain name may have a higher numerical value which will give it a different definition depending on our model.
Which should not be the case as names(text data which is just for labeling a specific set) generally should not define the way a model should work.

Also if you try working with this data assuming it to be numeric, you may get the following error:

Error in colMeans(x, na.rm = TRUE) : x must be numeric

I have defined why you may get this error above

To overcome this problem

training_set[,2:3] = scale(training_set)
test_set[,2:3] = scale(test_set)

In the following image, columns 1 and 4 have encoded data and cannot be treated as a numerical model Columns 2 and 3 have been originally containing numerical data so we can run our model only on that part of the data. The above code just shows how to select the data it includes all rows and columns 2 and 3

Leave a Reply

Your email address will not be published.