开发者

Help on subsetting a dataframe

开发者 https://www.devze.com 2023-03-27 07:53 出处:网络
I am using %in% for subsetting and I came across a strange result. > my.data[my.data$V3 %in% seq(200,210,.01),]

I am using %in% for subsetting and I came across a strange result.

> my.data[my.data$V3 %in% seq(200,210,.01),]
        V1     V2        V3         V4       V5      V6         V7
56     470   48.7    209.73        yes     26.3      54        470

That was correct. But when I widen the range... row 56 just disappears

> my.data[my.data$V3 %in% seq(150,210,.01),]
        V1     V2        V3         V4       V5      V6         V7
51     458   48.7    156.19        yes     28.2      58        458
67     511   30.5    150.54        yes     26.1      86        511
73     535   40.6    178.76        yes     29.5      73        535

Can you tell me what's wrong? Is 开发者_运维知识库there a better way to subset the dataframe?

Here is its structure

> str(my.data)
'data.frame':   91 obs. of  7 variables:
 $ V1: Factor w/ 91 levels "100","10004",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ V2: num  44.6 22.3 30.4 38.6 15.2 18.3 16.3 12.2 36.7 12.2 ...
 $ V3: num  110.83 25.03 17.17 57.23 2.18 ...
 $ V4: Factor w/ 2 levels "no","yes": 1 2 2 2 1 1 1 1 1 1 ...
 $ V5: num  22.3 30.5 24.4 25.5 4.1 28.4 7.9 5.1 24 12.2 ...
 $ V6: int  50 137 80 66 27 155 48 42 65 100 ...
 $ V7: chr  "" "10004" "10005" "10012" ...


Ooops. You are trying to do exact matching on a computer that can't represent all numbers exactly.

> any(209.73 == seq(200,210,.01))
[1] TRUE
> any(209.73 == seq(150,210,.01))
[1] FALSE
> any(209.73 == zapsmall(seq(150,210,.01)))
[1] TRUE

The reason for the discrepancy is in the second sequence, the value in the sequence is not exactly 209.73. This is something you have to appreciate when doing computation with computers.

This is covered in many places on the interweb, but in relation to R, see point 7.31 in the R FAQ.

Anyway, that said, you are going about the problem incorrectly. You want to use proper numeric operators:

my.data[my.data$V3 >= 150 & my.data$V3 <= 210, ]
## or
subset(my.data, V3 >= 150 & V3 <= 210)
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号