开发者

Subsetting data frame using variable with same name as column

开发者 https://www.devze.com 2023-04-07 19:51 出处:网络
I have a data frame and I\'开发者_如何学Pythonm trying to run a subset on it. In my data frame, I have a column called \"start\" and I\'m trying to do this:

I have a data frame and I'开发者_如何学Pythonm trying to run a subset on it. In my data frame, I have a column called "start" and I'm trying to do this:

sub <- subset(data,data$start==14)

and I correctly get a subset of all the rows where start=14.

But, when I do this:

for(start in seq(1,20,by=1)) {
   sub <- subset(data,data$start==start)
   print(sub)
}

it does not correctly find the subsets. It just prints the entire data frame.

Why is this and how do I fix it?


You can also specify the environment you're working with:

x<-data.frame(
  start=sample(3,20,replace=TRUE),
  someValue=runif(20))

env<-environment()
start<-3
cat("\nDefaut scope:")
print(subset(x,start==start)) # all entries, as start==start is evaluated to TRUE

cat("\nSpecific environment:")
print(subset(x,start==get('start',env)))  # second start is replaced by its value in former environment. Equivalent to subset(x,start==3)


Fixing it is easy. Just rename either your for loop counter or your data frame column to something other than start.

The reason it happens is because subset is trying to evaluate the expression data$start == start inside the data frame data. So it sees the column start and stops there, never seeing the other variable start you defined in the for loop.

Perhaps a better insight into why R gets confused here is to note that when using subset you don't in general need to refer to variables using data$. So imagine telling R:

subset(data,start == start)

R is just going to evaluate both of those start's inside data and get a vector of all TRUE's back.


Another approach is to use bracket subsetting rather than the subset function.

for(start in seq(1,20,by=1)) {
   sub <- data[data$start==start,]
   print(sub)
}

subset has non-standard evaluation rules, which is leading to the scoping problem you are seeing (to which start are you referring?). If there are (or may be) NA's in data$start, you probably need

sub <- data[!is.na(data$start) & data$start==start,]

Note this warning from the subset help page:

This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号