开发者

Rep values from a data frame to another data frame. apply? sapply?

开发者 https://www.devze.com 2023-03-31 07:56 出处:网络
I have the following data frame data<-data.frame(ID=c(\"a\", \"b\", \"c\", \"d\"), zeros=c(3,2,5,4), ones=c(1,1,2,1))

I have the following data frame

data<-data.frame(ID=c("a", "b", "c", "d"), zeros=c(3,2,5,4), ones=c(1,1,2,1))


   ID zeros ones
1  a     3    1
2  b     2    1
3  c     5    2
4  d     4    1

and I wish to create another data frame with 2 columns:

First colu开发者_如何学编程mn(id) the ID is repeated (zero+ones) times Second column value should be the c(rep(0, zeros), rep(1, ones))

so that the result would be

    id value
1   a  0
2   a  0
3   a  0
4   a  1
5   b  0
6   b  0
7   b  1
8   c  0
9   c  0
10  c  0
11  c  0
12  c  0
13  c  1
14  c  1
15  d  0
16  d  0
17  d  0
18  d  0
19  d  1

I tried data.frame(id=(rep(data$ID, (data$zeros+data$ones))), value=c(rep(0, data$zeros), rep(1, data$ones))) but doesnt work. Any ideas? Thank you in advance


This is perhaps overkill, using ddply from the plyr package, but it's the first thing that came to me:

ddply(dat,.(ID),function(x){data.frame(value = rep(c(0,1),times = c(x$zeros,x$ones)))})

Oh and I changed the name of your data frame to dat to avoid a bad habit (data is the name of an oft used function).


Here's a base R solution. I prefer the overkill of plyr myself:

dat <- data.frame(ID = letters[1:4], zeros = c(3,2,5,4), ones = c(1,1,2,1))

do.call("rbind"
    , apply(dat, 1, function(x) 
        data.frame(cbind(id = x[1], value = rep(0:1, times = x[2:3])))
    )
)


Since you've already got a base R solution for the first column, this is one for your second column:

lengths<-as.vector(t(as.matrix(data[,2:3]))) #notice the t
what<-rep(c(0,1), nrow(data))
times<-rep(what, lengths)

Edit: changed a minor thing above and tested it. It works now.


I also prefer the plyr method, but I thought I'd throw another base R solution related to reshaping the data first, and then replicating it. (also using dat instead of data):

names(dat)[2:3] <- c("times.0", "times.1")
tmp <- reshape(dat, varying=2:3, direction="long")
tmp <- tmp[rep(seq(length=nrow(tmp)),tmp$times),c("ID","time")]
names(tmp) <- c("id","value")
tmp <- tmp[order(tmp$id, tmp$value),]
rownames(tmp) <- NULL

Not as elegant as some of the other base solutions because it requires intermediate storage, but possibly interesting.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号