Aggregate data in one column based on values in another column_问答_开发者

Aggregate data in one column based on values in another column

开发者 https://www.devze.com 2023-04-07 12:42 出处：网络

I know there is an easy way to do this...but, I can\'t figure it out. I have a dataframe in my R script that looks something like this:

I know there is an easy way to do this...but, I can't figure it out.

I have a dataframe in my R script that looks something like this:

A      B    C
1.2    4    8
2.3    4    9
2.3    6    0
1.2    3    3
3.4    2    1 
1.2    5    1

Note that A, B, and C are column names. And I'm trying to get variables like this:

sum1 <- [the sum of all B values such that A is 1.开发者_StackOverflow2]
num1 <- [the number of times A is 1.2]

Any easy way to do this? I basically want to end up with a data frame that looks like this:

    A     num     totalB
   1.2    3       12
   etc    etc     etc

Where "num" is the number of times that particular A value appeared, and "totalB" is the sum of the B values given the A value.

I'd use aggregate to get the two aggregates and then merge them into a single data frame:

> df
    A B C
1 1.2 4 8
2 2.3 4 9
3 2.3 6 0
4 1.2 3 3
5 3.4 2 1
6 1.2 5 1

> num <- aggregate(B~A,df,length)
> names(num)[2] <- 'num'

> totalB <- aggregate(B~A,df,sum)
> names(totalB)[2] <- 'totalB'

> merge(num,totalB)
    A num totalB
1 1.2   3     12
2 2.3   2     10
3 3.4   1      2

In dplyr:

library(tidyverse)
A <- c(1.2, 2.3, 2.3, 1.2, 3.4, 1.2)
B <- c(4, 4, 6, 3, 2, 5)
C <- c(8, 9, 0, 3, 1, 1)

df <- data_frame(A, B, C)

df %>%
    group_by(A) %>% 
    summarise(num = n(),
              totalB = sum(B))

Here is a solution using the plyr package

plyr::ddply(df, .(A), summarize, num = length(A), totalB = sum(B))

Here is a solution using data.table for memory and time efficiency

library(data.table)
DT <- as.data.table(df)
DT[, list(totalB = sum(B), num = .N), by = A]

To subset only rows where C==1 (as per the comment to @aix answer)

DT[C==1, list(totalB = sum(B), num = .N), by = A]

Aggregate data in one column based on values in another column

精彩评论

关注公众号

热门标签

图文推荐

Aggregate data in one column based on values in another column

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：