开发者

How can I efficiently use R to add summary rows with 0 cases?

开发者 https://www.devze.com 2023-04-12 12:32 出处:网络
I have a data set that includes cases by year and month.Some months are missing, and I\'d like to create rows with a case count of zero for those months.

I have a data set that includes cases by year and month. Some months are missing, and I'd like to create rows with a case count of zero for those months.

Here is an example, and my current brute force approach. Thanks for a开发者_开发知识库ny pointers. Obviously, I'm new at this.

# fake data
library(plyr)
rm(FakeData)
FakeData <- data.frame(DischargeYear=c(rep(2010, 7), rep(2011,7)),
                       DischargeMonth=c(1:7, 3:9),
                       Cases=trunc(rnorm(14, mean=100, sd=20)))

# FakeData is missing data for some year/months
FakeData

# Brute force attempt to add rows with 0 and then total 
for(i in 1:12){
  for(j in 1:length(unique(FakeData$DischargeYear))){
    FakeData <- rbind(FakeData, data.frame(
                DischargeYear=unique(FakeData$DischargeYear)[j],
                DischargeMonth=i,
                Cases=0))
      }
     }

FakeData <- ddply(FakeData, c("DischargeYear","DischargeMonth"), summarise, Cases=sum(Cases))

# FakeData now has every year/month represented
FakeData


Using your FakeData data frame, try this:

# Create all combinations of months and years
allMonths <- expand.grid(DischargeMonth=1:12, DischargeYear=2010:2011)
# Keep all month-year combinations (all.x=TRUE) and add in 'Cases' from FakeData
allData <- merge(allMonths, FakeData, all.x=TRUE)
# 'allData' contains 'NA' for missing values. Set them to 0.
allData[is.na(allData)] <- 0
# Print results
allData


Another solution would be to use cast from the reshape package.

require(reshape)
cast(Fakedata, DischargeYear + DischargeMonth ~ ., add.missing = TRUE, fill = 0)

Note that it only adds 0 for the missing combinations in the data, months 8, 9 for year 2010 and months 1 and 2 for year 2011. To ensure that you have all months 1:12, you can change the definition of DischargeMonth to be a factor with levels 1:12 using

FakeData = transform(FakeData, 
   DischargeMonth = factor(DischargeMonth, levels = 1:12))


Here is a zoo solution. Note that zoo FAQ #13 discusses forming the grid, g. Also we convert the year and month to a "yearmon" class variable which is represented as a year plus fractional month (0 = Jan, 1/12 = Feb, 2/12 = Mar, etc.)

library(zoo)

# create zoo object with yearmon index
DF <- FakeData
z <- zoo(DF[,3], yearmon(DF[,1] + (DF[,2]-1)/12))

# create grid g. Merge zero width zoo object based on it.  Fill NAs with 0s.
g <- seq(start(z), end(z), 1/12)
z0 <- na.fill(merge(z, zoo(, g)), fill = 0)

which gives

> z0
Jan 2010 Feb 2010 Mar 2010 Apr 2010 May 2010 Jun 2010 
     149      113      110       99      110       96 
Jul 2010 Aug 2010 Sep 2010 Oct 2010 Nov 2010 Dec 2010 
     108        0        0        0        0        0 
Jan 2011 Feb 2011 Mar 2011 Apr 2011 May 2011 Jun 2011 
       0        0       91       72      119      130 
Jul 2011 Aug 2011 Sep 2011 
      93       74      112 

or converting to "ts" class:

> as.ts(z0)
     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2010 149 113 110  99 110  96 108   0   0   0   0   0
2011   0   0  91  72 119 130  93  74 112

Note that if z is a zoo object then coredata(z) is its data and time(z) are its index values.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号