开发者

Merging aggregate data in R

开发者 https://www.devze.com 2023-02-20 14:51 出处:网络
Following up my previous question about aggregating hourly data into daily data, I want to continue with (a) monthly aggregate and (b) merging the monthly aggregate into the original dataframe.

Following up my previous question about aggregating hourly data into daily data, I want to continue with (a) monthly aggregate and (b) merging the monthly aggregate into the original dataframe.

My original dataframe looks like this:

Lines <- "Date,Outdoor,Indoor
01/01/2000 01:00,30,25
01/01/2000 02:00,31,26
01/01/2000 03:00,33,24
02/01/2000 01:00,29,25
02/01/2000 02:00,27,26
02/01/2000 03:00,39,24
12/01/2000 02:00,27,26
12/01/2000 03:00,39,24
12/31/2000 23:00,28,25"

The daily aggregates have been answered in my previous question, and then I can find my way to produce the monthly aggregates from there, to something like this:

Lines <- "Date,Month,OutdoorAVE
01/01/2000,Jan,31.33
02/01/2000,Feb,31.67
12/01/2000,Dec,31.33"

Where the OutdoorAVE is the monthly average of the daily minimum and maximum outdoor temperature. What I want to have in the end is something like this:

Lines <- "Date,Outdoor,Indoor,Month,OutdoorAVE
01/01/2000 01:00,30,25,Jan,31.33
01/01/2000 02:00,31,26,Jan,31.33
01/01/2000 03:00,33,24,Jan,31.33
02/01/2000 01:00,29,25,Feb,31.67
02/01/2000 02:00,27,26,Feb,31.67
02/01/2000 03:00,39,24,Feb,31.67
12/01/2000 02:00,27,26,Dec,31.33
12/01/2000 03:00,39,24,Dec,31.33
12/31/2开发者_StackOverflow社区000 23:00,28,25,Dec,31.33"

I do not know enough R on how to do that. Any help is greatly appreciated.


Try ave and eg POSIXlt to extract the month:

zz <- textConnection(Lines)
Data <- read.table(zz,header=T,sep=",",stringsAsFactors=F)
close(zz)

Data$Month <- strftime(
     as.POSIXlt(Data$Date,format="%m/%d/%Y %H:%M"),
     format='%b')
Data$outdoor_ave <- ave(Data$Outdoor,Data$Month,FUN=mean)

Gives :

> Data
              Date Outdoor Indoor Month outdoor_ave
1 01/01/2000 01:00      30     25   Jan    31.33333
2 01/01/2000 02:00      31     26   Jan    31.33333
3 01/01/2000 03:00      33     24   Jan    31.33333
4 02/01/2000 01:00      29     25   Feb    31.66667
5 02/01/2000 02:00      27     26   Feb    31.66667
6 02/01/2000 03:00      39     24   Feb    31.66667
7 12/01/2000 02:00      27     26   Dec    31.33333
8 12/01/2000 03:00      39     24   Dec    31.33333
9 12/31/2000 23:00      28     25   Dec    31.33333

Edit : Then just calcualte Month in Data as shown above and use merge :

zz <- textConnection(Lines2) # Lines2 is the aggregated data
Data2 <- read.table(zz,header=T,sep=",",stringsAsFactors=F)
close(zz)

> merge(Data,Data2[-1],all=T)
  Month             Date Outdoor Indoor OutdoorAVE
1   Dec 12/01/2000 02:00      27     26      31.33
2   Dec 12/01/2000 03:00      39     24      31.33
3   Dec 12/31/2000 23:00      28     25      31.33
4   Feb 02/01/2000 01:00      29     25      31.67
5   Feb 02/01/2000 02:00      27     26      31.67
6   Feb 02/01/2000 03:00      39     24      31.67
7   Jan 01/01/2000 01:00      30     25      31.33
8   Jan 01/01/2000 02:00      31     26      31.33
9   Jan 01/01/2000 03:00      33     24      31.33


This is tangential to your question, but you may want to use RSQLite and a separate tables for various aggregate values instead, and join the tables with simple SQL commands. If you use many kinds of aggregations your data frame can easily get large and ugly.


Here's a zoo/xts solution. Note that Month is numeric here because you can't mix types in zoo/xts objects.

require(xts) # loads zoo too
Lines1 <- "Date,Outdoor,Indoor
01/01/2000 01:00,30,25
01/01/2000 02:00,31,26
01/01/2000 03:00,33,24
02/01/2000 01:00,29,25
02/01/2000 02:00,27,26
02/01/2000 03:00,39,24
12/01/2000 02:00,27,26
12/01/2000 03:00,39,24
12/31/2000 23:00,28,25"
con <- textConnection(Lines1)
z <- read.zoo(con, header=TRUE, sep=",",
    format="%m/%d/%Y %H:%M", FUN=as.POSIXct)
close(con)

zz <- merge(z, Month=.indexmon(z),
    OutdoorAVE=ave(z[,1], .indexmon(z), FUN=mean))
zz
#                     Outdoor Indoor Month OutdoorAVE
# 2000-01-01 01:00:00      30     25     0   31.33333
# 2000-01-01 02:00:00      31     26     0   31.33333
# 2000-01-01 03:00:00      33     24     0   31.33333
# 2000-02-01 01:00:00      29     25     1   31.66667
# 2000-02-01 02:00:00      27     26     1   31.66667
# 2000-02-01 03:00:00      39     24     1   31.66667
# 2000-12-01 02:00:00      27     26    11   31.33333
# 2000-12-01 03:00:00      39     24    11   31.33333
# 2000-12-31 23:00:00      28     25    11   31.33333

Update: How do get the above result using two different data sets.

Lines2 <- "Date,Month,OutdoorAVE
01/01/2000,Jan,31.33
02/01/2000,Feb,31.67
12/01/2000,Dec,31.33"
con <- textConnection(Lines2)
z2 <- read.zoo(con, header=TRUE, sep=",", format="%m/%d/%Y",
    FUN=as.POSIXct, colClasses=c("character","NULL","numeric"))
close(con)

zz2 <- na.locf(merge(z1, Month=.indexmon(z1), OutdoorAVE=z2))[index(z1)]
# same output as zz (above)
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号