开发者

Problem of data frame and replacement in a loop in R

开发者 https://www.devze.com 2023-04-01 21:35 出处:网络
I\'m using R on a dataset containing trips. Each line is a trip (from A to B). On each line, I know the identity of the individual (a number), the purpose of the trip (1,2,3 or 4), the time category (

I'm using R on a dataset containing trips. Each line is a trip (from A to B). On each line, I know the identity of the individual (a number), the purpose of the trip (1,2,3 or 4), the time category (1,2 or 3) and a number identifying the tour in which the trip was done (a tour is a group of trips; all these trips go from A to A).

I would like to create a new row: for the same individual, what was the purpose of the previous trip in the same time category in a different tour. This variable is called "prevDistanceSameTimeCategoryDifferentTour".

I have this error:

Error in $<-.data.frame(*tmp*,"prevDistanceSameTimeCategoryDifferentTour", : replacement has 2 rows, data has 1167

Here is my code:

prevPersonTimeCategory <- array(-999, dim=c(3,3))
prevPersonTimeCategory[1,1] <- TgData$PersonID[1]
prevPersonTimeCategory[2,1] <- TgData$PersonID[1]
prevPersonTimeCategory[3,1] <- TgData$PersonID[1]
for(i in 2:nrow(TgData)) {
    if (TgData$timeCategory[i] == 1) {
        if (TgData$tour[i] == prevPersonTimeCategory[1,3]) {
            if (prevPersonTimeCategory[1,1] == TgData$PersonID[i]) {
                TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- prevPersonTimeCategory[1,2]
                }
            else {
                TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- -999
                prevPersonTimeCategory[1,1] <- TgData$PersonID[i]
                }   
            }
        else {
            if (prevPersonTimeCategory[1,1] == TgData$PersonID[i]) {
                prevPersonTimeCategory[1,3] <- TgData$tour[i]
                TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- prevPersonTimeCategory[1,2]
                prevPersonTimeCategory[1,2] <- TgData$purpose[i]
                }
            else {
                TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- -999
                prevPersonTimeCategory[1,1] <- TgData$PersonID[i]
                prevPersonTimeCategory[1,2] <- -999
                }
            }
        }
    else if (TgData$timeCategory[i] == 2) {
        if (TgData$tour[i] == prevPersonTimeCategory[2,3]) {
            if (prevPersonTimeCategory[2,1] == TgData$PersonID[i]) {
                TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- prevPersonTimeCategory[2,2]
                }
            else {
                TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- -999
                prevPersonTimeCategory[2,1] <- TgData$PersonID[i]
                }   
            }
        else {
            if (prevPersonTimeCategory[2,1] == TgData$PersonID[i]) {
                print(i)
         开发者_如何学Go       prevPersonTimeCategory[2,3] <- TgData$tour[i]
                TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- prevPersonTimeCategory[2,2]
                prevPersonTimeCategory[2,2] <- TgData$purpose[i]
                }
            else {
                TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- -999
                prevPersonTimeCategory[2,1] <- TgData$PersonID[i]
                prevPersonTimeCategory[2,2] <- -999
                }
            }
        }
    else if (TgData$timeCategory[i] == 3) {
        if (TgData$tour[i] == prevPersonTimeCategory[3,3]) {
            if (prevPersonTimeCategory[3,1] == TgData$PersonID[i]) {
                TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- prevPersonTimeCategory[3,2]
                }
            else {
                TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- -999
                prevPersonTimeCategory[3,1] <- TgData$PersonID[i]
                }   
            }
        else {
            if (prevPersonTimeCategory[3,1] == TgData$PersonID[i]) {
                prevPersonTimeCategory[3,3] <- TgData$tour[i]
                TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- prevPersonTimeCategory[3,2]
                prevPersonTimeCategory[3,2] <- TgData$purpose[i]
                }
            else {
                TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- -999
                prevPersonTimeCategory[3,1] <- TgData$PersonID[i]
                prevPersonTimeCategory[3,2] <- -999
                }
            }
        }
    else {
        TgData$prevPurposeSameTimeCategoryDifferentTour[i] = -999
        }
    }

I'm creating an array to store information for each time category. In this array, the first value is the identity of the individual (prevPersonTimeCategory[1,1], prevPersonTimeCategory[2,1], prevPersonTimeCategory[3,1], one for each time category), the second is the purpose (prevPersonTimeCategory[1,2], etc.), and the third is the tour number (prevPersonTimeCategory[1,3], etc.). Then I'm just reading each line (for) and writing a few conditions (if).

I really don't see where I'm doing a mistake.

My dataset contains 36'784 lines, but I'm testing on 1932 lines (-1 line for headers). The data looks like this:

PersonID    purpose tour    timeCategory
1   1   1   2
1   4   2   3
1   4   2   3
1   4   3   3
1   3   4   3
1   4   5   3
1   4   5   2
1   4   5   3
1   3   5   3
1   4   6   2
1   4   6   2
1   4   6   3
1   3   7   3
1   4   8   3
1   4   9   3
1   4   10  3
1   4   10  3
1   4   11  1
1   4   12  1
1   4   13  1
1   4   14  1
1   4   16  1
1   1   17  2
1   4   18  3
1   4   19  2
1   3   20  3
1   4   20  3
1   4   21  3
1   1   22  2
1   3   22  3
1   3   23  3
1   4   24  3
1   4   25  3
1   4   25  3
1   4   26  3
1   1   27  2
1   3   27  3
1   4   28  3
1   3   28  3
1   4   29  3
1   4   29  3
1   1   30  2
1   4   31  3
1   1   31  2
1   4   32  3
1   3   32  3
1   4   33  3
1   3   34  3
1   4   35  3
1   1   36  2
1   3   36  3
1   4   37  3
1   3   38  3
1   4   39  3
1   3   39  3
1   4   39  3
1   4   40  3
1   4   40  2
1   4   40  3
1   3   41  3
1   4   42  3
1   4   43  3
1   1   44  2
1   3   45  3
1   4   46  3
1   3   47  3
1   3   47  3
1   4   48  2
1   1   49  2
1   4   50  3
1   1   51  2
1   1   51  2
1   2   51  3
1   3   52  3
1   3   53  1
1   4   54  1
1   4   55  1
1   4   55  1
1   4   55  1
1   1   56  3
1   4   57  3
1   4   58  3
1   1   59  2
1   3   59  3
1   4   60  3
1   4   61  3
1   1   62  3
1   3   63  3
1   4   64  3
1   3   65  3
1   4   66  3
1   3   67  3
1   2   68  1
2   3   69  3
2   1   70  3
2   4   71  2
2   1   72  3
2   3   72  3
2   1   72  2

If I run this short version of my code, I have no problems:

prevPersonTimeCategory <- array(-999, dim=c(3,3))
prevPersonTimeCategory[1,1] <- TgData$PersonID[1]
prevPersonTimeCategory[2,1] <- TgData$PersonID[1]
prevPersonTimeCategory[3,1] <- TgData$PersonID[1]
for(i in 2:nrow(TgData)) {
    if (TgData$timeCategory[i] == 1) {
        if (TgData$tour[i] == prevPersonTimeCategory[1,3]) {
            if (prevPersonTimeCategory[1,1] == TgData$PersonID[i]) {
                TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- prevPersonTimeCategory[1,2]
                }
            else {
                TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- -999
                prevPersonTimeCategory[1,1] <- TgData$PersonID[i]
                }   
            }
        }
    }

But if I add a few more lines like here:

prevPersonTimeCategory <- array(-999, dim=c(3,3))
prevPersonTimeCategory[1,1] <- TgData$PersonID[1]
prevPersonTimeCategory[2,1] <- TgData$PersonID[1]
prevPersonTimeCategory[3,1] <- TgData$PersonID[1]
for(i in 2:nrow(TgData)) {
    if (TgData$timeCategory[i] == 1) {
        if (TgData$tour[i] == prevPersonTimeCategory[1,3]) {
            if (prevPersonTimeCategory[1,1] == TgData$PersonID[i]) {
                TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- prevPersonTimeCategory[1,2]
                }
            else {
                TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- -999
                prevPersonTimeCategory[1,1] <- TgData$PersonID[i]
                }   
            }
        else {
            if (prevPersonTimeCategory[1,1] == TgData$PersonID[i]) {
                prevPersonTimeCategory[1,3] <- TgData$tour[i]
                TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- prevPersonTimeCategory[1,2]
                prevPersonTimeCategory[1,2] <- TgData$purpose[i]
                }
            else {
                TgData$prevPurposeSameTimeCategoryDifferentTour[i] <- -999
                prevPersonTimeCategory[1,1] <- TgData$PersonID[i]
                prevPersonTimeCategory[1,2] <- -999
                }
            }
        }
    }

The error comes back:

Error in $<-.data.frame(*tmp*, "prevPurposeSameTimeCategoryDifferentTour", : replacement has 18 rows, data has 1150


Creating a new empty column as joran suggested works.

run this before you start the loop

TgData$prevPurposeSameTimeCategoryDifferentTour <- NA

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号