开发者

Using Plyr in R with a complex function that returns multiple variable

开发者 https://www.devze.com 2023-04-03 22:29 出处:网络
I have a data set with three grouping variables: condition, sub, & delay. Here is a simplified version of my data (real data is much longer)

I have a data set with three grouping variables: condition, sub, & delay. Here is a simplified version of my data (real data is much longer)

sub condition delay later_value choiceRT later_choice primeRT        cue
 10      SIZE    10          27     1832            1     888      CHILD
 10      PAST     5          11      298            0    1635      PANTS
 10      SIZE    21          13      456            0     949      CANDY
 11      SIZE   120          22      526            1    7963        BOY
 11    FUTURE   120          27      561            1    4389   CHILDREN
 11      PAST     5          13      561            1    2586     SPRING

I have a complicated set of procedures to apply to these data (details are not important) I wrote the following function that accomplishes what I want when split by the three grouping variables. It returns 3 variables that I am interested in (indiff, p_intercept, & p_lv)

 getIndiffs <- function(currdelay){
      if (mean(currdelay$later_choice) == 1) {
        indiff = 10.5
        p_intercept = "laters"
        p_lv = "laters"
      }

      else if (mean(currdelay$later_choice) == 0) {
        indiff = 30.5

        # no p-val here, code that this was not calculated
        p_intercept = "nows"
        p_lv = "nows"
      }

      else {
        F <- factor(currdelay$later_choice)

        fit <- glm(F~later_value,data=currdelay,family=binomial())
        indiff <- -coef(fit)[1]/coef(fit)[2]

        if (indiff < 10) indiff = 10.5
        else if (indiff > 30) indiff = 30.5

        p_intercept = round(summary(fit)$coef[, "Pr(>|z|)"][1],3)
        p_lv = round(summary(fit)$coef[, "Pr(>|z|)"][2], 3)
        c(indiff,p_intercept,p_lv)
      }

I am trying to use ddply to apply it to each subset of the data per the 3 grouping variables:

ddply(data,.(sub,condition,delay),getIndiffs)

However, when I run this I get the error

Error in list_to_dataframe(res, attr(.data, "split_labels")) : Results do not have equal lengths

Strangely, this works fine when I use only 1 grouping variable but throws the error with 2+

Also, when I "simulate" splitting the dataset myself into a data drame only containing a subset split by the 3 grouping variables, my function works just fine. (Note: I've tried dif开发者_开发问答ferent ways of returning 3 variables or even returning just 1 variable and it does not work, either)

Basically, what I want to know is how to use plyr to use a function to return multiple variables.

Any other solutions to my problem that are fundamentally different are also welcome.


That error usually happens to me when my function applied to one of my pieces returns an empty data frame. In any case, an easy way to debug the situation is use dlply instead of ddply, and examine the output; for instance

x <- dlply(data,.(sub,condition,delay),getIndiffs)
sapply(x,ncol)

to check that they all have the same number of columns. If not, standardize your function more.

It looks like your function getIndiffs is designed to run on a single row, not on a whole dataframe. d*ply(x,vars,fn) hands fn() an entire data frame consisting of the subset of observations matching that group. Hm, also, the function can return in three different places -- at the end of each conditional clause. I think you meant to put c(indiff,p_intercept,p_lv) after the last } (and end your function with another }).

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号