开发者

Looping through a list and using elements within that list

开发者 https://www.devze.com 2023-04-08 09:41 出处:网络
In the code below, instead of starting with p=0.01 and then incrementing it, I\'d like to be able to do something like p=(1:99)/100 and then just loop through p. For example, instead of p=0.01, let\'s

In the code below, instead of starting with p=0.01 and then incrementing it, I'd like to be able to do something like p=(1:99)/100 and then just loop through p. For example, instead of p=0.01, let's let p=(1:99)/100. Now, I try to replace 1:99 in my for loop with p. When I run the code, however, I start having problems with coverage[i] (it returns numeric(0)). It seems like this should be fairly trivial so I'm hoping I'm just overlooking something in my code.

Also, if you see any easy boosts in efficiency please feel free to chime in! Thanks =)

w=seq(.01,.99,by=.01)
开发者_StackOverflowcoverage=seq(1:99)
p=0.01

for (i in 1:99){
    count = 0
    for (j in 1:1000){
        x = rbinom(30,1,p)  
        se=sqrt(sum(x)/30*(1-sum(x)/30)/30)
        if( sum(x)/30-1.644854*se < p && p < sum(x)/30+1.644854*se )
        count = count + 1
    }
    coverage[i]=count/1000
    print(coverage[i])
    p=p+0.01 
}


I would work on the middle part rather than that outer loop.

coverage <- p <- 1:99/100
z <- qnorm(0.95)

for (i in seq(along=p) ){
  # simulate all of the binomials/30 at once
  x <- rbinom(1000, 30, p[i])/30

  # ses
  se <- sqrt(x * (1-x)/30)

  # lower and upper limits
  lo <- x - z*se
  hi <- x + z*se

  # coverage
  coverage[i] <- mean(p[i] > lo & p[i] < hi)
}

This is almost instantaneous for me. The trick is to vectorize those simulations and calculations. Increasing to 100,000 simulation replicates took just 4 seconds on my 6-year-old Mac Pro.

(You'll want to increase the replicates to see the structure in the results; plot(p, coverage, las=1) with 100k reps gives the following; it wouldn't be clear with just 1000 reps.)

Looping through a list and using elements within that list


To answer the original question, setting i in p where p is 0.01, 0.02, etc, means that coverage[i] is trying to do coverage[0.01]; since [] requires an integer it truncates it to zero, resulting in a numeric of length zero, numeric(0).

One of the other solutions is better, but for reference, to do it using your original loop, you'd probably want something like

p <- seq(.01, .99, by=.01)
coverage <- numeric(length(p))
N <- 1000
n <- 30
k <- qnorm(1-0.1/2)
for (i in seq_along(p)) {
    count <- 0
    for (j in 1:N) {
        x <- rbinom(n, 1, p[i]) 
        phat <- mean(x) 
        se <- sqrt(phat * (1 - phat) / n)
        if( phat-k*se < p[i] && p[i] < phat+k*se ) {
            count <- count + 1
        }
    }
    coverage[i] <- count/N
}
coverage


@Karl Broman has a great solution that really shows how vectorization makes all the difference. It can still be improved slightly though (around 30%):

I prefer using vapply - although the speed improvement of it isn't noticeable here since the loop is only 99 times.

f <- function(n=1000) {
    z <- qnorm(0.95)/sqrt(30)

    vapply(1:99/100, function(p) {
      # simulate all of the binomials/30 at once
      x <- rbinom(n, 30, p)/30

      zse <- z*sqrt(x * (1-x))

      # coverage
      mean(x - zse < p & p < x + zse)
    }, numeric(1))
}

system.time( x <- f(50000) ) # 1.17 seconds

Here's a version of the OP's original code using vapply and it's about 20% faster than the original -but of course still magnitudes slower than the fully vectorized solutions...

g <- function(n=1000) {
    vapply(seq(.01,.99,by=.01), function(p) {
        count = 0L
        for (j in seq_len(n)) {
            x <- sum(rbinom(30,1,p))/30 

        # the constant is qnorm(0.95)/sqrt(30)
            zse <- 0.30030781175850279*sqrt(x*(1-x))
            if( x-zse < p && p < x+zse )
                count = count + 1L
        }
        count/n
    }, numeric(1))
}

system.time( x <- g(1000) ) # 1.04 seconds
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号