I would like to implement a simulation program, which requires the following structure:
It has a for loop, the program will generate an vector in each iteration. I need each generated vector is appended 开发者_高级运维to the existing vector.
I do not how how to do this in R. Thanks for the help.
These answers work, but they all require a call to a non-deterministic function like sample()
in the loop. This is not loop-invariant code (it is random each time), but it can still be moved out of the for
loop. The trick is to use the n
argument and generate all the random numbers you need beforehand (if your problem allows this; some may not, but many do). Now you make one call rather than n
calls, which matters if your n
is large. Here is a quick example random walk (but many problems can be phrased this way). Also, full disclosure: I haven't had any coffee today, so please point out if you see an error :-)
steps <- 30
n <- 100
directions <- c(-1, 1)
results <- vector('list', n)
for (i in seq_len(n)) {
walk <- numeric(steps)
for (s in seq_len(steps)) {
walk[s] <- sample(directions, 1)
}
results[[i]] <- sum(walk)
}
We can rewrite this with one call to sample()
:
all.steps <- sample(directions, n*steps, replace=TRUE)
dim(all.steps) <- c(n, steps)
walks <- apply(all.steps, 1, sum)
Proof of speed increase (n=10000):
> system.time({
+ for (i in seq_len(n)) {
+ walk <- numeric(steps)
+ for (s in seq_len(steps)) {
+ walk[s] <- sample(directions, 1)
+ }
+ results[[i]] <- sum(walk)
+ }})
user system elapsed
4.231 0.332 4.758
> system.time({
+ all.steps <- sample(directions, n*steps, replace=TRUE)
+ dim(all.steps) <- c(n, steps)
+ walks <- apply(all.steps, 1, sum)
+ })
user system elapsed
0.010 0.001 0.012
If your simulation needs just one random variable per simulation function call, use sapply()
, or better yet the multicore
package's mclapply()
. Revolution Analytics's foreach
package may be of use here too. Also, JD Long has a great presentation and post about simulating stuff in R on Hadoop via Amazon's EMR here (I can't find the video, but I'm sure someone will know).
Take home points:
- Preallocate with
numeric(n)
orvector('list', n)
- Push invariant code out of
for
loops. Cleverly push stochastic functions out of code with theirn
argument. - Try hard for
sapply()
orlapply()
, or better yetmclapply
. - Don't use
x <- c(x, rnorm(100))
. Every time you do this, a member of R-core kills a puppy.
Probably the best thing you can do is preallocate a list of length n (n is number of iterations) and flatten out the list after you're done.
n <- 10
start <- vector("list", n)
for (i in 1:n) {
a[[i]] <- sample(10)
}
start <- unlist(start)
You could do it the old nasty way. This may be slow for larger vectors.
start <- c()
for (i in 1:n) {
add <- sample(10)
start <- c(start, add)
}
x <- rnorm(100)
for (i in 100) {
x <- c(x, rnorm(100))
}
This link should be useful: http://www.milbo.users.sonic.net/ra/
Assuming your simulation function -- call it func -- returns a vector with the same length each time, you can store the results in the columns of a pre-allocated matrix:
sim1 <- function(reps, func) {
first <- func()
result <- matrix(first, nrow=length(first), ncol=reps)
for (i in seq.int(from=2, to=reps - 1)) {
result[, i] <- func()
}
return(as.vector(result))
}
Or you could express it as follows using replicate:
sim2 <- function(reps, func) {
return(as.vector(replicate(reps, func(), simplify=TRUE)))
}
> sim2(3, function() 1:3)
[1] 1 2 3 1 2 3 1 2 3
精彩评论