I have a table with >2M rows. I am only interested in percentiles of one variable vs. percentiles of number of observations (e.g. Lorentz curve).
- How do I create a smaller dataframe that contains e.g. observations number 1,101,201,301,...,last , or observations that corresponds to e.g. the 1,2,3,...,100 percentile of total number of observations? 
- Is there a quick way to get the lorenz curve of (index, variable) with axes on a percentage basis? Right开发者_如何学JAVA now I was thinking of adding variables for percentiles of index and variables and then plot them against each other. 
Thanks,
Roberto
As for the first question, I would use the quantile function, to get a subset of the dataframe according to the 1,2,3,...,100 percentile of the total number of (say) first column's observations (assuming integer values in column 1)
df[df[,1] %in% round(quantile(df[,1], probs = c(1:100)/100)),]
For a 'big' dataset
dfr <- data.frame(x = 1:1000, y = runif(1000))
You can take subsets of regularly spaced rows with
dfr[!(seq_len(nrow(dfr)) %% 50),]
Or random subsets with
dfr[sample(nrow(dfr), 20),]
As gd047 mentioned, use quantile to get quantiles/percentiles.
 
         
                                         
                                         
                                         
                                        ![Interactive visualization of a graph in python [closed]](https://www.devze.com/res/2023/04-10/09/92d32fe8c0d22fb96bd6f6e8b7d1f457.gif) 
                                         
                                         
                                         
                                         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论