开发者

How can I do SQL like operations on a R data frame?

开发者 https://www.devze.com 2023-03-25 14:12 出处:网络
For example, I have a data frame with data across categories and subcategories and I want to be able to get row with maximum value in a particular column etc.

For example, I have a data frame with data across categories and subcategories and I want to be able to get row with maximum value in a particular column etc.

SQL is what comes to mind first. But since I am not interested in joins or indices etc, python's list comprehensions would do the same thing better with a more modern syntax.

What's best practice in R for such operations?

EDIT: For now I think I am fine with which.max. Why I asked the question the way I did is simply that I have come to learn that in R there are many libraries etc doing pretty much the same thing. Just by reading the documentation it's very hard to evaluate how popular (ie how well the library fulfills its purpose). My personal experience with Python is that the day you figure ou开发者_StackOverflow中文版t how to use list comprehensions (with itertools as a bonus), you are pretty much covered. Over time this has evolved as best practice, you don't see lambda and filter for example that often in the general python debate these days as list comprehensions does the same thing easier and more uniform.


If you really mean SQL, a pretty straightforward answer is the 'sqldf' package:

http://cran.at.r-project.org/web/packages/sqldf/index.html

From the help for ?sqldf

library(sqldf)
a1s <- sqldf("select * from warpbreaks limit 6")


Some additional context would help, but from the sounds of it - you may be looking for which.max() or the related functions. For group by operations, I default to the plyr family of functions, but there are certainly faster alternatives in base R if speed is of utmost importance.

library(plyr)
#Make a local copy of mycars data and add the rownames as a column since ddply
#seems to drop them. I've never encountered that before actually...
myCars <- mtcars
myCars$carname <- rownames(myCars)

#Find the max mpg
myCars[which.max(myCars$mpg) ,]

                mpg cyl disp hp drat    wt qsec vs am gear carb        carname
Toyota Corolla 33.9   4 71.1 65 4.22 1.835 19.9  1  1    4    1 Toyota Corolla

#Find the max mpg by cylinder category
ddply(myCars, "cyl", function(x) x[which.max(x$mpg) ,])

   mpg cyl  disp  hp drat    wt  qsec vs am gear carb          carname
1 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1   Toyota Corolla
2 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1   Hornet 4 Drive
3 19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2 Pontiac Firebird
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号