I'm searching the John Tukey algorithm which compute a "resistant line" or "median-median line" on my linear regression with R.
A student on a mailling list explain this algorithm in these terms :
"The way it's calculated is to divide the data into three groups, find the x-median and y-median values (called the summary point) for each group, and then use those three summary points to determine the line. The outer two summary points determine the slope, and an average of all of them determines the intercept."
Article about John tukey's median 开发者_运维百科median for curious : http://www.johndcook.com/blog/2009/06/23/tukey-median-ninther/
Do you have an idea of where i could find this algorithm or R function ? In which packages, Thanks a lot !
There's a description of how to calculate the median-median line here. An R implementation of that is
median_median_line <- function(x, y, data)
{
  if(!missing(data))
  {
    x <- eval(substitute(x), data) 
    y <- eval(substitute(y), data) 
  }
  
  stopifnot(length(x) == length(y))
  #Step 1
  one_third_length <- floor(length(x) / 3)
  groups <- rep(1:3, times = switch((length(x) %% 3) + 1,
     one_third_length,
     c(one_third_length, one_third_length + 1, one_third_length),
     c(one_third_length + 1, one_third_length, one_third_length + 1)
  ))
  #Step 2
  x <- sort(x)
  y <- sort(y)
  
  #Step 3
  median_x <- tapply(x, groups, median)                                 
  median_y <- tapply(y, groups, median)
  #Step 4
  slope <- (median_y[3] - median_y[1]) / (median_x[3] - median_x[1])
  intercept <- median_y[1] - slope * median_x[1]
  #Step 5
  middle_prediction <- intercept + slope * median_x[2]
  intercept <- intercept + (median_y[2] - middle_prediction) / 3
  c(intercept = unname(intercept), slope = unname(slope))
}
To test it, here's an example:
dfr <- data.frame(
  time = c(.16, .24, .25, .30, .30, .32, .36, .36, .50, .50, .57, .61, .61, .68, .72, .72, .83, .88, .89),
  distance = c(12.1, 29.8, 32.7, 42.8, 44.2, 55.8, 63.5, 65.1, 124.6, 129.7, 150.2, 182.2, 189.4, 220.4, 250.4, 261.0, 334.5, 375.5, 399.1))
  
median_median_line(time, distance, dfr) 
#intercept     slope 
#   -113.6     520.0
Note the slightly odd way of specifying the groups.  The instructions are quite picky about how you define group sizes, so the more obvious method of cut(x, quantile(x, seq.int(0, 1, 1/3))) doesn't work.
I'm a little late to the party, but have you tried line() from the stats package?
From the helpfile:
Value
An object of class "tukeyline".
References
Tukey, J. W. (1977). Exploratory Data Analysis, Reading Massachusetts: Addison-Wesley.
As member of the R Core team, I now have digged in the source code, and also studied the history of it.
Conclusion: The source C source code, added in 19961997, when R was still called alpha (and around version 0.14alpha) already computed the quantiles not quite correctly... for some sample sizes.
More about this on the R mailing lists (not yet).
 
         
                                         
                                         
                                         
                                        ![Interactive visualization of a graph in python [closed]](https://www.devze.com/res/2023/04-10/09/92d32fe8c0d22fb96bd6f6e8b7d1f457.gif) 
                                         
                                         
                                         
                                         加载中,请稍侯......
 加载中,请稍侯......
      
精彩评论