开发者

group by range in mysql

开发者 https://www.devze.com 2023-03-20 08:31 出处:网络
Table: new_table user_number| diff 2|0 1|28 2|32 1|40 1|53 1|59 1|101 1|105 2|108 2|129 2|130 1|144 (result) v range| number of users
Table:   
new_table                                                    
user_number  | diff                  
     2       |  0                      
     1       |  28  
     2       |  32  
     1       |  40  
     1       |  53  
     1       |  59  
     1       |  101  
     1       |  105  
     2       |  108  
     2       |  129  
     2       |  130    
     1       |  144  


            |(result)
            v

range  | number of users  
0-20   |  2  
21-41  |  3  
42-62  |  1  
63-83  |  2  
84-104 |  1  
105-135|  0  
136-156|  3


select t.range as [range], count(*) as [number of users]  
from (  
  select case    
    when diff between 0 and 20 then ' 0-20'  
    when diff between 21 and 41 then ' 21-41'  
    when diff between 42 and 62 then ' 42-62'  
    when diff between 63 and 83 then ' 63-83'  
    when diff between 84 and 104 then ' 84-104'  
    when diff between 105 and 135 then ' 105-135'  
    else '136-156'   
     end as range  
  from new_table) t  
group by t.diff  

Error:

You have an error in your SQL syntax, near '[range], count(*) as [number of users]  
from (  
  select case  
    when' at line 1  
开发者_如何学Python


Here is general code to group by range since doing a case statement gets pretty cumbersome.

The function 'floor' can be used to find the bottom of the range (not 'round' as Bohemian used), and add the amount (19 in the example below) to find the top of the range. Remember to not overlap the bottom and top of the ranges!

mysql> create table new_table (user_number int, diff int);
Query OK, 0 rows affected (0.14 sec)

mysql>  insert into new_table values (2, 0), (1, 28), (2, 32), (1, 40), (1, 53),
        (1, 59), (1, 101), (1, 105), (2, 108), (2, 129), (2, 130), (1, 144);
Query OK, 12 rows affected (0.01 sec)
Records: 12  Duplicates: 0  Warnings: 0

mysql> select concat(21*floor(diff/21), '-', 21*floor(diff/21) + 20) as `range`,
       count(*) as `number of users` from new_table group by 1 order by diff;
+---------+-----------------+
| range   | number of users |
+---------+-----------------+
| 0-20    |               1 |
| 21-41   |               3 |
| 42-62   |               2 |
| 84-104  |               1 |
| 105-125 |               2 |
| 126-146 |               3 |
+---------+-----------------+
6 rows in set (0.01 sec)


Here's a solution that will work for any magnitude of diff:

select
  concat(21 * round(diff / 21), '-', 21 * round(diff / 21) + 20) as `range`,
  count(*) as `number of users`
from new_table
group by 1
order by diff;

Here's some testable code and its output:

create table new_table (user_number int, diff int);
insert into new_table values (2, 0), (1, 28), (2, 32), (1, 40), (1, 53), (1, 59), (1, 101), (1, 105), (2, 108), (2, 129), (2, 130), (1, 144); 
-- run query, output is: 
+---------+-----------------+
| range   | number of users |
+---------+-----------------+
| 0-20    |               1 |
| 21-41   |               1 |
| 42-62   |               2 |
| 63-83   |               2 |
| 105-125 |               3 |
| 126-146 |               2 |
| 147-167 |               1 |
+---------+-----------------+


If you have regular ranges, a quicker solution would be to group with the help of div function.

For instance:

select diff div 20 as range_code, sum(user_number)
from new_table
group by diff div 20;

Ranges are represented as single digits in that case and you have to know what they mean: 0 = 0-19, 1 = 20-39, 2 = 40-59,...

If you need different ranges use different divider or maybe subtract some number from diff. For instance "(diff - 1) div 10" gives you ranges 1-10, 11-20, 21-30,...


Mysql as a delimiter for keywords uses backtick sign " ` ", not square brackets (like sql server)


select 
case
when diff between 0 and 20 then ' 0-20'
when diff between 0 and 20 then ' 21-41'
when diff between 0 and 20 then ' 42-62'
when diff between 0 and 20 then ' 63-83'
when diff between 0 and 20 then ' 84-104'
when diff between 0 and 20 then ' 105-135'
else '136-156'
end; as 'range',
count(*) as 'number of users'


from new_table
group by range


range is a mysql keyword. You should "scape" it using ´ :

select t.`range` as [`range`], ...


One obvious mistake : Mysql uses backticks(

`

), not [](as sqlserver) . Change t.range as [range], count(*) as [number of users] to

t.range as `range`, count(*) as `number of users`


You might want to check Are square brackets valid in an SQL query?

I suspect that '[' and ']' are used in Microsoft's SQL but not mysql.


This won't be the exact solution for this question, but it is just similar suggestion for someone other. I need to create the number buckets too, because if I groupped by number i got 9k different values.

I need to have smaller count of groups.

I managed it by grouping by logarithm (and round it). Now instead of 9k groups i have only 18 groups. (Then i will use it for PDF or CDF for 1-x scale score computation).

SELECT COUNT(*) AS `Rows`, round(log(`diff`)) f FROM `users` GROUP BY f ORDER BY f

group by range in mysql


Here is a more generalized approach to binning in SQL:

SELECT
    concat(
        binsize * floor(diff / binsize),
        ' - ',
        binsize * floor(diff / binsize) + binsize - 1
    ) as range,
    count(*) as number_of_rows
FROM
    new_table,
    (
        SELECT
            21 as binsize
        FROM dual
    ) as prm
GROUP BY 1
ORDER BY floor(diff / binsize)

This way, you only have to provide the size of your range (called bins) once, in the sub query from dual.

The sub query returns a table of size 1 in both dimensions, a single row with a single column. This table is cross tabulated with each row of the other table, so its value is accessible in each row of the first table. This works without specifying a join condition.

As long as you only return a single row, you can add parameters to your sub query. For example, you can define upper and lower bounds to exclude certain features from your result this way:

SELECT
    concat(
        binsize * floor(diff / binsize),
        ' - ',
        binsize * floor(diff / binsize) + binsize - 1
    ) as range,
    count(*) as number_of_rows
FROM
    new_table,
    (
        SELECT
            21 as binsize,
            21 as above,
            83 as below
        FROM dual
    ) as prm
WHERE
    diff >= above
    AND diff <= below
GROUP BY 1
ORDER BY floor(diff / binsize)

If your RDBMS supports it, consider restructuring your query to a CTE (Common Table Expression), which helps in making the expression look neater and more tidy by putting the declaration of parameters right to the start of the whole statement:

WITH prms as (
    SELECT
        21 as binsize,
        21 as above,
        83 as below
    FROM dual
)
SELECT
    concat(
        binsize * floor(diff / binsize),
        ' - ',
        binsize * floor(diff / binsize) + binsize - 1
    ) as range,
    count(*) as number_of_rows
FROM
    new_table, prms
WHERE
    diff >= above
    AND diff <= below
GROUP BY 1
ORDER BY floor(diff / binsize)
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号