Get top n records for each group of grouped result

2018-12-30 23:57发布

The following is the simplest possible example, though any solution should be able to scale to however many n top results are needed:

Given a table like that below, with person, group, and age columns, how would you get the 2 oldest people in each group? (Ties within groups should not yield more results, but give the first 2 in alphabetical order)

+--------+-------+-----+
| Person | Group | Age |
+--------+-------+-----+
| Bob    | 1     | 32  |
| Jill   | 1     | 34  |
| Shawn  | 1     | 42  |
| Jake   | 2     | 29  |
| Paul   | 2     | 36  |
| Laura  | 2     | 39  |
+--------+-------+-----+

Desired result set:

+--------+-------+-----+
| Shawn  | 1     | 42  |
| Jill   | 1     | 34  |
| Laura  | 2     | 39  |
| Paul   | 2     | 36  |
+--------+-------+-----+

NOTE: This question builds on a previous one- Get records with max value for each group of grouped SQL results - for getting a single top row from each group, and which received a great MySQL-specific answer from @Bohemian:

select * 
from (select * from mytable order by `Group`, Age desc, Person) x
group by `Group`

Would love to be able to build off this, though I don't see how.

10条回答
后来的你喜欢了谁
2楼-- · 2018-12-31 00:07

How about using self-joining:

CREATE TABLE mytable (person, groupname, age);
INSERT INTO mytable VALUES('Bob',1,32);
INSERT INTO mytable VALUES('Jill',1,34);
INSERT INTO mytable VALUES('Shawn',1,42);
INSERT INTO mytable VALUES('Jake',2,29);
INSERT INTO mytable VALUES('Paul',2,36);
INSERT INTO mytable VALUES('Laura',2,39);

SELECT a.* FROM mytable AS a
  LEFT JOIN mytable AS a2 
    ON a.groupname = a2.groupname AND a.age <= a2.age
GROUP BY a.person
HAVING COUNT(*) <= 2
ORDER BY a.groupname, a.age DESC;

gives me:

a.person    a.groupname  a.age     
----------  -----------  ----------
Shawn       1            42        
Jill        1            34        
Laura       2            39        
Paul        2            36      

I was strongly inspired by the answer from Bill Karwin to Select top 10 records for each category

Also, I'm using SQLite, but this should work on MySQL.

Another thing: in the above, I replaced the group column with a groupname column for convenience.

Edit:

Following-up on the OP's comment regarding missing tie results, I incremented on snuffin's answer to show all the ties. This means that if the last ones are ties, more than 2 rows can be returned, as shown below:

.headers on
.mode column

CREATE TABLE foo (person, groupname, age);
INSERT INTO foo VALUES('Paul',2,36);
INSERT INTO foo VALUES('Laura',2,39);
INSERT INTO foo VALUES('Joe',2,36);
INSERT INTO foo VALUES('Bob',1,32);
INSERT INTO foo VALUES('Jill',1,34);
INSERT INTO foo VALUES('Shawn',1,42);
INSERT INTO foo VALUES('Jake',2,29);
INSERT INTO foo VALUES('James',2,15);
INSERT INTO foo VALUES('Fred',1,12);
INSERT INTO foo VALUES('Chuck',3,112);


SELECT a.person, a.groupname, a.age 
FROM foo AS a 
WHERE a.age >= (SELECT MIN(b.age)
                FROM foo AS b 
                WHERE (SELECT COUNT(*)
                       FROM foo AS c
                       WHERE c.groupname = b.groupname AND c.age >= b.age) <= 2
                GROUP BY b.groupname)
ORDER BY a.groupname ASC, a.age DESC;

gives me:

person      groupname   age       
----------  ----------  ----------
Shawn       1           42        
Jill        1           34        
Laura       2           39        
Paul        2           36        
Joe         2           36        
Chuck       3           112      
查看更多
皆成旧梦
3楼-- · 2018-12-31 00:10

In other databases you can do this using ROW_NUMBER. MySQL doesn't support ROW_NUMBER but you can use variables to emulate it:

SELECT
    person,
    groupname,
    age
FROM
(
    SELECT
        person,
        groupname,
        age,
        @rn := IF(@prev = groupname, @rn + 1, 1) AS rn,
        @prev := groupname
    FROM mytable
    JOIN (SELECT @prev := NULL, @rn := 0) AS vars
    ORDER BY groupname, age DESC, person
) AS T1
WHERE rn <= 2

See it working online: sqlfiddle


Edit I just noticed that bluefeet posted a very similar answer: +1 to him. However this answer has two small advantages:

  1. It it is a single query. The variables are initialized inside the SELECT statement.
  2. It handles ties as described in the question (alphabetical order by name).

So I'll leave it here in case it can help someone.

查看更多
旧时光的记忆
4楼-- · 2018-12-31 00:10

If the other answers are not fast enough Give this code a try:

SELECT
        province, n, city, population
    FROM
      ( SELECT  @prev := '', @n := 0 ) init
    JOIN
      ( SELECT  @n := if(province != @prev, 1, @n + 1) AS n,
                @prev := province,
                province, city, population
            FROM  Canada
            ORDER BY
                province   ASC,
                population DESC
      ) x
    WHERE  n <= 3
    ORDER BY  province, n;

Output:

+---------------------------+------+------------------+------------+
| province                  | n    | city             | population |
+---------------------------+------+------------------+------------+
| Alberta                   |    1 | Calgary          |     968475 |
| Alberta                   |    2 | Edmonton         |     822319 |
| Alberta                   |    3 | Red Deer         |      73595 |
| British Columbia          |    1 | Vancouver        |    1837970 |
| British Columbia          |    2 | Victoria         |     289625 |
| British Columbia          |    3 | Abbotsford       |     151685 |
| Manitoba                  |    1 | ...
查看更多
泪湿衣
5楼-- · 2018-12-31 00:13

Try this:

SELECT a.person, a.group, a.age FROM person AS a WHERE 
(SELECT COUNT(*) FROM person AS b 
WHERE b.group = a.group AND b.age >= a.age) <= 2 
ORDER BY a.group ASC, a.age DESC

DEMO

查看更多
后来的你喜欢了谁
6楼-- · 2018-12-31 00:13

There is a really nice answer to this problem at MySQL - How To Get Top N Rows per Each Group

Based on the solution in the referenced link, your query would be like:

SELECT Person, Group, Age
   FROM
     (SELECT Person, Group, Age, 
                  @group_rank := IF(@group = Group, @group_rank + 1, 1) AS group_rank,
                  @current_group := Group 
       FROM `your_table`
       ORDER BY Group, Age DESC
     ) ranked
   WHERE group_rank <= `n`
   ORDER BY Group, Age DESC;

where n is the top n and your_table is the name of your table.

I think the explanation in the reference is really clear. For quick reference I will copy and paste it here:

Currently MySQL does not support ROW_NUMBER() function that can assign a sequence number within a group, but as a workaround we can use MySQL session variables.

These variables do not require declaration, and can be used in a query to do calculations and to store intermediate results.

@current_country := country This code is executed for each row and stores the value of country column to @current_country variable.

@country_rank := IF(@current_country = country, @country_rank + 1, 1) In this code, if @current_country is the same we increment rank, otherwise set it to 1. For the first row @current_country is NULL, so rank is also set to 1.

For correct ranking, we need to have ORDER BY country, population DESC

查看更多
人间绝色
7楼-- · 2018-12-31 00:15

I wanted to share this because I spent a long time searching for an easy way to implement this in a java program I'm working on. This doesn't quite give the output you're looking for but its close. The function in mysql called GROUP_CONCAT() worked really well for specifying how many results to return in each group. Using LIMIT or any of the other fancy ways of trying to do this with COUNT didn't work for me. So if you're willing to accept a modified output, its a great solution. Lets say I have a table called 'student' with student ids, their gender, and gpa. Lets say I want to top 5 gpas for each gender. Then I can write the query like this

SELECT sex, SUBSTRING_INDEX(GROUP_CONCAT(cast(gpa AS char ) ORDER BY gpa desc), ',',5) 
AS subcategories FROM student GROUP BY sex;

Note that the parameter '5' tells it how many entries to concatenate into each row

And the output would look something like

+--------+----------------+
| Male   | 4,4,4,4,3.9    |
| Female | 4,4,3.9,3.9,3.8|
+--------+----------------+

You can also change the ORDER BY variable and order them a different way. So if I had the student's age I could replace the 'gpa desc' with 'age desc' and it will work! You can also add variables to the group by statement to get more columns in the output. So this is just a way I found that is pretty flexible and works good if you are ok with just listing results.

查看更多
登录 后发表回答