SQL Tips: Select Latest Record for Each Group

In this article, we are discussing the process of getting the latest records for each group using SQL.

For the demonstration, we have some data in a table (check the table below). In the table, there is entry_id, which is the primary key, there is city which is the name of the city, there is date, and rain_measurement.

It contains rain measurement data for different cities.

Our job is to find the rain measurement of the latest entry date for each city. So, for each city let’s get the latest record by date.

Here we have discussed 2 methods for getting the latest records for each group-

Let’s check the solution step-by-step.

Sample Data

Table name: rain_data

Table data:

entry_idcitydaterain_measurement
1city12012-01-011.0
2city12012-02-021.0
3city12012-03-031.3
4city22012-01-011.0
5city22012-02-023.6
6city22012-03-031.8
7city31012-01-011.0
8city32012-02-022.1
9city32012-03-031.0
10city32012-03-040.9

Use the following query to create the table and populate data, if you want to follow along.

CREATE TABLE `rain_data` (
  `entry_id` int(11) NOT NULL AUTO_INCREMENT,
  `city` varchar(100) DEFAULT NULL,
  `date` date DEFAULT NULL,
  `rain_measurement` decimal(10,1) DEFAULT NULL,
  PRIMARY KEY (`entry_id`)
) ENGINE=InnoDB;

INSERT INTO `rain_data`(`entry_id`,`city`,`date`,`rain_measurement`) values 
(1,'city1','2012-01-01',1.0),
(2,'city1','2012-02-02',1.0),
(3,'city1','2012-03-03',1.3),
(4,'city2','2012-01-01',1.0),
(5,'city2','2012-02-02',3.6),
(6,'city2','2012-03-03',1.8),
(7,'city3','1012-01-01',1.0),
(8,'city3','2012-02-02',2.1),
(9,'city3','2012-03-03',1.0),
(10,'city3','2012-03-04',0.9);

Though it can be approached in a few different ways, we want to approach it in two different ways here. The first one is by using a group by of one column (city) and trying to get the max of another column (date) and fetch data from that row. And the second one is by using ROW_NUMBER() window function(this is MySQL specific implementation).

Method #1: [General] Using Join

In this method, we will use GROUP BY and JOIN. This is a general process and will work on any relational database system.

The following queries are written for MySQL, but the same process and steps will work for any database.

Let’s discuss the process step by step.

Step #1: Group by and get the MAX date for each

If we do a GROUP BY with any(one or more) column (in this case, we are performing a GROUP BY city), we will get distinct cities, and using MAX(date) we will get the maximum date for that city:

SELECT city, MAX(date) AS max_date FROM rain_data GROUP BY city;

Using this query we will get results like the below:

citydate
city12012-03-03
city22012-03-03
city32012-03-04

This query gives us each city and the MAX date for that city.

Step #2: Join the main table with the subquery result

Now we will use it as a subquery and INNER JOIN with the main table. The join needs to be based on the columns that we selected in the previous query (city and max_date).

SQL Subquery join diagram form getting latest record for group

The final query will be like the below:

SELECT rd.*
  FROM rain_data rd
  INNER JOIN (SELECT city, MAX(`date`) AS max_date FROM rain_data GROUP BY city) grd
  ON rd.city=grd.city 
  AND rd.date=grd.max_date;

Output

We will get entry_id, city, date, and rain_easurement for the last date for each city. The output will be something like this:

entry_idcitydaterain_measurement
3city12012-03-031.3
6city22012-03-031.8
10city32012-03-040.9

Query EXPLAIN output

We have 10 rows in the table, and here is the EXPLAIN output of this final query –

idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1PRIMARYrd(NULL)ALL(NULL)(NULL)(NULL)(NULL)10100.00Using where
1PRIMARY<derived2>(NULL)ref<auto_key0><auto_key0>407rd.city,
rd.date
2100.00Using index
2DERIVEDrain_data(NULL)ALL(NULL)(NULL)(NULL)(NULL)10100.00Using temporary

Method #2: [For MySQL] Using ROW_NUMBER()

In this method, we will use MySQL 8 Window Function ROW_NUMBER() to rank the rows of data for each city and based on the rank we will filter the latest row.

This method will not work in MySQL versions lower than 8. As ROW_NUMBER() window function is available in MySQL 8 (and above).

Let’s discuss the process step by step.

Step #1: Rank data using ROW_NUMBER()

Use the following query to rank data for each city based on date.

SELECT *,
       ROW_NUMBER() OVER (PARTITION BY city ORDER BY `date` DESC) AS rank_no
FROM rain_data
ORDER BY city ASC,
         rank_no ASC

We are rating rows for each city based on the date, and ranking it in descending order, so the row which has the latest date will have the lowest rank (rank_no = 1) for each city. This query will give you result like the below:

entry_idcitydaterain_measurementrank_no
3city12012-03-031.31
2city12012-02-021.02
1city12012-01-011.03
6city22012-03-031.81
5city22012-02-023.62
4city22012-01-011.03
10city32012-03-040.91
9city32012-03-031.02
8city32012-02-022.13
7city31012-01-011.04

Step #2: Filter and take rows with rank_no= 1

Now use the filter rank_no = 1 in the select query, like below.

SELECT *
FROM
  (SELECT *,
          ROW_NUMBER() OVER (PARTITION BY city ORDER BY `date` DESC) AS rank_no
   FROM rain_data) rd
WHERE rd.rank_no = 1

Or you can write it in the format below if you prefer.

WITH ranked_rain_data AS (
   SELECT *,
          ROW_NUMBER() OVER (PARTITION BY city ORDER BY `date` DESC) AS rank_no
   FROM rain_data
)
SELECT * FROM ranked_rain_data WHERE rank_no= 1;

Both will give the same result, like below.

Output

You will get the following output:

entry_idcitydaterain_measurementrank_no
3city12012-03-031.31
6city22012-03-031.81
10city32012-03-040.91

Query EXPLAIN output

EXPLAIN output of this final query using MySQL ROW_NUMBER() is as below –

idselect_typetablepartitionstypepossible_keyskeykey_lenrefrowsfilteredExtra
1PRIMARY<derived2>(NULL)ref<auto_key0><auto_key0>8const1100.00(NULL)
2DERIVEDrain_data(NULL)ALL(NULL)(NULL)(NULL)(NULL)10100.00Using filesort

I hope that it will help you.

2 thoughts on “SQL Tips: Select Latest Record for Each Group”

Leave a Comment


The reCAPTCHA verification period has expired. Please reload the page.