In this article, we are discussing the process of getting the latest records for each group using SQL.
For the demonstration, we have some data in a table (check the table below). In the table, there is entry_id, which is the primary key, there is city which is the name of the city, there is date, and rain_measurement.
It contains rain measurement data for different cities.
Our job is to find the rain measurement of the latest entry date for each city. So, for each city let’s get the latest record by date.
Here we have discussed 2 methods for getting the latest records for each group-
Let’s check the solution step-by-step.
Sample Data
Table name: rain_data
Table data:
entry_id | city | date | rain_measurement |
---|---|---|---|
1 | city1 | 2012-01-01 | 1.0 |
2 | city1 | 2012-02-02 | 1.0 |
3 | city1 | 2012-03-03 | 1.3 |
4 | city2 | 2012-01-01 | 1.0 |
5 | city2 | 2012-02-02 | 3.6 |
6 | city2 | 2012-03-03 | 1.8 |
7 | city3 | 1012-01-01 | 1.0 |
8 | city3 | 2012-02-02 | 2.1 |
9 | city3 | 2012-03-03 | 1.0 |
10 | city3 | 2012-03-04 | 0.9 |
Use the following query to create the table and populate data, if you want to follow along.
CREATE TABLE `rain_data` (
`entry_id` int(11) NOT NULL AUTO_INCREMENT,
`city` varchar(100) DEFAULT NULL,
`date` date DEFAULT NULL,
`rain_measurement` decimal(10,1) DEFAULT NULL,
PRIMARY KEY (`entry_id`)
) ENGINE=InnoDB;
INSERT INTO `rain_data`(`entry_id`,`city`,`date`,`rain_measurement`) values
(1,'city1','2012-01-01',1.0),
(2,'city1','2012-02-02',1.0),
(3,'city1','2012-03-03',1.3),
(4,'city2','2012-01-01',1.0),
(5,'city2','2012-02-02',3.6),
(6,'city2','2012-03-03',1.8),
(7,'city3','1012-01-01',1.0),
(8,'city3','2012-02-02',2.1),
(9,'city3','2012-03-03',1.0),
(10,'city3','2012-03-04',0.9);
Though it can be approached in a few different ways, we want to approach it in two different ways here. The first one is by using a group by of one column (city) and trying to get the max of another column (date) and fetch data from that row. And the second one is by using ROW_NUMBER() window function(this is MySQL specific implementation).
Method #1: [General] Using Join
In this method, we will use GROUP BY
and JOIN
. This is a general process and will work on any relational database system.
The following queries are written for MySQL, but the same process and steps will work for any database.
Let’s discuss the process step by step.
Step #1: Group by and get the MAX date for each
If we do a GROUP BY with any(one or more) column (in this case, we are performing a GROUP BY city), we will get distinct cities, and using MAX(date) we will get the maximum date for that city:
SELECT city, MAX(date) AS max_date FROM rain_data GROUP BY city;
Using this query we will get results like the below:
city | date |
---|---|
city1 | 2012-03-03 |
city2 | 2012-03-03 |
city3 | 2012-03-04 |
This query gives us each city and the MAX date for that city.
Step #2: Join the main table with the subquery result
Now we will use it as a subquery and INNER JOIN with the main table. The join needs to be based on the columns that we selected in the previous query (city and max_date).
The final query will be like the below:
SELECT rd.*
FROM rain_data rd
INNER JOIN (SELECT city, MAX(`date`) AS max_date FROM rain_data GROUP BY city) grd
ON rd.city=grd.city
AND rd.date=grd.max_date;
Output
We will get entry_id, city, date, and rain_easurement for the last date for each city. The output will be something like this:
entry_id | city | date | rain_measurement |
---|---|---|---|
3 | city1 | 2012-03-03 | 1.3 |
6 | city2 | 2012-03-03 | 1.8 |
10 | city3 | 2012-03-04 | 0.9 |
Query EXPLAIN output
We have 10 rows in the table, and here is the EXPLAIN output of this final query –
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | PRIMARY | rd | (NULL) | ALL | (NULL) | (NULL) | (NULL) | (NULL) | 10 | 100.00 | Using where |
1 | PRIMARY | <derived2> | (NULL) | ref | <auto_key0> | <auto_key0> | 407 | rd.city, rd.date | 2 | 100.00 | Using index |
2 | DERIVED | rain_data | (NULL) | ALL | (NULL) | (NULL) | (NULL) | (NULL) | 10 | 100.00 | Using temporary |
Method #2: [For MySQL] Using ROW_NUMBER()
In this method, we will use MySQL 8 Window Function ROW_NUMBER() to rank the rows of data for each city and based on the rank we will filter the latest row.
This method will not work in MySQL versions lower than 8. As ROW_NUMBER() window function is available in MySQL 8 (and above).
Let’s discuss the process step by step.
Step #1: Rank data using ROW_NUMBER()
Use the following query to rank data for each city based on date.
SELECT *,
ROW_NUMBER() OVER (PARTITION BY city ORDER BY `date` DESC) AS rank_no
FROM rain_data
ORDER BY city ASC,
rank_no ASC
We are rating rows for each city based on the date, and ranking it in descending order, so the row which has the latest date will have the lowest rank (rank_no = 1) for each city. This query will give you result like the below:
entry_id | city | date | rain_measurement | rank_no |
---|---|---|---|---|
3 | city1 | 2012-03-03 | 1.3 | 1 |
2 | city1 | 2012-02-02 | 1.0 | 2 |
1 | city1 | 2012-01-01 | 1.0 | 3 |
6 | city2 | 2012-03-03 | 1.8 | 1 |
5 | city2 | 2012-02-02 | 3.6 | 2 |
4 | city2 | 2012-01-01 | 1.0 | 3 |
10 | city3 | 2012-03-04 | 0.9 | 1 |
9 | city3 | 2012-03-03 | 1.0 | 2 |
8 | city3 | 2012-02-02 | 2.1 | 3 |
7 | city3 | 1012-01-01 | 1.0 | 4 |
Step #2: Filter and take rows with rank_no= 1
Now use the filter rank_no = 1
in the select query, like below.
SELECT *
FROM
(SELECT *,
ROW_NUMBER() OVER (PARTITION BY city ORDER BY `date` DESC) AS rank_no
FROM rain_data) rd
WHERE rd.rank_no = 1
Or you can write it in the format below if you prefer.
WITH ranked_rain_data AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY city ORDER BY `date` DESC) AS rank_no
FROM rain_data
)
SELECT * FROM ranked_rain_data WHERE rank_no= 1;
Both will give the same result, like below.
Output
You will get the following output:
entry_id | city | date | rain_measurement | rank_no |
---|---|---|---|---|
3 | city1 | 2012-03-03 | 1.3 | 1 |
6 | city2 | 2012-03-03 | 1.8 | 1 |
10 | city3 | 2012-03-04 | 0.9 | 1 |
Query EXPLAIN output
EXPLAIN output of this final query using MySQL ROW_NUMBER() is as below –
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | PRIMARY | <derived2> | (NULL) | ref | <auto_key0> | <auto_key0> | 8 | const | 1 | 100.00 | (NULL) |
2 | DERIVED | rain_data | (NULL) | ALL | (NULL) | (NULL) | (NULL) | (NULL) | 10 | 100.00 | Using filesort |
I hope that it will help you.
Lifesaver! Thank You.
Happy to help, Andre