SQL: Select Latest Record for Each Group

In this article, we are discussing the process of getting the latest records for each group using SQL.

For the demonstration, we have records of rain data for different cities, on different dates(check the table below).

We want to find the rain measurement of the latest entry date for each city.

Sample Data

Here is the sample data we are using here for this demo-

Table name: rain_data

Table data:

In the table, there is entry_id, which is the primary key. There is city which is the name of the city, there is date, and rain_measurement.

entry_idcitydaterain_measurement
1city12012-01-011.0
2city12012-02-021.0
3city12012-03-031.3
4city22012-01-011.0
5city22012-02-023.6
6city22012-03-031.8
7city31012-01-011.0
8city32012-02-022.1
9city32012-03-031.0
10city32012-03-040.9

Use the following query to create the table and populate data, if you want to follow along.

CREATE TABLE `rain_data` (
  `entry_id` int(11) NOT NULL AUTO_INCREMENT,
  `city` varchar(100) DEFAULT NULL,
  `date` date DEFAULT NULL,
  `rain_measurement` decimal(10,1) DEFAULT NULL,
  PRIMARY KEY (`entry_id`)
) ENGINE=InnoDB;

INSERT INTO `rain_data`(`entry_id`,`city`,`date`,`rain_measurement`) values 
(1,'city1','2012-01-01',1.0),
(2,'city1','2012-02-02',1.0),
(3,'city1','2012-03-03',1.3),
(4,'city2','2012-01-01',1.0),
(5,'city2','2012-02-02',3.6),
(6,'city2','2012-03-03',1.8),
(7,'city3','1012-01-01',1.0),
(8,'city3','2012-02-02',2.1),
(9,'city3','2012-03-03',1.0),
(10,'city3','2012-03-04',0.9);

Expected Result

Here is the final result we are expecting, for the above sample data-

entry_idcitydaterain_measurement
3city12012-03-031.3
6city22012-03-031.8
10city32012-03-040.9

Though it can be approached in a few different ways, we want to approach it in 3 different ways here.

Here are the methods we are discussing here, for getting the latest records for each group-

Let’s check the solution step-by-step.

Method #1: Using Join and Subquery

In this method, we will use GROUP BY and JOIN. This is a general process and will work on any relational database system.

The following queries are written for MySQL, but the same process and steps will work for any database.

Let’s discuss the process step by step.

Step #1: Group by and get the MAX date for each

If we do a GROUP BY with any(one or more) column (in this case, we are performing a GROUP BY city), we will get distinct cities, and using MAX(date) we will get the maximum date for that city:

SELECT city, MAX(date) AS max_date FROM rain_data GROUP BY city;

Using this query we will get results like the below:

citydate
city12012-03-03
city22012-03-03
city32012-03-04

This query gives us each city and the MAX date for that city.

Step #2: Join the main table with the subquery result

Now we will use it as a subquery and INNER JOIN with the main table. The join needs to be based on the columns that we selected in the previous query (city and max_date).

SQL Subquery join diagram form getting latest record for group

The final query will be like the below:

SELECT rd.*
  FROM rain_data rd
  INNER JOIN (SELECT city, MAX(`date`) AS max_date FROM rain_data GROUP BY city) grd
  ON rd.city=grd.city 
  AND rd.date=grd.max_date;

Output

We will get entry_id, city, date, and rain_easurement for the last date for each city. The output will be something like this:

entry_idcitydaterain_measurement
3city12012-03-031.3
6city22012-03-031.8
10city32012-03-040.9

Method #2: Using ROW_NUMBER()

In this method, we will use MySQL 8 Window Function ROW_NUMBER() to rank the rows of data for each city and based on the rank we will filter the latest row.

This method will not work in MySQL versions lower than 8. As ROW_NUMBER() window function is available in MySQL 8 (and above).

Let’s discuss the process step by step.

Step #1: Rank data using ROW_NUMBER()

Use the following query to rank data for each city based on date.

SELECT *,
       ROW_NUMBER() OVER (PARTITION BY city ORDER BY `date` DESC) AS rank_no
FROM rain_data
ORDER BY city ASC,
         rank_no ASC

We are rating rows for each city based on the date, and ranking it in descending order, so the row which has the latest date will have the lowest rank (rank_no = 1) for each city. This query will give you result like the below:

entry_idcitydaterain_measurementrank_no
3city12012-03-031.31
2city12012-02-021.02
1city12012-01-011.03
6city22012-03-031.81
5city22012-02-023.62
4city22012-01-011.03
10city32012-03-040.91
9city32012-03-031.02
8city32012-02-022.13
7city31012-01-011.04

Step #2: Filter and take rows with rank_no= 1

Now use the filter rank_no = 1 in the select query, like below.

SELECT *
FROM
  (SELECT *,
          ROW_NUMBER() OVER (PARTITION BY city ORDER BY `date` DESC) AS rank_no
   FROM rain_data) rd
WHERE rd.rank_no = 1

Or you can write it in the format below if you prefer.

WITH ranked_rain_data AS (
   SELECT *,
          ROW_NUMBER() OVER (PARTITION BY city ORDER BY `date` DESC) AS rank_no
   FROM rain_data
)
SELECT * FROM ranked_rain_data WHERE rank_no= 1;

Both will give the same result, like below.

Output

You will get the following output:

entry_idcitydaterain_measurementrank_no
3city12012-03-031.31
6city22012-03-031.81
10city32012-03-040.91

Method #3: Using LEFT JOIN with Same Table

Here is another approach we can take to achieve the same result. We are using LEFT JOIN here, and joining the same table.

The trick is to add a condition in the where clause that is date_from_first_table_ref < date_from_second_table_ref, and then filter only those rows that do not have any entry on the right.

Let’s check step by step-

Step #1: LEFT JOIN the same table

LEFT JOIN rain_data table with the same table. The join will be based on the “city” column, if the city column is the same then we are joining.

SELECT
  rd1.*, rd2.*
FROM
  rain_data AS rd1
LEFT JOIN
  rain_data AS rd2
ON
  rd1.city = rd2.city

The output will be as below. The number of rows will be large, that’s why we have omitted some rows in the table below-

entry_idcitydaterain_measuremententry_idcitydaterain_measurement
1city12012-01-011.03city12012-03-031.3
1city12012-01-011.02city12012-02-021.0
1city12012-01-011.01city12012-01-011.0
2city12012-02-021.03city12012-03-031.3
2city12012-02-021.02city12012-02-021.0
2city12012-02-021.01city12012-01-011.0
9city32012-03-031.07city31012-01-011.0
10city32012-03-040.910city32012-03-040.9
10city32012-03-040.99city32012-03-031.0
10city32012-03-040.98city32012-02-022.1
10city32012-03-040.97city31012-01-011.0

We are getting a large and weird result set here, that does not make any sense. As no filter was applied to the query.
Check the following steps, as we gradually achieve our desired result, by applying filters on this JOIN query.

Step #2: Add condition rd1.date < rb2.date

On the join condition, add condition rd1.date < rd2.date

SELECT
  rd1.*, rd2.*
FROM
  rain_data AS rd1
LEFT JOIN
  rain_data AS rd2
ON
  rd1.city = rd2.city AND
  rd1.date < rd2.date

We get a smaller result set as below, as we are only getting results that have smaller dates for the left part-

entry_idcitydaterain_measuremententry_idcitydaterain_measurement
1city12012-01-011.03city12012-03-031.3
1city12012-01-011.02city12012-02-021.0
2city12012-02-021.03city12012-03-031.3
3city12012-03-031.3(NULL)(NULL)(NULL)(NULL)
4city22012-01-011.06city22012-03-031.8
4city22012-01-011.05city22012-02-023.6
5city22012-02-023.66city22012-03-031.8
6city22012-03-031.8(NULL)(NULL)(NULL)(NULL)
7city31012-01-011.010city32012-03-040.9
7city31012-01-011.09city32012-03-031.0
7city31012-01-011.08city32012-02-022.1
8city32012-02-022.110city32012-03-040.9
8city32012-02-022.19city32012-03-031.0
9city32012-03-031.010city32012-03-040.9
10city32012-03-040.9(NULL)(NULL)(NULL)(NULL)

The most important thing to notice here are the rows that have NULL on the right. As those rows do not have any large dates for the right part. Those rows are our actual target rows.

Step #3: Filter out rows which have NULL on the right

Let’s add a WHERE clause and only get rows which has null on the right site. Use the following query-

SELECT
  rd1.*, rd2.*
FROM
  rain_data AS rd1
LEFT JOIN
  rain_data AS rd2
ON
  rd1.city = rd2.city AND
  rd1.date < rd2.date
WHERE
  rd2.entry_id IS NULL

We will get results like the below-

entry_idcitydaterain_measuremententry_idcitydaterain_measurement
3city12012-03-031.3(NULL)(NULL)(NULL)(NULL)
6city22012-03-031.8(NULL)(NULL)(NULL)(NULL)
10city32012-03-040.9(NULL)(NULL)(NULL)(NULL)

Step #4: Final Query

Only take the left part of the result using the following query.

This is the final query of this approach.

SELECT
  rd1.*
FROM
  rain_data AS rd1
LEFT JOIN
  rain_data AS rd2
ON
  rd1.city = rd2.city AND
  rd1.date < rd2.date
WHERE
  rd2.entry_id IS NULL

Output

Here is the final output-

entry_idcitydaterain_measurement
3city12012-03-031.3
6city22012-03-031.8
10city32012-03-040.9

2 thoughts on “SQL: Select Latest Record for Each Group”

Leave a Comment


The reCAPTCHA verification period has expired. Please reload the page.