MySQL: Find and Delete Duplicate Rows (for single/multiple columns)

In this article, I will discuss 2 ways to get rid of duplicate rows from your MySQL database table. The first one will work for any MySQL version, but the second one will work for MySQL 8 or a later version.

So, let’s jump into it.

Sample Data

Table Name: bin_quantity

Table Data

idbin_iddatequantity
12692021-10-115
22692021-10-115
32692021-10-1550
42692021-10-1523
101002021-10-1510
111002022-04-1510
23982022-02-28100
242772021-10-280
252772021-11-28100
262782021-09-280
272782021-10-28100
542792021-11-0550
722762021-10-2950
732762021-10-2950
742762021-10-2950
842762021-10-2950
852762021-10-2950

Table & Data Query

If you want to use this sample data and want to try the processes, then use the following query.

CREATE TABLE `bin_quantity` (
  `id` BIGINT(20) DEFAULT NULL,
  `bin_id` BIGINT(20) DEFAULT NULL,
  `date` DATE DEFAULT NULL,
  `quantity` DECIMAL(12,0) DEFAULT NULL
);

INSERT  INTO `bin_quantity`(`id`,`bin_id`,`date`,`quantity`) VALUES 
(23,98,'2022-02-28',100),
(11,100,'2022-04-15',10),
(10,100,'2021-10-15',10),
(3,269,'2021-10-15',50),
(4,269,'2021-10-15',23),
(2,269,'2021-10-11',5),
(1,269,'2021-10-11',5),
(85,276,'2021-10-29',50),
(84,276,'2021-10-29',50),
(74,276,'2021-10-29',50),
(73,276,'2021-10-29',50),
(72,276,'2021-10-29',50),
(25,277,'2021-11-28',100),
(24,277,'2021-10-28',0),
(27,278,'2021-10-28',100),
(26,278,'2021-09-28',0),
(54,279,'2021-11-05',50);

Problem With This Data

For each bin_id for a specific date, there should be only one row in the table. But there are multiple records for each bin_id and date in the record.

Find Duplicate Rows

Let’s check if there exist any duplicate rows.

As we expect that for each ‘bin_id‘ and ‘date‘ combination only one row should exist. So use the following query to check which rows are duplicates, and also see the count of duplicate rows.

SELECT bin_id, `date`, COUNT(*) AS num_of_rows
FROM bin_quantity
GROUP BY bin_id, `date`
HAVING num_of_rows > 1

From this query, you will get results like the below:

bin_iddatenum_of_rows
2692021-10-152
2692021-10-112
2762021-10-295

That means we have duplicate (unexpected data there).

Let’s clean that data.

Delete Duplicate Rows

Use the following methods to detect and delete duplicate rows. Here we have discussed 2 methods for that, let’s take a look one by one:

Method #1: Using Join

This is a general process and will work on any MySQL version.

  • Precondition: Make sure that there is at least one column that has a distinct value, it can be a Primary Key or Unique Key. This will be used to differentiate between the rows while comparing.
  • Find Duplicate Data: To find the duplicate data – join the same table based on the common criteria column. (in our case it’s the combination of bin_id and date). If you use the following select query then you will get the duplicate rows and for which rows those are duplicates for.
SELECT bq1.*, bq2.* 
FROM bin_quantity bq1
INNER JOIN bin_quantity bq2 
	ON bq1.bin_id = bq2.bin_id
	AND bq1.date = bq2.date
WHERE bq1.id < bq2.id
ORDER BY bq1.bin_id ASC, 
	bq1.date ASC, 
	bq1.id ASC,
	bq2.id ASC;

This Gives the following data:

idbin_iddatequantityidbin_iddatequantity
12692021-10-11522692021-10-115
32692021-10-155042692021-10-1523
722762021-10-2950732762021-10-2950
722762021-10-2950742762021-10-2950
722762021-10-2950842762021-10-2950
722762021-10-2950852762021-10-2950
732762021-10-2950742762021-10-2950
732762021-10-2950842762021-10-2950
732762021-10-2950852762021-10-2950
742762021-10-2950842762021-10-2950
742762021-10-2950852762021-10-2950
842762021-10-2950852762021-10-2950
  • Delete Duplicate Data: Delete the duplicate rows comparing the unique criteria. (in our case it’s the ‘id’ column).

If you want to keep the rows that are created earlier (have a smaller ‘id’ value) then use this query.

DELETE bq2 
FROM bin_quantity bq1
INNER JOIN bin_quantity bq2 
	ON bq1.bin_id = bq2.bin_id
	AND bq1.date = bq2.date
	AND bq1.id < bq2.id;

If you want to keep the later rows, use “>” for comparing the id part, like ‘bq1.id > bq2.id`.

After deleting, you will get clean data like this.

idbin_iddatequantity
23982022-02-28100
101002021-10-1510
111002022-04-1510
12692021-10-115
32692021-10-1550
722762021-10-2950
242772021-10-280
252772021-11-28100
262782021-09-280
272782021-10-28100
542792021-11-0550

Method #2: Using “ROW_NUMBER()” Window Function

This process will work on MySQL 8 or later.

Reason: We are going to use a MySQL window function named “ROW_NUMBER()”, which is not available in the version earlier than 8.

Check the documentation of MySQL ROW_NUMBER() window function here: https://dev.mysql.com/doc/refman/8.0/en/window-function-descriptions.html#function_row-number

  • Find Duplicate Data: Using the ROW_NUMBER() function of MySQL we can generate a unique row number, based on certain criteria/columns. Like in this case we want to assign a unique id to the rows which have the same bin_id and date.

Use the following query:

SELECT 
ROW_NUMBER() OVER(
	PARTITION BY bin_id, `date`
	ORDER BY id) AS row_num,
bq.*
FROM bin_quantity bq
ORDER BY bin_id ASC, 
`date` ASC, 
row_num ASC;

After running this query you will be data like this.

row_numidbin_iddatequantity
123982022-02-28100
1101002021-10-1510
1111002022-04-1510
112692021-10-115
222692021-10-115
132692021-10-1550
242692021-10-1523
1722762021-10-2950
2732762021-10-2950
3742762021-10-2950
4842762021-10-2950
5852762021-10-2950
1242772021-10-280
1252772021-11-28100
1262782021-09-280
1272782021-10-28100
1542792021-11-0550

Here we have a unique ‘row_num’ in a group of rows that have the same ‘bin_id’ and ‘date’.

  • Get Ids of Duplicate Data: Now use the following query to get the `id’(s) of those rows with ‘row_num’ greater than one.
SELECT id
FROM (
	SELECT 
	ROW_NUMBER() OVER(
		PARTITION BY bin_id, `date`
		ORDER BY id) AS row_num,
	bq.id
	FROM bin_quantity bq
) AS bqr
WHERE row_num > 1;

It will give the ids like below.

id
2
4
73
74
84
85
id
  • Delete duplicate data: Now you have the ‘id’(s’), so just use a DELETE query to delete the rows which have those ids.
DELETE
FROM bin_quantity
WHERE id IN (
	SELECT id
	FROM (
		SELECT 
		ROW_NUMBER() OVER(
			PARTITION BY bin_id, `date`
			ORDER BY id) AS row_num,
		bq.id
		FROM bin_quantity bq
	) AS bqr
	WHERE row_num > 1
);

After running this query, you will have clean data like the below:

idbin_iddatequantity
23982022-02-28100
101002021-10-1510
111002022-04-1510
12692021-10-115
32692021-10-1550
722762021-10-2950
242772021-10-280
252772021-11-28100
262782021-09-280
272782021-10-28100
542792021-11-0550

Leave a Comment


The reCAPTCHA verification period has expired. Please reload the page.