Wednesday, August 29, 2007

“N” Maximum or Minimum entries in a table

This is a famous question in SQL, very often asked in the interview process. During my college days, most of the companies used to ask the question. More important, we might have the same business requirement as well in our industry life.

So, I have to plan to write a series of articles on the above and also analysing the performance of the queries.

In order to facilitate with a working example, we will make use of the table employeeDetails (the schema is provided as part of bulid scripts). The sample entries/rows to work with are also provided as part of the bulid scripts. All you to have to do, just copy the bulid files to any desired location and save it. Execute the file in the SQL prompt as follows:

SQL> @Absolute Location of the file

Bulid Script:

DROP TABLE EmployeeDetails;
COMMIT;
//

CREATE TABLE EmployeeDetails(
EmployeeName varchar2(30),
EmployeeID number);
//

INSERT INTO EmployeeDetails VALUES('Ram',2);
INSERT INTO EmployeeDetails VALUES('Rahul',3);
INSERT INTO EmployeeDetails VALUES('Sharma',57);
INSERT INTO EmployeeDetails VALUES('Nishi',28);
INSERT INTO EmployeeDetails VALUES('Sathya',46);
INSERT INTO EmployeeDetails VALUES('Shreeya',49);
INSERT INTO EmployeeDetails VALUES('Sandhya',24);
INSERT INTO EmployeeDetails VALUES('Sandip',47);
INSERT INTO EmployeeDetails VALUES('Kiran',19);
INSERT INTO EmployeeDetails VALUES('Prabhu',30);
INSERT INTO EmployeeDetails VALUES('Jasmine',41);
INSERT INTO EmployeeDetails VALUES('Satyendra',47);

COMMIT;
//

Copy the above contents to sample.sql stored in c:\.
In the SQL prompt, exceute the following code:
SQL> @ c:\sample.sql

The creation of the table and the insertion of the sample entries are done. Explore the table structure and contents to understand better about the schema and the contents.
Lets start with the objective of this article having spent enough time on the background.
Problem:
The table EmployeeDetails contains two columns namely employeeName (stores the name of the employee) and employeeID (stores the ID of the employee). Our objective would be getting the maximum/minimum Nth entries from the table based on the column employeeID.

Method 1:
Lets go step by step to understand the query better.

Step 1:
select emp1.employeeName, emp1.employeeID from
employeeDetails emp1, employeeDetails emp2

It performs a join between the same table employeeDetails (type of join? Ans: self join) with no conditions (no where clause in the SQL query). If you have n rows in your table, the result would be n*n rows.

Step 2:
Lets add the appropriate filter condition in the where clause.

select emp1.employeeName, emp1.employeeID from
employeeDetails emp1, employeeDetails emp2
where emp1.employeeID <= emp2.employeeID The where clause makes a comparison on the same fields but on the different aliases of the table. Step 3: Lets add the group by clause to the query: select emp1.employeeName, emp1.employeeID, count(*) from employeeDetails emp1, employeeDetails emp2 where emp1.employeeID <= emp2.employeeID group by emp1.employeeName, emp1.employeeID order by emp1.employeeID asc The group by clause makes grouping based on emp1.employeeName, emp1.employeeID and uses the count(*) method to get the total count based on the grouping clause. The order by clause is used for better readability (to display the results in an order). If you had understood upto this, you would have got the answer by this time. Lets go ahead. Step 4: select empDetails.employeeName, empDetails.employeeID from (select emp1.employeeName, emp1.employeeID, count(*) from employeeDetails emp1, employeeDetails emp2 where emp1.employeeID <= emp2.employeeID group by emp1.employeeName, emp1.employeeID order by emp1.employeeID asc) empDetails where rownum <= &n Finally we got the solution. The understanding is left as an exercise for the reader. The solution helps us to find the N minimum entries in the table. How to get the N maximum entries in the table? Simple! Just change the condition. select empDetails.employeeName, empDetails.employeeID from (select emp1.employeeName, emp1.employeeID, count(*) from employeeDetails emp1, employeeDetails emp2 where emp1.employeeID >= emp2.employeeID
group by emp1.employeeName, emp1.employeeID
order by emp1.employeeID desc) empDetails
where rownum <= &n

Hope you would have understood. Happy Programming.

Before you move off, these are few questions to test yourself.

1) The above solution is for n maximum/minimum entries. Suppose the user needs to get the nth maximum/minimum entry, what should he do? Just change the condition from rownum <= &n to rownum = &n
It does not work? Why?

Clue:
It’s to do with the internal implementation of the rownum.

2) How about the performance of this query?

Clue:
It uses joins, group by, order by, where condition. Definitely a performance issue.

(In the forthcoming articles, let us discuss alternate ways of the doing the same).

2 comments:

Kamalesh R S said...

hey nice post... expecting the continuation of the article discussing on the performance issues...
-kamalesh

Kamalesh R S said...

hei navan, this site has suggested one other way of doing this, i see that a sub query is involved with this but will it be a better option than a cartesian product (self join)
Thanks
kamalesh

link:
http://www.adp-gmbh.ch/ora/sql/examples/first_rows.html