Wednesday, August 29, 2007

“N” Maximum or Minimum entries in a table

This is a famous question in SQL, very often asked in the interview process. During my college days, most of the companies used to ask the question. More important, we might have the same business requirement as well in our industry life.

So, I have to plan to write a series of articles on the above and also analysing the performance of the queries.

In order to facilitate with a working example, we will make use of the table employeeDetails (the schema is provided as part of bulid scripts). The sample entries/rows to work with are also provided as part of the bulid scripts. All you to have to do, just copy the bulid files to any desired location and save it. Execute the file in the SQL prompt as follows:

SQL> @Absolute Location of the file

Bulid Script:

DROP TABLE EmployeeDetails;
COMMIT;
//

CREATE TABLE EmployeeDetails(
EmployeeName varchar2(30),
EmployeeID number);
//

INSERT INTO EmployeeDetails VALUES('Ram',2);
INSERT INTO EmployeeDetails VALUES('Rahul',3);
INSERT INTO EmployeeDetails VALUES('Sharma',57);
INSERT INTO EmployeeDetails VALUES('Nishi',28);
INSERT INTO EmployeeDetails VALUES('Sathya',46);
INSERT INTO EmployeeDetails VALUES('Shreeya',49);
INSERT INTO EmployeeDetails VALUES('Sandhya',24);
INSERT INTO EmployeeDetails VALUES('Sandip',47);
INSERT INTO EmployeeDetails VALUES('Kiran',19);
INSERT INTO EmployeeDetails VALUES('Prabhu',30);
INSERT INTO EmployeeDetails VALUES('Jasmine',41);
INSERT INTO EmployeeDetails VALUES('Satyendra',47);

COMMIT;
//

Copy the above contents to sample.sql stored in c:\.
In the SQL prompt, exceute the following code:
SQL> @ c:\sample.sql

The creation of the table and the insertion of the sample entries are done. Explore the table structure and contents to understand better about the schema and the contents.
Lets start with the objective of this article having spent enough time on the background.
Problem:
The table EmployeeDetails contains two columns namely employeeName (stores the name of the employee) and employeeID (stores the ID of the employee). Our objective would be getting the maximum/minimum Nth entries from the table based on the column employeeID.

Method 1:
Lets go step by step to understand the query better.

Step 1:
select emp1.employeeName, emp1.employeeID from
employeeDetails emp1, employeeDetails emp2

It performs a join between the same table employeeDetails (type of join? Ans: self join) with no conditions (no where clause in the SQL query). If you have n rows in your table, the result would be n*n rows.

Step 2:
Lets add the appropriate filter condition in the where clause.

select emp1.employeeName, emp1.employeeID from
employeeDetails emp1, employeeDetails emp2
where emp1.employeeID <= emp2.employeeID The where clause makes a comparison on the same fields but on the different aliases of the table. Step 3: Lets add the group by clause to the query: select emp1.employeeName, emp1.employeeID, count(*) from employeeDetails emp1, employeeDetails emp2 where emp1.employeeID <= emp2.employeeID group by emp1.employeeName, emp1.employeeID order by emp1.employeeID asc The group by clause makes grouping based on emp1.employeeName, emp1.employeeID and uses the count(*) method to get the total count based on the grouping clause. The order by clause is used for better readability (to display the results in an order). If you had understood upto this, you would have got the answer by this time. Lets go ahead. Step 4: select empDetails.employeeName, empDetails.employeeID from (select emp1.employeeName, emp1.employeeID, count(*) from employeeDetails emp1, employeeDetails emp2 where emp1.employeeID <= emp2.employeeID group by emp1.employeeName, emp1.employeeID order by emp1.employeeID asc) empDetails where rownum <= &n Finally we got the solution. The understanding is left as an exercise for the reader. The solution helps us to find the N minimum entries in the table. How to get the N maximum entries in the table? Simple! Just change the condition. select empDetails.employeeName, empDetails.employeeID from (select emp1.employeeName, emp1.employeeID, count(*) from employeeDetails emp1, employeeDetails emp2 where emp1.employeeID >= emp2.employeeID
group by emp1.employeeName, emp1.employeeID
order by emp1.employeeID desc) empDetails
where rownum <= &n

Hope you would have understood. Happy Programming.

Before you move off, these are few questions to test yourself.

1) The above solution is for n maximum/minimum entries. Suppose the user needs to get the nth maximum/minimum entry, what should he do? Just change the condition from rownum <= &n to rownum = &n
It does not work? Why?

Clue:
It’s to do with the internal implementation of the rownum.

2) How about the performance of this query?

Clue:
It uses joins, group by, order by, where condition. Definitely a performance issue.

(In the forthcoming articles, let us discuss alternate ways of the doing the same).

Wednesday, August 8, 2007

Basics for Tuning SQL Statements

To get the maximum performance from our application, we need to
tune our sql statements.Tuning a sql statement means discovering the execution plan.
Once we know the execution plan,we can attempt to improve it.

Discovering the execution plan for a statement can be done in at least two ways
* Issue an EXPLAIN PLAN statement from SQL* Trace.
* Using SQL* PLUS Autotrace.

Using Explain Plan Feature :
-----------------------------------
Oracle's Explain Plan feature allows us to discover the
execution plan for a sql statement.This can be done by using a
sql statement,EXPLAIN PLAN, to which we append the query of interest.

Example :
EXPLAIN PLAN
SET statement_id='stmt_id'
FOR
SELECT * FROM table_name;

After issuing this statement, we can query a special table known
as PLAN TABLE to find the execution plan that oracle intends to use
for the query.Before using this table we need to create a plan table to
hold the results.

CREATING THE PLAN TABLE:
------------------------------------
The easiest way to create a plan table is to use the Oracle supplied script "utlxplan.sql".
The script will be in the location $ORACLE_HOME/rdbms/admin directory.

Example:
SQL>@e:\oracle\ora92\rdbms\admin\utlxplan

The name of the table created by the script is PLAN_TABLE. This is also the default
table name used by EXPLAIN PLAN.

EXPLAINING A QUERY :
------------------------------
EXPLAIN PLAN
SET statement_id='user_defined_statement_id'
FOR query;

Note : Explaining the plan does not execute the sql statement.

DELETING DATA FROM PLAN_TABLE :
----------------------------------------------

DELETE
FROM plan_table
WHERE statement_id='user_defined_statement_id';

Deleting is necessary because EXPLAIN PLAN always adds to the plan table.
Deleting helpls to avoid possibilities of Duplication.

DISPLAYING THE EXECUTION PLAN FOR A QUERY:
---------------------------------------------------------------
We can make use of the following query to display an execution plan once
it has been generated.
SELECT id,
LPAD(' ', 2*(LEVEL-1)) || operation
|| ' ' || options
|| ' ' || object_name
|| ' ' ||
DECODE(id,0,'Cost = ' || position) step_description
FROM plan_table
START WITH id=0 AND statement_id='statement_id'
CONNECT BY PRIOR id=parent_id
AND statement_id = 'statement_id'
ORDER BY id,position;

USING SQL*PLUS AUTOTRACE:
-------------------------------------
The process is simple, we turn autotrace on and we issue the query as we normally would.

SQL> SET AUTOTRACE ON EXPLAIN
SQL> SELECT field_name1
FROM table_name
ORDER BY field_name1;

Using Autotrace is convinient because we don't need to manually query the plan
table, it is done automatically.However the query will be executed.
We can avoid seeing the results of the query executed by using TRACEONLY setting.

Example:
SQL> SET AUTORACE TRACEONLY EXPLAIN

Note : Even when we use TRACEONLY, sql plus still sends the query to database where it is executed.
The TRACEONLY setting just prevents sql plus from pulling back the results.