Performance Comparison of (SELECT TOP 1) Verses MAX()

Santech Solutions - Affordable SQL Server services

"SQL Server Performance Tuning Experts"

I recently came across a post on Stackoverflow regarding the performance of using a SELECT TOP 1 query in comparison to using the MAX() function.

My initial thoughts on this were that the MAX() function would provide the best performance however, in order to be absolutely certain and prove my initial hypothesis I decided to put the theory to the test.

Create the Test Environment

To effectively test the various T-SQL statements we need some data to play with. The script below will create a table containing one million records and was adapted from the article Creating Random SQL Server Test Data by Mitchel Sellers.

CREATE TABLE tmpTable
(
ID INT IDENTITY(1, 1),
SomeData VARCHAR(100),
SomeDate datetime
)
GO

DECLARE @RowCount INT
DECLARE @RowString VARCHAR(10)
DECLARE @Random INT
DECLARE @Upper INT
DECLARE @Lower INT
DECLARE @InsertDate DATETIME

SET @Lower = -730
SET @Upper = -1
SET @RowCount = 0

WHILE @RowCount < 1000000
BEGIN
SET @RowString = CAST(@RowCount AS VARCHAR(10))
SELECT
@Random = ROUND(((@Upper - @Lower - 1) * RAND()
+ @Lower), 0)
SET @InsertDate = DATEADD(dd, @Random, GETDATE())

INSERT INTO
tmpTable (SomeData, SomeDate)
VALUES
(
REPLICATE('0', 10 - DATALENGTH(@RowString))
+ @RowString,
DATEADD(dd, 1, @InsertDate)
)

SET @RowCount = @RowCount + 1
END

Activate SQL Statistical Logging

To evaluate the performance of each of the T-SQL queries, we are going to want to review information regarding both the number of I/O requests issued by SQL Server and the time that is takes to perform the queries, so enable both STATISTICS IO and STATISTICS TIME.
SET STATISTICS IO ON
SET STATISTICS TIME ON

Ensuring a Fair Test

In order to ensure a clean slate to work from for our testing, it is a good idea to free the SQL Server plan cache prior to the execution of each of our T-SQL statements to be tested. The following statements will achieve this requirement.
DBCC FREEPROCCACHE --Free the procedure cache
DBCC FREESYSTEMCACHE('ALL') --Free the plan cache

Performance Comparison with no index on the test table

So the first query:
DBCC FREEPROCCACHE; DBCC FREESYSTEMCACHE('ALL')
SELECT TOP 1 ID FROM tmpTable ORDER BY id desc

This produces the query plan shown in Figure 1 and contains three operators. A Table Scan, which is to be expected as there is no index on the table and a Sort operator that is responsible for 95% of the cost of the overall plan.

Figure 1: Select Top query, with no index.

Figure 1: SELECT TOP 1 , with no index.

Executing the query produces the following results:

SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 4 ms.
Table ‘tmpTable’. Scan count 1, logical reads 6050, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 344 ms, elapsed time = 338 ms.

Executing the second query using the MAX() operator:
DBCC FREEPROCCACHE; DBCC FREESYSTEMCACHE('ALL')
SELECT MAX(ID) FROM tmpTable

This produces the query plan shown in Figure 2 and contains three operators. Again, a Table Scan is performed however this time a Stream Aggregate operator is used and the Table Scan accounts for 88% of the query cost.

max_no_index

Figure 2: SELECT MAX(), with no index

Executing the query produces the following results:

SQL Server Execution Times: CPU time = 0 ms, elapsed time = 4 ms.

Table ‘tmpTable’. Scan count 1, logical reads 4546, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 281 ms, elapsed time = 280 ms.

Performance Comparison with no index on the test table

Create a clustered index on the test table:
CREATE CLUSTERED INDEX cl_ID ON tmpTable(ID)
Now execute the first query once again. Looking at how the estimated execution plan differs, Figure 3, notice that the Sort operator changes to a pure Top operator (that incidentally has an estimated number of rows of 1) and that all the cost of the query is placed on the Clustered Index Scan operator. This effectively removes the need for the Sort and hence it’s replacement in this plan.

Figure 3: SELECT TOP 1 , with clustered index.

Figure 3: SELECT TOP 1 , with clustered index.

The results of executing this query are:

SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 1 ms.

Table ‘tmpTable’. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms.

Now execute the second query once again. Again, this produces a different plan in comparison to when the query was executed without a clustered index on the table however, the key point is that the Clustered Index Scan operator that bears 100% of the estimated cost of the plan.  This demonstrates that the execution cost of both queries is almost identical, when executed on a table & column covered by a clustered index.

Figure 4: SELECT MAX(), with clustered index.

Figure 4: SELECT MAX(), with clustered index.

The results of executing this query are:

SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 2 ms.

Table ‘tmpTable’. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times: CPU time = 0 ms, elapsed time = 1 ms.

Conclusion

When a clustered index is present on the table & column that is to be selected, both the MAX() operator and the query (SELECT TOP 1 ColumnName order by ColumnName) have almost identical performance.

When there is no clustered index on the table & column to be queried, the MAX() operator offers the better performance.

I hope you enjoyed the article and please feel free to post any comments, questions and feedback below.

  • http://www.johnsansom.com John Sansom

    Hi Kris, you’re very welcome and I’m glad you found it beneficial.

  • Keshav Vadrevu

    Simple and precise. Has been quite useful !

  • http://www.johnsansom.com John Sansom

    Hi Keshav,

    Thank you for your comment and I’m glad you found the post useful. Always happy to help!

  • Dario Ribeiro

    very good job mate, simple and clear!

    • http://www.johnsansom.com John Sansom

      Thanks Dario!

  • Devarley Cabral

    Hi John,

    Very good your post. If I may make a doubt, the comparison would have the same result with the clause count () instead of max ().

    Sorry for my bad english.

    Thanks!

    • http://www.johnsansom.com John Sansom

      Hi Devarley,

      That’s a great question! I will look into it and post a follow up.

  • B1

    Sweet, I was just wondering this, and glad you had an answer!

    • http://www.johnsansom.com John Sansom

      You’re welcome, thanks for your comments.

  • Pingback: Aaron Bertrand : Why generalizations are dangerous

  • Tahir

    Hi John,

    Thank you, very good post.

    • http://www.johnsansom.com John Sansom

      Thanks Tahir!

  • Ajay Khatri

    Hi,

    You have cleared all the doubts…. Such a nice artical…… Tons of thanks to you ….

    • http://www.johnsansom.com John Sansom

      Thanks Ajay, glad you liked it.

      How are you currently working with SQL Server, DEV or DBA?

      • Ajay Khatri

        As A Dev …!!!!!!!!!!!!!!!!