Sunday, 8 July 2012

SQL Azure performance optimization

Database tuning is continuous and iterative process. It involves identifying the bottlenecks, find their cause, apply resolution and check if resolution works as expected.  QL Azure is same database engine as on premises SQL Server, most of the things apply equally on SQL Azure as well.

In order to make use of tools like SQL profiler it is always good idea to create and perform initial optimization using on premises instance and later move to SQL Azure. It might save a lot of time and cost.
It is always good idea to create performance baseline before starting up performance optimization task to measure the effectiveness of optimization.

For on premises database we focus on various areas as listed below.
1) Database design
2) Query design
3) Configuration setting
4) Hardware and software administration.
5) Database backup

In case of windows azure apart from first two items, other have become less relevant for performance tuning. These first two points are the area where best results are achieved in performance tuning . Which is also shown in figure below.



1) Database design:- Following is checklist for database design for better performance

a) Database design start with applying normalization and create highly normalized databases. These databases are good for write performance. Once we are done with normalization and ready for performance optimization phase, de normalization is applied appropriately to improve the performance.
b) Separate OLAP (Online Analytical processing ) and OLTP( Online transaction processing workloads.
c) Apply entity integrity constraints – this enable optimizer to come up with better query plans.
d) Apply domain and referential integrity constraints – similar to above it helps query optimizer.
e) Choose appropriate data type.
f) Use indexed views in de-normalization.

g)      Create indexes and follow index best practices : -

I.            Use narrow column for indexes.
II.            Use columns with high selectivity for indexes.
III.            For composite index use leading column with high selectivity.
IV.            Design indexes based on the where clause in queries.
V.            Use index covering with include clause.
VI.            Create clustered index very carefully since every non clustered indexes are based on the clustered index.

2) Query Design :- Following is a list of various items which should be kept in mind while designing queries.

a)      Avoid arithmetic operator and functions in where clause.
b)      Ensure no implicit type conversions.
c)       Adopt best practices for reusing the execution plans.
d)      Adopt best practices for database transactions.
e)      Eliminate or reduce overhead of  database cursors.
f)       Evaluate alternate to temp tables.

Above guide lines should help to stay streamlined.



Inspecting execution plan and statistics


This article is Part 1 of the “Tuning SQL Azure database” series. The scope of this article is to explain how to improve the performance of SQL Azure database by inspecting the execution plan and statistics. The next article will explain how to improve SQL Azure database performance by using information from dynamic management views.

A unique thing about SQL Azure’s multi-tenant architecture is the way it tackles the queries that tend to throttle the SQL Azure’s (shared) server. Therefore, a longer running query is more likely to get prematurely terminated. The rule of thumb is to keep queries as short as possible. To do so, you need to tune the query so that it runs for shorter duration of time. Inspecting the execution plan and statistics of a particular query is one way to do so. Also with SQL Azure database, you need to factor in performance delay due to network latency and bandwidth for an application that is not near to the database. If it’s an application hosted on the Azure platform that connects to SQL Azure database, the best practice is to keep SQL Azure database and the Azure application in the same data center. Apart from performance gain, doing so will help you save on data transfer costs. With SQL Azure database, a data transfer that takes place in the same data center is not charged.

Traditionally, tuning a database involved some or all of the following steps:
  1. 1. Application tuning
    2. Designing databases properly
    3. Analyzing and optimizing queries
    4. Changing hardware configuration or SQL Server configuration to minimize 
     
  2.     bottlenecks
    5. Altering system software parameters for efficient resource allocation 
Because SQL Azure does not provide control over underlying VM’s, system software or hardware; to tune a SQL Azure database; we must focus on tuning the application, database design, and query optimization. In fact, considerable performance gains are obtained by employing correct development techniques and using efficient data access code as shown in Figure 1: performance gain vs. methodology employed to improve performance:
 

Figure 1: Performance gain vs Methodology employed to improve performance
Since we know that tuning a database provides considerable performance gains, in this article, we will discuss how to improve SQL Azure database performance by inspecting the information available in the execution plan and statistics.

Execution Plan 

One of the best ways to analyze the performance of a particular query is to inspect the query’s execution plan. SQL Azure does not currently support SQL Profiler and Index Tuning Wizard; however, we can view query’s execution plan and tune it based on the information available through the execution plan. In addition to execution plans, it is possible to inspect CPU time and I/O information of a query execution using STATISTICS TIME and STATISTICS IO, which we will discuss later in the article.

If you are already familiar with using the execution plan and statistics for query optimization with SQL Server, then you will see that it’s not different with SQL Azure.

I use SQL Server Management Studio 2008 R2 to connect to SQL Azure server and inspect the execution plan and statistics.

Let’s explore the execution plan first. By default, a query’s execution plan is not displayed. To view a query’s execution plan, we need to explicitly include it before executing the query. Right-Click the query window and select Include Actual Execution Plan, as Figure 2 shows. 

Figure 2 Including a query’s execution plan.png
Figure 2: Including a query’s execution plan

Alternatively, you can select the option to include the actual execution plan by clicking the icon in the SQL Editor Taskbar, as Figure 3 shows:
 
Figure 3 Selecting the Taskbar icon to include a query’s execution plan.png
Figure 3: Selecting the Taskbar icon to include a query’s execution plan

For the purpose of this article, we will view the execution plan in graphical format, but you can also view it in text or XML format. In addition, you can view estimated execution plans for a query. Viewing estimated execution plans can be beneficial with SQL Azure because the data is not returned, so we save on the data transfer cost. Note that the query is not executed for the estimated plan, whereas the query is executed for the actual execution and the plan displayed is the output from the engine showing the plan used. Analyzing estimated execution plans instead of actual plans can be handy in a lot of scenarios because you save the time to execute the query. However, there are cases in which inspecting the actual execution plan is preferred—for example, when cardinality estimates are not accurate or when the code uses either dynamic T-SQL code or temporary tables. The other approach is to tune the queries for the on-premises version of the database and later migrate the database to SQL Azure. This approach can also save on the data transfer cost. In addition, after the database is migrated, the actual performance can be inspected using dynamic management views, which we will explore in Part 2 of this article.


For the demonstrations in this article, I used the AdventureWorksLT sample database, which is the official sample database for SQL Azure. You can download this database from the Microsoft SQL Azure download page, at http://msftdbprodsamples.codeplex.com/releases/view/37304

Let’s run a query:
select firstname.png
Figure 4 shows the execution plan for this query.
 

Figure 4 Execution plan of a query.png
Figure 4: Execution plan of a query: select firstname,lastname from saleslt.customer where phone like '1%';
Note a couple of things in the execution plan:

  1. 1. Clustered Index Scan means that server scanned all records to answer the
  2.     query
    2. Suggestion to create an index, including specifications
A table scan is required when we are displaying every record, but when we have filtered our query using the where clause and only a fraction of rows are returned, we can create an index to improve performance.
One thing that should be reviewed in creating an index for SQL Azure database is the space occupied by the index. You might not mind a single index occupying few kilobytes of space. But what if your database has a lot of indexes? This will add to the database size. And with SQL Azure, we are charged based on the size of the database. So it becomes vital to keep a tab on the space that indexes occupy. Also, creating an index will impact the performance of DML queries (which I discuss later in this part of the article)

Following is the T-SQL code to review the size of indexes in a table for SQL Azure database:
SELECT.png

Note that SQL Azure does not support tables without a clustered index. Every Table should have exactly one clustered index.

Now, Let’s create the index based on the index suggestion and inspect the performance gain, if any.
create nonclustered index.png

Note the “include” clause with the index. By using an index with included columns, we can cover more queries. Later in the article, I demonstrate this fact.

Now run the query again:
select firstname - copia.png

Figure 5 shows the execution plan:

Figure 5 Execution plan after rerunning the query.png
 
Figure 5: Execution plan after rerunning the query

Note that the execution plan now shows “Index Seek” which means that particular rows are found using a seek operation on the index. This improves performance because not all rows are supposed to be scanned to find the relevant information. Now, let’s run three separate queries to see if the index impacts other queries as well.

Figure 6 shows the execution plan for each of the three queries:
 
Figure 6 Execution plan for three separate queries.png
 
Figure 6: Execution plan for three separate queries.

Recall that Index with included columns can cover more than one query. We can see In Figure 6 that a non-clustered index can improve performance for multiple queries. For more information about indexes with included columns, see the Microsoft article “Index with Included Columns”.
You can drop the non-clustered index named nonclustered_index_cust_phone created earlier by using the following code:
drop index.png
Now,

Before creating indexes, we also have to factor in the performance cost that will be incurred from creating, updating, and deleting queries on the table. Figure 7 shows the results before index creation; Figure 8 shows the results after index creation.

Figure 7 Before index creation.png
Figure 7: Before index creation

As you can see in Figure 7, I inserted a batch of 100 rows in the “before index creation” scenario. The elapsed time is 53 seconds.
 
Figure 8 After index creation.png
Figure 8: After index creation

In the “after index creation” scenario, I create a non-clustered index with included columns. I then insert the batch of 100 rows again, with other parameters constant. The elapsed time is 55 seconds.

Thus, you need to keep the following considerations in mind with index creation:
  1. 1. Performance cost for create, update, and delete statements after index
  2.     creation
    2. Size of the index vs. performance gain; because we are charged for the size
  3.     of the database, it is important to analyze the performance gain vs. the
  4.     additional cost due to the increase in size of the database with index
  5.     creation

Indexing is an art as well as science, and it is important to balance the performance gain vs. cost for a particular scenario. Here, in this article, my purpose was to point out the fact that Execution plan is available in SQL Azure and the Information available through the execution plan is useful in tuning SQL Azure database.
 
STATISTICS

Statistics are a handy tool for tuning queries. Information such as CPU time and I/O information for query execution is accessible through STATISTICS IO and STATISTICS TIME. Based on this information available from it, we can see whether or not query was tuned.

By default, the STATISTICS IO and STATISTICS TIME are OFF. The following T-SQL code turns them ON:
SET STATITICS.png

You can see the statistics in the ‘Message’ tab of the output window. Now, let’s try to tune a query and see how the information available from STATISTICS IO and STATISTICS TIME can be used during the performance tuning process.
select name.png

If you inspect the message tab, you will see the following output:

SQL Server parse.png

The most interesting information available from STATISTICS TIME, is the following:
SQL Server execution.png
“CPU time” shows the amount of CPU time it took to execute a query; and “elapsed time” shows the total time it took to run the query. Because of varying loads on the server, the elapsed time will show varying times, but the CPU time is invariably consistent. Because CPU time is consistent, this parameter can help in determining whether the changes made in the query, during performance tuning, are acually helping (or hurting). Because it a simple query, we encounter Zero’s.
When I ran “select * from saleslt.product,” I encountered the following output:
SQL Server execution time.png

As discussed earlier, the ‘elapsed time’ will show inconsistent results.

The information available from STATISTICS, that interest me is the ‘logical reads’. Logical reads are the number of pages server had to read from the data cache in order to produce the results of the specified query. In addtion, logical reads remain constant from one execution of a query to an other execution of the same query. If this number goes down when you are tunign a query, it is an indication that the query is using less resources and that its performace will improve.

Now, let’s create an index:
create nonclustered index 2.png

Then, lets run the query again and inspect the Messages tab:
SQL Server parse 2.png

Note that after index creation, the logical reads decreased from 103 to 2. As discussed earlier, this is an indication that the server is using less resources and therefore, perofrmance will improve. Based on the information avaialable from STATISTICS IO and STATISTICS TIME, we verified that our query is getting tuned.

No comments :