Applications running with dynamic SQL are ever-increasing. Without some methods to contain the onslaught of queries, performance drops and phone calls increase. What can a DBA do to remain sane and avoid a visit from the company’s vice president?
This article addresses common bottlenecks encountered in performance of applications containing dynamic SQL, and how to avoid them. While our primary focus is distributed applications that access DB2 for z/OS on the mainframe, most of these bottlenecks and their solutions arise with other DBMSs as well.
Dynamic SQL and the Distributed Environment
The mainframe-centric DBA tends to think of a distributed application as one executing on an application server (usually in the enterprise’s intranet) that accesses DB2 data on the mainframe. The application itself is usually coded in Java, although other languages and environments also exist.
Access to DB2 data can be accomplished in several ways. These different methods vary in their performance characteristics, maintainability, and security, and are usually a function of whether the application is built in-house or purchased. Here are the most common methods:
• Dynamic SQL, where the SQL statements in the application exist either as pre-coded character strings or are dynamically constructed during application execution
• Static SQL, where the application invokes a local (to DB2) process that executes static SQL and common options are stored procedures or user-defined functions
• Utilities (non-SQL) such as load and unload.
There also are variations on these. For example, programs can contain both static and dynamic SQL. We’ll focus here on distributed applications that mainly execute dynamic SQL and we’ll refer to this as the Distributed, Dynamic Environment (DDE).
To organize our work let’s consider general tuning strategies and how they might be used in a DDE to prevent, detect, and correct application bottlenecks.
Tuning Strategies
Common tuning strategies available to the DBA include:
• Application tuning focuses attention on suites of programs that access particular tables, batch windows, units of work, and online service levels. This strategy concentrates on meeting Service-Level Agreements (SLAs) through analysis of transaction units of work, locking effects, commit frequency and transaction design, and data access paths.
• Object tuning concentrates on the general definitions and configurations of tables, indexes, stored procedures, and other database objects.
• SQL tuning involves SQL review and knowledge of potential access paths, table and index data distribution, and statistics. Here, the emphasis is on managing the SQL creation process, creating an inventory of existing (and planned) SQL and access paths, and proactively addressing potential issues.
• Resource constraint tuning is an analysis of possible trade-offs among CPU, elapsed time, I/O, memory, network traffic, and other resources.
• System tuning deals with the DB2 DBMS infrastructure that supports the previously mentioned strategies.
In a DDE, it may not be clear which strategy or strategies are the best use of one’s tuning time and resources. (For a more detailed discussion on this, see “Approaches to DB2 Tuning: What’s the Best Use of Your Time?” in the June/ July 2009 issue of z/Journal.) Here, we’ll concentrate on application tuning, an area particularly relevant in a DDE. The other strategies tend to involve in-depth technical knowledge or advanced performance reporting tools.
Relieving bottlenecks in this environment is best done proactively—before applications move to production. Application tuning is easily incorporated into early phases of the application development lifecycle.
A universal remedy for application bottlenecks doesn’t exist. A DBA can try to get as much detail as possible to the right people as often as possible and then try to make good performance choices.
Bottlenecks
Bottlenecks are blockages, jams, or restrictions that prevent or delay quick transaction processing. In a DDE they usually manifest themselves as long transaction elapsed times. Underlying causes include using too much CPU time, excessive waiting for I/Os to complete, or waiting for locked data to be released. Another possible contributing problem may be poor choice of access paths to data.
What do bottlenecks look like? Some bottlenecks show up as extremely long elapsed times for transaction execution and are easily detected. In practice you should have access to a monitoring tool that can measure events, actions, and delays. This is especially helpful if you need to compare different configurations or measure improvements after performance tuning.
There are several places in the network where delays are possible, including client and gateway hardware, Web and application servers, network transit, and firewalls. Here, we concentrate on bottlenecks related to the database server; the most common categories of delay deal with specific resources:
• CPU: Spinning cycles trying to get an answer
• I/O: Running through buffers and cache to get data
• Lock contention: Multiple transactions simultaneously accessing the same data.
Common Questions to Ask
When you suspect a bottleneck exists there are some common questions to ask and data to gather during problem definition, including:
• What things have recently changed for applications?
• Have any changes introduced new SQL statements?
• Were code or SQL walk-throughs executed for this application?
• Is this application monitored, and is there data available on recent (and historical) performance?
• What is the commit strategy used for this application?
• Has data volume recently changed (i.e., is the data more volatile, have tables grown faster than originally expected, or are there far more transactions per unit of time than planned?).
This last point is worth expanding. If the number of table rows significantly changes, it should be able to be explained by the business (i.e., 10,000 new customers were added or 500,000 new orders were processed). When the volume of rows stored and searched changes, application performance may change with it. DB2 requires relevant data distribution statistics to choose a good access path.
Being Proactive About Monitoring
The DBA (or other responsible party) should set up specific active and historical performance monitoring reports for each application. In some cases, this should include detail down to specific SQL statements. Several standard reports are extremely helpful. These top-N reports contain lists of applications and SQL statements (by environment) that consume large amounts of resources:
• Worst CPU users
• High synchronous I/O waits
• High number of GetPages
• High number of locks.
Simply executing the reports isn’t enough. They must be relevant to a business unit that has a vested interest in performance, they should be regularly reviewed, and the process that produces them may need to be updated.
For example, you may find that some of your SQL statements responsible for a large number of I/Os are already optimally tuned. This may occur for analytical queries, or those associated with a data warehouse. These queries that are statistically different from others (or significantly change over time) need to be considered separately.
On the other hand, a query that suddenly shows up as long-running begs the question: Why did this happen? The query that used to run for two seconds and now runs for two hours should be assigned to the programming group for analysis. The query then needs to be documented or marked in some way so it doesn’t re-appear on the report.
The next step is to regularly execute these reports and store the results in a repository (such as a database) where they can be queried. Make this information easily available to the business unit so they can check on their own problems that crop up. This allows for convenient historical analysis, especially if you’re making multiple performance enhancements and want to correlate their effects.
Let’s review why being proactive is so important.
An Ounce of Prevention
The sanity check of computer application review will reduce or eliminate problems. A checklist of items relevant to your shop can be used to review the application and then pass or fail this test. You can make it part of the application test plan.
Who is involved in application review? The reviewing group should have a vested interest in performance and availability for the company’s applications and needs to have a documented application checkout process. For example, use a short list of teams involved and questions they want answered: tech support OS, tech support DASD, middleware mainframe connectivity, application programmer and analyst, project lead, and manager of the business area the application is supporting. With the right people involved and an adequate checklist, many production issues can be avoided.
This forum is used so questions can be posed and answers given. It’s key to have the right people involved or it becomes a carte blanche sign-off without any meaning. Capacity should question the entire pipeline and know the numbers for each touch-point. The pipeline list could include: firewall, Web server, application server, middleware/ connectivity, and back-end database. The questions that are best often are asked by people who’ve been around awhile and know the inter-connectivity at the company.
At each point you’d need to know the current configuration, start-up values, concurrent processing values, and the peak it will effectively use, thereby knowing when a capacity failure is likely to occur and be predictable. (If you add 10,000 new users, the Web servers can’t handle the load, etc.) The process flow is usually hardware to software, and don’t forget the vendor configurations. The application area brings their activity with the application.
Then ask questions such as: Do you have enough CPU cycles or DASD available? Is this application coming through UNIX servers? If so, how will you know if you need to scale the application? How many connections do you need to support? How many concurrent users? What is the peak usage timeframe? Is there a most contentious table or set of tables? How much and how often for insert, update, deletes?
Whatever is important to your business and keeps it running needs to be validated. Set priorities by asking, “What will happen to the business if this application goes down?”
Add to your checklist a daily, weekly, and monthly process review denoting who will need to do what and when. Determine the impact to the application of an outage to upgrade hardware X. Review DB2 use of dynamic access, batch access, loads and unloads, reorganizations, or interfaces to other systems.
What if this is an entirely new application? There’s no substitute for actively monitoring an application. Document what you want watched for the application and know the ways you can get the numbers on those items.
When an application first goes live make sure there are people watching usage. Schedule a meeting time for the group to review a first day check-out. Have them report on a concrete list of SLA items. Some of these items might include:
• Maximum users reached n at time x
• The user-perceived response time averaged n seconds, with the longest time being m
• We started with n MB of allocated memory and grew to m MB
• We saw n deadlocks and m timeouts
• We had n long-running SQL statements.
When an application is running, use monitoring tools and reports to determine CPU, I/O, getpages, buffer usage, etc. Have the reports gathered for key points of the day and reviewed on a quarterly basis.
Application Tuning for Bottlenecks
The most common application-related bottlenecks in a DDE are caused by locks, commits, and poor data access patterns. Here are some points to ponder.
Are you holding locks? We love the phone call from the user that begins with the question, “Why is my application slow?” What does slow mean? And the onslaught of questions begins … How do we know it’s slow? Was it slow yesterday? And so forth. To address these questions with accuracy, having a monitoring tool or set of tools is imperative.
Understand the difference between locks and claims. The DBA needs to understand how and why DB2 uses transaction locks for concurrent control and how claims are used for SQL serialization control.
Carefully consider using uncommitted read access for SELECT statements. It’s possible, and sometimes desirable, to SELECT data from a database without acquiring locks on that data. This can be accomplished at the package level with the appropriate bind option, or on a statement by statement basis using the WITH UR parameter. This option lets the application read while acquiring few locks, but at the risk of reading uncommitted data.
Remember that binding a package with uncommitted read may raise red flags with auditors. Using uncommitted read risks accessing data that isn’t yet committed and may be changed (or deleted!) before the SQL statement finishes. In addition, some applications (such as payroll) need the data to remain fixed and unchanged.
Know the lock sizes of the objects and how application SQL will access them. In a DDE transaction, elapsed time is often a concern. To avoid deadlocks and timeouts on some tables, the DBA will consider row-level locking rather than page-level locking. The effect of this reduced lock granularity is to minimize the transaction “locking footprint.” However, there are several side effects, including additional CPU spent on lock management and an increase in the absolute number of locks the transaction holds. These must be taken into account during database design and application design phases.
Coordinate application transaction execution with required infrastructure jobs. DB2 has evolved to the point where utility execution (copy, reorg, and so forth) rarely contends with application data access. This is primarily due to the Sharelevel Change options added to the utilities, which allow concurrent update to the data during utility execution. One exception occurs during a table space reorganization. At the end of the reorganization process, the utility requires access to the table called a drain that prevents new SQL statement execution and waits for existing claims to be released.
As the DB2 for z/OS Administration Guide, V8 states: “The drain quiesces the applications by allowing each one to reach a commit point, but preventing any of them, or any other applications, from making a new claim. When no more claims exist, the process that drains (the drainer) controls access to the drained object. The applications that were drained can still hold transaction locks on the drained object, but they can’t make new claims until the drainer has finished.”
This means that to successfully complete, the Reorg utility requires a drain on the table being reorged. This necessitates that applications be designed to commit at appropriate points.
Commit Processing
Are you correctly executing commit logic? The COMMIT statement releases most locks and provides a recovery point. Applications in a DDE will normally commit logic at the end of one or more transactions, depending on the application’s purpose. At the database server, commits result in logging of data denoting the end of the transaction, and externalizing (i.e., writing to DASD) information necessary for recovery.
Since the commit process involves a certain amount of CPU and I/O, don’t commit too frequently. Here’s where a monitoring tool that can show you the impact of the commit comes in handy. Monitoring can point out the effects that multiple DDE applications can have upon each other, such as deadlocks and timeouts. It also may indicate other related issues such as concurrent, long-running batch programs, which may interfere with DDE application data access.
The primary reason for commit processing from the application’s perspective is to set a logical point for recovery. In a DDE it’s rare for an application to be coded with restart logic (that is, the ability to recover from an in-flight error such as a deadlock or timeout and then re-drive the current transaction from the beginning). It’s still possible for fatal errors to occur. In these cases, the application design is predicated on the ability of the database management system to return the database to a consistent state by rolling back recent changes to the previous commit point.
Data Access and Its Effects
It’s important to understand how data volatility affects the database. The “hot spot” is a physical place in the database where a lot of data access is happening. The worst situation is when a lot of inserts, updates, deletes, and selects are happening to a set of rows in one location. A flood of inserted rows requires DB2 to find space to store the data, eating up embedded free space. Simultaneous updates and selects to rows can lead to deadlocks and timeouts. Hot spots are common in databases accessed in a DDE.
Usually, hot spots can be predicted (and perhaps avoided) during database design. Take the case of an account table where new rows are assigned new, ascending account numbers. If the table is clustered by account number, the physical “end” of the table now becomes a hot spot where all new rows will be inserted.
Another example might be a table keyed by date, where all transactions for a particular date range are clustered. Some hot spots can be avoided, or at least mitigated, during database design. The DBA can embed free space for rows to be inserted, and can schedule frequent reorgs to redistribute rows. It may be possible to physically cluster data in ways that spread new row inserts evenly across the table. The DBA also can coordinate with the data owners to ensure old or unused data is regularly purged or archived.
Tables of ever-increasing size usually result in longer query execution times; this drives the need for a purge or data archive process. There are several database design alternatives that may make this easier. One common possibility is to partition data so you can purge by partition. For example, a table keyed by date where the chronologically oldest data is to be purged can be partitioned by date range. The purge process might then be designed to simply empty the oldest partition, rather than execute costly SQL delete statements to remove the rows. Such designs also lend themselves to data archival processes.
Summary
Dynamic SQL in a distributed environment presents several performance challenges to the application team and DBA. Typical bottlenecks tend to be due to delays associated with waiting for locks, access path issues, or other resource constraints. Application tuning in the DDE should proactively concentrate on good database design to avoid hot spots, monitoring to detect problems, regular performance reporting, and trend analysis.