DBA: August 2005

Wednesday, August 31, 2005

Essential performance forecasting, part 1

Craig Shallahamer
16 Aug 2005

SearchOracle: Oracle tips, scripts, and expert advice

Craig's Corner
Insights on Oracle technologies and trends by Craig Shallahamer

At a basic level, forecasting Oracle performance is absolutely essential for every DBA to understand and perform. When performance begins to degrade, it's the DBA who hears about it, and it's the DBA who's supposed to fix it. It's the DBA who has the most intimate knowledge of the database server, so shouldn't they be able to forecast performance? When a bunch of new users are going be added to a system, it's the DBA who is quickly asked, "That's not going to be a problem, is it?" Therefore, DBAs need the ability to quickly forecast performance. Low precision forecasting can be done very quickly and it is a great way to get started forecasting Oracle performance.

The key metrics we want to forecast are utilization, queue length, and response time. With only these three metrics, as a DBA you can perform all sorts of low precision what-if scenarios. To derive the values, you essentially need 3 things:

- a few simple formulas
- some basic operating system statistics
- some basic Oracle statistics

Before you are inundated with the formulas, it's important to understand some definitions and recognize their symbols.

S : Time to service one workload unit. This is known as the service time or service demand. It is how long it takes the CPU to service a single transaction. For example, 1.5 seconds per transaction or 1.5 sec/trx. The best way to get the value for Oracle systems is to simply derive it.

U : Utilization or CPU busyness. It's commonly displayed as a percentage and that's how it works in this formula. For example, in the formula it should be something like 75% or 0.75, but not 75. A simple way to gather CPU utilization is simply running sar -u 60 1. This will give you the average CPU utilization over a 60 second period.

λ : Workload arrival rate. This is how many transactions enter the system per unit of time. For example, 150 transactions each second or 150 trx/sec. When working with Oracle, there are many possible statistics that can be used for the "transaction" arrival rate. Common statistics gathered from v$sysstat are logical reads, block changes, physical writes, user calls, logons, executes, user commits, and user rollbacks. You can also mix and match as your experience increases. For this paper, we will simply use user calls.

Q : Queue length. This is the number of transactions waiting to be serviced. This excludes the number of transactions currently being serviced. We will derive this value.

M : Number of CPUs. You can get this from the instance parameter cpu_count.

The CPU formulas for calculating averages are as follows:

U = ( S λ ) / M [Formula 1]

R = S / (1 - U^M) [Formula 2]

Q = ( MU / (1 - U^M) ) - M [Formula 3]

Before we dive into real-life examples, let's check these formulas out by doing some thought experiments.

Thought experiment 1. Using formula (1), if the utilization was 50% with 1 CPU, it should be 25% with 2 CPUs. And that's what the formula says. As you probably already figured out, scalability is not taken into consideration.

Thought experiment 2. Using formula (1), if we increased the arrival rate, CPU utilization would also increase.

Thought experiment 3. Using formula (1), if we used faster CPUs, the service time would decrease, then the utilization would also decrease.

Thought experiment 4. Using formula (2), if the utilization increased, the denominator would decrease, which would cause the response time to increase!

Thought experiment 5. This one's tricky, so take your time. Using formula (2), if the number of CPUs increased, the denominator would increase, which would cause the response time to decrease.

Thought experiment 6. Using formula (3), if the utilization increased, the denominator would decrease and the numerator would increase, which would cause the queue length to increase.

Now that you have a feel and some trust in the formulas, let's take a look at a real life example.

Example 1. Let's say for the last 60 seconds you gathered the average CPU utilization and the number of user calls from a two CPU Linux box. You found the average utilization was 65% and Oracle processed 750 user calls. The number of user calls each second is then 12.5 (i.e., 750/60 = 12.5).

Therefore,

S = 0.104 sec/call ; U = ( S λ ) / M ; 0.650 = ( S * 12.500 ) / 2

R = 0.180 sec/call ; R = S / (1 - U^M); R = 0.104 / ( 1 - 0.65^2 )

Q = 0.251 calls ; Q = ( MU / (1 - U^M) ) - M ; Q = ( 2*0.65/(1-0.65^2) ) - 2

The only number that is immediately useful to us is the queue length. There is, on average, less than one process waiting for CPU cycles. That's OK for performance and for our users. But there is some queuing occurring, so now would be a good idea to plan for the future!

The response time and service time calculations will become more useful when we recalculate them using a different configuration or workload scenario. For example, let's say your workload is expected to increase 15% each quarter. How many quarters do we have until response time significantly increases? For Example 2, we will see this demonstrated.

Example 2. Let's suppose performance is currently acceptable, but the DBA has no idea how long the situation is going to last. Assuming the worst case, workload will increase each quarter by 15%. Using the system configuration described in Example 1 and using our three basic formulas, here's the situation quarter by quarter.

Right away we can see that utilization is over 100% by the fourth quarter (i.e., 114%). This results in an unstable system because the queue length will always increase. The response time and queue length calculations also both go negative, indicating an unstable system.

The answer to the question, "When will the system run out of gas?" is something like, "Sometime between the first and second quarter." The answer is not the third quarter and probably not even the second quarter! While the system is technically stable in the second and third quarters, the response time has massively increased and by the third quarter there are typically 85 processes waiting for CPU power! Let's dig a little deeper.

Performance degradation occurs way before utilization reaches 100%. Our simple example shows that at 75% utilization, response time has increased by 33% and there is usually over one process waiting for CPU power. So while the system will technically function into the 3rd quarter, practically speaking it will not come close to meeting users expectations.

Based upon the above table, users are probably OK with the current performance. If you have done a good job setting expectations, they may be OK with performance into the first quarter. But once you get into the second quarter, with the utilization at 86%, the response time more than doubling, and over four processes waiting for CPU power, not matter what you do, your uses will be very, very unhappy.

So what are the options? There are many options at this point, but we'll save that for another article…sorry.

The forecast precision using the method described above is very low. This is because of a few reasons, some of which are; only one data sample was gathered, the forecasts were not validated, the workload was not carefully characterized, and our model only considered the CPU subsystem. When a more precise forecast is required, a product like HoriZone (horizone.orapub.com) is required. But many times a quick and low precision forecast is all that is necessary. When this is the case, you can get a general idea of the sizing situation using the formulas outlined above.

As you can see, with only a few basic formulas and some performance data, an amazing amount of useful forecasting can occur. Performance forecasting is an fascinating area that can expand a DBAs area of expertise, help answer those nagging questions we all get asked at 4:30pm on Fridays, and help anticipate poor performance.

About the Author
Craig Shallahamer has 18-plus years experience in IT. As the president of OraPub, Inc., his objective is to empower Oracle performance managers by "doing" and teaching others to "do" whole system performance optimization (reactive and proactive) for Oracle-based systems.

In addition to course development and delivery, Craig is a consultant who was previously involved with developing a landmark performance management product, technically reviews Oracle books and articles, and keynotes at various Oracle conferences.

To view more of Craig's work, visit www.orapub.com.

Wednesday, August 10, 2005

Script to show problem tablespaces

SearchOracle.com Brett Ogletree
02 Aug 2005

[Ed. note: This script is now corrected and has been tested on 8.1.7.4 and 9.2.0.6.0.]

I've seen a lot of scripts that tell you about all the tablespaces in a database, but very few show only the ones that are going to give you problems.

I've been using this script for a few years now and it has really saved me from dialing in at nights and on the weekends. I use it as a cursor for a procedure and have it build an e-mail and/or page notification that is sent to myself and others.

This script is useful because it drills down to what is going to give you a problem. I don't have a lot of time to wade through a reports to find out which tablespace is running out of space, this is short and sweet and lets me get on with my day. I've run the script on 8, 8i, and 9i. Just make sure you are using system or another user that can read the data dictionary.

SELECT space.tablespace_name, space.total_space, free.total_free,
ROUND(free.total_free/space.total_space*100) as pct_free,
ROUND((space.total_space-free.total_free),2) as total_used,
ROUND((space.total_space-free.total_free)/space.total_space*100) as pct_used,
free.max_free, next.max_next_extent
FROM
(SELECT tablespace_name, SUM(bytes)/1024/1024 total_space
FROM dba_data_files
GROUP BY tablespace_name) space,
(SELECT tablespace_name, ROUND(SUM(bytes)/1024/1024,2) total_free, ROUND(MAX(bytes)/1024/1024,2) max_free
FROM dba_free_space
GROUP BY tablespace_name) free,
(SELECT tablespace_name, ROUND(MAX(next_extent)/1024/1024,2) max_next_extent
FROM dba_segments
GROUP BY tablespace_name) NEXT
WHERE space.tablespace_name = free.tablespace_name (+)
AND space.tablespace_name = next.tablespace_name (+)
AND (ROUND(free.total_free/space.total_space*100) /*pct_free*/ < 10 OR next.max_next_extent > free.max_free)
/

Reader feedback

Perry W. writes: "This script is based on an Oracle7 mentality. It does not provide valid data if autoextend is used for datafiles. The column max_bytes must be used to identify how large a datafile can *potentially* grow. He must also account for the fact some datafiles may have autoextend on and some may not. Also, a monitor for disk space available must also be included in the monitoring infrastructure. This is a rookie script in my opinion and can provide misleading results with autoextend turned on.

Tuning: People, processes and technology

SearchOracle.com Jeremy Kadlec
31 May 2005

All too often, organizations attempt to resolve performance issues with a quick fix or magic bullet to keep the business moving ahead of the competition. The perception is that there is no time for planning, designing or testing a solution; you have to move from problem identification to immediately implementing the solution. Unfortunately, the reality is that accurate solutions derived in this manner are few and far between.

Quick fixes typically become long-term nightmares that no one wants to work on. It's only a matter of time before the IT team knows how fragile the system is and what is really required. To top it off, the quick fix is typically some other piece of software that "can just be integrated" and forgotten. In most cases, this is simply not possible.

It has been common knowledge for years that every IT solution consists of people, processes and technology. But when it comes to building a solution, those first two components are left in the dust by the idea that a single piece of technology is the solution. However, what is really necessary is to have your team and management support all three components and avoid shooting from the hip.

In this article, we will explore the components for properly leveraging people, processes and technology for performance tuning.

People: Trained, motivated and performance-tuned

In my opinion, people are the most important aspect of the triad because with good people, the correct processes and technology can be developed, tested and implemented. Without knowledgeable professionals focused on the issues, performance will continue to suffer. As such, I recommend focusing on the following:

Team -- Motivate a strong team to address performance-tuning needs. At a minimum, it should consist of developers, database administrators, network administrators, desktop technicians, testers and users.

Training -- Make sure the team is properly trained on the technology required to support the application from the front-end application all the way to the storage subsystems. Training in team building and knowledge sharing yields high-performance team members who are well aware of the challenges faced by other team members.

Time -- Our scarcest resource is time, but with proper time management we can achieve momentous results. Without the ability to manage your time well, you face insufficient time leading to pressure and stress. Vigilant time management gives you ample time to manage your workload. You do not want to be so busy that at the eleventh hour you realize an issue that was brought up two months earlier had been ignored because you only have brain cycles for the work that is right in front of you, thanks to ever-mounting demands.

Processes include well-known and lesser-known components

Building a process that meets your team's skill set and comfort level with performance-tuning needs is easier said then done. That's why I believe most "solutions" lack processes that address things like implementation, maintenance and support, upgrades, testing and troubleshooting. Do not make processes too complex. Break the work down into small manageable steps that can be distributed among team members.

Project management -- Address performance-tuning needs as a project, with a set of goals, start and end dates and, most importantly, managerial support for the project in terms of time and the members of the team. Most of the time, performance tuning is considered a side project to be worked on as time permits. Break this habit! Legitimize these needs.

Communication -- First and foremost, communicate your needs for completing the project. Identify, document and test the processes, implement them and learn from the experience and share the knowledge so the same performance-tuning issues are not re-created in the future.

Simplicity -- Build accurate and efficient processes and reuse them with the goal of distributing the work load among the team members. Don't risk having all of the knowledge in one person's head. This, too, is easier said then done. But after building a few processes, you will recognize opportunities to reuse processes and streamline the overall project.

Technology, remember, is one of three components

Do not pass by the first two components and proceed directly to a tool to solve your performance problems. A tool may help identify a performance bottleneck but will typically not be able to correct and validate the technical issue in your environment.

Components -- Keep in mind that the tool is a third component of the equation, not the overall solution. Once you find the right tool for the issue, the team may have to learn something new and processes may change.

Evaluation -- Make sure the technology meets your needs conceptually and practically. Bring the technology in-house and validate that it will meet user and IT expectations. Discuss the short- and long-term potential for the technology to be sure the plans and your expectations align and that support will be available when you need it.

Once you allocate the proper time for evaluation, training, and planning, then you can implement the solution in a timely manner -- while the information is fresh and the team can focus on the issues at hand. Then begin to reap the benefits of your team's efforts in terms of high-performing systems.

--------------------------------------------------------------------------------

Jeremy Kadlec is the principal database engineer at Edgewood Solutions, a technology services company delivering professional services and product solutions for Microsoft SQL Server. He has authored numerous articles and delivers frequent presentations at regional SQL Server users groups and nationally at SQL PASS. Kadlec is the SearchSQLServer.com Performance Tuning expert. Ask him a question here.

Tuesday, August 02, 2005

A script showing explain plan for currently running queries

Pachot Franck http://www.dba-village.com/village/dvp_scripts.ScriptDetails?ScriptIdA=2182

Script:
SET linesize 1000 pagesize 0 feedback OFF

SELECT /* tag F354R334A56N47C687K645P6A628C7638H608O658758T8 */
DECODE(id,0,'
=== SID,SERIAL: ('||sid||','||serial#||') USER: '||username||' , ROWS_PROCESSED: '||rows_processed||' , BUFFER_GETS: '||buffer_gets||'
=== PROGRAM: '||program ||' , MODULE: ' || s.MODULE||'
'||'
'||sql_text|| '
'||'
EXPLAIN PLAN: ',LPAD(''||depth||'.'||position||') ',6+2*depth,' '))||
INITCAP(operation||DECODE(options,NULL,'',' '||options||'')) ||
DECODE(object_name,NULL,'',' '||object_owner||'.'||object_name)||
DECODE(OBJECT#,NULL,'',DECODE(optimizer,'ANALYZED','',' not analyzed'))||
DECODE(partition_start,NULL,'',' partition '||partition_start||'->'||partition_stop||' ')||
DECODE(cardinality,NULL,'',' card='||DECODE(SIGN(cardinality-1000), -1, cardinality||'',DECODE(SIGN(cardinality-1000000), -1, ROUND(cardinality/1000)||'K',DECODE(SIGN(cardinality-1000000000), -1, ROUND(cardinality/1000000)||'M',ROUND(cardinality/1000000000)||'G')))) ||
DECODE(cost,NULL,' ',' cost='||DECODE(SIGN(cost-10000000), -1, cost||'',DECODE(SIGN(cost-1000000000), -1, ROUND(cost/1000000)||'M',ROUND(cost/1000000000)||'G'))) ||
DECODE(bytes,NULL,' ',' bytes='||DECODE(SIGN(bytes-1024), -1, bytes||'',DECODE(SIGN(bytes-1048576), -1, ROUND(bytes/1024)||'K',DECODE(SIGN(bytes-1073741824), -1, ROUND(bytes/1048576)||'M',ROUND(bytes/1073741824)||'G'))))||
DECODE(cpu_cost,NULL,' ',' cpu_cost='||DECODE(SIGN(cpu_cost-10000000), -1, cpu_cost||'',DECODE(SIGN(cpu_cost-1000000000), -1, ROUND(cpu_cost/1000000)||'M',ROUND(cpu_cost/1000000000)||'G'))) ||
DECODE(io_cost,NULL,' ',' io_cost='||DECODE(SIGN(io_cost-10000000), -1, io_cost||'',DECODE(SIGN(io_cost-1000000000), -1, ROUND(io_cost/1000000)||'M',ROUND(io_cost/1000000000)||'G'))) ||
DECODE(temp_space,NULL,' ',' temp='||DECODE(SIGN(temp_space-1024), -1, temp_space||'',DECODE(SIGN(temp_space-1048576), -1, ROUND(temp_space/1024)||'K',DECODE(SIGN(temp_space-1073741824), -1, ROUND(temp_space/1048576)||'M',ROUND(temp_space/1073741824)||'G'))))||
'' text
FROM v$session s,v$sql q,v$sql_plan p
WHERE s.sql_hash_value=q.hash_value AND q.users_executing>0 AND q.hash_value=p.hash_value AND q.child_number=p.child_number
AND sql_text NOT LIKE '%F354R334A56N47C687K645P6A628C7638H608O658758T8%'
ORDER BY buffer_gets,s.sid,s.serial#,p.hash_value,p.child_number,p.id;

Sample Output:
=== SID,SERIAL: (90,33387) USER: APP , ROWS_PROCESSED: 0 , BUFFER_GETS: 152
=== PROGRAM: sqlplus.exe , MODULE: test.sql

INSERT /*+ append nologging */ INTO FACTS subpartition (DWH_P_333_NLGROC) (dwh_tpr_id, dwh_sho_id, ...

EXPLAIN PLAN: Insert Statement cost=267970

1.1) Load As Select
2.1) View card=22M cost=267970 bytes=18G
3.1) Window Sort card=22M cost=267970 bytes=18G io_cost=267970 temp=42G
4.1) View card=22M cost=3693 bytes=18G
5.1) Union-All
6.1) Table Access Full APP.FACTS partition 1179->1179 card=14M cost=1895 bytes=1G io_cost=1895
6.2) Table Access Full APP.FACTS partition 1173->1173 card=8M cost=1798 bytes=321M io_cost=1798