Thursday, December 22, 2005

Salesforce "failover"?

CNET reported on December 21, 2005: A Salesforce.com outage lasting nearly a day cut off access to critical business data for many of the company's customers on Tuesday in what appears to be Salesforce's most severe service disruption to date.

Salesforce, which has been growing rapidly, has undertaken efforts to bolster its computing infrastructure. For instance, it has configured its database to run on four different computers so if a machine fails, others will pick up the slack, Francis said. But the "failover" feature didn't prevent Tuesday's problems.

Salesforce's database supplier helped to restore service, Francis said. While he declined to identify who that supplier was, he did identify Oracle as Salesforce's biggest database supplier.

To see the detail, go to http://news.com.com/Salesforce+outage+angers+customers/2100-1012_3-6004625.html?tag=nefd.top

Tuesday, December 06, 2005

Manually Resolving In-Doubt Transactions: Different Scenarios

NOTE1: If using Oracle 9i and DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY fails with
ORA-30019: Illegal rollback Segment operation in Automatic Undo mode, use the following workaround
SQL> alter session set "_smu_debug_mode" = 4;
SQL>execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('local_tran_id');

select * from dba_2pc_pending
/

SQL> select LOCAL_TRAN_ID, STATE, MIXED, ADVICE from dba_2pc_pending;

LOCAL_TRAN_ID STATE MIX A
---------------------- ---------------- --- -
3.7.99084 prepared no

http://www-rohan.sdsu.edu/doc/oracle/server803/A54647_01/ch4e.htm

COMMIT FORCE '3.7.99084';

SQL> select LOCAL_TRAN_ID, STATE, MIXED, ADVICE from dba_2pc_pending;

LOCAL_TRAN_ID STATE MIX A
---------------------- ---------------- --- -
3.7.99084 forced commit no

SQL> select * from dba_pending_transactions;

FORMATID
----------
GLOBALID
--------------------------------------------------------------------------------
BRANCHID
--------------------------------------------------------------------------------
48801
34A257C2BC134A007FFD
73616D705841436F6E6E506F6F6C

alter session set "_smu_debug_mode" = 4;

execute DBMS_TRANSACTION.PURGE_LOST_DB_ENTRY('3.7.99084');

SQL> select * from dba_pending_transactions;

no rows selected

SQL> select * from dba_pending_transactions;

no rows selected

Oracle Export Utility

http://www.dba-oracle.com/tips_oracle_export_utility.htm

http://www.orafaq.com/faqiexp.htm

http://builder.com.com/5100-6388-5054021.html#Listing%20B

Wednesday, November 09, 2005

Developing Signatures for Data Buffers

Remember, change occurs in the data buffers rapidly, and sometimes a long-term analysis will provide clues that point to processing problems within the database. Almost every Oracle database exhibits patterns that are linked to regular processing schedules, called signatures.

To solve this problem, the DBA might schedule a dynamic adjustment to add more RAM to db_cache_size every day.

http://www.oracle.com/technology/oramag/webcolumns/2003/techarticles/burleson_auto_pt2.html

Wednesday, October 26, 2005

Fast guide to finding and keeping an Oracle job

This guide provides tips and advice to help you stand out from the hundreds of other job seekers.

Frequently asked questions and myths about indexes

[SearchOracle.com]

Popular author Tom Kyte tackles some of the most common questions about Oracle indexes and debunks some myths in the process.

Monday, October 24, 2005

44% of database devs use MySQL

44% of database devs use MySQL by ZDNet's ZDNet Research -- Open source database deployments are up more than 20% in the last six months, according to Evans Data. MySQL use, has increased by more than 25% in six months and is approaching a majority in the database space, with 44% of developers using the open source database. More than 60% of database developers say their [...]

Oracle Statspack Survival Guide

http://www.akadia.com/services/ora_statspack_survival_guide.html

Wednesday, October 19, 2005

Fast guide to finding and keeping an Oracle job

Elisa Gabbert, Assistant Site Editor
03.16.2005


Looking for your first Oracle DBA or developer job? Concerned about the future of your current position? You're not alone! With the IT market becoming increasingly competitive, it can be very difficult to find a job or keep the one you've got. We know how time-consuming it can be to search the Web for relevant career information, so we've gathered some valuable Oracle-related resources for you. This guide provides tips and advice to help you stand out from the hundreds of other job seekers. If you're interested in brushing up on the latest interviewing techniques, checking out Oracle salaries or simply searching for open positions, you've come to the right place.

Essential performance forecasting, part 2: I/O

Craig Shallahamer

SearchOracle.com

As I wrote in the first part of this series, forecasting Oracle performance is absolutely essential for every DBA to understand and perform. When performance begins to degrade, it's the DBA who hears about it, and it's the DBA who's supposed to fix it. Fortunately, low precision forecasting can be done very quickly and it is a great way to get started forecasting Oracle performance. This time, I'll focus on I/O performance forecasting.

The key metrics we want to forecast are utilization, queue time, and response time. With only these three metrics, as a DBA you can perform all sorts of low precision what-if scenarios. To derive the values, you essentially need 3 things:

a few simple formulas
some basic operating system statistics
some basic Oracle statistics
Modern I/O subsystems can be extremely difficult to forecast. Just as with Oracle, there is batching, caching, and a host of complex algorithms centered around optimizing performance. While these features are great for performance, they make intricate forecast models very complex. This may seem like a problem, but actually it's not. I have found that by keeping the level of detail and complexity at a consistently lower level (i.e., less detail), overall system I/O forecasts are typically more than adequate.

At a very basic level, an I/O subsystem is modeled differently than a CPU subsystem. A CPU subsystem routes all transactions into a single queue. All CPUs feed off of this single queue. This is why, with a mature operating system, any one CPU should be just as busy as the next. If you have had I/O performance problems you know the situation is very different.

In contrast to a CPU subsystem, each I/O device has its own queue. A transaction cannot simply be routed to any device. It must go specifically where the data it needs resides or where it has been told to write a specific piece of data. This is why each device needs its own queue and why some I/O queues are longer than others. This is also why balancing IO between devices is still the number one I/O subsystem bottleneck solution.

Today an I/O device can mean just about anything. It could be a single physical disk, a disk partition, a raid array, or some combination of these. The key when forecasting I/O is whatever you call a "device" is a device throughout the entire forecast. If a device is a 5 disk raid array, then make sure whenever a device is referenced, everyone involved understands the two devices are actually two raid arrays, each with five physical disks. If your device definition is consistant, you'll avoid many problems.

The forecasting formulas we'll use below assume the I/O load is perfectly distributed across all devices. While today's I/O subsystems do a fantastic job at distributing I/O activity, many DBAs do not. I have found that while an array's disk activity is nearly perfectly balanced, the activity from one array to the next may not be very well balanced. Hint: If an I/O device is not very active (utilization less than 5%), do not count it as a device. It is better to be conservative then aggressive when forecasting.

Before you are inundated with formulas, it's important to understand some definitions and recognize their symbols.

S : Time to service one workload unit. This is known as the service time or service demand. It is how long it takes a device to service a single transaction. For example, 1.5 seconds per transaction or 1.5 sec/trx. For simplicity sake, this value will be derived.

U : Utilization or device busyness. Commonly shown as a percentage and that's how it works in our formulas. For example, in the formula it should be something like 75% or 0.75, but not 75. This value can be gathered from both sar or iostat.

λ : Workload arrival rate. This is how many transactions enter the system per unit of time. For example, 150 transactions each second or 150 trx/sec. When working with Oracle, there are many possible statistics that can be used for the "transaction" arrival rate. For simplicity sake, this value will be derived and will refer to the general workload.

M : Number of devices. You can get this from the sar or iostat report. Be careful not to count both a disk and a disk's partition, resulting in a double count.

W : Wait time or more commonly called queue time. This is how long a transaction must wait before it begins to be serviced. For simplicity sake, this value will be derived.

R : Response time. This is how long it takes for a single transaction to complete. This includes both the service time and any queue/wait time. This will be gathered from the sar and iostat command (details below).

The IO formulas for calculating averages are as follows:

U = ( S λ ) / M (1)

R = S / (1 - U) (2)

R = S + W (3)

Before we dive into real-life examples, let's check these formulas out by doing some thought experiments.

Thought experiment 1. Using formula (1), if the arrival rate doubles, so will the utilization.

Thought experiment 2. Using formula (1), if we used slower disks, the service time (S) would increase, and therefore the utilization would also increase.

Thought experiment 3. Using formula (2), if we used faster devices, the service time would decrease, then the response time would also decrease.

Thought experiment 4. Using formula (2), if the device utilization decreased, the denominator would increase, which would cause the response time to decrease.

Thought experiment 5. Using formula (3), if we used a faster devices, service time would decrease, then the response time would also decrease.

While gathering I/O subsystem data is simple, the actual meaning of the data and how to apply it to our formulas is not so trivial. One approach, which is fine for low precision forecasting like this, is to gather only the response time, the utilization, and the number of devices. From these values, we can derive the arrival rate and service time.

Gathering device utilization is very simple as both sar –d and iostat clearly label these columns. However, gathering response time is not that simple. What iostat labels as service time is more appropriately the response time. Response time from sar –d is what you would expect, the service time plus the wait time. (For details, see "System Performance Tuning" by Musumeci and Loukides.)

There are many different ways we can forecast I/O subsystem activity. We could forecast at the device level or perhaps at the summary level. While detail level forecasting provides a plethora of numerical data, forecasting at the summary level allows us to easily communicate different configuration scenarios both numerically and graphically. For this article, I will present one way to consolidate all devices into a single representative device.

Capacity Planners like to call this process of consolidating or summarizing aggregation. While there are many ways to aggregate, the better the aggregation, the more precise and reliable your forecasts will be. For this example, our aggregation objective is to derive a single service time representing all devices and also the total system arrival rate. The total system arrival rate is trivial; it's just the sum of all the arrivals. Based upon the table below, the total arrival rate is 0.34 trx/ms.

To aggregate the service time, we should weight the average device service time based upon each respective device's arrival rate. But for simplicity and space, we will simply use the average service time across all devices. Based upon the table below, the average service time is 4.84 ms/trx.



Armed with the number of devices 5, the average service time 4.84 ms, and the system arrival rate of 0.34 trx/ms, we are ready to forecast!

Example 1. Let's say the I/O workload is expected to increase 20% each quarter and you need to now when the I/O subsystem will need to be upgraded. To answer the classic question, "When will we run out of gas?", we will forecast the average queue/wait time, response time, and utilization. The table below shows the raw forecast values.



Here's an example of the calculations with the arrival rate increased by 80% (arrival rate 0.71 trx/ms).

U = ( S λ ) / M = ( 4.84*0.71 ) / 5 = 0.69

R = S / (1 - U) = 4.84 / ( 1 – 0.69 ) = 15.46

W = R – S = 15.46 – 4.84 = 10.62

So what's the answer to our question? Technically speaking the system will operate with a 120% workload increase. But stating that in front of management is what I would call a "career decision." Looking closely at the forecasted utilization, the wait time, and the response time, you can see that once the utilization goes over 57%, the wait time skyrockets! Take a look at the resulting classic response time graph below.

Monday, October 17, 2005

SQL Formatter

A good sql formatter which helps to change the outlook of sql that captured from SQLAREA.

http://www.sqlinform.com

Monday, October 03, 2005

Oracle Monitoring/Tuning on Solaris

A good article from sun.com regarding oracle/database monitoring and tuning on sun platform.This document is published by sun.com itself.This document is containing....
1) Monitoring Memory
2) Monitoring Disks
3) Monitoring CPU
4) Monitoring Networks
5) Tuning Buffer Cache
6) Checking ISM (Intimate Shared Memory)


http://www.sun.com/blueprints/0602/816-7191-10.pdf

Thursday, September 29, 2005

甲骨文中国区高层

世界计算机ICXO.COM ( 日期:2005-09-29)

昨日(28日),《第一财经日报》从甲骨文(中国)公司获悉,公司华东、华西区董事总经理李绍唐将于10月16日正式离开甲骨文公司。这是自去年甲骨文大中华区原总经理陆纯初离职之后,再一次的公司高层变动。

甲骨文亚太区区域高级副总裁Keith Budge在负责其本职工作同时,将担任过渡时期甲骨文华东、华西区董事总经理。

“李绍唐将继续在中国一家非竞争性的公司追求他的事业。”甲骨文公司相关负责人表示,将会在晚些时候公布正式继任者名单,不过他没有透露,李绍唐具体会去哪家公司。

据了解,李绍唐于2000年5月加入甲骨文,担任甲骨文台湾区董事总经理,并于2003年7月被任命为甲骨文华东、华西区董事总经理。

去年中,甲骨文调整了大中华区的组织结构,甲骨文大中华区原总经理陆纯初离职。这让李绍唐、李翰璋(甲骨文中国公司北方区董事总经理及大中国区电信行业总经理)和潘应麟(甲骨文中国公司华南和香港区董事总经理)三人组成了甲骨文中国最高层管理团队。李绍唐的离职则意味着三人团队的开始松动。

根据甲骨文2005财年报告,中国新许可证销售收入第一次在公司所在亚太市场中名列第一,全球名次也从3年前的第10位上升为第6位。作为甲骨文在中国最重要的三个区域之一,华东、华西地区对甲骨文中国整体业绩的增长作出了重要贡献。

Interview questions for aspiring Oracle apps DBAs


Naveen Nahata

08.28.2005



[With the IT job market so tight, every available position is typically met with an avalanche of applicants. Naveen Nahata offers this list of technical interview questions for Oracle E-Business Suite DBA applicants that helps him quickly weed out the poseurs. If you hiring managers have any similar questions, email me and I'll add them to the list. --Ed.]

Questions

1. What happens if the ICM goes down?
2. How will you speed up the patching process?
3. How will you handle an error during patching?
4. Provide a high-level overview of the cloning process and post-clone manual steps.
5. Provide an introduction to AutoConfig. How does AutoConfig know which value from the XML file needs to be put in which file?
6. Can you tell me a few tests you will do to troubleshoot self-service login problems? Which profile options and files will you check?
7. What could be wrong if you are unable to view concurrent manager log and output files?
8. How will you change the location of concurrent manager log and output files?
9. If the user is experiencing performance issues, how will you go about finding the cause?
10. How will you change the apps password?
11. Provide the location of the DBC file and explain its significance and how applications know the name of the DBC file.

Answers

1. All the other managers will keep working. ICM only takes care of the queue control requests, which means starting up and shutting down other concurrent managers.


2. You can merge multiple patches.
You can create a response file for non-interactive patching.
You can apply patches with options (nocompiledb, nomaintainmrc, nocompilejsp) and run these once after applying all the patches.

3. Look at the log of the failed worker, identify and rectify the error and restart the worker using adctrl utility.

4. Run pre-clone on the source (all tiers), duplicate the DB using RMAN (or restore the DB from a hot or cold backup), copy the file systems and then run post-clone on the target (all tiers).
Manual steps (there can be many more):

Change all non-site profile option values (RapidClone only changes site-level profile options).
Modify workflow and concurrent manager tables.
Change printers.

5. AutoConfig uses a context file to maintain key configuration files. A context file is an XML file in the $APPL_TOP/admin directory and is the centralized repository.
When you run AutoConfig it reads the XML files and creates all the AutoConfig managed configuration files.

For each configuration file maintained by AutoConfig, there exists a template file which determines which values to pick from the XML file.

6. Check guest user/password in the DBC file, profile option guest user/password, the DB.
Check whether apache/jserv is up.
Run IsItWorking, FND_WEB.PING, aoljtest, etc.

7. Most likely the FNDFS listener is down. Look at the value of OUTFILE_NODE_NAME and LOGFILE_NODE_NAME in the FND_CONCURRENT_REQUESTS table. Look at the FND_NODES table. Look at the FNDFS_ entry in tnsnames.ora.

8. The location of log files is determined by parameter $APPLCSF/$APPLLOG and that of output files by $APPLCSF/$APPLOUT.

9. Trace his session (with waits) and use tkprof to analyze the trace file.
Take a statspack report and analyze it.
O/s monitoring using top/iostat/sar/vmstat.
Check for any network bottleneck by using basic tests like ping results.


10. Use FNDCPASS to change APPS password.
Manually modify wdbsvr.app/cgiCMD.dat files.
Change any DB links pointing from other instances.


11. Location: $FND_TOP/secure directory.
Significance: Points to the DB server amongst other things.
The application knows the name of the DBC file by using profile option "Applications Database Id."

Tracking the progress of long-running queries

Manish Upadhyay

09.14.2005 SearchOracle.com


Sometimes there are batch jobs or long-running queries in the database that may take a while to complete. This query will show the status of the query -- how much of it is completed. In other words, this may be viewed as a "progress bar" for the query. It has been tested on v. 9.2.0.4 on Tru64 and Windows. (Note: This tip is a modified version of a tip from Oracle documentation.)

SELECT * FROM (select
username,opname,sid,serial#,context,sofar,totalwork
,round(sofar/totalwork*100,2) "% Complete"
from v$session_longops)
WHERE "% Complete" != 100
/

Resource-intensive SQL

Vasan Srinivasan

09.14.2005 SearchOracle.com


Here is a simple script to find the most resource-intensive SQL in the database. It has been of immense help to me several times. It has been used on 8.1.7.4 and 9.2.0.5. However, there may be a better way of doing this in 9i that I have yet to learn.

In the SQL below, I am ordering the results by the descending number of executions, but by changing the order to refer to the dre or bge columns, you can find the SQLs with the most disk reads or buffer gets respectively.

select a.executions,
a.disk_reads,
a.disk_reads/a.executions dre,
a.buffer_gets,
a.buffer_gets/a.executions bge,
b.username,
a.first_load_time,
a.sql_text
from v$sql a, all_users b
where a.executions > 0
and a.parsing_user_id = b.user_id
order by 1 desc;

Monitoring rollback progress

QIANGHUA MA
09.14.2005 SearchOracle.com

When a large transaction takes a long time to rollback, it is good to know how much of the rollback is done and estimate how long it is going to take. Given the session's sid, it can be done with the simple statement below, tested on Oracle9i. When a transaction is rolling back, the t.used_ublk and t.used_urec will decrease until they become 0. By sampling the two measures at different points of time, you can calculate how fast the rollback is and when it is going to complete.

SELECT t.used_ublk, t.used_urec
FROM v$session s, v$transaction t
WHERE s.taddr=t.addr
and s.SID =:sid;

Wednesday, August 31, 2005

Essential performance forecasting, part 1

Craig Shallahamer
16 Aug 2005

SearchOracle: Oracle tips, scripts, and expert advice

Craig's Corner
Insights on Oracle technologies and trends by Craig Shallahamer

At a basic level, forecasting Oracle performance is absolutely essential for every DBA to understand and perform. When performance begins to degrade, it's the DBA who hears about it, and it's the DBA who's supposed to fix it. It's the DBA who has the most intimate knowledge of the database server, so shouldn't they be able to forecast performance? When a bunch of new users are going be added to a system, it's the DBA who is quickly asked, "That's not going to be a problem, is it?" Therefore, DBAs need the ability to quickly forecast performance. Low precision forecasting can be done very quickly and it is a great way to get started forecasting Oracle performance.

The key metrics we want to forecast are utilization, queue length, and response time. With only these three metrics, as a DBA you can perform all sorts of low precision what-if scenarios. To derive the values, you essentially need 3 things:

- a few simple formulas
- some basic operating system statistics
- some basic Oracle statistics


Before you are inundated with the formulas, it's important to understand some definitions and recognize their symbols.

S : Time to service one workload unit. This is known as the service time or service demand. It is how long it takes the CPU to service a single transaction. For example, 1.5 seconds per transaction or 1.5 sec/trx. The best way to get the value for Oracle systems is to simply derive it.

U : Utilization or CPU busyness. It's commonly displayed as a percentage and that's how it works in this formula. For example, in the formula it should be something like 75% or 0.75, but not 75. A simple way to gather CPU utilization is simply running sar -u 60 1. This will give you the average CPU utilization over a 60 second period.

λ : Workload arrival rate. This is how many transactions enter the system per unit of time. For example, 150 transactions each second or 150 trx/sec. When working with Oracle, there are many possible statistics that can be used for the "transaction" arrival rate. Common statistics gathered from v$sysstat are logical reads, block changes, physical writes, user calls, logons, executes, user commits, and user rollbacks. You can also mix and match as your experience increases. For this paper, we will simply use user calls.

Q : Queue length. This is the number of transactions waiting to be serviced. This excludes the number of transactions currently being serviced. We will derive this value.

M : Number of CPUs. You can get this from the instance parameter cpu_count.

The CPU formulas for calculating averages are as follows:

U = ( S λ ) / M [Formula 1]

R = S / (1 - U^M) [Formula 2]

Q = ( MU / (1 - U^M) ) - M [Formula 3]


Before we dive into real-life examples, let's check these formulas out by doing some thought experiments.

Thought experiment 1. Using formula (1), if the utilization was 50% with 1 CPU, it should be 25% with 2 CPUs. And that's what the formula says. As you probably already figured out, scalability is not taken into consideration.

Thought experiment 2. Using formula (1), if we increased the arrival rate, CPU utilization would also increase.

Thought experiment 3. Using formula (1), if we used faster CPUs, the service time would decrease, then the utilization would also decrease.

Thought experiment 4. Using formula (2), if the utilization increased, the denominator would decrease, which would cause the response time to increase!

Thought experiment 5. This one's tricky, so take your time. Using formula (2), if the number of CPUs increased, the denominator would increase, which would cause the response time to decrease.

Thought experiment 6. Using formula (3), if the utilization increased, the denominator would decrease and the numerator would increase, which would cause the queue length to increase.

Now that you have a feel and some trust in the formulas, let's take a look at a real life example.

Example 1. Let's say for the last 60 seconds you gathered the average CPU utilization and the number of user calls from a two CPU Linux box. You found the average utilization was 65% and Oracle processed 750 user calls. The number of user calls each second is then 12.5 (i.e., 750/60 = 12.5).

Therefore,

S = 0.104 sec/call ; U = ( S λ ) / M ; 0.650 = ( S * 12.500 ) / 2

R = 0.180 sec/call ; R = S / (1 - U^M); R = 0.104 / ( 1 - 0.65^2 )

Q = 0.251 calls ; Q = ( MU / (1 - U^M) ) - M ; Q = ( 2*0.65/(1-0.65^2) ) - 2

The only number that is immediately useful to us is the queue length. There is, on average, less than one process waiting for CPU cycles. That's OK for performance and for our users. But there is some queuing occurring, so now would be a good idea to plan for the future!

The response time and service time calculations will become more useful when we recalculate them using a different configuration or workload scenario. For example, let's say your workload is expected to increase 15% each quarter. How many quarters do we have until response time significantly increases? For Example 2, we will see this demonstrated.

Example 2. Let's suppose performance is currently acceptable, but the DBA has no idea how long the situation is going to last. Assuming the worst case, workload will increase each quarter by 15%. Using the system configuration described in Example 1 and using our three basic formulas, here's the situation quarter by quarter.

Right away we can see that utilization is over 100% by the fourth quarter (i.e., 114%). This results in an unstable system because the queue length will always increase. The response time and queue length calculations also both go negative, indicating an unstable system.

The answer to the question, "When will the system run out of gas?" is something like, "Sometime between the first and second quarter." The answer is not the third quarter and probably not even the second quarter! While the system is technically stable in the second and third quarters, the response time has massively increased and by the third quarter there are typically 85 processes waiting for CPU power! Let's dig a little deeper.

Performance degradation occurs way before utilization reaches 100%. Our simple example shows that at 75% utilization, response time has increased by 33% and there is usually over one process waiting for CPU power. So while the system will technically function into the 3rd quarter, practically speaking it will not come close to meeting users expectations.

Based upon the above table, users are probably OK with the current performance. If you have done a good job setting expectations, they may be OK with performance into the first quarter. But once you get into the second quarter, with the utilization at 86%, the response time more than doubling, and over four processes waiting for CPU power, not matter what you do, your uses will be very, very unhappy.

So what are the options? There are many options at this point, but we'll save that for another article…sorry.

The forecast precision using the method described above is very low. This is because of a few reasons, some of which are; only one data sample was gathered, the forecasts were not validated, the workload was not carefully characterized, and our model only considered the CPU subsystem. When a more precise forecast is required, a product like HoriZone (horizone.orapub.com) is required. But many times a quick and low precision forecast is all that is necessary. When this is the case, you can get a general idea of the sizing situation using the formulas outlined above.

As you can see, with only a few basic formulas and some performance data, an amazing amount of useful forecasting can occur. Performance forecasting is an fascinating area that can expand a DBAs area of expertise, help answer those nagging questions we all get asked at 4:30pm on Fridays, and help anticipate poor performance.


About the Author
Craig Shallahamer has 18-plus years experience in IT. As the president of OraPub, Inc., his objective is to empower Oracle performance managers by "doing" and teaching others to "do" whole system performance optimization (reactive and proactive) for Oracle-based systems.

In addition to course development and delivery, Craig is a consultant who was previously involved with developing a landmark performance management product, technically reviews Oracle books and articles, and keynotes at various Oracle conferences.

To view more of Craig's work, visit www.orapub.com.

Wednesday, August 10, 2005

Script to show problem tablespaces

SearchOracle.com Brett Ogletree
02 Aug 2005


[Ed. note: This script is now corrected and has been tested on 8.1.7.4 and 9.2.0.6.0.]

I've seen a lot of scripts that tell you about all the tablespaces in a database, but very few show only the ones that are going to give you problems.

I've been using this script for a few years now and it has really saved me from dialing in at nights and on the weekends. I use it as a cursor for a procedure and have it build an e-mail and/or page notification that is sent to myself and others.

This script is useful because it drills down to what is going to give you a problem. I don't have a lot of time to wade through a reports to find out which tablespace is running out of space, this is short and sweet and lets me get on with my day. I've run the script on 8, 8i, and 9i. Just make sure you are using system or another user that can read the data dictionary.


SELECT space.tablespace_name, space.total_space, free.total_free,
ROUND(free.total_free/space.total_space*100) as pct_free,
ROUND((space.total_space-free.total_free),2) as total_used,
ROUND((space.total_space-free.total_free)/space.total_space*100) as pct_used,
free.max_free, next.max_next_extent
FROM
(SELECT tablespace_name, SUM(bytes)/1024/1024 total_space
FROM dba_data_files
GROUP BY tablespace_name) space,
(SELECT tablespace_name, ROUND(SUM(bytes)/1024/1024,2) total_free, ROUND(MAX(bytes)/1024/1024,2) max_free
FROM dba_free_space
GROUP BY tablespace_name) free,
(SELECT tablespace_name, ROUND(MAX(next_extent)/1024/1024,2) max_next_extent
FROM dba_segments
GROUP BY tablespace_name) NEXT
WHERE space.tablespace_name = free.tablespace_name (+)
AND space.tablespace_name = next.tablespace_name (+)
AND (ROUND(free.total_free/space.total_space*100) /*pct_free*/ < 10 OR next.max_next_extent > free.max_free)
/

Reader feedback

Perry W. writes: "This script is based on an Oracle7 mentality. It does not provide valid data if autoextend is used for datafiles. The column max_bytes must be used to identify how large a datafile can *potentially* grow. He must also account for the fact some datafiles may have autoextend on and some may not. Also, a monitor for disk space available must also be included in the monitoring infrastructure. This is a rookie script in my opinion and can provide misleading results with autoextend turned on.

Tuning: People, processes and technology

SearchOracle.com Jeremy Kadlec
31 May 2005


All too often, organizations attempt to resolve performance issues with a quick fix or magic bullet to keep the business moving ahead of the competition. The perception is that there is no time for planning, designing or testing a solution; you have to move from problem identification to immediately implementing the solution. Unfortunately, the reality is that accurate solutions derived in this manner are few and far between.

Quick fixes typically become long-term nightmares that no one wants to work on. It's only a matter of time before the IT team knows how fragile the system is and what is really required. To top it off, the quick fix is typically some other piece of software that "can just be integrated" and forgotten. In most cases, this is simply not possible.

It has been common knowledge for years that every IT solution consists of people, processes and technology. But when it comes to building a solution, those first two components are left in the dust by the idea that a single piece of technology is the solution. However, what is really necessary is to have your team and management support all three components and avoid shooting from the hip.

In this article, we will explore the components for properly leveraging people, processes and technology for performance tuning.

People: Trained, motivated and performance-tuned

In my opinion, people are the most important aspect of the triad because with good people, the correct processes and technology can be developed, tested and implemented. Without knowledgeable professionals focused on the issues, performance will continue to suffer. As such, I recommend focusing on the following:


Team -- Motivate a strong team to address performance-tuning needs. At a minimum, it should consist of developers, database administrators, network administrators, desktop technicians, testers and users.


Training -- Make sure the team is properly trained on the technology required to support the application from the front-end application all the way to the storage subsystems. Training in team building and knowledge sharing yields high-performance team members who are well aware of the challenges faced by other team members.


Time -- Our scarcest resource is time, but with proper time management we can achieve momentous results. Without the ability to manage your time well, you face insufficient time leading to pressure and stress. Vigilant time management gives you ample time to manage your workload. You do not want to be so busy that at the eleventh hour you realize an issue that was brought up two months earlier had been ignored because you only have brain cycles for the work that is right in front of you, thanks to ever-mounting demands.

Processes include well-known and lesser-known components

Building a process that meets your team's skill set and comfort level with performance-tuning needs is easier said then done. That's why I believe most "solutions" lack processes that address things like implementation, maintenance and support, upgrades, testing and troubleshooting. Do not make processes too complex. Break the work down into small manageable steps that can be distributed among team members.


Project management -- Address performance-tuning needs as a project, with a set of goals, start and end dates and, most importantly, managerial support for the project in terms of time and the members of the team. Most of the time, performance tuning is considered a side project to be worked on as time permits. Break this habit! Legitimize these needs.


Communication -- First and foremost, communicate your needs for completing the project. Identify, document and test the processes, implement them and learn from the experience and share the knowledge so the same performance-tuning issues are not re-created in the future.


Simplicity -- Build accurate and efficient processes and reuse them with the goal of distributing the work load among the team members. Don't risk having all of the knowledge in one person's head. This, too, is easier said then done. But after building a few processes, you will recognize opportunities to reuse processes and streamline the overall project.

Technology, remember, is one of three components

Do not pass by the first two components and proceed directly to a tool to solve your performance problems. A tool may help identify a performance bottleneck but will typically not be able to correct and validate the technical issue in your environment.


Components -- Keep in mind that the tool is a third component of the equation, not the overall solution. Once you find the right tool for the issue, the team may have to learn something new and processes may change.


Evaluation -- Make sure the technology meets your needs conceptually and practically. Bring the technology in-house and validate that it will meet user and IT expectations. Discuss the short- and long-term potential for the technology to be sure the plans and your expectations align and that support will be available when you need it.

Once you allocate the proper time for evaluation, training, and planning, then you can implement the solution in a timely manner -- while the information is fresh and the team can focus on the issues at hand. Then begin to reap the benefits of your team's efforts in terms of high-performing systems.



--------------------------------------------------------------------------------

Jeremy Kadlec is the principal database engineer at Edgewood Solutions, a technology services company delivering professional services and product solutions for Microsoft SQL Server. He has authored numerous articles and delivers frequent presentations at regional SQL Server users groups and nationally at SQL PASS. Kadlec is the SearchSQLServer.com Performance Tuning expert. Ask him a question here.

Tuesday, August 02, 2005

A script showing explain plan for currently running queries

Pachot Franck http://www.dba-village.com/village/dvp_scripts.ScriptDetails?ScriptIdA=2182

Script:
SET linesize 1000 pagesize 0 feedback OFF

SELECT /* tag F354R334A56N47C687K645P6A628C7638H608O658758T8 */
DECODE(id,0,'
=== SID,SERIAL: ('||sid||','||serial#||') USER: '||username||' , ROWS_PROCESSED: '||rows_processed||' , BUFFER_GETS: '||buffer_gets||'
=== PROGRAM: '||program ||' , MODULE: ' || s.MODULE||'
'||'
'||sql_text|| '
'||'
EXPLAIN PLAN: ',LPAD(''||depth||'.'||position||') ',6+2*depth,' '))||
INITCAP(operation||DECODE(options,NULL,'',' '||options||'')) ||
DECODE(object_name,NULL,'',' '||object_owner||'.'||object_name)||
DECODE(OBJECT#,NULL,'',DECODE(optimizer,'ANALYZED','',' not analyzed'))||
DECODE(partition_start,NULL,'',' partition '||partition_start||'->'||partition_stop||' ')||
DECODE(cardinality,NULL,'',' card='||DECODE(SIGN(cardinality-1000), -1, cardinality||'',DECODE(SIGN(cardinality-1000000), -1, ROUND(cardinality/1000)||'K',DECODE(SIGN(cardinality-1000000000), -1, ROUND(cardinality/1000000)||'M',ROUND(cardinality/1000000000)||'G')))) ||
DECODE(cost,NULL,' ',' cost='||DECODE(SIGN(cost-10000000), -1, cost||'',DECODE(SIGN(cost-1000000000), -1, ROUND(cost/1000000)||'M',ROUND(cost/1000000000)||'G'))) ||
DECODE(bytes,NULL,' ',' bytes='||DECODE(SIGN(bytes-1024), -1, bytes||'',DECODE(SIGN(bytes-1048576), -1, ROUND(bytes/1024)||'K',DECODE(SIGN(bytes-1073741824), -1, ROUND(bytes/1048576)||'M',ROUND(bytes/1073741824)||'G'))))||
DECODE(cpu_cost,NULL,' ',' cpu_cost='||DECODE(SIGN(cpu_cost-10000000), -1, cpu_cost||'',DECODE(SIGN(cpu_cost-1000000000), -1, ROUND(cpu_cost/1000000)||'M',ROUND(cpu_cost/1000000000)||'G'))) ||
DECODE(io_cost,NULL,' ',' io_cost='||DECODE(SIGN(io_cost-10000000), -1, io_cost||'',DECODE(SIGN(io_cost-1000000000), -1, ROUND(io_cost/1000000)||'M',ROUND(io_cost/1000000000)||'G'))) ||
DECODE(temp_space,NULL,' ',' temp='||DECODE(SIGN(temp_space-1024), -1, temp_space||'',DECODE(SIGN(temp_space-1048576), -1, ROUND(temp_space/1024)||'K',DECODE(SIGN(temp_space-1073741824), -1, ROUND(temp_space/1048576)||'M',ROUND(temp_space/1073741824)||'G'))))||
'' text
FROM v$session s,v$sql q,v$sql_plan p
WHERE s.sql_hash_value=q.hash_value AND q.users_executing>0 AND q.hash_value=p.hash_value AND q.child_number=p.child_number
AND sql_text NOT LIKE '%F354R334A56N47C687K645P6A628C7638H608O658758T8%'
ORDER BY buffer_gets,s.sid,s.serial#,p.hash_value,p.child_number,p.id;

Sample Output:
=== SID,SERIAL: (90,33387) USER: APP , ROWS_PROCESSED: 0 , BUFFER_GETS: 152
=== PROGRAM: sqlplus.exe , MODULE: test.sql

INSERT /*+ append nologging */ INTO FACTS subpartition (DWH_P_333_NLGROC) (dwh_tpr_id, dwh_sho_id, ...

EXPLAIN PLAN: Insert Statement cost=267970

1.1) Load As Select
2.1) View card=22M cost=267970 bytes=18G
3.1) Window Sort card=22M cost=267970 bytes=18G io_cost=267970 temp=42G
4.1) View card=22M cost=3693 bytes=18G
5.1) Union-All
6.1) Table Access Full APP.FACTS partition 1179->1179 card=14M cost=1895 bytes=1G io_cost=1895
6.2) Table Access Full APP.FACTS partition 1173->1173 card=8M cost=1798 bytes=321M io_cost=1798

Friday, March 11, 2005

Autotrace in SQLPLUS

http://asktom.oracle.com/~tkyte/article1/autotrace.html

Here is what I like to do to get autotrace working:
cd $oracle_home/rdbms/admin
log into sqlplus as system
run SQL> @utlxplan
run SQL> create public synonym plan_table for plan_table
run SQL> grant all on plan_table to public
exit sqlplus and cd $oracle_home/sqlplus/admin
log into sqlplus as SYS
run SQL> @plustrce
run SQL> grant plustrace to public You can replace public with some user if you want. by making it public, you let anyone trace using sqlplus (not a bad thing in my opinion).
About AutotraceYou can automatically get a report on the execution path used by the SQL optimizer and the statement execution statistics. The report is generated after successful SQL DML (that is, SELECT, DELETE, UPDATE and INSERT) statements. It is useful for monitoring and tuning the performance of these statements.
Controlling the ReportYou can control the report by setting the AUTOTRACE system variable. SET AUTOTRACE OFF - No AUTOTRACE report is generated. This is the
default.
SET AUTOTRACE ON EXPLAIN - The AUTOTRACE report shows only the optimizer
execution path.
SET AUTOTRACE ON STATISTICS - The AUTOTRACE report shows only the SQL
statement execution statistics.
SET AUTOTRACE ON - The AUTOTRACE report includes both the
optimizer execution path and the SQL
statement execution statistics.
SET AUTOTRACE TRACEONLY - Like SET AUTOTRACE ON, but suppresses the
printing of the user's query output, if any.
To use this feature, you must have the PLUSTRACE role granted to you and a PLAN_TABLE table created in your schema. For more information on the PLUSTRACE role and PLAN_TABLE table, see the AUTOTRACE variable of the SET command in Chapter 6 of the SQL*Plus Guide.
Execution PlanThe Execution Plan shows the SQL optimizer's query execution path.
Each line of the Execution Plan has a sequential line number. SQL*Plus also displays the line number of the parent operation.
The Execution Plan consists of four columns displayed in the following order: Column Name Description
------------------------------------------------------------------------

ID_PLUS_EXP Shows the line number of each execution step.
PARENT_ID_PLUS_EXP Shows the relationship between each step and its
parent. This column is useful for large reports.
PLAN_PLUS_EXP Shows each step of the report.
OBJECT_NODE_PLUS_EXP Shows the database links or parallel query servers
used.The format of the columns may be altered with the COLUMN command. For example, to stop the PARENT_ID_PLUS_EXP column being displayed, enter:
SQL> COLUMN PARENT_ID_PLUS_EXP NOPRINT
The default formats can be found in the site profile (for example, glogin.sql).
The Execution Plan output is generated using the EXPLAIN PLAN command. For information about interpreting the output of EXPLAIN PLAN, see the Oracle7 Server Tuning guide.
The following is an example of tracing statements for performance statistics and query execution path.
If the SQL buffer contains the following statement:
SQL> SELECT D.DNAME, E.ENAME, E.SAL, E.JOB
2 FROM EMP E, DEPT D
3 WHERE E.DEPTNO = D.DEPTNOThe statement can be automatically traced when it is run:
SQL> SET AUTOTRACE ON
SQL> /

DNAME ENAME SAL JOB
-------------- ---------- ---------- ---------
ACCOUNTING CLARK 2450 MANAGER
ACCOUNTING KING 5000 PRESIDENT
ACCOUNTING MILLER 1300 CLERK
RESEARCH SMITH 800 CLERK
RESEARCH ADAMS 1100 CLERK
RESEARCH FORD 3000 ANALYST
RESEARCH SCOTT 3000 ANALYST
RESEARCH JONES 2975 MANAGER
SALES ALLEN 1600 SALESMAN
SALES BLAKE 2850 MANAGER
SALES MARTIN 1250 SALESMAN
SALES JAMES 950 CLERK
SALES TURNER 1500 SALESMAN
SALES WARD 1250 SALESMAN

14 rows selected.
Execution Plan
-----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 MERGE JOIN
2 1 SORT (JOIN)
3 2 TABLE ACCESS (FULL) OF 'DEPT'
4 1 SORT (JOIN)
5 4 TABLE ACCESS (FULL) OF 'EMP'

Statistics
----------------------------------------------------------
148 recursive calls
4 db block gets
24 consistent gets
6 physical reads
43 redo size
591 bytes sent via SQL*Net to client
256 bytes received via SQL*Net from client
33 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
14 rows processed
Note: The output may vary depending on the version of the server to which you are connected and the configuration of the server.