Thursday, October 1, 2009

Tip#26 OMS generates lot of core.xxx files

Recently we had disk full issue on our Grid Control OMS server, firstly we thought it must be just few backup copies which are lying around. After cleaning up the disk of unwanted backups got some breathing space (atleat thats what we thought!) but soon found out it is full again in like an hour.

So digging further found out that OMS was generating HUGE amount of logs (like core.xxx) and at very brisk speed too. Metalink Doc ID 419999.1 says it is due to "the access_log for the http server of the OMS is over 2Gb in size and this is causing the http server to core dump." and to fix it we need to stop all OMS process, remove the log file and startup the OMS process. So basically,

/opmn/bin/opmnctl stopall

rm /Apache/Apache/logs/access_log

/opmn/bin/opmnctl startall

After doing that all seems to be back to normal. To avoid the same in future, as suggested in the metalink, Consider rotating the Apache logs on a regular basis as part of the routine maintenance of the system. This can be done while the OMS is down during monthly maintenance tasks.

Wednesday, July 15, 2009

Tip#25 Fix for ORA-38029 : object statistics are locked

Recently while analyzing tables in one of our test database, we got ORA-38029: object statistics are locked. It turns out that in 10gR2, when you import (imp or impdp) table without data i.e. structure only, oracle will lock the table statistics. (ref : metalink doc id 433240.1)

You can see list of all locked tables in a schema by running following query:

select table_name, stattype_locked from dba_tab_statistics where owner = 'MBS' and stattype_locked is not null;

To generate unlock statement for all tables in the schema you can use following,

select 'exec DBMS_STATS.UNLOCK_TABLE_STATS ('''|| owner ||''','''|| table_name ||''');' from dba_tab_statistics where owner = 'MBS' and stattype_locked is not null;

OR for each table individually the following,

exec DBMS_STATS.UNLOCK_TABLE_STATS('owner','table name');

Wednesday, June 17, 2009

Tip#24 Fix for ORA-12545 on RAC

We had a case recently with a 10gR2 RAC, Some clients getting ORA-12545 errors i.e. ORA-12545 Connect failed because target host or object does not exist. TNS and Listener entries were all fine and TNSPING was also working so it had to be something else.

On digging further and googling a bit it turned out that root cause was that the client was being redirected to the server hostname instead of virtual addresses. Due to that the client had to resolve the server hostname and in some cases it could not ( e.g. no entry in their DNS & client Host file), hence the error. Here is the link where it is nicely explained Link .

Hence two thing needed to be fixed,

1) Client should have been pointed(redirected) to Virtual address and not real host names
2) Client should be able to resolve virtual address

Point 1 can be fixed by setting up LOCAL_LISTENER parameter on all RAC nodes. Here is how,

Add entry in TNSnames.ora on each node e.g. on Node 1

LISTENER_NODE1 =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = )(PORT =1521))
)

set local listner for each node,

Alter system set LOCAL_LISTENER= 'LISTENER_NODE1' scope=both sid='SID1';

Repeat above steps on each node with respective host & tns name.

Point 2 can be fixed by either add DNS entry for the host names or if not possible, add entry in Client's host file e.g. on Linux add entry in /etc/hosts or on Windows add entry in windows\system32\drivers\etc\hosts.

Wednesday, June 3, 2009

Tip#23 2>&1 and Argument list too long

What does 2>&1 mean?

Sometimes I call my SQL script via shell script which generally have this kind of statement in it,

$SCRIPTDIR/run_me.sh 2>&1 > $LOGDIR/log_me.log

What above statement does for me is, it runs run_me.sh script from my script folder and logs the detail in files called log_me.log in my log folder. Over here what I am interested in is what is 2>&1. Well firstly let me define three data streams in linux i.e. STDIN, STDOUT, and STDERR,

STDIN : Standard input usually comes from the keyboard or from another program

STDOUT : The program usually prints to standard output

STDERR : The program sometimes prints to standard error

In Linux/Unix, The built-in numberings for them are 0, 1, and 2, in that order.

The command above is redirecting standard error into standard output (you have to put an & in front of the destination when you do this) and redirecting standard output into my log file, which is a place I want to dump anything my scripts writes out.

So effectively, all output from this command should be logged into my log file. So in case of any issue I can always look at the log to find out the trouble area.


--------------------------------------------------------------------------------------------

Sometimes there will be cases where a directory is filled with lots of files e.g. dump directory with lots of trace file. In such case when I tried following,

/bin/rm *.trc

I got following error message:

bash: /bin/rm: Argument list too long

Solution:

find . -name "*.trc"| xargs /bin/rm.

Depending on the number of files, after a while all files will be erased.

Tip#22 Fix for ORA-01882

ORA-01882: timezone region %s not found

Recently we got this error while running via TOAD

select * from dba_scheduler_jobs;

The error message itself turns out not very informative.

01882, 00000, "timezone region %s not found"
// *Cause: The specified region name was not found.
// *Action: Please contact Oracle Customer Support.

In short, the error is because there are 7 timezone region IDs changed from version 3 and above. If you have old Timezone data from Version 2 that using one of these IDs the error raises.

In our case, server had time zone files were already upgraded so running same query on the DB server was working fine. So to fix it we had fix it on the client side,

1) Download patch 5731535 for a 10.2.0.X client ( 10.2.0.1 to 10.2.0.3)

2) Copy the 2 .dat files located at Client_patch\5731535\files\oracore\zoneinfo and the readme.txt file into the %ORACLE_HOME%\oracore\zoneinfo directory on your oracle client.

3) restart the TOAD


Ref : Metalink Doc ID: 414590.1 and 417893.1

Friday, April 3, 2009

Tip#21 DB Link Name Suffix

Recently came across installations where all DB link were having suffix US.ORACLE.COM or REGRESS.RDBMS.DEV.US.ORACLE.COM and was bit annoying for users so here is how we went forward to fix it.

As per documentation , whenever we create DB link, the db_domain is automatically appended to it. So following SQL should help determines naming of all DB links,


select name,value from v_$parameter where name IN ('db_name', 'db_domain');

db_name = MYDB
db_domain = NULL

So in our case db_domain was NULL then where does US.ORACLE.COM or REGRESS.RDBMS.DEV.US.ORACLE.COM come form, lets check the global name, in case you dont know, GLOBAL_NAME = db_name.db_domain

select global_name from global_name;

global_name = MYDB.US.ORACLE.COM

hmm so global name does show me domain name now, how do I fix it?

Option 1 :

Rename global name with desired domain name e.g. MYWORLD

alter database rename global_name to MYDB.MYWORLD;

Please note with above alter statement, you MUST specify db_domain else it wont work.

Option 2 :

In case you dont wont any domain, Update global name to db_name without db_domain

UPDATE GLOBAL_NAME SET GLOBAL_NAME = 'MYDB';
commit;


Now cross check the changes,

select global_name from global_name ;


global_name = MYDB

Now you can go and create DB link as any user and it will be as expected i.e. with desired suffix or without.

While we are on DB link topic, please note global_name <> global_names i.e.

Global Name is global name of the DB determined by db_name.db_domain

Global Names is a parameter which determines whether DB link should always be named AFTER the global name of the database it connects to. You can set parameter value to be TRUE or FALSE (default) . If you have a replication environment then it is in general a good thing to set it to true.

Monday, March 23, 2009

Tip#20 Session Marked for Kill Forever

I had a situation recently where our junior DBA accidentally killed an MV refresh job session and session killed was 'marked for kill' forever i.e. it wont release the resource (in this case lock) hence any subsequent refresh were also failing.

Lock holding session can be determined by,

select SID from v$lock where ID1 = (select object_id from dba_objects where object_name = *Object name* and OWNER = *Owner*)

Also any subsequent try to kill/disconnect session using alter session will result in ORA-00031: session marked for kill. So only way to release resources is to kill the process at OS level and let the PMON do the cleanup.

Now since sessoin is already killed and we are not yet using 11g (which gives you OS PID after killing as well), I tried getting OS PID using following SQL (Also suggest to read Metalink Doc ID 1020720.102) i.e.

SQL> SELECT spid FROM v$process WHERE NOT EXISTS ( SELECT 1 FROM v$session WHERE paddr = addr) AND spid IS NOT NULL;

unfortunately this didnt help as PID returned was not JOB process, you can check on Linux using,

% ps -ef | grep spid

The process I was expecting to be something like ora_j00x_sid but didnt found any so search for OS PID was still on.

Since we still have an entry in v$session with 'KILLED' status, we decided to get a 'Logon time' and see if we can match that with OS PID for same time and with command format 'ora_j00x_sid'. And fortunately we had hit the bulls eye we had only one OS PID with same time and expected command so we could just easily nailed it. So we killed the process at OS level

% kill spid

and after few moments we could see the lock was released!

Wednesday, March 11, 2009

Tip#19 Oracle Date Insight

Note : Partial text from Metalink Note:69028.1

Oracle DATE values are always stored in 7 bytes, excluding the length byte, within a datafile. These bytes store the century, year, month, day, hour, minute, and second details respectively. The following is the definition of Oracle's internal DATE storage structure:

BYTE Meaning
---- -------
1 Century -- stored in excess-100 notation
2 Year -- " "
3 Month -- stored in 0 base notation
4 Day -- " "
5 Hour -- stored in excess-1 notation
6 Minute -- " "
7 Second -- " "

Note that the century and year components are stored in 'excess 100 format', which means that 100 must be deducted from the byte's value. If a negative number results, then we've got a BC date at which point we take the absolute number. Also, to avoid ever having a zero byte, 1 is added to the hour, minute and second bytes. Therefore, 1 must be detected to get the correct value.

For example, take the following date:
11-MAR-2009 13:08:00

we would expect this date to be stored internally as follows:
120,109,3,11,14,9,1

Let's confirm this,

SQL> create table test1 as select sysdate sd from dual;

SQL> select to_char(sd, 'DD-MON-YYYY HH24:MI:SS'), dump(sd) from test1;

Result:

TO_CHAR(SD,'DD-MON-YYYYHH24:MI:SS') : 11-MAR-2009 13:08:00

DUMP(SD) : Typ=12 Len=7: 120,109,3,11,14,9,1

Let's try using the DUMP() function to do the same thing with TO_DATE now. Issue the following statement:

SQL> SELECT dump(to_date('11-MAR-2009 13:08:00', 'DD-MON-YYYY HH24:MI:SS'))
FROM dual;

Result: Typ=13 Len=8: 217,7,3,11,13,8,0,0

Note the different "Typ=" values to understand why we are seeing these results. The datatype returned is 13 and not 12, the external DATE datatype. This occurs because we rely on the TO_DATE function! External datatype 13 is an internal c-structure whose length varies depending on how the c-compiler represents the structure. Note that the "Len=" value is 8 and not 7. Type 13 is not a part of the published 3GL interfaces for Oracle and is used for date calculations mainly within PL/SQL operations.

Note that the same result can be seen when DUMPing the value SYSDATE.

Using deductive logic, we can derive the following storage format for type 13 data:

Byte 1 - Base 256 year modifier
2 - Base 256 year
3 - Month
4 - Day
5 - Hours
6 - Minutes
7 - Seconds
8 - Unused

For AD dates, the year and base 256 modifier are stored in base 0 notation and we must add the modifier to the year to obtain the true year. For BC dates, the year and base 256 modifier are stored in excess-255 notation. We must subtract the modifier from the year to obtain the true year.

For our year 2009, we could read this to be, Byte 1 + Byte 2 * 256. In other words, 217 + 7 * 256 = 2009.

Oracle is capable of handling dates from 01-JAN-4712 BC 00:00:00 TO 31-DEC-9999 AD 23:59:59 AD OR in terms of Julian Day: 1 through Julian Day: 5373484

The Julian Day number is a count of days elapsed since Greenwich mean noon on 1 January 4712 B.C. in the Julian proleptic calendar. The Julian Date is the Julian Day number followed by the fraction of the day elapsed since the preceding noon.

Monday, February 16, 2009

Tip#18 Rebuilding A Table

The DBA may have to rebuild a table after maintenance e.g. after a huge delete would like to lower the High Water Mark (HWM). There are several options to do it,

Option 1: Export and Import the table
Option 2: CTAS - Create table as Select
Option 3: Alter table MOVE (> 8.1.6)
Optoin 4: Alter table SHRINK (> 10g )

First two option are fairly straight forward so I wont go in the details here.

ALTER TABLE MOVE :

This one is my preferred option compare to the first two (if not already on 10g or above) and would like to point out few facts.

The ALTER TABLE MOVE command was implemented in 8.1.6. The command will allow tables to be moved from one tablespace to another, reorganise a table, provide the ability to modify of the INITIAL parameter. Also note that we don't lose grants (unlike CTAS), above all it is easy and fast.

Although it has a drawback, please note that when you use it, all the indices on the table become invalid, have to do an ALTER INDEX REBUILD.

e.g. I have a large table MBS with 50 million rows and now due to some process change I need to delete 25 million rows. After such huge delete obviously I would like to lower the HWM to avoid scanning those empty blocks below HWM.

Since I dont want to change the tablespace or initial parametere I would simply execute this,

Alter table MBS move; --tablespace test_tbs pctfree 20 pctused 40;

Since indices on the table will become invalid, I will also need to do this,

SELECT index_name FROM user_indexes WHERE table_name = 'MBS';

Once you know the indices used by the table just rebuild each of them as shown below,

Alter index pk_mbs rebuild; -- unrecoverable; --to avoid generating redo


ALTER TABLE SHRINK :

If you are on 10g or above, both of the above mentioned steps (table move and index rebuild) are done using single statement using SHRINK option.

alter table MBS enable row movement;
alter table MBS shrink space cascade;
alter table MBS disable row movement;

Just for comparison of Alter table options,

1.) deallocate unused : claims back unused space above HWM and doesnt lowers the HWM.
2.) move : claims back unused space below HWM and lowers the HWM
3.) shrink : claims back unused space below and above HWM and lowers the HWM

This blog was intended to just give you an overview, for more details on both the command please refer to the documentation. Also as usual always do backup before and after the changes.