Calculating disk space usage on an exadata.

January 24, 2012, 9:03 pm

≫ Next: Disk space layout on your Exadata

I have been working on figuring out where all our space is, and how much space is actually available on our Exadata. First to clarify what all the calculations are based on.

1/2 Rack. Sata drives. normal redundancy

This means we have

7 storage cells
Each storage cell contains 12 disks
each disk is 2tb (which is about 1.862 tb usable)
The first 2 disks in each storage cell has ~30g already partitioned for the OS (which is mirrored).

Next I looked to see how the disks were allocated within each storage cell (they are all consistent)

list griddisk attributes name, celldisk, size
         DATA_DMPF_CD_00_srcell1        CD_00_srcell1  733G
         DATA_DMPF_CD_01_srcell1        CD_01_srcell1  733G
         DATA_DMPF_CD_02_srcell1        CD_02_srcell1  733G
         DATA_DMPF_CD_03_srcell1        CD_03_srcell1  733G
         DATA_DMPF_CD_04_srcell1        CD_04_srcell1  733G
         DATA_DMPF_CD_05_srcell1        CD_05_srcell1  733G
         DATA_DMPF_CD_06_srcell1        CD_06_srcell1  733G
         DATA_DMPF_CD_07_srcell1        CD_07_srcell1  733G
         DATA_DMPF_CD_08_srcell1        CD_08_srcell1  733G
         DATA_DMPF_CD_09_srcell1        CD_09_srcell1  733G
         DATA_DMPF_CD_10_srcell1        CD_10_srcell1  733G
         DATA_DMPF_CD_11_srcell1        CD_11_srcell1  733G
         DBFS_DG_CD_02_srcell1          CD_02_srcell1  29.109375G
         DBFS_DG_CD_03_srcell1          CD_03_srcell1  29.109375G
         DBFS_DG_CD_04_srcell1          CD_04_srcell1  29.109375G
         DBFS_DG_CD_05_srcell1          CD_05_srcell1  29.109375G
         DBFS_DG_CD_06_srcell1          CD_06_srcell1  29.109375G
         DBFS_DG_CD_07_srcell1          CD_07_srcell1  29.109375G
         DBFS_DG_CD_08_srcell1          CD_08_srcell1  29.109375G
         DBFS_DG_CD_09_srcell1          CD_09_srcell1  29.109375G
         DBFS_DG_CD_10_srcell1          CD_10_srcell1  29.109375G
         DBFS_DG_CD_11_srcell1          CD_11_srcell1  29.109375G
         RECO_DMPF_CD_00_srcell1        CD_00_srcell1  1099.546875G
         RECO_DMPF_CD_01_srcell1        CD_01_srcell1  1099.546875G
         RECO_DMPF_CD_02_srcell1        CD_02_srcell1  1099.546875G
         RECO_DMPF_CD_03_srcell1        CD_03_srcell1  1099.546875G
         RECO_DMPF_CD_04_srcell1        CD_04_srcell1  1099.546875G
         RECO_DMPF_CD_05_srcell1        CD_05_srcell1  1099.546875G
         RECO_DMPF_CD_06_srcell1        CD_06_srcell1  1099.546875G
         RECO_DMPF_CD_07_srcell1        CD_07_srcell1  1099.546875G
         RECO_DMPF_CD_08_srcell1        CD_08_srcell1  1099.546875G
         RECO_DMPF_CD_09_srcell1        CD_09_srcell1  1099.546875G
         RECO_DMPF_CD_10_srcell1        CD_10_srcell1  1099.546875G
         RECO_DMPF_CD_11_srcell1        CD_11_srcell1  1099.546875G

This is giving me a lot of information of how things were configured as griddisks.

I can tell from this that there are 3 sets of griddisks (for my diskgroups).

Data - this is composed of 12 disks containing 733g luns
reco - this is composed of 12 disks containing 1100g luns
dbfs - this is composed of 10 disks containing 29g luns

Notice that I mentioned previously, that the first 2 disks are used for the os (mirrored), this is why there are only 10 luns of 29g available for the dbfs disk group.

I then run the numbers for each one (7 cells * #disks * luns)

data - 61.572 tb
reco - 92.4 tb
dbfs -   2.03 tb

Remember this is raw disk available, and I am running in normal reduncy (mirrored), if you are running triple mirrored keep this in mind.

Now this gets me a starting point, and took a look at the what asm is showing for disk usage to try see what is going on..

There are 3 values that I am looking at trying to figure out.

Disk Group      SIZE         USED              USABLE FREE
data                     61.572   32.692             10.042
reco                     92.4         3.003             38.082
dbfs                       2.03       2.018            -.135

Now these numbers don't seem to add up.. Only the size seems to match what I was expecting.

These are the things I started wondering about

How can I be using 33 tb out of 62tb raw when I am mirrored (unless it is the total raw used)
How can my usable free be 10tb if I am using 1/2 of the raw disk ?
How can my usable free be negative ???

Well in looking at the number further, and looking at the data I was able to answer the first question. The 32 tb is the raw so to state it again in actual usage...

Disk group   mirrored_used
data               16.346
reco                 1.502
dbfs                  1.009

Ok this makes a little more sense. Looking at this this the following must be true also....

Disk group      raw left
data                 28.88
reco                89.397
dbfs                     .019

OK, first number solved.. now lets see the next number.. The usable free must be the amount of mirred storage available (rather then raw), so If I go back to the usable free, and convert back to raw (x2 for mirrored) I get

Disk group       Usable free    Raw usable
data                   10.042        20.082
reco                    38.082        76.164
dbfs                      -.135           -.270

OK, I'm getting close, but why the discrepency, and why the negative number ??? Lets look at the diff

Disk group    Raw left     raw usable     missing raw storage
data                28.88          20.082              8.8
reco                 89.397       76.164            13.233
dbfs                     .019         -.270            -.29

Now lets take a closer look at the numbers...   and what it means to be negative.

TIP The usable free space specifies the amount of space that can be safely used for data. A value above zero means that redundancy can be properly restored after a disk failure.

So we need to reserve some space to absorb a disk loss.. hmm, in this case, it means being able to lose a storage cell, and be able to mirror on a different cell.. So lets take that calculation and see what happens
Lun_size * disks

Disk group      calculation                  Storage cell usage
data               (.733 x 12)                 8.8
reco                (1.1 x 12)                  13.23
dbfs               (.029 x 10)                     .29

Well there is missing, space and I got answers to all my questions.

Well to summarize.

1) How much space is there to use on a 1/2 rack with 2tb Sata drives mirrored (normal redundancy) ???
    ((29g * 10 disks)     * 6 cells +
    (1833g * 12 disks) * 6 cells)/2

    66.858 tb mirrored

2) What does the values USED and SIZE mean when I am looking at ASM ?
These are the raw space avalailable across all cells, and it is the amount of raw space allocated.

3) What does the USABLE FREE show me ?
This is the amount of space you can safely allocate to your data. this (like the 2 above values) is not measured in raw, but it is measured in usable.

If anyone see's anything wrong with with my calculations let me know. they seem to add up, and explain all the numbers...

Here is some good information, and the display from the storage cell to comfirm my sizes on whats available from the disks. My numbers match up.

http://blog.enkitec.com/wp-content/uploads/2011/02/Enkitec-Exadata-Storage-Layout11.pdf

CellCLI> list celldisk attributes name, devicePartition, size where diskType = 'HardDisk'
         CD_00_srcell1  /dev/sda3       1832.59375G
         CD_01_srcell1  /dev/sdb3       1832.59375G
         CD_02_srcell1  /dev/sdc        1861.703125G
         CD_03_srcell1  /dev/sdd        1861.703125G
         CD_04_srcell1  /dev/sde        1861.703125G
         CD_05_srcell1  /dev/sdf        1861.703125G
         CD_06_srcell1  /dev/sdg        1861.703125G
         CD_07_srcell1  /dev/sdh        1861.703125G
         CD_08_srcell1  /dev/sdi        1861.703125G
         CD_09_srcell1  /dev/sdj        1861.703125G
         CD_10_srcell1  /dev/sdk        1861.703125G
         CD_11_srcell1  /dev/sdl        1861.703125G

↧

Disk space layout on your Exadata

January 29, 2012, 8:48 pm

≫ Next: OLTP compression slow for large data set

≪ Previous: Calculating disk space usage on an exadata.

This blog post is a product of my last post on Exadata disk usage.

I have multiple exadatas (both full Rack and 1/2 Racks), and I want to know exactly how each one is configured, now that ACS has left. How do I go about finding how they are set up.

Well let's start with the basics.

Each Storage cell

Has 14 physical spinning disks.
The first 2 disks contain the os which utilizes ~29g of space
The disks come in either 600g (SAS) or 2tb (SATA). The newer model now has 3tb (SATA).
Each cell contains 384G of flash cache, made up of 4 96g f20 PCI cards..

Now lets logon to a storage cell and see how it is configuring.

First go to cellcli, and look at the physical disks.

CellCLI| list physicaldisk
         20:0            R0DQF8          normal
         20:1            R1N71G          normal
         20:2            R1NQVB          normal
         20:3            R1N8DD          normal
         20:4            R1NNBC          normal
         20:5            R1N8BW          normal
         20:6            R1KFW3          normal
         20:7            R1EX24          normal
         20:8            R2LWZC          normal
         20:9            R0K8MF          normal
         20:10           R0HR55          normal
         20:11           R0JQ9A          normal
         FLASH_1_0       3047M04YEC      normal
         FLASH_1_1       3047M05079      normal
         FLASH_1_2       3048M052FD      normal
         FLASH_1_3       3047M04YF7      normal
         FLASH_2_0       3047M04WXN      normal
         FLASH_2_1       3047M04YAJ      normal
         FLASH_2_2       3047M04WTR      normal
         FLASH_2_3       3047M04Y9L      normal
         FLASH_4_0       3047M0500W      normal
         FLASH_4_1       3047M0503G      normal
         FLASH_4_2       3047M0500X      normal
         FLASH_4_3       3047M0501G      normal
         FLASH_5_0       3047M050XG      normal
         FLASH_5_1       3047M050XP      normal
         FLASH_5_2       3047M05098      normal
         FLASH_5_3       3047M050UH      normal

From this you can see that there are 12 physical disks (20:0 - 20:11), and 16 flash disks.
Now lets look at the detail from these 2 types of disks. I will use the command

list physicaldisk {diskname} detail

CellCLI| list physicaldisk 20:0 detail
         name:                   20:0
         deviceId:               19
         diskType:               HardDisk
         enclosureDeviceId:      20
         errMediaCount:          0
         errOtherCount:          0
         foreignState:           false
         luns:                   0_0
         makeModel:              "SEAGATE ST32000SSSUN2.0T"
         physicalFirmware:       0514
         physicalInsertTime:     2011-09-20T10:19:00-04:00
         physicalInterface:      sata
         physicalSerial:         R0DQF8
         physicalSize:           1862.6559999994934G
         slotNumber:             0
         status:                 normal

This is what you would see for a SAS 600g Disk

CellCLI| list physicaldisk 20:0 detail

         name:                   20:9
         deviceId:               17
         diskType:               HardDisk
         enclosureDeviceId:      20
         errMediaCount:          23
         errOtherCount:          0
         foreignState:           false
         luns:                   0_9
         makeModel:              "TEST ST360057SSUN600G"
         physicalFirmware:       0805
         physicalInsertTime:     0000-03-24T22:10:19+00:00
         physicalInterface:      sas
         physicalSerial:         E08XLW
         physicalSize:           558.9109999993816G
         slotNumber:             9
         status:                 normal

This is what the configuration of the FLASH drives are

CellCLI| list physicaldisk FLASH_5_0 detail
         name:                   FLASH_5_0
         diskType:               FlashDisk
         errCmdTimeoutCount:     0
         errHardReadCount:       0
         errHardWriteCount:      0
         errMediaCount:          0
         errOtherCount:          0
         errSeekCount:           0
         luns:                   5_0
         makeModel:              "MARVELL SD88SA02"
         physicalFirmware:       D20Y
         physicalInsertTime:     2011-09-20T10:20:17-04:00
         physicalInterface:      sas
         physicalSerial:         3047M050XG
         physicalSize:           22.8880615234375G
         sectorRemapCount:       0
         slotNumber:             "PCI Slot: 5; FDOM: 0"
         status:                 normal

So this gives me a good idea of what disks the storage is made up of. In my case you can see that the 12 disks are SATA, and they contain 1862 of usable space.
In the case of the SAS, you can see they contain 558g of usable space.

You can also see that the flash disks comprise of 16 separate disks, that are connected through 4 PCI cards. Each card contains 4 22g flashdisks.

For now (and the rest of this post), I will not talk about the flash. It is possible to use these cell disks, and provision them as usable storage, but I won't be discussing that.

Now that we have the physical disk layout, we can move to next level First to review.

We have 12 physical disks. Each disk contains 1862.65 g of space. (22,352g/cell)

Now the next step is to look at the luns that were created out of the physical disks. The lun, is the amount of usable space left after the disks have been turned into block devices and presented to the server. You can see that is is a small amount, and below is the output(truncated after the first 2 disks, then I've included the flashdisk to show that detail.

CellCLI| list lun detail
         name:                   0_0
         cellDisk:               CD_00_tpfh1
         deviceName:             /dev/sda
         diskType:               HardDisk
         id:                     0_0
         isSystemLun:            TRUE
         lunAutoCreate:          FALSE
         lunSize:                1861.712890625G
         lunUID:                 0_0
         physicalDrives:         20:0
         raidLevel:              0
         lunWriteCacheMode:      WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
         status:                 normal

         name:                   0_1
         cellDisk:               CD_01_tpfh1
         deviceName:             /dev/sdb
         diskType:               HardDisk
         id:                     0_1
         isSystemLun:            TRUE
         lunAutoCreate:          FALSE
         lunSize:                1861.712890625G
         lunUID:                 0_1
         physicalDrives:         20:1
         raidLevel:              0
         lunWriteCacheMode:      WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
         status:                 normal

         name:                   2_2
         cellDisk:               FD_06_tpfh1
         deviceName:             /dev/sdab
         diskType:               FlashDisk
         id:                     2_2
         isSystemLun:            FALSE
         lunAutoCreate:          FALSE
         lunSize:                22.8880615234375G
         overProvisioning:       100.0
         physicalDrives:         FLASH_2_2
         status:                 normal

So from this you can see that we have 1861.7 g of usable space on each drive, and you can see that the LUNS are given names that refer to the server. In this case the tpfh1 is the name of the storage cell, and this is included in the cellDisk name to easily identify the disk.

The next step is to take a look at the cell disks that were created out of these luns.

The items to note on this output is that first 2 disks contain the OS. You will see that the usable space left after the creation of the os partitions is less than the other disks. The overhead for the cell software on each disk is also taken (though it is a small amount).

Here is what we have next as celldisks.

CellCLI| list celldisk detail
         name:                   CD_00_tpfh1
         comment:
         creationTime:           2011-09-23T00:19:30-04:00
         deviceName:             /dev/sda
         devicePartition:        /dev/sda3
         diskType:               HardDisk
         errorCount:             0
         freeSpace:              0
         id:                     a15671cd-2bab-4bfe
         interleaving:           none
         lun:                    0_0
         raidLevel:              0
         size:                   1832.59375G
         status:                 normal

         name:                   CD_01_tpfh1
         comment:
         creationTime:           2011-09-23T00:19:34-04:00
         deviceName:             /dev/sdb
         devicePartition:        /dev/sdb3
         diskType:               HardDisk
         errorCount:             0
         freeSpace:              0
         id:                     de0ee154-6925-4281
         interleaving:           none
         lun:                    0_1
         raidLevel:              0
         size:                   1832.59375G
         status:                 normal

         name:                   CD_02_tpfh1
         comment:
         creationTime:           2011-09-23T00:19:34-04:00
         deviceName:             /dev/sdc
         devicePartition:        /dev/sdc
         diskType:               HardDisk
         errorCount:             0
         freeSpace:              0
         id:                     711765f1-90cc-4b53
         interleaving:           none
         lun:                    0_2
         raidLevel:              0
         size:                   1861.703125G
         status:                 normal

Now you can see the first 2 disks have 1832.6g available, and the remaining 10 disks have 1861.7g available (I didn't include the last 9 disks in the output).

So to review where we are. There are 12 physical disks, which are carved into luns, then become cell disks. These cells have (2 x 1832.6) + (10 x 1861.7) = 22,282g of raw disk available.

Now these disks get carved up into Grid disks. The grid disks are what is presented to ASM. Lets see how my storage cell is carved up. While looking at the output, notice that the celldisks are named CD_00_{cellname} through CD_11_{cellname}. Here is a snippet

CellCLI| list griddisk detail
         name:                   DATA_DMPF_CD_00_tpfh1
         availableTo:
         cellDisk:               CD_00_tpfh1
         comment:
         creationTime:           2011-09-23T00:21:59-04:00
         diskType:               HardDisk
         errorCount:             0
         id:                     2f72fb5a-adf5
         offset:                 32M
         size:                   733G
         status:                 active

         name:                   DATA_DMPF_CD_01_tpfh1
         availableTo:
         cellDisk:               CD_01_tpfh1
         comment:
         creationTime:           2011-09-23T00:21:59-04:00
         diskType:               HardDisk
         errorCount:             0
         id:                     0631c4a2-2b39
         offset:                 32M
         size:                   733G
         status:                 active
.......
.......
.......

        name:                   DATA_DMPF_CD_11_tpfh1
         availableTo:
         cellDisk:               CD_11_tpfh1
         comment:
         creationTime:           2011-09-23T00:22:00-04:00
         diskType:               HardDisk
         errorCount:             0
         id:                     ccd79051-0e24
         offset:                 32M
         size:                   733G
         status:                 active

         name:                   DBFS_DG_CD_02_tpfh1
         availableTo:
         cellDisk:               CD_02_tpfh1
         comment:
         creationTime:           2011-09-23T00:20:37-04:00
         diskType:               HardDisk
         errorCount:             0
         id:                     d292062b-0e26
         offset:                 1832.59375G
         size:                   29.109375G
         status:                 active

         name:                   DBFS_DG_CD_03_tpfh1
         availableTo:
         cellDisk:               CD_03_tpfh1
         comment:
         creationTime:           2011-09-23T00:20:38-04:00
         diskType:               HardDisk
         errorCount:             0
         id:                     b8c478a9-5ae1
         offset:                 1832.59375G
         size:                   29.109375G
         status:                 active

         name:                   DBFS_DG_CD_04_tpfh1
         availableTo:
         cellDisk:               CD_04_tpfh1
         comment:
         creationTime:           2011-09-23T00:20:39-04:00
         diskType:               HardDisk
         errorCount:             0
         id:                     606e3d69-c25b
         offset:                 1832.59375G
         size:                   29.109375G
         status:                 active
.....
.....
.....
         name:                   DBFS_DG_CD_11_tpfh1
         availableTo:
         cellDisk:               CD_11_tpfh1
         comment:
         creationTime:           2011-09-23T00:20:45-04:00
         diskType:               HardDisk
         errorCount:             0
         id:                     58af96a8-3fc8
         offset:                 1832.59375G
         size:                   29.109375G
         status:                 active

         name:                   RECO_DMPF_CD_00_tpfh1
         availableTo:
         cellDisk:               CD_00_tpfh1
         comment:
         creationTime:           2011-09-23T00:22:09-04:00
         diskType:               HardDisk
         errorCount:             0
         id:                     77f73bbf-09a9
         offset:                 733.046875G
         size:                   1099.546875G
         status:                 active

.....
.....
.....

         name:                   RECO_DMPF_CD_11_tpfh1
         availableTo:
         cellDisk:               CD_11_tpfh1
         comment:
         creationTime:           2011-09-23T00:22:09-04:00
         diskType:               HardDisk
         errorCount:             0
         id:                     fad57e10-414f
         offset:                 733.046875G
         size:                   1099.546875G
         status:                 active

Now by looking at this you can see that there are 3 sets of grid disks.

DATA - this carved out of every disk, and contains 733g of storage. This starts at offset 32m (the beginning of the disks)..

RECO - this is carved out of every disk also, and contains 1099.5g of storage. This starts at offset 733G.

So now we are getting the picture.. Each celldisk is carved into 2 gridisk, starting with Data, followed by reco.

DBFS - This is carved out of the last 10 disks (starting with disk 2) at offset 1832.59, and it contains 29.1g. I can only conclude this is the size of the OS parition on the first 2 disks.

So here is what we have for sizing on each Storage cell.

DATA - 8,796g
RECO - 13,194g
DBFS -       290g

Total   22,280

The thing to keep in mind with this number, is that the OS partitions has caused us a bit of trouble. There are only 10 of these grid disks per cell, and the are only 29g. If we pull this out, we have ~22tb of disk usable on each storage cell.

Now to figure out how much space is in each disk group (assuming these grid disks will all go directly into 3 disk groups).

The first thing to remember is the redundance level. Are they going to be normal redundancy (mirrored) or High redundancy (triple mirrored) ? With normal redundancy, the disk groups are configured with a disk being redundant with a disk on another cell. With High redundancy the disk is redundant with 2 other disks on 2 other cells. To maintain this level of redundancy, you must set aside 1 storage cells worth of storage for normal redudnacy, and 2 storage cells worth of storage for high redundancy to ensure that you are completely protected.

So what does this mean for sizing ?? The larger your array, the more usable disk you get. With a half rack, you must set aside 1 out of 7 storage cells, or 2 out of 7 storage cells for redudnacy. For a full rack you need to set aside 1 out of 14 storage cells, or 2 out of 14 storage cells for redundancy.

Now lets run the numbers.

HALF RACK -

Data - Normal (8,796g / 2) * 6 usable racks = 26,388g of usable space
            High       (8,796g / 3) * 5 usable racks = 14,660g of usable space

Reco - Normal (13,194g / 2) * 6 usable racks = 39,562g of usable space
           High      (13,194g / 3) * 5 usable racks = 21,990g of usable space

Dbfs - Normal (290g / 2) * 6 usable racks = 870g of usable space
           High      (290g / 3) * 5 usable racks = 483g of usable space

TOTAL usable (minus DBFS)
    Normal Redundancy - 65.9tb
    High Redundancy        36.6tb

FULL RACK -

Data - Normal (8,796g / 2) * 13 usable racks = 57,174g of usable space
           High (8,796g / 3) * 12 usable racks = 35,184g of usable space

Reco - Normal (13,194g / 2) * 13 usable racks = 85,761g of usable space
           High (13,194g / 3) * 12 usable racks = 52,776g of usable space

Dbfs - Normal (290g / 2) * 13 usable racks = 1885g of usable space
         High (290g / 3) * 12 usable racks = 1160g of usable space

TOTAL usable (minus DBFS)
    Normal Redundancy - 142.9 tb
    High Reundancy        - 87.96tb

So the take I get from this is.

There is a much higher cost for redunancy levels, and this cost is higher for smaller rack systems.
A certain portion of the the cells is a small gid disk, that is only on 10 of the physical disks, and is hard to utilize well.

↧

OLTP compression slow for large data set

February 10, 2012, 11:23 am

≫ Next: Setting aside a node for maintenance on Exadata

≪ Previous: Disk space layout on your Exadata

I am working on loading a large dataset (about 500 Million rows that take up about 100g of space).

I am loading them into a partitioned table and I was hoping to use HCC compression, but at least OLTP compression.

After loading for a while, the inserts seem to go slow and slower, I was able to test my table structure with OLTP and no compression, and found that there was indeed a bottleneck with compress, but it didn't really get bad until about an hour into the procession.

My process is to do a bulk collect (in pl/sql) of 500 rows, and insert them into my partitioned table.

Below is an example.. From 7:00 am, until 9:30 (in red) I was inserting data into the table with compression off.

You can see that that the number of rows processed in each interval (15 minutes) was consistently ~30 million. for a throughput of 127 Million/hour.

Also take a look at buffer gets/exec, elapsed time/exec, and CPU time/exec. These values all remain fairly consistent after the first 45 minutes.

At the end of 2.5 hours 351 Million rows were loaded

Now compare to the Blue (10:15 - 10:45), I was inserting data into the same table with compression on.

You can see that the rows processed started at 22 Million (for the first 15 minutes), but it kept trending downward. You will also notice that the reads went up,

Compare same values (buffer gets, rows processed, cpu time), and you can see the performance just continues to degrade.

Finally, look at the Violet. This is a snapshot of the currently running load after over 250 Million of data has been loaded.

Notice that we are processing at about 10 Million rows/ hour, the buffer gets are up, and the CPU time has increased.

OLTP compression seems to be significantly slowing down the loads once they get moving along.

Anyone see this, or have any idea why it slows down ? The only theory I can come up with is "garbage collection" for the partitions.. I reach a point, where I am inserting into blocks, that haven't been compressed, and oracle is now going back and compressing the blocks to make room.

Here are the performance numbes. I've also included the AWR logic read output, If you take number of executions * buffer gets, you find that the logical reads are all from the inserts.

END_TIME	ELAPSED_TIME	EXECUTIONS	TOTAL_READS_PER_EXECUTION	ROWS_PROCESSED Total	BUFFER_GETS	CPU_TIME
2/10/2012 7:00	0.004	56,711	283	28,355,500	283	4,316
2/10/2012 7:15	0.004	83,225	262	41,612,500	262	4,206
2/10/2012 7:30	0.004	81,178	293	40,589,000	293	4,332
2/10/2012 7:45	0.007	66,630	945	33,315,000	945	6,821
2/10/2012 8:00	0.007	62,374	1,190	31,187,000	1,190	7,353
2/10/2012 8:15	0.009	58,031	1,640	29,015,500	1,640	8,912
2/10/2012 8:30	0.008	59,598	1,442	29,799,000	1,442	8,292
2/10/2012 8:45	0.009	57,116	1,648	28,558,000	1,648	8,952
2/10/2012 9:00	0.008	60,477	1,410	30,238,500	1,410	8,057
2/10/2012 9:15	0.009	56,334	1,710	28,167,000	1,710	9,060
2/10/2012 9:30	0.008	61,627	1,293	30,813,500	1,293	7,681
				351,650,500
			Throughput	127,872,909


2/10/2012 10:15	0.013	45,964	1,940	22,982,000	1,940	12,878
2/10/2012 10:30	0.019	33,048	3,014	16,524,000	3,014	19,466
2/10/2012 10:45	0.018	36,192	2,235	18,096,000	2,235	18,024
2/10/2012 11:00	0.017	37,362	1,737	18,681,000	1,737	17,507
2/10/2012 11:15	0.018	34,992	1,526	17,496,000	1,526	17,799
2/10/2012 11:30	0.036	20,757	6,253	10,378,500	6,253	35,703
2/10/2012 11:45	0.046	16,744	8,714	8,372,000	8,714	46,436
				112,529,500
			throughput	64,302,571

END_TIME	ELAPSED_TIME_DELTA	EXECUTIONS_DELTA	TOTAL_READS_PER_EXECUTION	ROWS_PROCESSED_DELTA	DISK_READS_DELTA	BUFFER_GETS	CPU_TIME
2/9/2012 22:00	0.186	4,572	33,631	2,286,000	11	33,620	171,338
2/9/2012 22:15	0.188	4,632	33,240	2,316,000	11	33,229	171,302
2/9/2012 22:30	0.19	4,545	33,574	2,272,500	10	33,564	174,641
2/9/2012 22:45	0.182	4,762	33,027	2,381,000	11	33,016	167,433

				9,255,500

Segments by Logical Reads

Total Logical Reads: 159,380,402
Captured Segments account for 99.5% of Total

Owner	Tablespace Name	Object Name	Subobject Name	Obj. Type	Logical Reads	%Total
BGRENN	BGRENN_2009	FACT_BGRENN_DETL	ERD_BGRENN_2009	TABLE PARTITION	36,379,968	22.83
BGRENN	BGRENN_2010	FACT_BGRENN_DETL	ERD_BGRENN_2010	TABLE PARTITION	35,459,344	22.25
BGRENN	BGRENN_2011	FACT_BGRENN_DETL	ERD_BGRENN_2011	TABLE PARTITION	34,801,888	21.84
BGRENN	BGRENN_2008	FACT_BGRENN_DETL	ERD_BGRENN_2008	TABLE PARTITION	33,651,856	21.11
BGRENN	BGRENN_2007	FACT_BGRENN_DETL	ERD_BGRENN_2007	TABLE PARTITION	9,641,168	6.05

↧

Setting aside a node for maintenance on Exadata

February 27, 2012, 7:47 pm

≫ Next: IOUG Real World Performance tour

≪ Previous: OLTP compression slow for large data set

Actually, this isn't exadata specific, but it becomes even more important on a multi-node cluster.

First the background.

I have a data warehouse application in which we are loading up lots of data. At the same time, we have users reporting off the data. I am finding that we actually have 2 needs, and they are opposed

USERS -- Their needs

Lots of concurrency
Small amounts of data
Small PGA
small temp
Large SGA

If the users need more than this something probably went wrong with their query..

DBA/ODI jobs

Very little concurrency (except for some parallelization)
Large amounts of data
HUGE PGA
HUGE Temp
HUGE Undo segments
Small SGA

The Temp issue is easy enough to fix with a separate temp for each user, and by setting up a temporary tablespace group for users.

But what about my the other things data load jobs need ?? The only answer seems to be to set aside 1 (or more) nodes out of my cluster for maintenance/loading. This node (or nodes) will have a different configuration. This node, let's say node 8, has the following characteristics.

The only service running on this node is my ODI (data load) service, and a service the DBA's to use for index rebuilds
PGA Automatic memory management is not enabled
work_area_size_policy is manual
sort_area_size=60g
hash_area_size=60g
undo tablespace size is set to 1tb, much, much larger than the other nodes. Undo_retention is set to a very large number.

The only work done on Node 8 will be loading of large tables, and rebuild/creation of indexes.

I was even thinking about getting the memory expansion kit for just this node, to bring it up to 144g from 96g.

Anyone else do this ? set aside a node specifically for "heavy lifting" with a different configuration ?

UPDATE --- After writing this, and looking at my load process, I noticed that most of my load time is going to temp.. Both reads, and writes since I am doing huge hashes. I am considering Dedicating SSD luns to the temp area for my ODI jobs only. I might even try dedicating SSD to the i$ (intermediate staging tables) that ODI uses.

↧

IOUG Real World Performance tour

March 10, 2012, 7:41 pm

≫ Next: Benchmarks for CPU's

≪ Previous: Setting aside a node for maintenance on Exadata

Last Thursday the Real World Performance Tour came to Rochester, NY.

I know what you're probably thinking. One of 2 things.

1) Why did it come there, aren't you a suburb of NYC (we are actually about a 7 hour drive from NYC)
or
2) Why there ? Did the cows enjoy it ?

We we had a huge turnout. There were about 90 people in attendance. For this area, that is one of the biggest attendence I have every seen. Especially since it was a paid event, and the lunch was boxed.

The Tour consists of three individuals

1) Tom Kyte... He needs no explaination.

2) Andrew Holdsorth - Head of the real world performance team. As a point of full disclosure, I've had a couple of meetings with Andrew in the past, so I already have discussed some of the topics with him in those meetings.

3) Graham Wood - Oracle Database Architect in database development. He is the person responsible for AWR reports.

The day was broken up in 2 halfs. The morning concentrated on how to manage a data warehouse, and the afternoon concentrated on OLTP. Of course the approach to both of these areas is different.

The morning covered a number of topics, especially concentrating on the challenges of a data warehouse.

Parallelization
Hash joins vs Nested loops
indexing vs FTS.

Some of the presentation talked about HCC and the exadata, but I would say in general only about 10-20% was exadata specific. No sales pitch, just reasons why it helps..

The afternoon was dedicated to the issues revolving around an OLTP system. A lot of it covered the material in the youtube video narrated by Andrew on the connections pooling, and how it affects performance.

It was a great day, and there was a lot of great material.. I have talked to Andrew before, and I've seen his videos, but I still got a lot out of the day.

If it is coming to your city, it is definately worth going to.

Here are some links to check out.

Here is Tom's presentation, but like most good presentations, the slides miss a lot.

Here are the Youtube videos from Andrew .. Thanks Andrew for creating these !

And finally, here is the upcoming schedule of events.

↧

Benchmarks for CPU's

March 18, 2012, 8:08 pm

≫ Next: Cooking for Geeks (shameless plug)

≪ Previous: IOUG Real World Performance tour

I have been doing some benchmarks on a couple of different systems for LIO's. I have been using Kevin Closson's great SLOB toolkit. You can find more information about it on his blog here.

I have been looking at 2 different systems, and here are my results
These 2 systems are both HP.

The first is an AMD 6276 server. 2 socket x 16 cores. (465 G7)

./runit.sh 0 40
The awr is posted.

Here is the summary of throughput.

oad Profile              Per Second    Per Transaction   Per Load Profile              Per Second    Per Transaction   Per Exec   Per Call
~~~~~~~~~~~~         ---------------    --------------- ---------- ----------
      DB Time(s):               37.9               46.6       0.00       5.96
       DB CPU(s):               30.3               37.2       0.00       4.76
       Redo size:           15,791.3           19,408.2
   Logical reads:       10,119,426.4       12,437,215.0
   Block changes:               83.4              102.5
  Physical reads:                0.4                0.6
 Physical writes:               11.5               14.1
      User calls:                6.4                7.8
          Parses:                3.0                3.7
     Hard parses:                0.1                0.1
W/A MB processed:                0.2                0.2
          Logons:                0.1                0.1
        Executes:           39,333.0           48,342.0
       Rollbacks:                0.0                0.0

I then looked at the new Intel E7 2870 I got. 2 socket 10 core, dual threaded (BL620 E7)

./runit.sh 0 43

the awr is here

Load Profile              Per Second    Per Transaction   Per Load Profile              Per Second    Per Transaction   Per Exec   Per Call
~~~~~~~~~~~~         ---------------    --------------- ---------- ----------
      DB Time(s):               40.9              108.6       0.00       6.93
       DB CPU(s):               37.8              100.4       0.00       6.41
       Redo size:           10,053.3           26,674.3
   Logical reads:       13,203,419.8       35,032,516.8
   Block changes:               36.9               97.9
  Physical reads:                0.0                0.0
 Physical writes:                9.6               25.4
      User calls:                5.9               15.7
          Parses:                4.0               10.7
     Hard parses:                0.0                0.0
W/A MB processed:                0.2                0.6
          Logons:                0.3                0.7
        Executes:           51,300.2          136,114.4
       Rollbacks:                0.0                0.1
    Transactions:                0.4

Look at that throughput.. the 43 process count looks to be the best throughput of over 13 Million LIOS/second

WOW the new AMD Bulldozer has great numbers, but the intel really Rocks !

↧

Cooking for Geeks (shameless plug)

March 22, 2012, 12:23 pm

≫ Next: R and visualing your execution times

≪ Previous: Benchmarks for CPU's

I am part of the O-Really Blogger review program, and long, long ago I picked up the book

"Cooking for Geeks" by Jeff Potter for review.

This was a fantastic book.. I am both a geek, and cook. This book does a great job of tying the 2 together. It not only talks about spices, and ingredients but explains in geekspeak why they work togethor.

I thoroughly enjoyed this book ,and I would recommend it to anyone that is interested in cooking, or at all interested in the chemistry, and biology behind how and why we enjoy foods.

Here is a link to the book.
http://www.amazon.com/Cooking-Geeks-Science-Great-Hacks/dp/0596805888

↧

R and visualing your execution times

March 31, 2012, 8:03 pm

≫ Next: Analyzing the last query run

≪ Previous: Cooking for Geeks (shameless plug)

Well, I think I'm a little late to the party.. I know Greg Rahn did a great post on utilizing R to visual your ash data. I figure I would do a simple example of how to build something myself to show how easy it is to utilyze R to visualize query execution times..

Well first I stated by downloading R from cran.r-project.org.

Once I downloaded R, I went to one of my databases, and found a query that had different execution times I wanted to play with. I created an output file from the query.. Here is the script I used..

set pagesize 10000
set feedback off
spool rtest.txt

select trunc((elapsed_time_delta/executions_delta)/1000000,4) avg_execution_time "AVG_EXECUTION_TIME",
       PLAN_HASH_VALUE "PLAN_HASH_VALUE",
       execution_date "EXECUTION_DATE"
from
(
select sum(elapsed_time_delta) elapsed_time_delta,
       sum(executions_delta) executions_delta,
              PLAN_HASH_VALUE,
              to_char(trunc(end_interval_time),'mm/dd/yy') execution_date
from dba_hist_sqlstat a,
     dba_hist_snapshot b
 where sql_id='19sqmxkc58wqm'
and a.snap_id=b.snap_id
and a.instance_number=b.instance_number
--and executions_delta>0
group by plan_hash_value,to_char(trunc(end_interval_time),'mm/dd/yy')
)
where executions_delta > 0
order by execution_date;
spool off

This script created a file I brought over to my pc and cleaned up the format. Here is part of the file..

AVG_EXECUTION_TIME PLAN_HASH_VALUE execution_date                                     
           20.4368       566875892 01/01/12                                     
           50.3253      4009342004 01/01/12                                     
           21.4655       566875892 01/02/12                                     
           19.8312      4009342004 01/02/12                                     
           69.9299      4009342004 01/03/12                                     
          135.7153      4009342004 01/04/12                                     
           39.3972      4009342004 01/05/12                                     
           65.2833      4009342004 01/06/12                                     
           39.8093      4009342004 01/07/12                                     
           35.8615      4009342004 01/08/12                                     
           18.7553       566875892 01/09/12                                     
          134.7431      4009342004 01/09/12                                     
           76.2954      4009342004 01/10/12                                     
          115.8707      4009342004 01/11/12                                     
           60.0754      4009342004 01/12/12                                     
          102.6432      4009342004 01/13/12                                     
           22.2528       566875892 01/14/12                                     
          119.8541      4009342004 01/14/12                                     
           21.8552       566875892 01/15/12                                     
           18.5785      4009342004 01/15/12                                     
           19.3179       566875892 01/16/12                                     
            80.794      4009342004 01/16/12                                     
           67.0872      4009342004 01/17/12                                     
          107.1604      4009342004 01/18/12                                     
           28.9797      4009342004 01/19/12

I put this file into c:\r and named it query_performance .txt.

I then went into R and ran the following commands.

setwd("c:\\r")
query_data <- read.table("query_performance.txt",header=T)


max_num <- max(query_data$AVG_EXECUTION_TIME)

hist(query_data$AVG_EXECUTION_TIME,col=heat.colors(max_num),breaks=max_num,xlim=c(0,max_num),
     right=F,main="Execution Time Histogram",las=1)

You can see I just ran a few simple commands...

setwd --- set the working directory to c:\r
read.table --- read in my space delimitted table (there is a read.csv for a comma separated file)
max_num --- is set to the maximum execution time in the file

hist -- creates a histogram of the execution times.. Check out below what comes out. Sweet !!

This was easy, and gives me a great picture of the variance in execution times.

I am going to work more with this file since it had 2 different plans I want to visual the differences.

↧

Analyzing the last query run

April 14, 2012, 9:22 pm

≫ Next: Hyperthreading with Oracle (update)

≪ Previous: R and visualing your execution times

I don't know how many times I find myself trying to tune a query, and going through the same tasks.

I run the sql in my session (most likely on sqlplus on the unix server), but I then try to track down the sql_id, and finally (if I am lucky) find the SQL Monitor report (tuning pack license required).

So I came up with a script that I am putting on all my severs. post_execute.sql

This script can be executed in your shell as the next step after running the query in question.. but first.

If you want to use the SQL Monitor output, you are best adding the /*+monitor */ hint to your query just to be sure it is captured.

Well this is what the script does.

1) run your query.. I used this query for my example.

select /*+monitor */ a.owner,b.table_name ,sum(bytes) from dba_segments a,
                                     dba_tables b
where a.owner=b.owner and 
      a.segment_name=b.table_name 
group by a.owner,b.table_name
order by a.owner,b.table_name;

2) It captures the dbms_xplan.display_cursor output for the last run sql.. Here is snippet of what the output looked like..

Plan information for current query




SQL_ID  208ttrm2nbt0u, child number 0
-------------------------------------
select /*+monitor */ a.owner,b.table_name ,sum(bytes) from dba_segments 
a,                                      dba_tables b where 
a.owner=b.owner and       a.segment_name=b.table_name group by 
a.owner,b.table_name order by a.owner,b.table_name

Plan hash value: 1601540958

--------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                            |                |       |       | 14425 (100)|          |
|   1 |  SORT GROUP BY                              |                |     6 |  2184 | 14425   (1)| 00:02:54 |
|*  2 |   HASH JOIN OUTER                           |                |     6 |  2184 | 14424   (1)| 00:02:54 |
|   3 |    NESTED LOOPS OUTER                       |                |     6 |  2160 | 14422   (1)| 00:02:54 |
|   4 |     NESTED LOOPS OUTER                      |                |     6 |  2106 | 14410   (1)| 00:02:53 |
|   5 |      NESTED LOOPS OUTER                     |                |     6 |  2070 | 14408   (1)| 00:02:53 |
|   6 |       NESTED LOOPS                          |                |     6 |  1998 | 14396   (1)| 00:02:53 |
|   7 |        NESTED LOOPS                         |                |     6 |  1974 | 14390   (1)| 00:02:53 |
|*  8 |         HASH JOIN                           |                |    67 | 19899 | 14318   (1)| 00:02:52 |
|*  9 |          HASH JOIN                          |                |  5095 |  1298K| 13638   (1)| 00:02:44 |
|  10 |           MERGE JOIN CARTESIAN              |                |   131 | 11135 |     5  (20)| 00:00:01 |
|* 11 |            HASH JOIN                        |                |     1 |    68 |     1 (100)| 00:00:01 |
|* 12 |             FIXED TABLE FULL                | X$KSPPI        |     1 |    55 |     0   (0)|          |
|  13 |             FIXED TABLE FULL                | X$KSPPCV       |   100 |  1300 |     0   (0)|          |
|  14 |            BUFFER SORT                      |                |   131 |  2227 |     5  (20)| 00:00:01 |
|  15 |             TABLE ACCESS STORAGE FULL       | USER$          |   131 |  2227 |     4   (0)| 00:00:01 |
|  16 |           VIEW                              | SYS_DBA_SEGS   |  5095 |   875K| 13633   (1)| 00:02:44 |
|  17 |            UNION-ALL                        |                |       |       |            |          |
|* 18 |             HASH JOIN RIGHT OUTER           |                |   672 |   103K| 12305   (1)| 00:02:28 |
|  19 |              TABLE ACCESS STORAGE FULL      | USER$          |   131 |  2227 |     4   (0)| 00:00:01 |
|* 20 |              HASH JOIN                      |                |   672 | 94752 | 12301   (1)| 00:02:28 |
|  21 |               NESTED LOOPS                  |                |   672 | 68544 | 11623   (1)| 00:02:20 |
|* 22 |                HASH JOIN                    |                |   672 | 63168 | 11623   (1)| 00:02:20 |
|  23 |                 TABLE ACCESS STORAGE FULL   | TS$            |   167 |  1336 |    48   (0)| 00:00:01 |
|* 24 |                 HASH JOIN                   |                |   672 | 57792 | 11574   (1)| 00:02:19 |
|  25 |                  TABLE ACCESS STORAGE FULL  | SEG$           | 16421 |   449K|   821   (1)| 00:00:10 |
|  26 |                  VIEW                       | SYS_OBJECTS    | 21369 |  1210K| 10752   (1)| 00:02:10 |
|  27 |                   UNION-ALL                 |                |       |       |            |          |
|* 28 |                    TABLE ACCESS STORAGE FULL| TAB$           |  7284 |   177K|  2681   (1)| 00:00:33 |
|  29 |                    TABLE ACCESS STORAGE FULL| TABPART$       |  1525 | 28975 |    12   (0)| 00:00:01 |
|  30 |                    TABLE ACCESS STORAGE FULL| CLU$           |    10 |   140 |  2680   (1)| 00:00:33 |
|* 31 |                    TABLE ACCESS STORAGE FULL| IND$           |  9647 |   188K|  2681   (1)| 00:00:33 |
|  32 |                    TABLE ACCESS STORAGE FULL| INDPART$       |  1322 | 25118 |    12   (0)| 00:00:01 |
|* 33 |                    TABLE ACCESS STORAGE FULL| LOB$           |  1547 | 32487 |  2680   (1)| 00:00:33 |
|  34 |                    TABLE ACCESS STORAGE FULL| TABSUBPART$    |    32 |   448 |     2   (0)| 00:00:01 |
|  35 |                    TABLE ACCESS STORAGE FULL| INDSUBPART$    |     1 |    52 |     2   (0)| 00:00:01 |
|  36 |                    TABLE ACCESS STORAGE FULL| LOBFRAG$       |     1 |    17 |     2   (0)| 00:00:01 |
|* 37 |                INDEX UNIQUE SCAN            | I_FILE2        |     1 |     8 |     0   (0)|          |
|  38 |               TABLE ACCESS STORAGE FULL     | OBJ$           | 88138 |  3356K|   677   (1)| 00:00:09 |
|  39 |             NESTED LOOPS                    |                |    34 |  3638 |   450   (1)| 00:00:06 |
|  40 |              NESTED LOOPS                   |                |    34 |  3366 |   450   (1)| 00:00:06 |
|* 41 |               HASH JOIN OUTER               |                |    34 |  3094 |   416   (1)| 00:00:05 |
|  42 |                NESTED LOOPS                 |                |    34 |  2516 |   411   (0)| 00:00:05 |
|* 43 |                 TABLE ACCESS STORAGE FULL   | UNDO$          |   204 |  8568 |     3   (0)| 00:00:01 |
|* 44 |                 TABLE ACCESS CLUSTER        | SEG$           |     1 |    32 |     2   (0)| 00:00:01 |
|* 45 |                  INDEX UNIQUE SCAN          | I_FILE#_BLOCK# |     1 |       |     1   (0)| 00:00:01 |
|  46 |                TABLE ACCESS STORAGE FULL    | USER$          |   131 |  2227 |     4   (0)| 00:00:01 |
|  47 |               TABLE ACCESS CLUSTER          | TS$            |     1 |     8 |     1   (0)| 00:00:01 |
|* 48 |                INDEX UNIQUE SCAN            | I_TS#          |     1 |       |     0   (0)|          |
|* 49 |              INDEX UNIQUE SCAN              | I_FILE2        |     1 |     8 |     0   (0)|          |
|* 50 |             HASH JOIN                       |                |  4389 |   321K|   878   (1)| 00:00:11 |
|  51 |              TABLE ACCESS STORAGE FULL      | FILE$          |   569 |  6828 |     3   (0)| 00:00:01 |
|* 52 |              HASH JOIN RIGHT OUTER          |                |  4389 |   270K|   874   (1)| 00:00:11 |
|  53 |               TABLE ACCESS STORAGE FULL     | USER$          |   131 |  2227 |     4   (0)| 00:00:01 |
|* 54 |               HASH JOIN                     |                |  4389 |   197K|   870   (1)| 00:00:11 |
|  55 |                TABLE ACCESS STORAGE FULL    | TS$            |   167 |  1336 |    48   (0)| 00:00:01 |
|* 56 |                TABLE ACCESS STORAGE FULL    | SEG$           |  4389 |   162K|   821   (1)| 00:00:10 |
|* 57 |          TABLE ACCESS STORAGE FULL          | OBJ$           | 88138 |  3098K|   679   (1)| 00:00:09 |
|* 58 |         TABLE ACCESS CLUSTER                | TAB$           |     1 |    32 |     2   (0)| 00:00:01 |
|* 59 |          INDEX UNIQUE SCAN                  | I_OBJ#         |     1 |       |     1   (0)| 00:00:01 |
|  60 |        TABLE ACCESS CLUSTER                 | TS$            |     1 |     4 |     1   (0)| 00:00:01 |
|* 61 |         INDEX UNIQUE SCAN                   | I_TS#          |     1 |       |     0   (0)|          |
|  62 |       TABLE ACCESS CLUSTER                  | SEG$           |     1 |    12 |     2   (0)| 00:00:01 |
|* 63 |        INDEX UNIQUE SCAN                    | I_FILE#_BLOCK# |     1 |       |     1   (0)| 00:00:01 |
|* 64 |      INDEX RANGE SCAN                       | I_OBJ1         |     1 |     6 |     2   (0)| 00:00:01 |
|* 65 |     INDEX RANGE SCAN                        | I_OBJ1         |     1 |     9 |     2   (0)| 00:00:01 |
|  66 |    INDEX FULL SCAN                          | I_USER2        |   131 |   524 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CX"."OWNER#"="CU"."USER#")
   8 - access("SEGMENT_NAME"="O"."NAME" AND "O"."OWNER#"="U"."USER#")
   9 - access("OWNER"="U"."NAME")
  11 - access("KSPPI"."INDX"="KSPPCV"."INDX")
  12 - filter("KSPPI"."KSPPINM"='_dml_monitoring_enabled')
  18 - access("O"."OWNER#"="U"."USER#")
  20 - access("O"."OBJ#"="SO"."OBJECT_ID" AND "O"."TYPE#"="SO"."OBJECT_TYPE_ID")
  22 - access("S"."TS#"="TS"."TS#")
  24 - access("S"."FILE#"="SO"."HEADER_FILE" AND "S"."BLOCK#"="SO"."HEADER_BLOCK" AND 
              "S"."TS#"="SO"."TS_NUMBER" AND "S"."TYPE#"="SO"."SEGMENT_TYPE_ID")
  28 - filter(BITAND("T"."PROPERTY",1024)=0)
  31 - filter(("I"."TYPE#"=1 OR "I"."TYPE#"=2 OR "I"."TYPE#"=3 OR "I"."TYPE#"=4 OR "I"."TYPE#"=6 OR 
              "I"."TYPE#"=7 OR "I"."TYPE#"=8 OR "I"."TYPE#"=9))
  33 - filter((BITAND("L"."PROPERTY",64)=0 OR BITAND("L"."PROPERTY",128)=128))
  37 - access("S"."TS#"="F"."TS#" AND "S"."FILE#"="F"."RELFILE#")
  41 - access("S"."USER#"="U"."USER#")
  43 - storage("UN"."STATUS$"<>1)
       filter("UN"."STATUS$"<>1)
  44 - filter(("S"."TYPE#"=1 OR "S"."TYPE#"=10))
  45 - access("S"."TS#"="UN"."TS#" AND "S"."FILE#"="UN"."FILE#" AND "S"."BLOCK#"="UN"."BLOCK#")
  48 - access("S"."TS#"="TS"."TS#")
  49 - access("UN"."TS#"="F"."TS#" AND "UN"."FILE#"="F"."RELFILE#")
  50 - access("S"."TS#"="F"."TS#" AND "S"."FILE#"="F"."RELFILE#")
  52 - access("S"."USER#"="U"."USER#")
  54 - access("S"."TS#"="TS"."TS#")
  56 - filter(("S"."TYPE#"<>6 AND "S"."TYPE#"<>5 AND "S"."TYPE#"<>8 AND "S"."TYPE#"<>10 AND 
              "S"."TYPE#"<>1))
  57 - storage(BITAND("O"."FLAGS",128)=0)
       filter(BITAND("O"."FLAGS",128)=0)
  58 - filter(BITAND("T"."PROPERTY",1)=0)
  59 - access("O"."OBJ#"="T"."OBJ#")
  61 - access("T"."TS#"="TS"."TS#")
  63 - access("T"."TS#"="S"."TS#" AND "T"."FILE#"="S"."FILE#" AND "T"."BLOCK#"="S"."BLOCK#")
  64 - access("T"."BOBJ#"="CO"."OBJ#")
  65 - access("T"."DATAOBJ#"="CX"."OBJ#")

3) It captures some execution statistics that I like to look at .. Here is the output below.





Summary of query stats





 ************** plan_hash_value       = 1188305021
 ************** sql_id               = 208ttrm2nbt0u

avg_elapsed_time      = 2.59
total_executions      = 2
avg_rows_processed    = 5372
avg_disk_reads        = 202
avg_buffer_gets       = 189970
avg_cpu_time          = 2426631
avg_iowait            = 99965
avg_cluster_wait      = 245947
avg_direct_writes     = 0
avg_plssql_exec_time  = 162398
avg_cell_offload      = 0

And finally it creates the beautiful sql_monitor output we are all used to.

Pretty cool huh ? It creates these 3 pieces based on the last sql executed in the session.. Now this is the best part.. At the end of the script is this code.

!echo "From: bryan.grenn@oracle.com"  > /tmp/file
!echo "To: bryan.grenn@oracle.com"   >> /tmp/file
!echo "Subject: analyze output sql_monitor"   >> /tmp/file
!echo "Mime-Version: 1.0"      >> /tmp/file
!echo 'Content-Type: multipart/mixed; boundary="DMW.Boundary.605592468"'   >> /tmp/file
!echo "--DMW.Boundary.605592468" >> /tmp/file
!echo " " >> /tmp/file
!echo " analyze report sql_monitor " >> /tmp/file
!echo " " >> /tmp/file
!echo "--DMW.Boundary.605592468" >> /tmp/file
!echo 'Content-Disposition: inline; filename="analyze_dbms_xplan.txt"' >> /tmp/file
!echo "Content-Transfer-Encoding: 7bit" >> /tmp/file
!cat /tmp/analyze_dbms_xplan.txt >> /tmp/file
!echo "--DMW.Boundary.605592468" >> /tmp/file

!echo 'Content-Disposition: inline; filename="analyze_query_stats.txt"' >> /tmp/file
!echo "Content-Transfer-Encoding: 7bit" >> /tmp/file
!cat /tmp/analyze_query_stats.txt >> /tmp/file
!echo "--DMW.Boundary.605592468" >> /tmp/file

!echo 'Content-Disposition: inline; filename="analyze_sql_monitor.htm"' >> /tmp/file
!echo "Content-Transfer-Encoding: 7bit" >> /tmp/file
!cat /tmp/analyze_html.htm >> /tmp/file
!echo "--DMW.Boundary.605592468" >> /tmp/file

!/usr/sbin/sendmail bryan.grenn@oracle.com< /tmp/file

This code will take the 3 output files, and mail them to you as attachments. Notice my e-mail is in the script, so change it to what you need to get it going.

I'm going to use the often, by putting it on all my servers and just running it after any stubborn sql show up. I will be instantly e-mailed the mail pieces I need to figure out what is going on with the sql.

↧

Hyperthreading with Oracle (update)

May 4, 2012, 12:28 pm

≫ Next: Analytical functions

≪ Previous: Analyzing the last query run

After my first post on hyperthreading, and all the hits I've been getting I've decided to update it..

My first post was pased on testing with a DL980.. This is an 8 socket server with X7560 @ 2.27GHz processes.

My currently updated post is on a new 2 socket server with the E7- 2870 @ 2.40GHz chipset.

The servers I tested on were

2 Socket
10 Core (dual threaded)
11.2.0.3 Oracle
Linux RHEL 2.6.18-274.17.1.el5
132g of ram.

I tested by using Kevin Clossons SLOB test which can be found here.

I tested using multiple process settings, and you see how these servers scaled up with the processes.

I warmed up with 15 processes. Looking at the LIO count, we are doing 7.2 million. output

Load Profile              Per Second    Per Transaction   Per Exec   Per Call
~~~~~~~~~~~~         ---------------    --------------- ---------- ----------
      DB Time(s):               14.4               24.9       0.00       4.60
       DB CPU(s):               14.3               24.7       0.00       4.56
       Redo size:           13,974.5           24,137.1
   Logical reads:        7,259,470.5       12,538,708.3
   Block changes:               32.9               56.8
  Physical reads:               69.9              120.8
 Physical writes:               15.8               27.2
      User calls:                3.1                5.4
          Parses:                3.8                6.6
     Hard parses:                0.0                0.0
W/A MB processed:                0.4                0.7
          Logons:                0.1                0.2
        Executes:           28,206.8           48,719.3
       Rollbacks:                0.0                0.0
    Transactions:                0.6

Then 20 processes. 8.7 million still looking good. output

Load Profile              Per Second    Per Transaction   Per Exec   Per Call
~~~~~~~~~~~~         ---------------    --------------- ---------- ----------
      DB Time(s):               18.9              127.0       0.00      10.83
       DB CPU(s):               18.9              126.8       0.00      10.82
       Redo size:           15,073.6          101,207.6
   Logical reads:        8,714,693.4       58,512,431.8
   Block changes:               39.2              263.4
  Physical reads:                2.1               13.9
 Physical writes:               13.0               87.0
      User calls:                1.8               11.7
          Parses:                3.8               25.6
     Hard parses:                0.0                0.1
W/A MB processed:                0.4                2.5
          Logons:                0.1                0.4
        Executes:           33,859.7          227,341.5
       Rollbacks:                0.0                0.0
    Transactions:                0.2

Now lets try 25.. see how we go past the number of cores . 9.4 million.. Hyperthreading is scaling nicely. output

Load Profile              Per Second    Per Transaction   Per Exec   Per Call
~~~~~~~~~~~~         ---------------    --------------- ---------- ----------
      DB Time(s):               23.1               91.7       0.00       9.26
       DB CPU(s):               23.0               91.6       0.00       9.24
       Redo size:           21,634.2           86,063.7
   Logical reads:        9,406,658.2       37,420,998.8
   Block changes:               68.2              271.4
  Physical reads:                3.7               14.8
 Physical writes:                5.7               22.5
      User calls:                2.5                9.9
          Parses:                3.3               13.3
     Hard parses:                0.1                0.3
W/A MB processed:                0.4                1.6
          Logons:                0.1                0.5
        Executes:           36,544.7          145,380.0
       Rollbacks:                0.0                0.0
    Transactions:                0.3

Next 35.. Still scaling... output

Load Profile              Per Second    Per Transaction   Per Exec   Per Call
~~~~~~~~~~~~         ---------------    --------------- ---------- ----------
      DB Time(s):               33.5              310.9       0.00      18.74
       DB CPU(s):               33.5              310.3       0.00      18.70
       Redo size:            9,379.2           86,997.8
   Logical reads:       11,039,158.6      102,395,221.3
   Block changes:               17.7              164.4
  Physical reads:                1.3               12.0
 Physical writes:                4.0               36.9
      User calls:                1.8               16.6
          Parses:                2.0               18.9
     Hard parses:                0.1                1.0
W/A MB processed:                0.2                2.2
          Logons:                0.1                0.6
        Executes:           42,882.8          397,765.4
       Rollbacks:                0.0                0.0
    Transactions:                0.1

Finally the number of threads. This appears to be the peak. output

Load Profile              Per Second    Per Transaction   Per Exec   Per Call
~~~~~~~~~~~~         ---------------    --------------- ---------- ----------
      DB Time(s):               38.3              387.0       0.00      17.48
       DB CPU(s):               37.6              380.4       0.00      17.18
       Redo size:            9,207.1           93,060.2
   Logical reads:       11,577,951.1      117,023,088.1
   Block changes:               16.4              166.1
  Physical reads:                1.1               11.5
 Physical writes:                4.5               45.1
      User calls:                2.2               22.1
          Parses:                2.0               19.9
     Hard parses:                0.1                0.6
W/A MB processed:                0.2                2.5
          Logons:                0.1                0.9
        Executes:           44,975.6          454,587.0
       Rollbacks:                0.0                0.0
    Transactions:                0.1

Now lets go up to 45 ... things start dropping off. output

oad Profile              Per Second    Per Transaction   Per Exec   Per Call
~~~~~~~~~~~~         ---------------    --------------- ---------- ----------
      DB Time(s):               40.7              482.9       0.00      16.78
       DB CPU(s):               36.0              427.5       0.00      14.86
       Redo size:            8,205.8           97,318.2
   Logical reads:       11,100,719.6      131,651,002.6
   Block changes:               14.0              165.9
  Physical reads:                0.9               10.3
 Physical writes:                3.2               38.3
      User calls:                2.4               28.8
          Parses:                2.3               27.5
     Hard parses:                0.1                0.6
W/A MB processed:                0.3                3.6
          Logons:                0.1                1.4
        Executes:           43,122.5          511,418.6
       Rollbacks:                0.0                0.0
    Transactions:                0.1

So if you look at the throughput..

20 processes 8,7 Million LIO's
40 Processes 11.6 Million LIO's

Not quite linaear progression, but things don't really top off until I get to the number of threads.. This scaled much better, and it seems that hyperthreading is helpgin push more workload through.

↧

Analytical functions

May 10, 2012, 12:05 pm

≫ Next: Where is my space on DBFS

≪ Previous: Hyperthreading with Oracle (update)

I have been working on a query that is uses an analytical function..
I’m not familiar with them, but using my usual query tuning got me into some trouble..

This is the query.

WITH
SACOMMON71991 AS (select T17189.SUR_REF_TIME_PERD_ID as SUR_REF_TIME_PERD_ID,
    T17189.RTP_CAL_DT as c4,
    T17189.RTP_CAL_YR as c5,
    ROW_NUMBER() OVER (PARTITION BY T17189.RTP_CAL_YR ORDER BY
T17189.RTP_CAL_YR DESC) as c6,
    ROW_NUMBER() OVER (PARTITION BY T17189.RTP_CAL_YR,
T17189.SUR_REF_TIME_PERD_ID ORDER BY T17189.RTP_CAL_YR DESC,
T17189.SUR_REF_TIME_PERD_ID DESC) as c7
from
    REF_TIME_PERD T17189 /* Dim_TIME_PERD_D_Check_Date */ )
select * from SACOMMON71991 where SUR_REF_TIME_PERD_ID=20240213

The data in the table looks like this.

SUR_REF_TIME_PERD_ID varchar(8)
RPT_CAL_DT date
RPT_CAL_YR varchar(4)

Some example values are.

SUR_REF_TIME_PERD_ID RPT_CAL_DT RPT_CAL_YR
19000101       01/01/1900    1900
…
20120101       01/01/2012    2012
-..
--
20120125    01/25/2012    2012
..
20120430    04/30/2012    2012
..
20240213    02/13/2024    2024
..
47121231       12/31/4712       4712

Now back to my query..

WITH
SACOMMON71991 AS (select T17189.SUR_REF_TIME_PERD_ID as SUR_REF_TIME_PERD_ID,
    T17189.RTP_CAL_DT as c4,
    T17189.RTP_CAL_YR as c5,
    ROW_NUMBER() OVER (PARTITION BY T17189.RTP_CAL_YR ORDER BY T17189.RTP_CAL_YR DESC) as c6,
    ROW_NUMBER() OVER (PARTITION BY T17189.RTP_CAL_YR,T17189.SUR_REF_TIME_PERD_ID ORDER BY T17189.RTP_CAL_YR DESC,T17189.SUR_REF_TIME_PERD_ID DESC) as c7
from
    REF_TIME_PERD T17189 /* Dim_TIME_PERD_D_Check_Date */ )
select * from SACOMMON71991 where SUR_REF_TIME_PERD_ID=20240213

I noticed that the query did a full table scan of 219,000 rows.

I figured, well I only wanted 1 row, so why can’t I just change the
query ??

select T17189.SUR_REF_TIME_PERD_ID as SUR_REF_TIME_PERD_ID,
     T17189.RTP_CAL_DT as c4,
     T17189.RTP_CAL_YR as c5,
     ROW_NUMBER() OVER (PARTITION BY T17189.RTP_CAL_YR ORDER BY T17189.RTP_CAL_YR DESC) as c6,
     ROW_NUMBER() OVER (PARTITION BY T17189.RTP_CAL_YR, T17189.SUR_REF_TIME_PERD_ID ORDER BY T17189.RTP_CAL_YR DESC, T17189.SUR_REF_TIME_PERD_ID DESC) as c7
from 
     V_REF_TIME_PERD T17189  where SUR_REF_TIME_PERD_ID=20240213

My plan looks much better

But the answer is different ??

The first query returns.

SUR_REF_TIME_PERD_ID C4                            C5         C6         C7
-------------------- --------------------- ---------- ---------- ----------
            20240213 2024-02-13 00:00:00         2024         44          1

The second query returns



SUR_REF_TIME_PERD_ID C4                            C5         C6         C7
-------------------- --------------------- ---------- ---------- ----------
            20240213 2024-02-13 00:00:00         2024          1          1

So what exactly is the query doing ???

The query is actually analyzing more data than just my row. After examining the query, I noticed the 4th column (C6) is actually the day of the year. 2/13 is the 44th day of the year.

So what exactly is the query doing to get this ???

It is starting with all rows in the table, then sorting them by RPT_CAL_YR, and SUR_REF_TIME_PERD_ID. This is column c7 in table.. If we look back we see this column is

ROW_NUMBER() OVER (PARTITION BY T17189.RTP_CAL_YR, T17189.SUR_REF_TIME_PERD_ID ORDER BY T17189.RTP_CAL_YR DESC, T17189.SUR_REF_TIME_PERD_ID DESC) as c7

This column does the sorting to get the result.. Now if you look at C6, you see that it is a row_number over just the 1 column RPT_CAL_YR. Since the rows are now sorted in data order (thanks to c7), C6 is just the row number for the value passed to the query..

I know it gets kind of complicated.. The lesson learned from this, is that analytical functions sometimes need to do FTS to do their work. By giving it the key value within the select, I removed the queries ability to look at the full tables data.

I also learned that by properly limiting the values in the right spot, you can eliminate rows. Here is a better query. I am limiting the analytical function to just look at the current year.

   WITH
SACOMMON71991 AS (select T17189.SUR_REF_TIME_PERD_ID as SUR_REF_TIME_PERD_ID,
    T17189.RTP_CAL_DT as c4,
    T17189.RTP_CAL_YR as c5,
    ROW_NUMBER() OVER (PARTITION BY T17189.RTP_CAL_YR ORDER BY
T17189.RTP_CAL_YR DESC) as c6,
    ROW_NUMBER() OVER (PARTITION BY T17189.RTP_CAL_YR,
T17189.SUR_REF_TIME_PERD_ID ORDER BY T17189.RTP_CAL_YR DESC,
T17189.SUR_REF_TIME_PERD_ID DESC) as c7
from
    REF_TIME_PERD T17189 /* Dim_TIME_PERD_D_Check_Date */ 
    where RTP_CAL_YR='2024')
select * from SACOMMON71991 where SUR_REF_TIME_PERD_ID=20240213

The plan is going through less data, but the appropriate data to get the right answer.

↧

Where is my space on DBFS

May 18, 2012, 1:00 pm

≫ Next: Problem debugging for DBFS

≪ Previous: Analytical functions

I just ran into an issue on DBFS where I ran out of space.

First here is the df -k


Filesystem           1K-blocks      Used Available Use% Mounted on
dbfs-dbfs_admin2@:/   20983808  11443696   9540112  55% /dbfs/dba

OK, everything looks good.. I am using 11g and I have 9.5g available.

I go to copy a file on the os (you can see it is 240m). Lots of room

 ls -al bsg.out
-rw-r--r-- 1 oracle oinstall 240794862 May 18 11:37 bsg.out

cp bsg.out bsg.out1
cp: writing `bsg.out1': No space left on device
cp: closing `bsg.out1': No space left on device

So where is my space. ?? I find this query..

set serveroutput on;
declare
v_segment_size_blocks number;
v_segment_size_bytes number;
v_ number;
v_used_blocks number;
v_used_bytes number;
v_expired_blocks number;
v_expired_bytes number;
v_unexpired_blocks number;
v_unexpired_bytes number;
begin
dbms_space.space_usage ('DBFS_OWNER', 'LOB_SFS$_FST_12345', 'LOB', 
v_segment_size_blocks, v_segment_size_bytes,
v_used_blocks, v_used_bytes, v_expired_blocks, v_expired_bytes, 
v_unexpired_blocks, v_unexpired_bytes );
dbms_output.put_line('Segment Size blocks = '||v_segment_size_blocks);
dbms_output.put_line('Segment Size bytes = '||v_segment_size_bytes);
dbms_output.put_line('Used blocks = '||v_used_blocks);
dbms_output.put_line('Used bytes = '||v_used_bytes);
dbms_output.put_line('Expired Blocks = '||v_expired_blocks);
dbms_output.put_line('Expired Bytes = '||v_expired_bytes);
dbms_output.put_line('UNExpired Blocks = '||v_unexpired_blocks);
dbms_output.put_line('UNExpired Bytes = '||v_unexpired_bytes);
end;
/

And I see this output

Segment Size blocks = 2619024
Segment Size bytes = 21455044608
Used blocks = 1425916
Used bytes = 11681103872
Expired Blocks = 1190111
Expired Bytes = 9749389312
UNExpired Blocks = 0
UNExpired Bytes = 0

So.. according to this.. The segment is 21.4 g

11.7g is used space
9.7g is Expired space
0g is unexpired space.

So if I have 9.7g of Expired space why can't I use it ?? My file is only 244m, and I should have 9.7 g available.

So my questions out of this are (if anyone knows the answer).

1) How does this happen and how do I avoid it ?

2) How do I size tablespaces for DBFS ? They need more space to be available then I need for the file system.

3) How do I monitor the sizing since the DF -k does not report unexpired bytes that are available to be used ?

4) How does the clause "retention" fit into this ? retention defaults to "auto" rather than "none". Can I set it to "none", but what happens and does this solve my problem ?

Oh, and I wanted to make sure that I included the ouput of whats in the tablespace.

SEGMENT_NAME                                             SEGMENT_TYPE       SEGMENT_SIZE
-------------------------------------------------------- ------------------ ------------
LOB_SFS$_FST_12345                                       LOBSEGMENT                20461
T_ADMIN                                                  TABLE                        17
IP_SFS$_FST_12345                                        INDEX                         4
IPG_SFS$_FST_12345                                       INDEX                         3
IG_SFS$_FST_12345                                        INDEX                         2
SYS_IL0000095835C00007$$                                 LOBINDEX                      0

UPDATE (6/13/12) --

After working with support on this, it was filed as a bug. This occured because I was using DBFS as a filesytem for my dbreplay capture. After thinking about it the dbcapture is probably the most intensive workload I could throw at DBFS. Not only does it simultaneously write to multiple files, but it writes to those files across multiple nodes. In my capture there were 4 nodes writing to 100's of files at the same time.
I will be testing the patch, and see if it corrects the problem. Support is telling me that the writing across multiple nodes is causing some of the issues..

↧

Problem debugging for DBFS

May 23, 2012, 7:51 pm

≫ Next: Taking a career change

≪ Previous: Where is my space on DBFS

In trying to find out the cause of a DBFS issue, I learned what is expected (or helpful) when working with Support on DBFS issues.

1) Logon trigger for dbfs user to create a trace file..

CREATE OR REPLACE TRIGGER DBFS_LOGON
AFTER LOGON
ON DATABASE
declare
username VARCHAR2(30);
BEGIN
  username:=SYS_CONTEXT('USERENV','SESSION_USER');
  IF username like 'FOO' then
    dbms_dbfs_content.setTrace(3);
    execute immediate 'alter session set events ''45050 trace name context forever, level 0xfffff'' ';
  END IF;
EXCEPTION
   WHEN OTHERS THEN
      NULL;
END;
/

2) Start the DBFS client with tracing turned on. (-otrace_file=,trace_level=2,trace_size=0). (see How to trace DBFS when any failure happens [ID 1320683.1])

Remember if you are running DBFS, you are probably on a multi-node clustered environment, so you only need to do these steps on one of the nodes to gather the data. I turned the logon trigger on, remounted the FS with tracing.. reproduced issue. Verified the log files were created, disabled trigger, remounted without tracing.. I did this on only one node, and gathered what I needed with minimal issues.

↧

Taking a career change

June 24, 2012, 7:14 pm

≫ Next: AWR compare report

≪ Previous: Problem debugging for DBFS

Well, I have decided to make a change and take a job with oracle. I am very excited about this move, and I look forward to being more involved in big data. As any of you know (who have read my blog posts), I have taken a strong interest in this area. I know I'm not the only one. You probably have heard the terms "Data Scientist"... Hadoop... R.. These are all the areas that I'm going to be delving into in my new position.
I will continue to blog, probably mostly about the same topics I blog about now. I am looking forward to this change, and becoming part of this evolution. Many people are saying that Big Data is the next big change (like the internet), whether this is true or not, we shall see.

↧

AWR compare report

June 29, 2012, 10:15 am

≫ Next: What happened to my sql (sql_id) ?

≪ Previous: Taking a career change

I came across this while doing some dbreplays, and found it very useful.

First, lets say you have a RAC cluster, and you want to do some performance comparisons .. What's one of the issues you run into ? For me it is trying to figure out which nodes I care about, and running the AWR report for that node. This is exasperated with a Full Rack Exadata. 8 nodes to compare. Well this is what I use to compare 2 time periods across all nodes. I also increase some of the reporting thresholds..

First the script to gather the report. (here)

To get this to work change the following

dbid - dbid for the first time period
begin_snap - begin snap first time period
end_snap - end snap first time period

dbid2 - dbid for the second time period
begin_snap2 - begin snap second time period
end_snap2 - end snap second time period

Also notice that I changed top_n_** values to give me more data

Rem    NAME
Rem      awr_full.sql - Workload Repository Global Compare Periods Report
Rem
Rem    DESCRIPTION
Rem      RAC Version of Compare Period Report
Rem
Rem
Rem    NOTES
Rem      Run as SYSDBA.  Generally this script should be invoked by awrgdrpt,
Rem      unless you want to pick a database and/or specific instances
Rem      other than the default.
Rem
Rem      If you want to use this script in an non-interactive fashion,
Rem      without executing the script through awrgdrpt, then
Rem      do something similar to the following:
Rem
      define  num_days     = 0;
      define  dbid         =2415508472; 
      define  instance_numbers_or_ALL    = 'ALL';
      define  begin_snap   = 35727;
      define  end_snap     = 35728;
      define  num_days2    = 0;
      define  dbid2        = 2415508472;
      define  instance_numbers_or_ALL2    = 'ALL';
      define  begin_snap2  = 35728;
      define  end_snap2    = 35729; 
      define  report_type  = 'html';
      define  report_name  = /tmp/awr_report.html
      define top_n_files        = 50;
      define top_n_segments     = 50;
      define top_n_services     = 50;
      define top_n_sql          = 100;
      @@?/rdbms/admin/awrgdrpi
!./mail_full.sql
exit

The second to last line of the script is to mail the report, and the script is here.

echo "From: replay_report@oracle.com"  > /tmp/mailfilebsg
echo "To: raddba@oracle.com"   >> /tmp/mailfilebsg
echo "Subject: DBREPLAY output "   >> /tmp/mailfilebsg
echo "Mime-Version: 1.0"      >> /tmp/mailfilebsg
echo 'Content-Type: multipart/mixed; boundary="DMW.Boundary.605592468"'   >> /tmp/mailfilebsg
echo "--DMW.Boundary.605592468" >> /tmp/mailfilebsg
echo " " >> /tmp/mailfilebsg
echo " dbreplay output " >> /tmp/mailfilebsg
echo " " >> /tmp/mailfilebsg
echo "--DMW.Boundary.605592468" >> /tmp/mailfilebsg
echo 'Content-Disposition: inline; filename="dbreplay.html"' >> /tmp/mailfilebsg
echo "Content-Transfer-Encoding: 7bit" >> /tmp/mailfilebsg
cat /tmp/awr_report.html >> /tmp/mailfilebsg
echo "--DMW.Boundary.605592468" >> /tmp/mailfilebsg
/usr/sbin/sendmail bryan..grenn@oracle.com< /tmp/mailfilebsg

The second script will mail you the output as an attachement. So when using it, be sure to make the E-mail address yours, and change the subject, and filename to be what you want.

Enjoy.

↧

What happened to my sql (sql_id) ?

July 6, 2012, 10:38 am

≫ Next: Exadata tips

≪ Previous: AWR compare report

While finishing up a few things, I ran across a query that wasn't playing nicely. It had 4 different plans over the course of the last couple of days, and I wanted to see what happend.. I came up with the nifty query below. If you plug in a sql_id, it will go through the AWR history, and return (ordered by date last executed), the plans grouped by plan_hash_value. Within each plan_hash_value it will give you the objects in the plan, and when they were last analyzed. By using this you should see what plans are good, when they were last executed, and if anything was analyzed to change the plan.

set linesize 160
set pagesize 1000
break on plan_hash_value skip 1 nodup  on last_executed skip 1 nodup  on avg_exec_time skip 1
select object_owner ||'.'|| object_name object_name,
object_type,
a.plan_hash_value,
case object_type
  when 'INDEX' then (select last_analyzed from dba_indexes b where owner=object_owner and index_name=object_name)
  when 'TABLE'  then (select last_analyzed from dba_tables b where owner=object_owner and table_name=object_name)
 else null 
end last_analyzed,
 to_char((select max(end_interval_time) from dba_hist_snapshot b,
                                           dba_hist_sqlstat c 
                            where c.sql_id=a.sql_id and 
                                  c.plan_hash_value=a.plan_hash_value and 
                              b.snap_id=c.snap_id),'mm/dd/yy hh24:mi') last_Executed,
to_char((select sum(elapsed_time_delta)/sum(executions_delta) from dba_hist_sqlstat d where d.sql_id=a.sql_id and d.plan_hash_value=a.plan_hash_value)/1024/1024,'999.99') avg_exec_time
 from DBA_HIST_SQL_PLAN  a
where a.SQL_ID='gbug7dg8adhgh'
 and object_type in ('INDEX','TABLE')
order by last_executed desc ,a.plan_hash_value , last_analyzed desc;

Here is an example of the output

              OBJECT_NAME                                 OBJECT_TYPE          PLAN_HASH_VALUE LAST_ANALYZED       LAST_EXECUTED  AVG_EXE
-------------------------------------------------------------- -------------------- --------------- ------------------- -------------- -------
MY_SCHEMA.SNP_CDC_SUBS                                     TABLE                     2518369181 2012-07-06 09:25:15 07/06/12 10:00  791.37
MY_SCHEMA.SNP_CDC_SUBS                                     TABLE                                2012-07-06 09:25:15
MY_SCHEMA.D$TAB_REG                                        TABLE                                2012-07-06 09:25:06
MY_SCHEMA.J$TAB_REG                                        TABLE                                2012-07-06 09:24:50
MY_SCHEMA.J$TAB_REG                                        TABLE                                2012-07-06 09:24:50
MY_SCHEMA.J$TAB_REG                                        TABLE                                2012-07-06 09:24:50
MY_SCHEMA.J$TAB_REG                                        TABLE                                2012-07-06 09:24:50
ERD.DIM_TABS_COMP_PLCY_AGMT                                TABLE                                2012-07-06 00:39:33
MY_SCHEMA.TAB_COMP_PLCY_AGMT                               TABLE                                2012-05-25 09:20:39
MY_SCHEMA.IDX_WCPA_AGMT_ID                                 INDEX                                2012-05-25 09:20:39
MY_SCHEMA.TAB_COMP_PLCY_ST_CLSF_VT                         TABLE                                2012-05-15 18:49:43
MY_SCHEMA.TAB_COMP_PLCY_ST_CLSF                            TABLE                                2012-05-15 18:49:43
MY_SCHEMA.WPTD_COMP_PER_TAX                                TABLE                                2012-05-15 18:39:30
MY_SCHEMA.TAB_REG                                          TABLE                                2012-05-15 18:31:18
MY_SCHEMA.CO_TAB                                           TABLE                                2012-05-15 18:27:50
MY_SCHEMA.TAB_PAYR                                         TABLE                                2012-05-15 18:26:47
MY_SCHEMA.AGMT_REG                                         TABLE                                2012-05-15 18:21:09



MY_SCHEMA.SNP_CDC_SUBS                                     TABLE                     1903861587 2012-07-06 09:25:15 07/06/12 09:00  882.94
MY_SCHEMA.SNP_CDC_SUBS                                     TABLE                                2012-07-06 09:25:15
MY_SCHEMA.D$TAB_REG                                        TABLE                                2012-07-06 09:25:06
MY_SCHEMA.J$TAB_REG                                        TABLE                                2012-07-06 09:24:50
MY_SCHEMA.J$TAB_REG                                        TABLE                                2012-07-06 09:24:50
MY_SCHEMA.J$TAB_REG                                        TABLE                                2012-07-06 09:24:50
MY_SCHEMA.J$TAB_REG                                        TABLE                                2012-07-06 09:24:50
ERD.DIM_TABS_COMP_PLCY_AGMT                                TABLE                                2012-07-06 00:39:33
MY_SCHEMA.TAB_COMP_PLCY_AGMT                               TABLE                                2012-05-25 09:20:39
MY_SCHEMA.IDX_WCPA_AGMT_ID                                 INDEX                                2012-05-25 09:20:39
MY_SCHEMA.TAB_COMP_PLCY_ST_CLSF_VT                         TABLE                                2012-05-15 18:49:43
MY_SCHEMA.TAB_COMP_PLCY_ST_CLSF                            TABLE                                2012-05-15 18:49:43
MY_SCHEMA.WPTD_COMP_PER_TAX                                TABLE                                2012-05-15 18:39:30
MY_SCHEMA.TAB_REG                                          TABLE                                2012-05-15 18:31:18
MY_SCHEMA.CO_TAB                                           TABLE                                2012-05-15 18:27:50
MY_SCHEMA.TAB_PAYR                                         TABLE                                2012-05-15 18:26:47
MY_SCHEMA.AGMT_REG                                         TABLE                                2012-05-15 18:21:09

↧

Exadata tips

July 15, 2012, 7:13 am

≫ Next: What extended stats do I have on my database?

≪ Previous: What happened to my sql (sql_id) ?

I wanted to write up an Exadata tip that I learned.

Background : I wanted to do a simple "select count(1) from mytable". mytable has a primary key on it. The count seemed to be taking a long time for an Exadata.

First the "select count(1) from mytable". You can see that it uses an index storage fast full scan. The top wait event is "cell multiblock physical read". The query does 2 Million Disk reads in 3 Minutes 28 seconds.

But this seems slow...




select count(1)
from
     MYTABLE MYTABLE

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2     56.46     208.27    2030378    2031797         20           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4     56.46     208.27    2030378    2031797         20           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: SYS
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=2031797 pr=2030378 pw=0 time=208276577 us)
 592848893  592848893  592848893   INDEX STORAGE FAST FULL SCAN PK_MYTABLE (cr=2031797 pr=2030378 pw=0 time=245927651 us cost=523897 size=0 card=572441788)(object id 312310)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  library cache lock                              4        0.00          0.00
  Disk file operations I/O                       42        0.00          0.00
  library cache pin                               2        0.00          0.00
  SQL*Net message to client                       2        0.00          0.00
  cell single block physical read                15        0.00          0.00
  cell list of blocks physical read               2        0.00          0.00
  cell multiblock physical read               15959        0.26        152.20
  latch: object queue header operation            1        0.00          0.00
  SQL*Net message from client                     2       10.38         10.38
********************************************************************************

Next I did a FTS.

"select /*+full(MYTABLE) */ count(1) from MYTABLE MYTABLE ;

You can see this did 2.2 Million disk reads (more than the index scan), but the wait event is sql_net. With the "cell smart table scan", there were very few waits, and the wait was much shorter.

select /*+full(MYTABLE) */ count(1)
from
     MYTABLE MYTABLE  

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2     65.11      66.48    2224642    2225028         21           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4     65.11      66.48    2224642    2225028         21           1

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: SYS
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=2225028 pr=2224642 pw=0 time=66486729 us)
 592848893  592848893  592848893   PARTITION RANGE ALL PARTITION: 1 33 (cr=2225028 pr=2224642 pw=0 time=140325566 us cost=533066 size=0 card=572441788)
 592848893  592848893  592848893    TABLE ACCESS STORAGE FULL MYTABLE PARTITION: 1 33 (cr=2225028 pr=2224642 pw=0 time=54479242 us cost=533066 size=0 card=572441788)


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       2        0.00          0.00
  Disk file operations I/O                      129        0.00          0.00
  gc current block 2-way                         65        0.00          0.00
  enq: KO - fast object checkpoint              101        0.00          0.02
  reliable message                               33        0.09          0.30
  gc current block 3-way                         13        0.00          0.00
  cell smart table scan                        1403        0.03          1.01
  gc cr block 3-way                               1        0.00          0.00
  gc current grant busy                          18        0.00          0.00
  gc cr block 2-way                              17        0.00          0.00
  gc cr multi block request                       5        0.00          0.00
  cell single block physical read                11        0.00          0.00
  cell list of blocks physical read               2        0.00          0.00
  gc cr grant 2-way                               3        0.00          0.00
  SQL*Net message from client                     2        9.62          9.62

Bottom line, if you want to a count on a table use the "FULL" hint. The exadata is built for table scans, and this example shows that.

It also should make you rethink when to use indexes for an application, you see they can hurt you in some cases.

↧

What extended stats do I have on my database?

July 31, 2012, 9:53 am

≫ Next: Exadata sizing updated for 3tb drives 1/2 rack SATA

≪ Previous: Exadata tips

I've been starting to work with Extended statistics to help the optimizer find the best plan. This is a great feature that is outlined by @sqlmaria (Maria Coogan) here.

But once you create extended statistics, how do you know what is there ? I wrote this query to find out what function based indexes, what extended statistics, and what their definition are.

Here is my script.

column table_owner alias "owner" format a15
column table_name alias  "Table Name" format  a30
column function_index alias  "F Index" format  a8
column Index_name  alias  "Index Name"  format a30
column data_default alias  "Definition"  format a50
set pagesize 1000
select table_owner,
         table_name,
        nvl2(index_name,'YES','NO') function_index,
        index_name,
        data_default
        from
        (
select owner table_owner,table_name,
(select distinct index_name from dba_ind_columns b where a.column_name=b.column_name and a.owner=b.index_owner and a.table_name=b.table_name) index_name
,data_default
-- ,     DBMS_LOB.SUBSTR( to_lob(data_default),100,1)
 from dba_tab_cols a
  where virtual_column='YES' and hidden_column='YES'  and (owner not in ('SYS','WMSYS','XDB','SYSMAN','MDSYS','EXFSYS','PR_MDS') and owner not like 'APEX_%')
  )
order by table_owner,table_name;

and this is what the output looks like..

TABLE_OWNER     TABLE_NAME                     FUNCTION INDEX_NAME                     DATA_DEFINITION
--------------- ------------------------------ -------- ------------------------------ --------------------------------------------------
BGRENN          TAB_SCHR_PERD                      NO                                  COALESCE("COL1","COL2")
BGRENN          TAB2                              YES   IDX_TAB2                       "COL1"||' '||"COL2"
BGRENN          TAB3                               NO                                  COALESCE("COL1","COL2")
BGRENN          TAB4                              YES   IDX_TAB4                       COALESCE("COL1","COL2",0)
BGRENN          TAB4                              YES   IDX_TAB4                       COALESCE("COL3",0)
BGRENN          TAB5                              YES   IDX_TAB5                       COALESCE("COL1","COL2",0)
BGRENN          TAB6                              YES   IDX_TAB6                       NVL("COL1",(-1))
BGRENN          TAB6                              YES   IDX_TAB6                       NVL("COL2",(-1))
BGRENN          TAB6                              YES   IDX_TAB6                       NVL("COL3",(-1))
BGRENN          TAB6                              YES   IDX_TAB6                       NVL("COL4",'x')
BGRENN          TAB7                              YES   IDX_COMPOSITE                  "COL1"
BGRENN          TAB7                              YES   IDX_COMPOSITE                  "COL3"

Notice the Function colunmn. This is a "YES" or "NO" depending on if this is a function based index, or just extended statistics.

This should help tell where your extended statistics are in your database.

↧

Exadata sizing updated for 3tb drives 1/2 rack SATA

September 1, 2012, 7:54 pm

≫ Next: Recognize the magic optimizer numbers

≪ Previous: What extended stats do I have on my database?

OK, Now I new Exadata coming in that has 3tb drives, and the first question asked is .. How much disk to I have to configure on it ? Well I'm going to expand on a previous entry I did on sizing .

1/2 Rack. Sata drives. normal redundancy

This means we have

7 storage cells
Each storage cell contains 12 disks
each disk is 3tb (which is about 2.794 tb usable) *** This is calculated using base 1024
The first 2 disks in each storage cell has 29.103g already partitioned for the OS (which is mirrored).
The rest of the disks in the group are used for DBFS

Given this, I am going to calculate out the total disk available then subtract out the 29.103g (for OS and DBFS).

First 12 disks * 7 cells x 2.794 = 234.696 tb of total raw storage/
Subtract out 29g* 2 disks * 7 cells = 406g ----- OS
Subtract out 29g * 10 disks * 7 cells = 2.03tb ----- DBFS
Available raw is 234.696 - 2.436 = 232.26

Now I said we were running Normal Redundancy.. This means that we loose 1/2

DBFS = 1.015tb
OS 29g
Remaining for Data and Reco = 116.13

But of course we need to account for cell being off line. This takes out 1/7 of the storage.

DBFS === .870 tb (29g * 10 * 6)/2
Everything else === ( 2.765 * 12 disks * 6 cells)/2 == 99.54

So now we have 99.54 raw storage available for Data and Reco.

This is now easy to figure out now.. You have really 100tb raw storage (with normal redundancy) to split up between Data and Reco.

Now a full rack is easy to do.

2.765 * 12 disks * 13 cells) / 2 = 215.67tb

↧

Recognize the magic optimizer numbers

September 4, 2012, 1:18 pm

≫ Next: "enq: CR - block range reuse ckpt" and the recycle bin

≪ Previous: Exadata sizing updated for 3tb drives 1/2 rack SATA

Well I figured I document some of the magic numbers that the optimizer uses to help remember them, and help others. The back ground of this is simple.

I was looking through a query that was running for a long, long time, and the cardinality looked wrong. I know the developers were using a table operation (looping over a LOB that was treated like table).

The Cardinality estimate for the step was 8168, and I thought hmmmm I've seen that before when dynamic sampling didn't happen. Well after some digging I came across this page. Cardinality

The page contained this handy chart below... These are important numbers to remember because when you see a cardinality matching this chart it is probably because the optimizer couldn't estimate the correct cardinality, and it couldn't dynamically sample. Below is a snippet from the query I was investigating. Notice the cardinality on the first line.

0  0  0   COLLECTION ITERATOR PICKLER FETCH PARSE_DYNAMIC_COLS 

                    (cr=0 pr=0 pw=0 time=0 us cost=29 size=16336 card=8168)
0  0  0 HASH JOIN RIGHT OUTER (cr=0 pr=0 pw=0 time=0 us cost=8757 size=233200 card=100)
0  0  0  VIEW  (cr=0 pr=0 pw=0 time=0 us cost=8614 size=14 card=1)
0  0  0   HASH UNIQUE (cr=0 pr=0 pw=0 time=0 us cost=8614 size=2069 card=1)
0  0  0    FILTER  (cr=0 pr=0 pw=0 time=0 us)
0  0  0      NESTED LOOPS  (cr=0 pr=0 pw=0 time=0 us)
0  0  0      NESTED LOOPS  (cr=0 pr=0 pw=0 time=0 us cost=8613 size=2069 card=1)
0  0  0       HASH JOIN  (cr=0 pr=0 pw=0 time=0 us cost=8612 size=2044 card=1)

Default cardinality for database objects

The following table demonstrates the estimated cardinalities (using a 8K blocksize) of various objects which have had no statistics generated for them :

Object Type	Estimated Cardinality
Heap Table	82
Global Temporary Table	8168
Index-Organized Table	1
System Generated Materialized View (such as the output of the TABLE operator)	8168

↧