An Oracle database can almost never get enough IOPS, or low latency, hence it is important to address these factors while drawing the design for an Oracle database. However, providing high-throughput storage in large heterogeneous data centres does impose some problems. Thus, this article looks at how we can maximize the throughput, utilizing Symantec Storage Foundation in large shared storage area networks (SAN).
In order to achieve the absolute best possible performance for an Oracle database, there are many factors that need to be addressed; some of them are: computing power (such as speed of processors), high bandwidth memory, the network bandwidth and the storage. This article will focus on the storage.
When drawing the design for an Oracle database, it is important to understand the type of load the storage must be able to meet. There are many types of load, and no application has the same load pattern, but we can at least differentiate two typical loads: online transaction processing (OLTP) and decision support systems (DSS).
Typically OLTP system use fast indexes to achieve fast results for the many concurrent SQL queries. Thus, the primary goal is to use the fastest possible storage for indexes. It is important to understand how many indexes and their sizes to accommodate for in the storage.
DSS systems typically handle large quantities of data and the SQL queries are often dynamic and not known in advance. Thus the reliance on indexes is less important as many SQL queries will end up doing sequential table scans. Thus, the DSS system storage is typically configured to meet long running scans.
So we effectively have two completely different requirements; OLTP needs small quantity of very fast storage, for the indexes with slower storage for the actual data, whereas DSS typically only need large quantity of slower performing storage.
However in both cases, using Symantec Storage Foundation will increase the aggregated throughput by using optimized I/O policies over multiple paths (VxDMP), extent based file system (VxFS), support for Oracle Disk Manager (ODM) and caching (SmartIO and memory cache).
Another factor which has significant impact on the general performance is Oracle’s requirement of atomicity, consistency, isolation, and durability (ACID). This means that each commit transaction must be acknowledged through the whole chain, from DB write, ODM, file system, HBA, disk storage system. ACID does impose a huge performance impact in many cases as cache cannot be used, but primarily on write operations.
Storage management consist of many components, such as disk storage systems, SAN infrastructure, OS disk devices used for either raw or formatted access, or file systems.
Today a typical enterprise disk storage system consists of multiple disks with different performance. We usually categorize them into tiers:
|Tier 0||Fastest possible disk available. This could be SSD disks in shared storage systems, or local SSD disks in the host systems. This tier can also be flash memory cards connected directly to the I/O bus on the host systems|
|Tier 1||Fastest spinning disks available. Today it would typically be either SAS 15K or FC 15K disks.|
|Tier 2||Fast spinning disks; typically SAS 10K or FC 10K diskc.|
|Tier 3||Nearline disks, typically SATA 7.2K disks. Used primary for usage of large data quantities, such as disk-based backup data, or archived data.|
Table 1 – Tier levels
Many disk storage system use tier 2 disks instead of tier 1 disks as they combine them with global cache in the storage controllers. However, the latency will be higher than tier 1 disks, as not all data can be cached. But in many cases the lower price for tier 2 wins over using tier 1.
The combination of tier 0 and tier 1 or 2 is winning terrain, as the aggregated throughput will come to an acceptable level by carefully crafting your provisioned storage type to the Oracle databases. This is where Symantec Storage Foundation really shows its benefits, as it simplifies the storage management.
For automatic tiering on the host, Symantec Storage Foundation provides two technologies: SmartTier and SmartIO.
SmartTier is a VxFS feature that enables you to allocate file storage space from different storage tiers. It leverages two technologies: multi-volume file systems and automatic policy-based placement of files. A multi-volume file system occupies two or more virtual storage volumes of different tiers and thereby enables a single file system to span across multiple, possibly heterogeneous, physical storage devices.
SmartIO is a new feature which enables efficient use of SSD or PCIe Flash cards through data caching. SmartIO improves the I/O performance by reducing the number of I/O operations to the back-end storage. It supports read and write caching for VxFS file systems. SmartIO also has database I/O load template framework which can be used to further improve the performance of an Oracle database.
Another important factor is the type of RAID being used to protect the data. RAID is used to provide redundancy if a disk should fail. There are many types of RAID levels and RAID combinations. Below are the RAID levels relevant for Oracle listed:
|RAID0||Striped block layout with no redundancy. The data blocks are distributed over multiple disks in chunks. RAID0 is used to achieve better performance as more disk spindles are in use.|
|RAID1||Mirrored block layout. All data blocks are mirrored to 1 or more disks for redundancy. RAID1 adds some overhead for write operations, as at least two copies of the block is written. Read operations can however perform better, as some storage system can load-balance the read between the disks in the mirror.|
|RAID0+1||Combination of striping and mirroring. Write operations are first striped over multiple disks, followed by being mirrored to the other set of striped disks. More difficult re-construction if a disk fails|
|RAID1+0||Combination of mirroring and striping. Write operations are first mirrored, followed striping. RAID1+0 is easier to reconstruct, and is usually the preferred combination, as it gives good redundancy, better performance, and simpler to rebuild.|
|RAID5||Striping on block level with distributed parity block. Provides redundancy for single drive failure, and good performance for read operations, but with higher latency.|
|RAID50||RAID0 striped over two or more RAID5 groups. Used to combine multiple RAID5 groups into large logical volumes.|
|RAID6||Striping on block level with distributed two parity blocks. Provides redundancy for dual drive failure. Slower read performance than RAID5.|
|RAID60||RAID0 striped over two or more RAID6 groups. Used to combine multiple RAID6 groups into large logical volumes.|
Table 2 – RAID levels
Today, most storage vendors recommend the use of RAID60, as very large pools of RAID6 groups can be combined, thus enabling provisioning of very large logical volumes to the hosts. Many enterprise storage vendors also support automatic storage tier pools. Meaning, that data being accessed the most will be moved up from slower tiers to the fastest available.
Once the logical disks have been provisioned to the host, they show up as disk devices in the OS layer. If multi-pathing is being used the same disks will show up multiple times, leading to more complex management.
To handle this, Symantec Storage Foundation provide dynamic multi-pathing option (VxDMP), which has support for many disk storage systems currently available on the market. By providing support for specific systems, VxDMP optimizes the I/O access in accordance to the storage vendor recommendations.
Once the disks have been claimed by VxDMP, the logical DMP disk devices can be put into disk groups in Veritas Volume Manager (VxVM). If the disk systems does not provide RAID protection, VxVM can be used to provide host-based RAID 0, 1, and 5. However, most storage systems are configured with RAID, so the most common use is to concatenate multiple disks (RAID0 concat), in order to build larger logical volumes. With VxVM, file systems can be extended on the fly, by just adding a disk to the VxVM disk group. As VxVM supports this operation, downtime of databases can be avoided.
Oracle Automatic Storage Management
Automatic Storage Management (ASM) is Oracle’s implementation of volume manager and file system. ASM supports both single-instance databases as well as Oracle Real Application Clusters (RAC).
ASM can only be used with raw devices, and therefore removes the need for 3rd party file systems as ASM provides the internal Oracle file system distributed across all the disks available in the ASM disk groups.
Symantec Storage Foundation fully supports ASM, as ASM can use VxVM managed volumes, and thus the Oracle databases can achieve better performance as VxVM and VxDMP optimizes the throughput.
Although Oracle promotes the use of ASM, there are still OS related issues, such as disk track alignment, and the limited flexibility once put into production. No two disk systems has the same track alignment offset, and manually trying to find the offset tend to be difficult. With VxVM, the track alignment is automatically managed, as VxVM use its database of all supported disk systems, so select the appropriate alignment.
VxFS file system
Symantec Storage Foundation provides an extent-based file system (VxFS) which is highly optimized for many types of I/O load.
With its integration into Oracle Disk Manager, Oracle I/O throughput can be greatly improved. VxFS also provide caching which also can be combined with ODM.
The benefits of using Oracle disk Manager are many; asynchronous I/O, efficient file system layout by building contiguous files, no double buffering, direct I/O support, to mention a few. By using ODM and VxFS, the ACID compliance can be assured, even though the I/O as such is asynchronous, which leads to much better performance.
Putting it all together…
So with that bit of technical background, we are ready to look at the best possible designs for Oracle databases.
As we know, OLTP systems require very fast access to the indexes, where as the actual data can reside on slower tiers. So in this case we would like to use tier 0 disks for indexes, tier 1 for control files, redo logs, and rollback/undo data files. General recommendation is to use the fastest RAID level, and here a RAID1+0 most likely the best fit.
Local SSD disks or Flash cards, if they are used only as read cache using SmartIO does not necessarily require RAID protection, as we do have the backend data on disks which will be used if the local cache fails. However, if we use SSD disks without using SmartIO, it is highly recommended to build RAID1+0.
So far it is has been a somewhat simple decision. But when looking at data files, temporary files, and archive redo log files, it gets more complex. If we use SmartTier or the disk storage systems own automatic storage tiering, the block locations will be automatically managed But if we want to control it more statically, we need to choose the tiering level, but still keeping in mind the RAID levels.
|Type of data||Tier/RAID level|
|Control files, redo logs, rollback/undo logs||Tier 1 with RAID1+0|
|Indexes||Tier 0 with RAID0|
|Data files (active partitions)||Tier 1 with RAID1+0|
|Temporary files||Tier 1 with RAID1+0|
|Data files (readonly/archived partitions)||Tier 2-3 with RAID6|
Table 3 – OLTP preferred tier and RAID levels
If we can control which RAID level is being used in the shared disk storage system, we can use disks from RAID1+0 disk pools. But more typically this is not the case. The host will be one of many hosts, and the storage system is typically using RAID50, or RAID60. In the latter case, we can typically only choose tiering level. So if possible tier 1 is preferred. This gives us low latency, although we have worse performance than RAID1+0.
Depending on whether the OLTP database has a lot of static data, and not being accessed very often, then tier 2, or even tier 3 can become relevant in the design as well, as the lower throughput, and higher latency most likely can be accepted.
As mentioned previously, DSS typically does not benefit as much by using fast indexes, as an OLTP database would. We also expect long sequential table scans. Although we can generally use slower tier levels, we should use tier 1 for control files, redo logs, indexes, and rollback/undo data files. A general recommendation is to use the fastest RAID level, and here a RAID1+0 most likely the best. The indexes do not have to be on tier 0 storage.
|Type of data||Tier/RAID level|
|Control files, redo logs, rollback/undo logs||Tier 1 with RAID1+0|
|Indexes||Tier 1 with RAID1+0|
|Data files (active partitions)||Tier 1 with RAID6|
|Temporary files||Tier 1 with RAID6|
|Data files (read-only partitions)||Tier 2 with RAID6|
|Data files (archived partitions)||Tier 3 with RAID6|
Table 4 – DSS preferred tier and RAID levels
As DSS databases typically also are much larger in size, we need to be more relaxed on the tier and RAID levels. Rule of thumb, active partitions on tier 1 with RAID6, and read-only or archived partitions on tier 2 and 3 respectively, using RAID6.
ASM or VxFS
Choosing ASM over VxFS is typically more a policy decision, rather than performance decision. Oracle recommends the use of ASM of course, and it can be managed with Oracle tools and Oracle Enterprise Manager. VxFS fully supports ODM and provides much better throughput, as shown below. Symantec Storage Foundation also supports the use of Oracle Enterprise Manager for visibility and reporting on storage utilization.
The perfect design
So our best design is thus to use a combination of Oracle Database and Symantec Storage Foundation, where ODM uses the VxFS plugin to optimize the throughput, with VxVM for disk management and VxDMP for multipathing with the optional use of SmartIO.
These components tied together with the preferred storage tier levels and RAID protection will leverage the throughput and lower the latency, which leads to much faster transaction handling in the Oracle database instance.
So does Storage Foundation actually make a difference for your Oracle database? The short answer is yes! From a comparison of the various alternatives, we can clearly see that Storage Foundation does add value as the general performance is several times better than with Oracle ASM and native OS I/O kernel modules.
The performance tests have been conducted on an Intel-based HP DL380G8 server at 3GHz, 512GB memory, running Redhat Enterprise Linux 6.3 and Oracle 11gR2. The underlying storage is provisioned as logical LUNs from a tier 2 logical disks distributed on many RAID6 groups. Thus these tests are for a DSS database, and not a typical OLTP, although in many cases it would also be using RAID6. The local flash cache is a PCIe XtremSF card from EMC.
There are four types of tests:
- Oracle ASM with native Linux multipathd
- Oracle ASM with VxDMP
- Oracle ODM with VxFS (implicit VxDMP)
- Oracle ODM with VxFS and SmartIO (implicit VxDMP)
The four types of tests are using two types of I/O page size; 8K (normal Oracle page size), and 1M (big pages). All tests are read-only I/O operations.
In order to simulate Oracle ASM load, the ORION test tool from Oracle has been used. For simulating Oracle type of load on file systems, the FIO tool has been used. The support for ODM in VxFS has been enabled.
Looking at the 8K results, we can already see the benefits of Storage Foundation; from ~27MB/s using Oracle ASM with Linux multipathing, we get almost three times better throughput with ASM/VxDMP. If we then look at using VxFS, we see a boost up to 364-392MB/s.
The 1M results are more evenly distributed, except when using SmartIO. With SmartIO, we get more than three times better throughput as all the blocks are read from the local flash PCIe card.
Latency is very important as Oracle requires ACID committed operations, and the read/write operations must have been committed through the whole chain before next operation is allowed. So obviously we look at the lowest possible latencies. With many disk spindles, the latency is seldom a problem though.
Just as with the throughput, we can see some gains in using Storage Foundation. However, as the latencies measured are so low, the difference is quite marginal, the exception being the I/O from the PCIe flash. But even a small difference as 250µs would most likely give a better overall throughput.
Although we have only looked at how to accelerate an Oracle database using disk tuning, we can clearly see a major performance boost by using Storage Foundation with VxVM, VxFS, VxDMP, and SmartIO. The disk used in the tests, are provisioned from a very large shared disk storage system in the enterprise class, utilizing hundreds of disks to build the disk pools, and servicing I/O to several hundred hosts in a heterogeneous data centre.
By using Storage Foundation, we significantly improve throughput and latency figures. The main reason is that Storage Foundation is written to provide the best possible I/O performance, with multipathing I/O policies and modules for many storage vendors, leading to optimized performance. Storage Foundation also provides integration into Oracle Disk Manager for improved performance, as well as Oracle Enterprise Manager for visibility and reporting on storage utilization.
Furthermore, by using the SmartIO feature with local SDD or PCIe flash, we can minimize the number of I/O operations having to travel all the way to the shared storage systems, leading to faster throughput and lower latencies, and less reliance on the shared storage system.