ORACLE OFFERS SEVERAL APPROACHES TO HIGH AVAILABILITY
By Dan Norris
To remain competitive in the global economy, businesses need uninterrupted access to mission-critical data. Decision- makers must have reliable access to timely data so that they can react to changing circumstances. Because of the importance of the applications it supports, Oracle database administrators have developed several techniques to improve the availability of the database. Each technique targets a certain availability level and, accordingly, each technique has varying costs, implementation times and complexity associated with it. Logically, techniques that yield high availability tend to be more complex, take longer to implement and cost more.
Oracle has created the Maximum Availability Architecture (MAA) that provides practical advice on how some of the options discussed here can be combined. Anyone attempting to increase availability for their Oracle environment should review the Oracle MAA at www.oracle.com.
The first few options to increase database availability are basic and very common in most database environments today. First, logical backups are the most basic availability tool, as they provide some way to recover data. The primary tools for creating local backups are Oracle Export/Import and (in 10g and newer releases) Data Pump. While not compatible with one another, both meet similar goals. But they are architected completely differently. Each toolset has a simple interface, but some advanced features that allow for additional options and flexibility.
Export/Import is very easy to use and has a long history of stability and cross-platform, cross-version compatibility that supports its continued popularity. Data Pump was introduced in 10g and will likely become the new standard for logical backup. It offers a new architecture, much faster operations, and a more flexible set of options for conducting export and import activities. Logical backups are not generally considered as an availability option because they only provide the most basic recovery in the event of a failure.
Physical backups are commonly created using Oracle Recovery Manager (RMAN). Physical backups provide greater availability since data loss can be minimized during recovery by applying archived redo to the recovery. With less data loss, the system can be restored to a usable state much more quickly, and that results in higher availability. On the other hand, physical backups often take a toll on database server resources in terms of time or I/O overhead, so they are usually conducted during low-activity periods. These types of backups are commonly considered essential to any database environment and serve as the primary backups used during a failure scenario.
The next option commonly used for increasing availability is Oracle Data Guard. Data Guard is a database option that involves a primary database that sends copies of all transactional data to another database (called a standby database) for recovery. The standby database (commonly located on a server in a separate geographic location) remains in constant recovery mode while receiving and applying all transactions from the primary database. When a failure occurs on the primary database, the standby database can be activated and assume the primary role. Users or applications must be directed to connect to the standby database site, which can sometimes be problematic. Most Data Guard sites use a manual failover process which requires human intervention to activate the standby site. Data Guard is commonly used as an option for disaster recovery instead of high availability.
Another multi-database option for increased availability is replication. There are two options for database replication in current releases of Oracle Database. The older, more mature option is Advanced Replication. Oracle Database 9i Release 2 introduced a new replication technology called Oracle Streams. The two choices do not share much in terms of their architecture or methods for replicating data, but they both result in data being copied from one database to another. Oracle Streams is usually faster at replicating changes, but Advanced Replication is generally more configurable and stable.
To use replication as an availability option, a DBA would need to ensure that users or applications were configured properly to connect to the proper database. In a failure event, the connections would need to be shifted to the secondary site. To avoid replication conflicts, most replication environments used for high availbility only allow users to connect to one site at a time. Replication is not commonly used as an availability solution, but it does provide the necessary infrastructure to increase overall availability of the data.
One of the most common solutions for Oracle Database availability is a failover cluster. This solution relies on two or more systems (cluster members) that share storage and a private network called an interconnect to allow the cluster members to communicate about their health and status. One or more Oracle Databases may be supported in a single cluster, but the Oracle Database instance only runs on one node in the cluster at any time. Should that node fail, the instance and its dependencies would be automatically migrated to a healthy cluster member quickly and without any required interaction from the administrators.
Typically, a "failover" (the action of migrating to another node) happens within a few minutes. Users would simply need to reconnect in order to continue their work. Any in-process, uncommitted transactions in flight before the failure will need to be resubmitted after the instance is available. Usually, applications can be adapted to handle the common failover-related exceptions properly. Due to its quick, unattended handling of a failure, the failover cluster is commonly considered the first option providing true "high" availability for Oracle Database.
Finally, the option providing the highest availability for Oracle Database is the Oracle Real Application Clusters (RAC) database option. RAC has a configuration similar to failover clusters, but there are multiple instances accessing a single database simultaneously. Users can access the database through any of the instances in the cluster and since all instances are running all the time, failover events can be very fast, often less than three seconds. Many times, users may not even know a failover has occurred. RAC adds some complexity and cost to the availability equation, but recent releases have increased manageability significantly with new features and enhanced tools.
The most critical high-availability requirements often combine more than one of the options examined here to create an environment that provides supreme availability coupled with solid disaster recovery and multiple options for media recovery. The most basic options may not provide the highest availability, but should not be omitted in any environment due to their important role providing a recovery option when all else fails.
Dan Norris is technology services practice manager at IT Convergence, an IT consulting company specializing in Oracle DBA and Unix/Linux administration. He routinely works with clients from almost every industry. Reach him at dnorris@itconvergence.com.
< Back to DBTA Home Page |