To be always up and running is the goal of all IT services. Whether it is enterprise applications, IT applications, cloud services, or supporting data center services, business resilience depends on how quickly mission-critical services are restored once there is a disruption. It could be due to a single server failure affecting one application or network downtime affecting a set of systems in a location, or an entire site outage due to a major crisis. Resiliency is a crucial element of an organization’s business continuity (BC) and disaster recovery (DR) plan that helps formulate preparedness, protection, response and recovery objectives to quickly and effectively get the disrupted services operational with minimal data loss and user impact.
Data storage plays a pivotal role in ensuring application availability and performance. When there is storage-related downtime, it has a detrimental impact on application access and, in turn, business continuity. Let’s look at the importance and applicability of three principal metrics which, in the storage world, matter just as much as – if not more than – anywhere else in IT. These are:
- Recovery point objective (RPO)
- Recovery time objective (RTO)
- Recovery time actual (RTA)
The adage ‘less is more’ cannot be more apt than for this trifecta. The shorter the value (measured by the unit of time) of these metrics, the greater the efficacy in responding to storage failures and resuming business services. The ideal case, however, would be to keep all these metrics at zero. Getting them closer to zero one or as low as possible is the goal for the IT team. To achieve shorter recovery times, having the right set of data backup and recovery practices in place is instrumental.
RPO: Recovery Point Objective
Consider a site-level incident where the entire data storage is down, affecting many applications. Here, RPO can be understood as the time period of data loss that the applications suffer dating back from the time of the incident to the when the last known good status of data is available for recovery. This can be understood like it is a service level objective or a measure of loss tolerance. What period of time is realistically acceptable by the organization to suffer data loss when the storage failure affects data access? So, if RPO is defined as 12 hours in the business continuity plan and the last known available data backup before the outage is from 9 hours ago, then the RPO threshold has not been violated.
RTO: Recovery Time Objective
RTO is also another service level objective which is used to set the target expectation for the IT team to get the service operational again. RTO denotes the period of time the organization defines as the service level to restore the affected service since the event of disruption (in our case due to a storage issue). For example, the RTO for a high availability scenario can be set as 5 minutes for a small incident like a disk failure, which necessitates a mirror copy to be made active. In the case of a disaster recovery scenario, where the primary site and DR site are separated by a long distance, TBs of data backup needs to made available at the DR site (typically through remote replication), many connections have to be reconfigured and services restarted means RTO can be many hours or even days.
RTA: Recovery Time Actual
RTA refers to the actual time period elapsed to complete the data recovery and make the storage copy available for application access. While RTO is the estimated value set as a target, RTA is the actual time measured against it. For good data governance and compliance, RTA achieved must be lesser than the RTO set in the BC/DR plan. In some cases, IT teams simulate a DR-like scenario in a test environment (parallel to and independent of production) and examine the effectiveness of their backup and recovery tool by measuring RTA. If there is a significant time gap between estimated RTO and actual RTA, you may need to revisit your failover strategy to ensure that the switch from source to target happens faster.
Download this white paper to dive deeper into this storage trifecta that impacts business resiliency and understand how to plan your recovery objectives. Learn about the three lines of defense that DataCore SANsymphony software-defined storage provides to help achieve your business continuity and disaster recovery goals and improve RPO and RTO.