Home Legal


About Us
Clients
News
Solutions
Links
Contact Us

Y2K Contingency Planning

 

  1. Overview
  2. How Year2000 Compares to Other Planning Events
    1. Year2000 is scheduled, Y2K failures are not.
    2. What will actually happen is a complete unknown.
      1. Potential complexity can be overwhelming
      2. Fundamentally, Year2000 is a software problem,
      3. Failures will occur across the end of the century.
      4. Year2000 may be incorrectly cited as the cause of a failure.
    3. Y2K - many small, isolated failures spread over an extended period rather than one big failure or event.
    4. Impact of Y2K will be distributed rather than focused at one location or area.
    5. Remediation of internal systems does not obviate the risk of Y2K problems.
    6. Problems could occur internally or with utilities, communications services or business partners.
    7. Year2000 problems will require different solutions than just switching to a duplicate system.
    8. Year2000 is business issue rather than a technology/hardware issue
      1. Loss of public confidence about Year2000 issues may engender client actions that cause problems of their own.
        1. Call Center volumes could escalate.
        2. Unusual or excessive business activity could occur.
        3. There could be excessive or unusual requests for account statements.
        4. Automated information systems could experience excessive traffic.
        5. Banks could experience unusual cash demands.
      2. Inability of a business partner to process may have impact.
  3. Contingency Planning as a Business Imperative
    1. Consumers (and most businesses) normally assume that everything works.
    2. Failure is inherent in everything -- materials fatigue, people make errors, programs have bugs.
    3. The past is not necessarily prologue.
    4. New is not necessarily better.
    5. Due to the high cost of failure, some industries consider contingency planning a normal part of business.
  4. A General Approach to Contingency Planning
    1. An essential precursor -- understand how the business works.
      1. What are the core business processes?
      2. What is the chain of relationships between suppliers, process and consumers which enable the process to produce value?
      3. What are the flows of information, material and control within those processes?
      4. Know what is important -- both internally and externally.
    2. Starting Assumption - Focus on Problem Sources or Solutions?
    3. A simple scenario planning methodology
      1. Establish a contingency planning working group
      2. Enumerate key sources and services
        1. Utilities
          1. Electricity
          2. Water
          3. Natural Gas
          4. Sewer Services
          5. Trash Removal
        2. Services
          1. Street access
          2. Vehicle Fuel
          3. Public Transportation
          4. Local Telephone Service
          5. Long Distance Telephone Service
          6. Internet Access
            1. Local Access
            2. Web Servers
            3. Email Servers
            4. Internet Name Service (DNS)
            5. Wide area routing
          7. Mail Delivery
          8. Package Delivery
        3. Business Services
          1. Market Data
          2. Market Access for Trading
          3. Clearing and Settlement Services
          4. Trading partner EDI access
        4. Internal Services
          1. Telephone PBX
          2. Voice Mail
          3. EMail Services
          4. File Servers
          5. Application Hosts
          6. Database Servers
          7. Internal Networks
      3. Document business processes and services.
      4. Develop failure scenarios
      5. Some possible ways that a service could be affected:
        1. Unavailable
        2. Available intermittently
        3. Available but wrong
      6. Pruning Strategy - Selective Focus
      7. Develop A Business Response Portfolio
        1. Probable Event Horizon
        2. Scenario detection approach
        3. Response Validity Timeframes
        4. Fallback strategy if limits exceeded
        5. Plan for return to normalcy
        6. Test Strategy
        7. Do not ignore routine failures
      8. Identify and implement impact mitigation responses
      9. Develop Situation Management Strategies
        1. War Room
        2. Situation Management Team - Standing
        3. Situation Management Team - Event-Driven
        4. Business Support and Escalation Processes
        5. Issues to Consider
          1. Event Horizon
          2. Event Duration
          3. Multiplicity of Events
            1. Track Everything
            2. Assign Priorities or Relative Impacts
            3. Manage the Situation
            4. Afterwards, Review What Was Learned
          4. Impact to Communications and Dispatch Mechanisms
          5. Technical vs Business Problem Management
      10. And Hope for the Best

 


 

  1. Overview
    With the impending turn of the century and the discovery that some previously desirable technology shortcuts had undesirable side-effects, there has been considerable interest in both remediation and contingency planning. Remediation for Year2000 problems, the systematic examination of internal systems and correction of inherent date problems, should be just about complete. Development of business plans to address the greater than usual uncertainties over the next few months should be underway. These business plans, or contingency plans, detail at some level the actions to be taken should an abnormal event scenario begin to unfold. While the focus will likely be scenarios around the 'year2000' problem, it should be noted that contingency planning is in reality a normal part of the management process. As business processes become more interconnected as a result of e-commerce, risk management and the establishment of contingency plans will become widespread.
  2. How Year2000 Compares to Other Planning Events
    1. Year2000 is scheduled, Y2K failures are not.
      Unlike the calendar transition from 1999 to 2000, the event horizon for Year2000 failures is not a single, focused point but a broad band that stretches back many years and can be expected to extend well into the future. A Cap Gemini America executive survey reported by CNN August 28, 199 ("Early Year 2000 glitches provide sneak preview") reported that 40% of firms surveyed had already encountered Year2000 failures. Date-related problems which now would be classified as Y2K have been encountered for many years. For example - a number of Canadian brokerage firms encountered the as yet unnamed Y2K problem when a widely used bond calculator failed to evaluate 10 year bonds back in 1990.
      The difficulty lies in the many different ways that a date problem could impact the function of a systems. It would be very convenient if the system just shut down. But most likely the application would compute an invalid value causing incorrect decisions to be recommended or (worse) pass bad data on to the next user. Total system failure is usually fairly obvious and easy to detect. Data corruption problems may not be easy to detect -- particularly with the widespread attitude that if an application executed then what it did must be correct (it was tested once, wasn't it?).
      The event horizon for a Y2K problem in an individual application or system will be specific to how the date is being used. If the date procedure is calculating future events such as when a product lot expires or the number of days interest paid before maturity then the failures should have occurred in the past. If the date procedure is evaluating past events such as the age of an outstanding invoice or the expiration of an interest-free grace period then the failures should occur in the future, sometime after the calendar transition. There is even the chance that a calculation will only fail for a while -- and heal itself when both dates it is working with are from the same century.
    2. What will actually happen is a complete unknown.
      1. Potential complexity can be overwhelming
        A detailed examination of the webs of dependencies between and within businesses can be overwhelming. It is suggested that initial analysis and planning be performed at a high level and selectively expanded where appropriate. Bottom-up analysis from a technologists perspective is likely to be less productive than top-down from the perspective of the senior management team.
      2. Fundamentally, Year2000 is a software problem,
        so limited precedence can be derived from software industry experience creating and maintaining programs (see Caper Jones article "Probability of Year2000 Damages" on the Year2000 archive at www.year2000.com/archive/proby2k.html). Analysis of software defects and the efficacy of their removal suggests that problems can occur even with 'remediated' systems.
      3. Failures will occur across the end of the century.
        The year2000 transition has not in any way suspended normal, routine failures. Furnaces will still go out, power will fail in some localities, computers will disgorge garbage -- hardware will continue to break and software will continue to encounter 'will never happen' problems. And none of these failures will have anything to do with the Y2K bug.
      4. Year2000 may be incorrectly cited as the cause of a failure.
        There have been a number of widely reported incidents described as Year2000 failures which in fact had completely unrelated causes (see CNN Aug. 16 article "Londoners suffer Y2K power outage" and the later "Eclectic Electrical Supply" analysis by Peter de Jager on www.year2000.com). This syndrome can be expected to become more common as the millennium approaches.
    3. Y2K - many small, isolated failures spread over an extended period rather than one big failure or event.
      TV has suggested that multiple, small failures could cascade into a major event (as demonstrated by historical major blackouts), the electric utilities are pretty terrified of this and have made plans to isolate areas just in case [or should have].
    4. Impact of Y2K will be distributed rather than focused at one location or area.
    5. Remediation of internal systems does not obviate the risk of Y2K problems.
      The remediation process only addresses the systems under local control -- suppliers and other business partners may not have been affected.
    6. Problems could occur internally or with utilities, communications services or business partners.
    7. Year2000 problems will require different solutions than just switching to a duplicate system.
      Failing over to a backup system would not work if the problem were in the program logic or data feeds. Contingency service providers have been worried about firms declaring disasters due to Y2K problems and then tying up valuable recovery centre resources duplicating those problems in the backup systems.
    8. Year2000 is business issue rather than a technology/hardware issue
      1. Loss of public confidence about Year2000 issues may engender client actions that cause problems of their own.
        1. Call Center volumes could escalate.
        2. Unusual or excessive business activity could occur.
        3. There could be excessive or unusual requests for account statements.
        4. Automated information systems could experience excessive traffic.
        5. Banks could experience unusual cash demands.
      2. Inability of a business partner to process may have impact.
  3. Contingency Planning as a Business Imperative
    1. Consumers (and most businesses) normally assume that everything works.
      It is more comfortable to assume that things just work the way they are supposed to. This assumption needs to be tested regularly for self-preservation. How often becomes an issue of survival.
    2. Failure is inherent in everything -- materials fatigue, people make errors, programs have bugs.
    3. The past is not necessarily prologue.
      If nothing has changed a problem-free past is predictive only to the extent that the environment is stable and predictable. This is rarely true.
    4. New is not necessarily better.
      The cynic would say that nothing new ever works. What he really means to say is that new things have rarely been tested sufficiently to weed out either design flaws or implementation flaws -- components are subject to infant mortality, programs may have bugs in significant places, human processes can be shaky. From a statistical point of view the likelihood of failure is often thought of as a 'bathtub' curve - high at the start, falling to a low value through most of the products life, rising at the end as things wear out.
    5. Due to the high cost of failure, some industries consider contingency planning a normal part of business.
      Risk management and contingency planning have traditionally been the province of military and government planners. But as businesses become more connected with each other and their ultimate customers they become increasingly exposed to internal and external failures. Risk management will need to be considered around every controlled change and contingency plans developed to minimize the impact of these inevitable disruptions.
  4. A General Approach to Contingency Planning
    1. An essential precursor -- understand how the business works.
      The Value-Chain metaphor can be useful when looking at the business as a whole and its constituent processes. What this suggests is that business value is created not just by the processes within a business but through transportation and communications links that tie those processes to both suppliers and consumers.
      1. What are the core business processes?
        Know what business functions and processes are most important to the health and profitability of the firm. Watch out for low profile activity that provides essential input into more critical functions -- if withholding this contribution compromises the viability of more 'critical' functions then it should be considered critical as well. Understanding how long a particular function can be shutdown is helpful in establishing its criticality. Be aware, though, that processes which are executed at intervals as part of the business cycle should not be discounted -- yearend close is critical when it it time to close. For this reason, some processes may be tied to the business calendar to determine their criticality.
      2. What is the chain of relationships between suppliers, process and consumers which enable the process to produce value?
      3. What are the flows of information, material and control within those processes?
        All business processes consume resources of some type without which their function would be impossible. These inputs are transformed (or perhaps just redirected) by the process and the results propagated to subsequent consumers. Decision-making information is required from a variety of sources internal and external to manage and direct the process. Knowing these items is essential in being able to estimate the impact of Y2K degradation or failures.
      4. Know what is important -- both internally and externally.
        Being connected is a reality of the global business environment. Problems experienced by business partners can easily affect other businesses -- particularly if they are linked by electronic information exchanges or other tightly coupled services. But not everything will be affected uniformly -- internal processes could fail but have minimal impact if their output could be defered. Similarly, not all computer clocks and calendars are important -- consider how much it would really affect things if the calendar on a computer were set back to a pre-2000 date to avoid Y2K problems.
    2. Starting Assumption - Focus on Problem Sources or Solutions?
      Year2000 is distinguished from most business problems in that the uncertainty around what could go wrong is exceptionally high -- and the time frame in which business contingency plans are needed is very close and fixed. It will be helpful to decide up front if the planning approach seeks to identify what could fail or if it is assumed that key things will fail and plan accordingly.
    3. A simple scenario planning methodology
      What is outlined here is a simple approach to developing contingency plans. More detailed information can be found by consulting the web sites listed in 'Directory of Related Links'. The MITRE Contingency Planning site is a personal favorite. The SIA Year2000 contingency planning working committee reports are also helpful -- particularly for financial service firms.
      1. Establish a contingency planning working group
        A cross-functional working group should be established to review failure scenarios and develop contingency plans. This group should have representatives from each major business area including technology. It cannot be stressed enough that contingency planning is a business management exercise, not a technical function. Contingency plans encapsulate decisions about the strategy and tactics for a business unit to cope with unexpected changes that impact its ability to function. The technologists can develop possible failure scenarios but evaluating potential impacts quickly is likely only from the management team. A cross-functional group is recommended because of the potential for indirect effects across the organization.
      2. Enumerate key sources and services
        This list contains some suggestions that could be considered. It may be impossible to anticipate what could affect specific services -- but perhaps less difficult to predict that some services could be affected. A list such as this would be developed for rounds of 'what-if' scenario planning.
        1. Utilities
          1. Electricity
          2. Water
          3. Natural Gas
          4. Sewer Services
          5. Trash Removal
        2. Services
          1. Street access
          2. Vehicle Fuel
          3. Public Transportation
          4. Local Telephone Service
          5. Long Distance Telephone Service
          6. Internet Access
            1. Local Access
            2. Web Servers
            3. Email Servers
            4. Internet Name Service (DNS)
            5. Wide area routing
          7. Mail Delivery
          8. Package Delivery
        3. Business Services
          1. Market Data
          2. Market Access for Trading
          3. Clearing and Settlement Services
          4. Trading partner EDI access
        4. Internal Services
          1. Telephone PBX
          2. Voice Mail
          3. EMail Services
          4. File Servers
          5. Application Hosts
          6. Database Servers
          7. Internal Networks
      3. Document business processes and services.
        Write down what could be affected -- a spreadsheet arraying business functions against input services is a useful planning tool.
      4. Develop failure scenarios
        by examining the inputs and processes at a high level and imagining what could go wrong (or better, assume that an essential process stops or produces wrong information). Then examine the business consequences for that action. This is a business exercise looking at the overall function of business units, not a technical exercise looking only at computer systems. Assign a likelihood of occurrence in simple terms (like low, medium, high) and a severity of impact (like low, medium or severe). Determining economic impact is useful, particularly when discussing scenarios with senior management, but is not always easy to evaluate.
      5. Some possible ways that a service could be affected:
        1. Unavailable
          This is the most widely anticipated kind of Y2K interruption.
        2. Available intermittently
          Intermittent availability is normal for many services even without Y2K factors. But it can be problematic if unexpected (as with electric power going off and on multiple times over a short period).
        3. Available but wrong
          A service that continues to work but contains incorrect information is perhaps the most insidious kind of Y2K problem. There is nothing unique about bad data -- particularly in that very few firms validate inputs at the source where errors can be contained. Instead, bad data is normally allowed to propagate deep into other systems to fester until discovered later. Data quality is suggested to be a very likely Y2K problem.
      6. Pruning Strategy - Selective Focus
        It will be helpful to categorize the evaluated scenarios and focus on developing contingencies in a predetermined order -- most severe impact to least severe, for example. Low impact scenarios can be excluded initially. It is suggested that scenarios that are considered to be unlikely but severe should not be excluded -- although ordering severe scenarios by likelihood of occurrence may be valid. It is important to keep in mind that what will happen is unknown, so guessing that a particular scenario has a low probability of occurrence could be wishful thinking.
      7. Develop A Business Response Portfolio
        For each scenario a business strategy should be developed. Non-technical alternatives should be explored where possible -- manual processing, selective shutdowns, alternate processing approaches. A number of factors should be considered for each business strategy:
        1. Probable Event Horizon
          It is helpful to identify when certain problems are likely to occur -- some may be very specific. With Y2K problems this is expected to be around certain dates - Jan 1, 2000, February 29, 2000, etc.
        2. Scenario detection approach
          The contingency plan should contain some ideas around how problems would be detected that would suggest a particular scenario was unfolding -- power outages should be easy, data quality problems may not be.
        3. Response Validity Timeframes
          Contingency business strategies are usually workable for limited periods. It is useful to identify those limits during the planning process.
        4. Fallback strategy if limits exceeded
          A second tier contingency response should be considered. This alternative would be executed if the time limits ascertained for the primary contingency response were exceeded. This could include decisions to shutdown certain services, reroute orders to an alternate location, etc. -- these choices will be unique to the individual business.
        5. Plan for return to normalcy
          The individual contingency plan should consider the return to normal processing. For example, if the contingent response were to accept orders using paper documents rather than an electronic order system, consideration should be given to whether to transcribe the paper orders to the electronic sales history files at the end. And how to coordinate order numbers and update inventories to rebalance the system.
        6. Test Strategy
          Key to validating a potential business response is testing whether it addresses the problem in an acceptable and cost-effective manner. For any given situation there are many potential solutions -- only some of which will work and of those only some which are affordable. At the very minimum it is helpful to conduct a walkthrough exercise to simulate the impact of the alternative. Participants should represent all involved areas -- not just the immediate affected function. The walkthrough should attempt to identify areas of conflict or capacity -- what works in one area may have undesirable impacts elsewhere.
        7. Do not ignore routine failures
          The intrinsic reliability of an application or service should not be ignored in developing contingency plans. Even though Y2K may not have an effect, failures could still occur and impact the business.
      8. Identify and implement impact mitigation responses
        Scenario analysis may identify vulnerabilities that can be offset by preemptive changes. Customer service PCs in a Call Centre might be vulnerable to power interruptions even though the main servers were on UPS/APS -- moving some or all to protected power would mitigate their vulnerability.
      9. Develop Situation Management Strategies
        The appropriate situation management strategy should be considered as part of the contingency planning process. Notification processes to decision makers and necessary decision response times should be considered in formulating the situation management approach. Unreliable notification mechanisms might dictate that decision makers be kept close at hand during probable trouble periods. Relaxed response time requirements could suggest that a negative response approach be used to for coordination.
        1. War Room
          A situation room for managing problems should be prepared prior to the critical periods. Key reference materials - maps, system documentation, diagrams, staff and vendor lists, etc. should be collected. White boards are useful for displaying status messages and notices in a clear but easily changeable manner. Access to the room in the event of problems should be considered - a war room on the top floor of a high rise might not be useful in the event of a power failure.
        2. Situation Management Team - Standing
          Many larger organizations are adopting the approach of a standing situation management team located in a war room with rotating teams on duty across the entire perceived vulnerability period. One challenge of a standing team is maintaining vigilance -- particularly if the war room team is a key part of the situation detection process. This approach is well-suited to environments that cover large geographical areas or have particularly critical response requirements.
        3. Situation Management Team - Event-Driven
          An alternate approach is to constitute the situation management team in the event problems are detected. One issue that should be considered is how event detection will occur and through what mechanisms will notification be delivered. Pager dispatch may not work if the system at fault is the local phone network. One alternative to consider is negative notification - distributing an 'all is well' heartbeat during the critical periods. Failure to receive the heartbeat would be considered a problem notification.
        4. Business Support and Escalation Processes
          The criteria and decision path to escalate problems within the organization and to external vendors should be considered in advance. This needs to be documented (see 'Developing a Support Strategy'). Computerized problem management systems are very helpful to encapsulate this kind of information but should not be relied upon exclusively. After all, the problem management system could fail as well. External vendors should be contacted to ascertain their resource availability plans during critical periods. Staff should be consulted to ensure that key people are not on holiday during critical periods. Some firms are requiring that staff be on site or locally available.
        5. Issues to Consider
          1. Event Horizon
            Be clear on when problems are expected to occur -- these will be unique to the systems they are embedded in and how those systems are used. Remember that many Y2K problems will not manifest themselves precisely at 12:01am, Jan 1, 2000. Some areas in financial services were affected in 1990, for example. Some firmware date problems will only be seen when the machine is restarted,
          2. Event Duration
            Think through how long a problem could be extant before it affected the business; and how long the contingency strategy could be followed before alternate approaches were needed. And remember, people need rest to be effective -- plan for staff rotation.
          3. Multiplicity of Events
            Do not be surprised if there are multiple events occurring. Anticipating and managing the cross-impact of multiple events could be a real challenge - know when to change and adapt strategies.
            1. Track Everything
              Some problems will be routine failures, others may be Y2K-related. But all could affect the business in one manner or another.
            2. Assign Priorities or Relative Impacts
              Focus resources on problems that matter and can be solved.
            3. Manage the Situation
              Maintain a policy of periodic situation reviews to ensure that resources are used effectively over time and redeploy staff as status changes.
            4. Afterwards, Review What Was Learned
          4. Impact to Communications and Dispatch Mechanisms
            Do not forget that the communications systems and message dispatch tools are also suspect. Plan alternate means of communication (including changing physical proximity) to maintain control in the event of problems.
          5. Technical vs Business Problem Management
            And remember that the objective is to maintain the business -- with the technology in a supporting role, not the other way around. Be clear and firm on deadlines and thresholds and don't hesitate to invoke contingency if the technical situation fails to correct itself in time. Technologists are optimists about one more fix that is sure to correct the problem -- don't let their optimism jeopardize the business.
      10. And Hope for the Best

Copyright Technology Strategists, Inc. 1997, 1998, 1999

 

 

 

 

 

 

 

 

 

Copyright Technology Strategists, Inc. 2003 Back Home Up Next