= Disaster Recovery = * Following is Notes from [[Advisory/AMinutes20081222|meeting 20081222]] == 1. Business Processes Analysis == === 1.1 Core Processes === Need to identify Core Processes from the [[BusinessProcesses]]. * Not according to revenue. * Reliance is the claim made by CAcert, and the only time-critical components of reliance are CRL/OCSP. How to do a revocation: * online system for users * support-generated: * need the support system * need arbitration to authorise * need mail + maillists SO in terms of business continuity / disaster recovery, these are '''CORE''': 1. '''critical systems''' 1. '''OCSP/CRL servers''' === 1.2 Secondary Processes === Secondary: 1. email + maillists - all redundant 1. support - receive certificate complaints, do revocations on them 1. arbitration Discretionary -- all other processes in the list are discretionary. In context of Disaster, these are ignored, for the time being. == 2. Standard Process Times == Standard Process Times (SPT) is needed as a baseline. 1. revocation * Support -- rebuild + startup? * redundant channels: * email support * website POST box * phone??? VoIP??? SMS??? * IRC + chat * 0 time for receiving certificate complaints * 1 hour to pass to arbitration 1. Arbitration * 1 mailing list * 1 hour hour to designate Arbitrator * 24 hours to get 1st ruling on revocation * does arbitrator need guidelines? * false positives, false negatives, discretion amongst arbitrators.... 1. Revocation by Support * 1 hour to revoke 1. Critical Systems * new CRL from support - 0 time * distribution to OCSP / CRL servers - 0 time Then, the SPT for revocation is: '''3 + 24 = 27''' == 3. Recovery Time Objectives == Recovery Time Objectives (RTOs) for core processes are how long it takes to recover the core+secondary processes needed. || 27 hours|| 1. critical systems -- rebuild and start up -- ?? * this would have to be faster than total revocation time * board will have to define this time: * within 24 hours 1. OCSP/CRL -- rebuild and start up???? * 0 time: must have redundancy 1. Mail+mailing lists (Arbitration) * 0 time - redundant -- requirement, we need redundant mail for arbitrators? === 3.1 Failure Times === How long will it take then? Target is 27 hours. * Notification of total failure (support systems) - 1 hour * Investigation to determine total failure (sysadm team) - 1 hour * Decision to rebuild (board 2 members) - 1 hour * Rebuild (sysadm team, 2 people) - 24 hours || == 27 hours || == 4. Maximum Acceptable Outage == Maximum Acceptable Outage (MAO) is the total time that the business decress it can be down for in this context. 1. OCSP/CRL == 0 time for existing ones 1. 2 days before new revocations issued 1. email / support / maillists == 0 time (redundant) * how long does it take to realise problems with mail systems? * throw at tech people ... we want redundant mail + 0 time == 5. Recovery Point Objective == Recovery Point Objective (RPO) is the time back to which we recover. 1. what time before Disaster do we have data for? (Backups)? 1. revocation: 24 hours (normal incremental backups) *==> revocations can be lost *==> user / Arbitrator must do confirm/retry manually *==> write in CPS "you must check within 24 hours to confirm/retry" mail: RPO == 1 hour on mail incoming (so 1 hour SPT can be met) 1. OCSP/CRL: no issue because source files on critical systems * and on other OCSP servers * ==> requirement to load up from other OCSPs and form source. * RPO == 0 time 1. critical systems: RPO == 24 hours == 6. Others == Service Delivery Objectives: not offered (community CA). Best efforts standard for revocation: * support: 1 hour ?? 24 hours??? * arbitration: 24 hours?? 7 days ?? == 7. Strategy and Planning: == What plans exist to put in place the systems and infrastructure required to meet the targets? * general backups 24 hours * mail backups 1 hour * OCSP - 3 redundant * channels to Arbitration - dual? we don't know * (e.g., support people to monitor channel and duplicate on other list) * backup supplier of hardware??? * -- escrow hardware for signing server * if the signing server is secured fully (which it must be anyway??) * ==> redundant CA (database, online, signing, geographical) mirrored drives in all machines * redundant comms already provided by ISP. * Alternate processing Location, etc none Maintenance == 8. Decision Contact Info == * State requirement in SM: ''who needs contact info for whom'' * Within CAcert * all sysadms, all board * Need contact information deposited somewhere offline? e.g., somewhere where it does not effect going offline * Email that goes out every change. * Sysadms must have contact info for all of Oophaga * (not for SM, as described in contract) === 8.1 Oophaga === * need a comment in Oophaga agreement to include disaster recovery * notification to open notifications mailing list * open to all members * public archives * 2 messages: * we intend to do X on date Y * we're out, it's done, here's the report * check cameras * are they on CAcert aisle * talk to BIT about new cameras == 9. Threats & Disasters == As in Security Manual. 1. data breach 1. false certificate issuance * arbitration -> revocation * arbitration -> investigation, checking the logs 1. root compromise * revoke root with vendors (business protocol) * reissue root * revoke subroot / certs == 10. Side Question == * quality of support process * quality of arbitration processes ---- . CategoryPolicy . CategorySystems