Arbitrations/a20120622.1

Case Number: a20120622.1
Status: running
Claimants: Bas D
Respondents: CAcert
Initial Case Manager: AlexRobertson
Case Manager: BernhardFröhlich
Arbitrator: UlrichSchroeter
Date of arbitration start: 2012-06-22
Date of ruling: 201Y-MM-DD
Case closed: 201Y-MM-DD
Complaint: Authorize emergency visit
Relief: TBD

Before: Arbitrator UlrichSchroeter (A), Respondent: CAcert (R), Claimant: Bas D(C), Case: a20120622.1

History Log

2012-06-21 (Support): Blog post Problems with signing certificates (resolved)
2012-06-22 (issue.c.o) case s20120622.180
2012-06-22 (iCM): added to wiki
2012-06-22 (A): I'll take this and appoint CM
2012-06-22 (A): Intermediate Ruling #1
2012-06-22 (C): called (A) by mobile
2012-06-23 (Critical team): Visit BIT 22.06.2012 Report
2012-06-23 (C): Visit BIT 22.06.2012 Report Confirmation
2012-06-25 (Critical team): scheduled visit #2 - Tuesday 2012-06-26
2012-06-25 (Support): Blog post maintenance announcement: Server Downtime 2012-06-26 about 12:00 UTC to 14:00 UTC
2012-06-25 (iCM): should this be appended to a20120622.1?
2012-06-25 (A): Intermediate Ruling #2
2012-06-25 (Critical Admin): note to intermediate ruling #2
2012-06-25 (A): response to note given
2012-06-26 (iCM): response to (Critical Admin) note to intermediate ruling #2
2012-06-26 (Critical Admin): response to (iCM) note
2012-07-03 (C): call to (A) by mobile (16:08): questions regarding disk shreeding

Original Dispute, Discovery (Private Part)

Link to Arbitration case a20120622.1 (Private Part), Access for (CM) + (A) only)

EOT Private Part

Intermediate Ruling #1

I'll hereby follow the case a20120528.1 and I give the following Intermediate Ruling:

Intermediate ruling #1 I order that one access engineer and one or two critical adminstrator(s) are allowed to access the BIT facilities to analyse and probably fix the current signer problem. If further authorisation is required according to SP you can call me by mobile +##-####-###.#### The critical team shall prepare an report for later review according to SP procedures

Frankfurt/Main, 2012-06-22

Discovery

Public Support mailing list
(Support) Blog post Problems with signing certificates (resolved)
- "The signer was down from June 21 02:00 UTC to June 22 23:00 UTC."
2012-06-22 (C): called (A) by mobile (all before the 1st visit)
- Discussion wether the team of 1 AE + 1 Critical Admin is allowed to access the critical system (Signer)
- A second Critical Admin is not available
- (A): SP 1.2 Principles: dual control, four eyes, redundancy, escrow, logging, separation of concerns, Audit, Authority
  - confirmation to keep the 4 eyes principle intact, 1 Access Engineer, 1 Critical Admin, Access Engineer goes into role of oversight
- Round #2 via (Support)
  - (A): Whats about previous (Critical Admin) who moved to (AE) Stefan Kooman?
  - (C): Stefan isn't available too
2012-06-23 Visit BIT 22.06.2012 Report by (Critical Admin)
- 2012-06-22 Emergency Visit BIT report
- Persons:
  - Mendel Mobach (CAcert)(Critical Admin)
  - Bas van den Dikkenberg (Oophaga)(Access Engineer)
2012-06-23 Visit BIT 22.06.2012 Report Confirmation by (C)
- 2012-06-22 Emergency Visit Report confirmation
- NB Recommendation to CritSys team to replace failing disk

2012-06-25 (Critical team): scheduled visit #2 - Tuesday 2012-06-26

A visit to BIT by Bas van den Dikkenberg (Oophaga), Mendel Mobach (CAcert)
and Wytze van der Raay (CAcert) has been scheduled for 26 June 2012 at
14:00 CEST. The purpose of the visit is to replace a broken disk in the
signing server, and correct the time on the signing server. Due to this
work the signing server will be unavailable for approximately two hours.

2012-06-25 (Support): maintenance announcement
- "Server Downtime 2012-06-26 about 12:00 UTC to 14:00 UTC"

Intermediate ruling #2

A team from Access Engineers and Critical team is allowed to visit BIT again as a follow-up of the Friday, 2012-06-22 visit to replace the broken disk as proposed by Bas van Dikkenberg in the confirmation email dated 2012-06-23 of the last visit report by Mendel dated 2012-06-23 The critical team shall prepare a report for later review according to SP procedures also of this visit.

Frankfurt/Main, 2012-06-25

Discovery

2012-06-25 (Critical Admin): note to ruling #2

we probably don't need arbitration as this is a normal visit with only normal fixes and normal maintaince (with impact for users, but oke).

2012-06-25 (A): response to note given

By default, you're probably correct

As this visit is a follow-up visit of the last weeks
visit that has been moved into the disputes queue
one of the tasks is a replacement of a defective disk

As we have a running case open, all unforseen actions
can be handled under this current case until the
initial problem is solved and the running arbitration
case is closed.

So the upcoming visit can also be seen as an
evidence gathering process for the previous visit.
The requested report follows SP 2.3.3. Access Logging
that includes "reporting to all"

probably related SP areas that are
covered by current case:
SP 2.2.3.3 Retirement 
SP 2.3.2. Access Profiles
SP 5.4. Investigation
SP 5.6. Report

2012-06-26 (iCM): response to (Critical Admin) note to intermediate ruling #2

IMHO it's probably only relevant in that it follows  up Bas' request to
replace the disk from his confirmation of the earlier visit - as that was
logged in the running case, this provides "completion" to that case.

2012-06-26 (Critical Admin): response to (iCM) note

My motivation for copying the visit announcement to support@cacert.org
was to make them aware of the actions  taking in response to their
reports of failing user certs and failing OCSP responses, and of course
the expected outage of the server during the maintenance activity.

Potentialy affected Policies and Manuals
- Security Policy
  - SP 1.2 Principles: dual control, four eyes, redundancy, escrow, logging, separation of concerns, Audit, Authority
  - SP 2.3.3. Access Logging (this includes "reporting to all")
  - SP 2.2.3.3 Retirement (Storage media)
    - ```
    Storage media that is exposed to critical data and is to be retired from service shall be destroyed or otherwise secured. The following steps are to be taken:
    
        The media is securely destroyed, or
        the media is securely erased, and stored securely. 
    
    Records of secure erasure and method of final disposal shall be tracked in the asset inventory. Where critical data is involved, two Systems Administrators must sign-off on each step. 
```
- Question that araises: How to handle broken disks?
- One answer is probably given under a20090301.1
- The answer is given under SystemAdministration/Procedures/DriveRetirement
- SP 2.3.2. Access Profiles
  - ```
  According to the Security Manual 2.3.2 updates to the signer may require the presence of two critical system administrators.
```
  - see also Arbitration case a20090810.4
  - more details under SystemAdministration/Procedures/DriveRetirement
  - SP 5.4. Investigation
  - SP 5.6. Report
  - SP 6. DISASTER RECOVERY
- Certification Practice Statement p 5.7
  - ```
  5.7. Compromise and disaster recovery
  
  Refer to Security Policy 5, 6 (COD8). (Refer to §1.4 for limitations to service.) 
```
- SP 5. INCIDENT RESPONSE
- SP 6. DISASTER RECOVERY
- SecurityManual
  - 6.2. Recovery Times
    - ```
    DisasterRecovery sets the recovery time for revocation services at 27 hours.
```
- Interruption by Signer affects OCSP responder service
- DisasterRecovery
  - DisasterRecovery#2._Standard_Process_Times
  - DisasterRecovery#3._Recovery_Time_Objectives
- System Procedures
  - List of System procedures: SystemAdministration/Procedures and SystemAdministration
  - SystemAdministration/Procedures/DriveRetirement
  - SystemAdministration/Procedures/OcspResponder
- Systems
  - System: Signer

2012-07-03 (C): call to (A) by mobile (16:08): questions regarding disk shreeding

Dear CC-party,

Regarding arbitration case a20120622.1
https://wiki.cacert.org/Arbitrations/a20120622.1

Today I've received a call from Bas by mobile
at 2012-07-03 16:08
that relates to the running case.
For documentation purposes I'll document
this under this case:

Bas, Wytze and Mendel are staying at BIT Ede datacenter.
(visit re-scheduled for 2012-07-03 14:00 CEST)
The purpose of the visit is to execute
the follow-up actions defined after the previous site visit of
26 June 2012:

* Retire the old (somewhat broken) system drive of the siging server.
  following the "Suggested simplified procedure" as described in:
  https://wiki.cacert.org/SystemAdministration/Procedures/DriveRetirement

Part 0 -- Zero -- CAcert Systems Administrator
has been executed.
The question now is, if its allowed that Wytze can take the disk
at home to execute 
Part 1 -- Shred -- CAcert Systems Administrator
as the process is a long time consuming phase

(A): Is it possible to do the process in the datacenter?
(C): Yes it is.
(A): references to https://wiki.cacert.org/Arbitrations/a20090301.1
     where the procedure has been changed caused by lack of hardware
     to proceed
     so the current answer is to proceed it in the datacenter
--End of call--


=== Discussion [on] ===

While now reading the procedure defined under
https://wiki.cacert.org/SystemAdministration/Procedures/DriveRetirement
there is a slight difference in the procedures and the procedures itself
opens some questions:

In the visit announcement the "Suggested simplified procedure" has been
referenced (later more) but there also exist a main procedure references
the
2-step procedure:
Part 0 + Part 1

One of the possible conflicting points is following line under
Part 0:
"After completion of this, remove the old drive and take it off-site for
Phase 1."


What does "off-site" means here ?
Take it away from the production system, but keep it in the secure
environment (the CAcert rack) ?


The Notes under this section opens the question:

Notes:

* Two CAcert administrators need to be present at the start
  and the finish, and sign-off on the completed process.

 * The Machine plus drive need to be in a location with
   reasonable security. E.g., a secured office location
   or a populated home location. 
 * if the drive to be shredded contains hard media defects
   which block writing of certain sectors, the above
   procedure may not run to completion, and another
   (physical) method will be required to render the
   remaining data on the drive inaccessible. This will
   mean that CAcert Systems Administrators will also need
   to be present in Oophaga phase 2 below. 


"Two CAcert administrators need to be present"
ok, so the next section
   The Machine plus drive need to be in a location with
   reasonable security. E.g., a secured office location
   or a populated home location. 
makes no sense, if the disk is take off-site (out of BIT Ede)
delivered to one administrator who takes the disk at home,
to start the shred process ...
or both admins have to travel to one admins house

so the requirement:
 "Two CAcert administrators need to be present at the start
  and the finish, and sign-off on the completed process."
makes only sense, if the disk will be kept in the rack,
the rack opened by one Access Engineer. The two CAcert
administrators starting the shred process, the Access Engineer
closes the rack and all parties scheduling a revisit

There is one more topic, this is Notes part 2:
"if the drive to be shredded contains hard media defects
 which block writing of certain sectors, the above procedure
 may not run to completion, and another (physical) method
 will be required to render the remaining data on the drive
 inaccessible. This will mean that CAcert Systems
 Administrators will also need to be present in Oophaga
 phase 2 below. "

With the analyze and knowledge from the previous visits,
we know, that we have a defective disk with potential
defective blocks that can cause the problems described
under Notes part 2 ......


Then we have the scheduled visit announcement
for the "simplified procedure"

Suggested simplified procedure (not agreed as yet)
step 1:
requirement: two CAcert System Administrators present
"After completion of this, remove the old drive and take it off-site."

here probably off-site means out of BIT Ede datacenter?
Is this correct?

So one main question that araises in comparing both procedures
is: why there is that a big difference in step 1 of procedure 1
in relation to step 2 in procedure 2 ?

proc 1, step 1 requires 2 admins in a secured location, probably
access controlled by an Access Engineer
in contrast
proc 2, step 2 doesn't have these requirements

So what is the reason that allows a reduced security control
in procedure 2 step 2 in relation to proc 1 step 1 ?

Is the "zero the data" procedure on the disk enough
that the bits and bytes can no longer be recovered
so the outcome is to lower the security requirements?
that allows one admin to take the disk away from
the secured location, to take it at home, to pass step 2
of the suggested procedure ?

Why then there are 2 more steps - a shredding and physical destruction
step required for such a disk ?

Ruling

Execution

Similiar Cases

a20090301.1	CAcert disk destruction procedure has changed compared to the CAcert Board decision
a20090627.1	Emergency access to CAcert critical systems
a20090810.4	Emergency access to CAcert critical systems
a20120528.1	Emergency Dispute: Access to server due to signer problem