- Case Number: a20140322.1
- Status: closed
- Claimant: CAcert
- Respondent: Michael T
initial Case Manager: EvaStöwe
Case Manager: BernhardFröhlich
Arbitrator: EvaStöwe
- Date of arbitration start: 2014-03-23
- Date of ruling: 2015-01-30
- Case closed: 2015-02-18
- Complaint: Maybe lost information in the database
- Relief: TBD
Arbitrator: EvaStöwe (A), Respondent: Michael T (R), Claimant: CAcert (C), Case: a20140322.1
History Log
- 2014-03-22 (issue.c.o): case [s20140321.159]
- 2014-03-22 (iCM): added to wiki,
- 2014-03-22 (iCM): request for CM / A, informed R about case
- 2014-03-23 (R): informed iCM that the case should be considered as urgend
2014-03-23 (CM): I'll take this case as Case Manager, EvaStöwe has volunteered to act as Arbitrator.
- 2014-03-23 (CM): Sent initial mail
- 2014-03-24 (A): intermediate ruling I, send to R, support, CM
- 2014-04-29 (R/Software): Infomed A that there was some major review and testing for bug 1138, but one review failing (per voice in software team meeting)
- 2014-05-11 (A): reminds support to send requested collection of cases
- 2014-06-06 (SA): asks critical team to install 3 patches, to coordinate with support to check the results and confer with arbitration for further actions
- 2014-06-07 (Support): suggests procedure how to check correct behavior of new patches
- 2014-06-07 (Critical): patches ware installed
- 2014-06-07 (A): asks for verification that the problem was solved
- 2014-06-07 (A): asks Support about the status of the collection of name/DoB-changes and delete account changes asked for in intermediate ruling I
- 2014-06-08 (A): asks R and software teams questions about the error, how it could take place, how to fix it in the future and if other persons should have found it
- 2014-06-09 (R): answers questions
- 2014-06-10 (Critical): asks to proceed as Support had suggested
- 2014-06-10 (A): intervenes as it has to be clarified first if the issue is resolved from the point of view of software team
- 2014-06-10 (Critical): asks what is missing
- 2014-06-10 (A): clarifies question to critical team
- 2014-06-10 (Critical): asks how to solve the possible deadlock
- 2014-06-10 (A): A: hopes that C may resolve the issue as he reported it and should know when it is solved as software officer
- 2014-06-10 (C): problem should be solved by the patches
- 2014-06-10 (A): consults internal Auditor informally about suggested procedure (via a chat tool)
- 2014-06-11 (A, C, SA): life session, including discussion about suggested procedure for the verification of the correct behavior after new patches
- 2014-06-13 (A): consults CM about suggested procedure for the verification of the correct behavior after new patches (vial chat tool)
- 2014-06-15 (A): intermediate ruling II send to C, support, critical team, CM
- 2014-06-15 (Support): provides file with collection of cases, including suggestions of sql-queries for the needed entries to correct the DB
- 2014-06-15 (Support): reports about tests done with a new account
- 2014-06-16 (Critical): reports about an error detected in the log at the end of the reported windowframe
- 2014-06-16 (Support): asks a software assessor about the error
- 2014-06-18 (A): asks same softwar assessor if the error is related to the patches touched by this case
- 2014-06-24 (SA): declares that error is unrelated (VoiP, chat, before A and Support team member)
- 2014-06-28 (A): informs support, that the software assessor had declared the error as unrelated to the patches discussed in this case and states, that the execution of nam/DoB changes or account deletions should be allowed, again
- 2014-07-19 (A): asks 3 software assessors to approve of the pattern for the sql-queries provided by support to correct the missing entries
- 2014-07-19 (2 SAs): confirm sql-query-pattern (with added ' at two places)
- 2014-07-19 (A): partial ruling send to C, critical team CM
- 2014-07-19 (A): send list with sql-queries encrypted to critical team
- 2014-07-20 (critical): executed sql-queries, send report encrypted to A, CM
- 2015-01-30 (A): final ruling (2. partial ruling)
- 2015-01-30 (A): thanks to CM and software, critical and support team for their help in this case
Private Part
Link to Arbitration case a20140322.1 (Private Part), Access for (CM) + (A) only
EOT Private Part
original Dispute
Hi, I want to file a dispute against myself. In a mail to the critical admins [1] for the application of bug 1135 [2] I requested that two database migration scripts should be run while in fact only the first should have been run. The effect is probably that some other query relying on the database structure and keeping track of renames/changes of DOB by support engineers and account deletion fails to record that change. This is critical information that might be needed in case of a Dispute that may be lost now. Therefore I hereby file a dispute against myself to investigate the issue. Please note that this issue might be TIME-CRITICAL so important evidence and data may be recovered while it's not yet deleted. Places in the source code that might fail: /includes/account.php:2704 /includes/notary.inc.php:914 [1]: https://lists.cacert.org/wws/arc/cacert-devel/2014-01/msg00000.html [2]: https://bugs.cacert.org/view.php?id=1135
- additional mail from R while issuing dispute against himself:
> Dear SEs, > > I just noticed that I made a quite significant mistake a short while ago > that causes information loss when using the automated deletion routine > and the rename of an account in the Support Engineer interface. It's OK > if users that haven't been assured yet change their name in their > account on their own, it's just the support/admin interface that is > affected. > > So as an immediate measure please stop using the automated deletion and > renaming in the support engineer interface until we know what to do. > > @OTRS: I have filed a Dispute for that issue. Please merge this email to > the other one I just sent containing the request for dispute.
Discovery
Containment of the problem
- 2014-01-15
R (in his role as software assessor) asked critical team to install two scripts together with the patch for bug #1135. The patch was ready to go at this time, but one of the scripts should have not been installed at this time as it changed the DB structure to prepare for the patch of bug #1138 which was not ready. Because of this, changes of the name and date of birth (DoB) fields were lost if done before #1138 got installed.
Critical team installed the script together with the patch for #1135 without problems. There is no reason to assume that they should have encountered any problems. According to R there would not have been any error messages to observe for critical team even when those fields were changed by support actions as "the return code [was] not checked and never used so it may be that there was no error message" before the patch to #1138 was installed.
- 2014-03-22
R detected his mistake and filed the dispute to this case.
A issued the intermediate ruling, so that there should not be further changes or deletions by support to the affected fields until it is ensured that the issue was fixed by the installation of another software patch.
This did not affect name and DoB-changes done by the members itself, but as software was busy with working on the patch to solve the original issue at high priority it did not make sense to ask them to prepare something to cover up those changes instead of fixing the issue itself. Also those changes done by the members themselves could only be done as long as there is no assurance to the account. At this time there is no way to verify the fields at all, so a change would not mean a lot one way or the other, as neither the original nor the changed value would be reliable. Also there is probably no great reason to change the fields to begin with for the members themselves, when no assurances play a role.
- 2014-06-07
Critical team installed the needed patches to fix the issue as asked by another software assessor.
The according software assessor also asked critical team to "coordinate with Support to verify information on previously affected operations are now properly recorded by the system".
- 2014-06-10
R attested that the issue was fixed by the installation of the patches at 2014-06-07 and that it should be safe to change or delete entries for the affected fields again.
After some discussion with the internal auditor, software team and the CM how to proceed A thinks that the block to change names/DoBs or to delete accounts could cautiously be removed.
Support previously had suggested a procedure how to check if the affected operations are properly recorded by the system. This included "Critical [to] check the results in the database and gives the result back to support"
There have to be good reasons to allow such deep inspection of accounts. To verify that a patch is good enough to be run on the productive server should not be one of those reasons, at least not with other evidence that there really is such an issue.
If such a detailed check would be needed, the patch should not be considered to be verified enough to be installed on the productive system at all. In this case according tests should first be done on a testserver that would have to be set to a state like the productive server was after #1135 was installed.
Else the current patches have to be considered to be secure and good enough to be executed without such a detailed inspection of the account of at least one member (or ex-member).
- 2014-06-11
At a life session between R (as software assessor), another software assessor (Benny) and A both software assessors assured A that they think that the patches should not need such a deep check, even if they should be installed and monitored with care - as should be the case after every change to the software on a productive server. It was also clarified that the instruction of critical team should be understood as a cautiously manner only.
- 2014-06-15
A allowed singular cautious executions of name/DoB changes and delete account actions, which should be monitored. Support was allowed to first do this to a real account of their own, with correct data. During that executions an error message was encountered by the critical team, that was later declared to be unrelated (and known) by the software team.
- 2014-06-28
The execution of name/DoB changes and delete account acctions was declared to be safe again and support was told that they should be allowed to execute them, as usually.
- 2014-07-20
Critical team executed sql-queries that inserted the missing entries into the DB, according to the information provided by support team. The queries had the OK of 2 SAs.
All issue resulting on the early execution of the script, should be fixed now. There is no need to inform the affected members / ex-members as the data loss did not affect the member or their accounts but only the records for the audit trail.
need for corrective actions
interview of R
R was asked to answer the following questions. He gave the following answers:
- After the patch for #1135 was applied (with the scripts), there should have been some error messages if someone would have tried to change a name or delete an account. Is there anything else that normally causes this kind of error message?
I don't think there will be an error message at all. The critical part happens in line 912 of the notary.inc.php in the function account_delete(). There an entry should be created in the adminlog table which was changed by the script that was accidentally executed but the return code is not checked and never used so it may be that there was no error message.
- Can you explain why you accidentally handed over the script too soon to critical team?
Because there were two scripts in the same bug tracker item and there were two months between the last time I checked the script and when I sent it out. So I didn't remember at that time when I sent it out that only the first one should have been executed.
- Was there anything that could have indicated to critical team, that the script should not have been applied at this time or that there were some problems with the script at that time?
I can't see how the critical team should have noticed this, there was no big note when executing the script or something. There was no problem with the script itself but it should have been kept with the code that needed it instead of being mangled with the other one.
- Is there anything that could prevent that software hands over scripts or anything else to critical team too soon? Or anything else that you can think about to prevent such a mistake in the future?
One thing we should do is keep changes not related by the part that needs changing (in this case the database scheme) but the logical relation. In this case the version4.sh script should have been in the related bug-1138.
- How did you come to see that you had done a mistake?
I checked in the release state of another database migration script and noticed that it hadn't been applied on the test server yet. As I wanted to execute it on the test server it failed because the previous script which was the one in question for this arbitration case hadn't been run (a safe-guard I put in the database migration mechanism). When I checked why it hadn't been run I discovered my previous error.
- Mistakes occur, but we try to reduce them with a 4 eyes principle - was there anything that could have told other SAs that there was a script applied too early?
Maybe if they remembered correctly and observed the mails to the critical team. But for them too some time had gone by.
Deduction
- The problem was cased by a mistake and not by deliberation.
- Critical team who should monitor the logs, could not have detected a problem. For them everything should have looked ok.
- Support team could not have detected the data loss, as the changes were not visible at the frontend at that time.
- The problem was caused by software team because they had not split up some scripts between bug-entries in the bugtracker to the contexts in which they should be executed.
- The 4-eyes-principle has not helped here, because the mails from software team to critical team do not need to be reviewed, obviously the according mail was not heartly checked by other SAs in this case.
The current process does not demand to check such mails by other SAs. The scripts themselve probably had passed their review. There was nothing installed to check for the correct time of execution. This is currently only defined in the mails send to critical team by the software team.
An interview with another SA revealed that the used bugtracker does not allow to set a bug into the status "ready to deploy" if a dependency is set to another bug which is not at least in the status "ready to deploy". This could be used as a safeguard, which not only would be seen by every software assessor, but also by critical team. A minor reconfiguration of the settings may be needed for this.
Rulings
Intermediate Ruling I
- Support should not execute any name changes or delete account actions until notified to do so again by an arbitrator with a reference to
a20140322.1. Name change or delete account support cases should be processed up to the point of execution and than put on hold (with a possible information of affected users) so that they can be executed when it is safe to execute them again without data loss.
- Support should also collect all cases of name changes and account deletions that were executed by support since 2014-01-15 (including) and report them to the arbitrator and case manager of a20140322.1. If this report contains personal information it should be done encrypted.
- Software team should give any needed or possible fix for the problem that arose by the wrong execution of the scripts at the application of bug 1135 (see dispute of a20140322.1) a high priority, so that it can be executed as soon as possible. If no security issues are detected, this
should not be done by an emergency process. The arbitrator of a20140322.1 should be informed about any major steps or problems in this context.
-- Cologne, 2014-03-24
Intermediate Ruling II
Support should be allowed to execute name and DoB changes and account deletions, again.
They should select one case if possible of each kind and execute them. After the execution they should take a look at the affected account in the support interface and check if everything looks like it should be. They should also inform critical team about the execution and ask critical team to take a close look at the according log files.
If they feel the need, support may create a real (additional) account for a support member with correct data that may be deleted without any normal delays or mails normally needed based on a20111128.3.
Both teams should report the results back to A and CM of this case, especially if anything unusual or unexpected was detected, that may be related to this case.
If there is no indication that there remains an issue support should be allowed to execute all other cases of name/DoB changes or account deletions again as usual.
-- Kiel, 2014-06-15
Partial ruling
Arbitration was provided by the support team with a list of entries for name/DoB-changes and account deletions, missing in the database because of the early execution of a script together with the patch for bug # 1135.
The list also contains sql-queries for each of those cases to add the missing entries, which follow a pattern confirmed by 2 software assessors.
The Arbitrator should provide critical team with this list in an encrypted mail.
Critical team afterwards should execute those queries and report the results back to the Arbitrator and Case Manager of this case, again in an encrypted mail.
-- Kiel, 2014-07-19
Final ruling
The issue was caused by a mistake, because two scripts that should not nbe installed at the same time were handled in the same bugtracker entry. It is fixed and the data is restored.
The software team processes at that time were followed correctly, but could not prevent such a mistake.
A Software Assessor made a proposal to change some settings within the bugtracker that may help to improve this situation.
If not already done, software team is advised to consider the proposed settings or to find other ways to improve their processes to prevent this kind of mistake in the future.
Münster, 2015-01-30.
Execution
- 2014-03-24 (A): intermediate ruling I, send to R, support, CM
- 2014-05-11 (A): reminds support to send requested collection of cases
- 2014-06-15 (A): intermediate ruling II send to C, support, critical team, CM
- 2014-06-15 (Support): provides file with collection of cases, including suggestions of sql-queries for the needed entries to correct the DB
- 2014-06-15 (Support): reports about tests done with a new account
- 2014-06-16 (Critical): reports about an error detected in the log at the end of the reported windowframe
- 2014-06-16 (Support): asks a software assessor about the error
- 2014-06-18 (A): asks same softwar assessor if the error is related to the patches touched by this case
- 2014-06-24 (SA): declares that error is unrelated (VoiP, chat, before A and Support team member)
- 2014-06-28 (A): informs support, that the software assessor had declared the error as unrelated to the patches discussed in this case and states, that the execution of nam/DoB changes or account deletions should be allowed, again
- 2014-07-19 (A): partial ruling send to C, critical team CM
- 2014-07-19 (A): send list with sql-queries encrypted to critical team
- 2014-07-20 (critical): executed sql-queries, send report encrypted to A, CM
- 2015-01-30 (A): final ruling (partial ruling 2)
- 2015-02-18 (CM): The case is closed now.
Similiar Cases