From the desk of: Sam C. Chan

Bravo/Mach 4 Network Advisory: Recovery Status

All time in Bravo HQ time zone EDT (GMT-4), although server is located in CDT. UK clients: add 5 for BDT.
Bravo Incident Clock notation: { total elapsed time / business hours }

 Click Ctrl-REFRESH on your browser

Thursday 5:20am  {13/1.2}   FELI in Houston, TX resumes normal operations. Incident concluded. Will report after diagnostics and observation period.

Thursday 2:22am  Datacenter unable to make significant progress. Response temporarily hampered by an exhausted team. Also, permanent cut-over to new power postponed to June 14 (tentative).  SEE: YOUR OPTIONS

Wednesday 3:42pm   Datacenter experiencing DNS issue. Expected down time 2 to 4 hours. This is relatively minor, unlike previous which was monumental and required physical infrastructure rebuild.

Tuesday 6:48pm {15/8.0}   FELI in Houston, TX resumes normal operations. Server   diagnostics passed. Incident concluded. Post-incident report to follow.

Tuesday 6:11pm  Generator load test in-progress. Service will likely be restored within an hour, pending successful load test.

Tuesday 4:19pm  Houston still down. New generator arrived at datacenter. Installation completed. Fuel transfer started.

Tuesday 12:42pm   Status of our redundant server added to main page, as some clients have elected to switch over there. Replacement of generator breaker unit didn't solve problem. New generator en route to datacenter, ETA 2 hours. Deployment will take few more hours.

Tuesday 3:45am   Apparently, H1 Phase 1 generator faulty sensor tripped breaker, cutting power to entire H1 phase 1 zone, affecting 4000+ servers.  Replaced sensors and everything was up within 15 minutes, but failed again within an hour. Pursuing replacement of entire breaker unit.

Monday 3:50pm {42/6.8}  FELI in Houston, TX resumes normal operations. Extensive diagnostics and assessment in-progress. Post-incident report to follow. Currently running on temporary generator, while permanent underground infrastructure being rebuilt. There will be another scheduled down time of approx. 8 hours in about 1 week's time. To be announced shortly.

Monday 10:19am   Datacenter reported 90% of Phase 2 (2nd floor) servers reinstated successfully. Construction work to restore power to Phase 1 (where our server is) is on track so far. ETA for us remains at end-of-today or early tomorrow. 

Monday 5:37am {35/0} H1-Phase 1 zone nameservers being reconstructed. It is very likely that section could be up by end-of-day. Temporary power infrastructure expected to be in place today. There will be another brief scheduled outage to install the permanent structure, in about 1 week's time.     SEE: YOUR OPTIONS

Monday 1:55am   Fire marshal just gave green light to start generator. HVAC initiated (2 hrs). Phase 2 zone racks being restored (4 hrs). Normal traffic convergence expected. Phase 1 must wait for additional emergency repairs.

Monday 00:12am   2nd official report. All progressing well.

Sunday 6:30am {12/0}   Datacenter personnel allowed inside facility. Damage found to be worse than initially thought. Phase 2 zone (2nd floor) suffers no physical damage. Phase 1 zone (where our server is located) suffered severe building damage from the initial explosion, as well as requiring complete rebuild of utility power wires.

Sunday 2:10am   Bravo fully functional. Email, VPN, download center, monitoring, dropbox. DNS nameservers.

Sunday 00:40am {6/0}  Bravo activates contingency plan and escalated status: extended outage. Reroute critical in-house sites to our redundant server to preserve full support infrastructure. Review client data. Plan capacity and rehearse procedures. Prioritize schedule and tasks. Notify tier-1 clients and reseller/consultants. Establish web portal for incident.

Saturday 11:46pm   First official report from datacenter after management emergency meeting. Outline recovery plans and timeline.

Saturday 9:17pm   Assurance that servers were not damaged or lost. Complete power outage. Per fire dept. order, cannot start backup generators, pending investigation.

Saturday 7:29pm   Initial report from datacenter. Cause: Explosion and fire. Building evacuated.

Saturday 6:05pm   Assess technical status. Contact datacenter (The Planet).

Saturday 6:01pm   I was first alerted of outage by our monitoring alarm.

Saturday 5:55pm {0/0}  Explosion and fire.

 

SEE ALSO:

Copyright @2005-2008   Bravo Technology Center  *  Bravo:GO  *  Contact Us