Bravo/Mach 4 Network Advisory:
Recovery Status All time in Bravo HQ
time zone EDT (GMT-4),
although server is located in CDT. UK clients: add 5 for BDT.
Bravo Incident Clock notation: {
total elapsed time / business hours
}
Click Ctrl-REFRESH on your browser
Thursday 5:20am {13/1.2} FELI in Houston, TX resumes normal operations.
Incident concluded. Will report after diagnostics and observation
period.
Thursday 2:22am Datacenter unable to make
significant progress. Response temporarily hampered by an exhausted
team. Also, permanent cut-over to new power postponed to June 14
(tentative).
SEE: YOUR
OPTIONS Wednesday 3:42pm Datacenter experiencing DNS issue. Expected down time 2 to 4 hours.
This is relatively minor, unlike previous which was monumental and
required physical infrastructure rebuild. Tuesday 6:48pm {15/8.0} FELI in Houston, TX resumes normal operations.
Server diagnostics passed. Incident concluded. Post-incident
report to follow.
Tuesday 6:11pm Generator load test in-progress. Service
will likely be restored within an hour, pending successful load test.
Tuesday 4:19pm
Houston still down. New generator arrived at datacenter.
Installation completed. Fuel transfer started.
Tuesday 12:42pm Status of our redundant server added to
main page, as some clients have elected to switch over there.
Replacement of generator breaker unit didn't solve problem. New
generator en route to datacenter, ETA 2 hours. Deployment will
take few more hours.
Tuesday 3:45am
Apparently,
H1 Phase 1 generator faulty sensor tripped breaker, cutting
power to entire H1 phase 1 zone, affecting 4000+ servers. Replaced
sensors and everything was up within 15 minutes, but failed again within
an hour. Pursuing replacement of entire breaker unit.
Monday 3:50pm {42/6.8}
FELI in Houston, TX resumes normal operations.
Extensive diagnostics and assessment in-progress. Post-incident
report to follow. Currently running on temporary generator, while
permanent underground infrastructure being rebuilt.
There will be
another scheduled down time of approx. 8 hours in about 1 week's
time. To be announced shortly.
Monday 10:19am Datacenter reported 90% of Phase 2
(2nd floor) servers reinstated successfully. Construction work to
restore power to Phase 1 (where our server is) is on track so far. ETA
for us remains at end-of-today or early tomorrow.
Monday 5:37am {35/0} H1-Phase 1 zone nameservers being reconstructed. It is
very likely that section could be up by end-of-day. Temporary power
infrastructure expected to be in place today. There will be another
brief scheduled outage to install the permanent structure, in about 1
week's time.
SEE: YOUR
OPTIONS
Monday 1:55am Fire marshal just gave green light to
start generator. HVAC initiated (2 hrs). Phase 2 zone racks being
restored (4 hrs). Normal traffic convergence expected. Phase 1 must wait
for additional emergency repairs.
Monday 00:12am 2nd official report. All
progressing well.
Sunday 6:30am {12/0} Datacenter
personnel allowed inside facility. Damage found to be worse than
initially thought. Phase 2 zone (2nd floor) suffers no physical damage.
Phase 1 zone (where our server is located) suffered severe building
damage from the initial explosion, as well as requiring complete rebuild
of utility power wires.
Sunday 2:10am Bravo fully functional. Email, VPN,
download center, monitoring, dropbox. DNS nameservers.
Sunday 00:40am {6/0}
Bravo activates contingency plan and escalated
status:
extended outage.
Reroute critical
in-house sites to our redundant server to preserve full support infrastructure. Review client data.
Plan capacity and rehearse procedures. Prioritize schedule and tasks. Notify
tier-1 clients and reseller/consultants. Establish web portal for
incident.
Saturday 11:46pm First official report from
datacenter after management emergency meeting. Outline recovery plans
and timeline.
Saturday 9:17pm Assurance that servers were not
damaged or lost. Complete power outage. Per fire dept. order, cannot
start backup generators, pending investigation.
Saturday 7:29pm Initial report from datacenter.
Cause: Explosion and fire. Building evacuated.
Saturday 6:05pm Assess technical status. Contact
datacenter (The Planet).
Saturday 6:01pm I was first alerted of outage by
our monitoring alarm.
Saturday 5:55pm {0/0} Explosion and fire.
SEE ALSO:
|