Between 22:14 - 22:49 AEST 18/07/2020 we experienced a hardware fault on a firewall cluster which resulted in a 35 minute outage to our shared and reseller hosting infrastructure.
Our engineers were alerted immediately and troubleshooting began, it took some time to find the root cause.
The root cause was that the active firewall node at the time had a partial XFP transceiver failure where it was in a half-broken state which was not enough to trigger an automatic failover to the redundant node. Engineers performed a manual failover to the secondary firewall node and traffic was restored.
The faulty XFP in question was replaced and redundancy has been restored.
We will be reviewing our architecture and practises to see how we can avoid such an outage from occurring again.
We are sorry for any inconvenience caused.