We wanted to provide some more information before we marked this incident as closed. On 21/07/2021 at approximately 09:17 customers may have experienced a brief disruption of our services.
The servers which run our services live in two data centres each on opposite sides of London. Under normal circumstances there is a reasonable amount of network traffic going between the two data centres. At 09:17 this morning the supplier who provides the primary connection between the two data centres suffered an issue which caused traffic to be temporarily dropped.
When traffic began to be dropped, the network was working to automatically converge and begin sending traffic over the secondary link between the two data centres. Unfortunately, because the primary link was not marked as down, but was instead silently dropping traffic, this resulted in an influx of traffic bouncing between the two data centres. This increase in traffic resulted in there being a delay to the time it took for the network to converge.
At approximately 09:19 all network traffic was passing again as expected and when the primary link had recovered traffic began automatically using this path.
We did have reports of a few customers being unable to connect to some applications until around 12:00. We do believe this to be an application issue which was triggered by the events on the network. However, the issues after 09:19 were not being explicitly caused by network traffic being dropped. We are troubleshooting these issues at the moment.
Now we have an understanding of the issues this morning we’ll be reviewing the network to see if there is anything which can be done in the future to stop a similar issue happening again.
We’re always striving to keep the platform as stable as possible and would like to apologize for any inconvenience caused this morning.
UPDATE: July 21 09:26
We have been investigating logs and could see there was a network interruption this morning between 09:17 and 09:19. Service should have been operational from 09:19 again for the majority of users however we were seeing reports of some users not being able to login to some applications until around 09:43. We are continuing to investigate with our vendors and will provide a further update later today. Service should be operating as expected at the moment.
We are currently investigating an incident which occurred at 09:17 this morning. Further updates to follow.