Show My Homework - Message from Chief Technology Officer

As promised, I wanted to follow up with a message from our Chief Technology Officer (CTO) outlining the details of the outage and the steps we have taken to rectify the issue.

Message from the CTO

I wanted to explain why the site was unavailable for over 7 hours on Monday 11th September 2017 and 4 hours on Tuesday 12th September 2017

Cause

Given that the vast majority of schools have returned over the past week or so, the Engineering team at Satchel anticipated increasing activity on our website yesterday, and examination of activity on the site both for last week and for last year suggested we had sufficient capacity within our infrastructure to accommodate expected traffic.

However, despite this, the amount and scale of user activity on Show My Homework on the afternoon of 11th September caused our database performance to degrade to such a level whereby we were no longer able to serve any information to users. Furthermore, as performance slowed, the degradation was such that users could not log in either to the website or to our mobile applications.  In short, part of our database infrastructure ground to a standstill. Despite changes made to overcome the performance issues on the 11th, the changes made proved insufficient, as witnessed by the outage on the 12th.

Resolution

Underpinning Show My Homework is a collection of API Services which ultimately reference information inside the Show My Homework database. One of the challenges for any heavily used site like ours is to figure out the infrastructure needed to run these services in such a way that they respond quickly to user requests whilst at the same time not overwhelming our database. Our existing configuration for suggested we had the balance between user activity, available API services and our database capacity correct, however clearly we got this wrong.

To summarise in brief the changes made over the duration of the two outages:

  • We introduced a further three read replicas on our MySQL database to handle incoming reads. This effectively means we can now handle substantially more database queries at a given point in time compared with the available capacity at the start of the week. Part of the delay experienced by users was in getting these built and made ready for use by the site.
  • We increased the infrastructure running the API services by 50%, again allowing us to handle more user requests from the website

At no time was any data lost or compromised.

Post-Mortem

Despite our planning, we underestimated the amount of activity we were likely to receive, especially when schools started the new academic year. Therefore, the key lesson learnt by Satchel is in recognising that our assessment of available operational capacity was insufficient. In addition, whilst we have done much over the past year to improve site performance, there are still parts of the website that need to be tuned to improve performance and reduce the occurrence of such issues as we experienced yesterday. I will work with the team to ensure this work in optimising our site continues.

I believe what we now have in place will be sufficient for our needs over the coming months, but longer term, as we continue to grow our user base and product line, we recognise that we need a more scalable solution. This is something I will ensure the Engineering team puts in place well in advance of September 2018.

Again, please accept our apologies for yesterday’s outage; Satchel appreciates that at this early point in the academic year, the problems with Show My Homework were unacceptable to all our users. The Engineering team here will continue to work to improve matters and ensure there is no repeat of this issue.

Steve Westwood, CTO, Satchel, 13th September 2017.

 

I appreciate that this is a technical overview of what has happened, which is not always useful when you have to explain this to a parent. We included this as many of our customers have requested this and we feel that we have a duty to be transparent with you.

However, I have also included a link here to a summary which you are able to share with parents should you need to.

In addition, we will be contacting parents with an apology on their SMHW app today to ensure they know that this was a fault with Show My Homework and not with the school.

Again, if there’s anything else I can do to assist in any way, please do not hesitate to get in touch with me directly.


Sincerely,
Danni O’Mahoney
Head of Account Management
Show My Homework, Satchel
www.teamsatchel.com