Instrumenting and Observing Micro-services Part 1: What do you expect from your micro-service?

A friend of mine tells a great story of a team avoiding a great deal of grief. All of their system health checks were green, but the live graph of purchases dropped to zero and stayed there. Despite the many positive system indicators, the team were able to see they had a problem and were able able to react quickly to find and to fix it. It turned out that user purchases was a key indicator of success.


TES Hack Days

Hack days are an interesting idea based on the 15% time from 3M in 1948. The fundamental goal for hack days is to empower engineers to solve problems that they see however they want to. Without the day-to-day delivery pressure they can solve problems that other people don’t even realize they have or in ways that are incredibly creative.


Outage on 25th April - the impact of a misconfigured redis server

On Tuesday evening, post the launch of the new home page, we had a second set of performance problems that impacted the entire tes.com site around 6pm for 40 minutes, and then subsequently during two periods at 9pm and 12am. The root cause turned out to be a misconfigured Redis caching server that was moved to in response to the issues on the 24th of April. During the post-mortem of the issues the day before we had agreed that a key action was to upgrade and improve the monitoring of the part of our platform that does the composition of the shared fragments (e.g.


Outage on 24th April - how one inefficient piece of code impacted tes.com

We had a number of site related performance issues on Monday 24th April that impacted the entirety of tes.com. The fix to which (as always) was deceptively simple, and resulted in response times on average dropping from 100ms to 10ms, and CPU usage on the server reduce by almost 400%. As part of the rebrand we have been rebuilding the services that supply shared assets to all parts of our platform, which include core styles, images and the fragments of HTML for the masthead, footer and left hand navigation rail.


Debugging with the MongoDB oplog

Debugging allows us developers to assume the role of detective, and like any good detective, we need to consult all of our sources to understand what’s going on. If your application uses MongoDB for persistence, one source you have available is the oplog. What is the oplog? The MongoDB oplog, or operations log, is a standard capped MongoDB collection. Each document in the collection is a record of a write operation (a delete, update or insertion) that has resulted in data being changed.


React, or no framework?

This post is personal opinion rather than representative of the Tes development team in general. If you asked three other developers at Tes you’ll most likely get three different answers (and maybe three more blog posts after this one!). React has been the JavaScript framework of choice at Tes for the last couple of years, recently paired with Redux. Dozens of apps have been built in a variety of styles. Opinion across the whole development team has varied, from ‘Reactify the world’ to ‘use only if strictly necessary’.


Accelerated Mobile Pages

Accelerated Mobile Pages are normal stuff of the web in many respects - publicly reachable via a URL, viewable in any browser, built with HTML, CSS and Javascript. The difference is, they load almost instantly on mobile - music to the ears of anyone who's waited 10 seconds for a page to load over 3G (all of us, then). More information is on the AMP project website. The payoff if you make an AMP version of your page SEO - AMP pages can be served higher up Google search results, for example in the carousel of article stories that you sometimes see when googling for news.


Page load speed (part 2) - faster images

This is the second in a series of posts about improving page performance. Part 1 discussed what we're measuring and how. A video of me talking about the performance issues discussed in this post. The problem For the job details page, we accept banners supplied by schools which aren't compressed as well they could be. Large images don't block the rendering of the main content, however they hog bandwidth, especially on mobile.


Improving page load speed at Tes (part 1)

A video of me talking about the performance issues discussed in this post. How are we defining 'page load speed'? How quickly the user can see and interact with core page content after they navigate. Non-core content could be adverts, their user avatar, or recommended links. It's important they appear as quickly as possible but they're not the main reason the user navigated to the page. Where are the biggest gains to be made?


Secure file uploads with redux-plupload, ClamAV and S3

We have recently added a new feature that allows a user to upload a file from our webpage. We implemented this using redux-plupload, ClamAV and S3 to satisfy the following requirements: the file should be uploaded from the client to avoid excessive memory use on the server while streaming files. the upload must be secure and the file must be stored securely (and ideally encrypted at rest). the file should be virus free so that it can be downloaded without worry.