Over the past few weeks, you may have noticed that your Selenium tests on CrossBrowserTesting are running faster and with more consistent timing. This isn’t a fluke or a random variance. We made significant changes to the way our Selenium Hub works in our infrastructure that greatly reduced the time it takes to execute each selenium command. This change along with reworking some back-end load balancing means shorter overall test run times as well as less variation between test runs.
By moving our Selenium hub infrastructure to Kubernetes, we have not only improved our performance (quite drastically, as you’ll see), we have made it easier to update, troubleshoot, and scale.
Before the change, our internal load test timings on our Selenium hub were good, but not where we wanted them to be. With our new Selenium hub, we not only hit our performance mark under load, but are also now delivering a much better experience to the regular users of our service.
You can probably tell just from glancing at the above graph exactly when we put the new hub in place – the sharp drop in test time on December 6th is due to the new Hub being placed into production. In the above 1-week graph, you can see that our average command execution time dropped from approximately 25 seconds to right around 12.5 seconds – half the time of the previous week. The daily average metric (not pictured) bears this out as well, going from 22 seconds to 11.
As mentioned, our timings are also more consistent – while I don’t have metrics through the end of today (as of time of writing), this graph of our standard deviation (a measure of variance) shows quite clearly that our variance in timing has diminished, from 11 a week ago to 7. This means that not only do tests run faster overall, but they will run in a narrower band of timings.
How We Made This Happen
Traditionally, we have had a Selenium Hub service running on several physical servers inside our data center. While this was easier to understand and maintain when it was originally implemented, the needs of our Selenium Hub, and our users, was well on its way to expanding outside what we could support with the existing infrastructure. We could expand our Selenium Hub to more boxes, but this becomes a maintenance bottleneck, and, as I mentioned previously, toil is expensive and can reduce productivity well beyond the time taken to handle it.
So we built out a Kubernetes cluster that runs on some of the hardware we have recently repurposed, and after extensive testing, we felt comfortable rolling it out to users.
This was a fairly straightforward process, which was great, because this is the first time we’ve used Kubernetes for any of our workloads at CrossBrowserTesting. We also had another advantage – because of the way that our Selenium Hub was written, putting it into a Docker container required exactly zero code changes to make it work properly within a container, only requiring setting a few environment variables, exposing a port, and picking a starting point to build from.
We knew going in that Kubernetes offered certain things, but overall, we found these to be the biggest points in favor of using it for workloads:
- Completely decoupling software from the environment it’s running in is mindbogglingly helpful – if we want to update software to work on a new version of Python, that change is as easy as rebuilding on a Python image, and there’s no need to worry about the impact it might have on other software using Python.
- Deployments are now safer, faster, and easier. Writing push-button deployments for Kubernetes is easy, especially when you start using Helm on top of it. The deployment process is considerably faster, with most of the heavy lifting being handled by built-in components.
- Changing resource configurations becomes incredibly simple – if we want to change the count, scaling bounds, scaling parameters, or even change to a completely different image, that’s a single clean roll out away.
- Adding resources to a cluster is as easy as setting up a server and joining it using a token. From there, that node can immediately begin handling some of the workload, which becomes especially useful when we start adding in Kubernetes built-in auto-scaling functionality.
- Having a unified way to handle logging and metrics makes working on these systems much simpler and more efficient.
Of course, there is no silver bullet to any problem. While Kubernetes has incredible power and flexibility, there is always a cost:
- Kubernetes may not work for every workload, and this may include yours. Stateful applications will have trouble running well in Kubernetes in a meaningful way, and the very small overhead imposed by Kubernetes and Docker may cause an issue if your workload requires the absolute most performance your hardware is capable of.
- Kubernetes and Docker are ecosystems unto themselves, and if you don’t have someone who can read the docs and glue things together, it becomes much more difficult to actually make it all functional.
Overall, I think this was an important change at CrossBrowserTesting – not only have we made a vital user-facing service considerably higher performance and more resilient, we have set out a path for ourselves to begin improving even more of our platform. We already have some really exciting plans in the works for the future to make the platform better, faster, more resilient, and more useful no matter who you are!