CrossBrowserTesting.com

A Design, Development, Testing Blog

  • Product Overview
  • Pricing
  • Back Home
  • Free Trial
  • Design
  • Development
  • Manual Testing
  • Test Automation
  • Visual Testing

Why Your Tests Are Suddenly Running Faster

February 5, 2020 By Harold Schreckengost Leave a Comment

Over the past few weeks, you may have noticed that your Selenium tests on CrossBrowserTesting are running faster and with more consistent timing.  This isn’t a fluke or a random variance. We made significant changes to the way our Selenium Hub works in our infrastructure that greatly reduced the time it takes to execute each selenium command.  This change along with reworking some back-end load balancing means shorter overall test run times as well as less variation between test runs.

By moving our Selenium hub infrastructure to Kubernetes, we have not only improved our performance (quite drastically, as you’ll see), we have made it easier to update, troubleshoot, and scale.

The Numbers

Before the change, our internal load test timings on our Selenium hub were good, but not where we wanted them to be.  With our new Selenium hub, we not only hit our performance mark under load, but are also now delivering a much better experience to the regular users of our service.

You can probably tell just from glancing at the above graph exactly when we put the new hub in place – the sharp drop in test time on December 6th is due to the new Hub being placed into production.  In the above 1-week graph, you can see that our average command execution time dropped from approximately 25 seconds to right around 12.5 seconds – half the time of the previous week.  The daily average metric (not pictured) bears this out as well, going from 22 seconds to 11.

As mentioned, our timings are also more consistent – while I don’t have metrics through the end of today (as of time of writing), this graph of our standard deviation (a measure of variance) shows quite clearly that our variance in timing has diminished, from 11 a week ago to 7.  This means that not only do tests run faster overall, but they will run in a narrower band of timings.

How We Made This Happen

Traditionally, we have had a Selenium Hub service running on several physical servers inside our data center.  While this was easier to understand and maintain when it was originally implemented, the needs of our Selenium Hub, and our users, was well on its way to expanding outside what we could support with the existing infrastructure.  We could expand our Selenium Hub to more boxes, but this becomes a maintenance bottleneck, and, as I mentioned previously, toil is expensive and can reduce productivity well beyond the time taken to handle it.

So we built out a Kubernetes cluster that runs on some of the hardware we have recently repurposed, and after extensive testing, we felt comfortable rolling it out to users.

This was a fairly straightforward process, which was great, because this is the first time we’ve used Kubernetes for any of our workloads at CrossBrowserTesting.  We also had another advantage – because of the way that our Selenium Hub was written, putting it into a Docker container required exactly zero code changes to make it work properly within a container, only requiring setting a few environment variables, exposing a port, and picking a starting point to build from.

We knew going in that Kubernetes offered certain things, but overall, we found these to be the biggest points in favor of using it for workloads:

  • Completely decoupling software from the environment it’s running in is mindbogglingly helpful – if we want to update software to work on a new version of Python, that change is as easy as rebuilding on a Python image, and there’s no need to worry about the impact it might have on other software using Python.
  • Deployments are now safer, faster, and easier. Writing push-button deployments for Kubernetes is easy, especially when you start using Helm on top of it.  The deployment process is considerably faster, with most of the heavy lifting being handled by built-in components.
  • Changing resource configurations becomes incredibly simple – if we want to change the count, scaling bounds, scaling parameters, or even change to a completely different image, that’s a single clean roll out away.
  • Adding resources to a cluster is as easy as setting up a server and joining it using a token.  From there, that node can immediately begin handling some of the workload, which becomes especially useful when we start adding in Kubernetes built-in auto-scaling functionality.
  • Having a unified way to handle logging and metrics makes working on these systems much simpler and more efficient.

Of course, there is no silver bullet to any problem.  While Kubernetes has incredible power and flexibility, there is always a cost:

  • Kubernetes may not work for every workload, and this may include yours.  Stateful applications will have trouble running well in Kubernetes in a meaningful way, and the very small overhead imposed by Kubernetes and Docker may cause an issue if your workload requires the absolute most performance your hardware is capable of.
  • Kubernetes and Docker are ecosystems unto themselves, and if you don’t have someone who can read the docs and glue things together, it becomes much more difficult to actually make it all functional.

Overall, I think this was an important change at CrossBrowserTesting – not only have we made a vital user-facing service considerably higher performance and more resilient, we have set out a path for ourselves to begin improving even more of our platform.  We already have some really exciting plans in the works for the future to make the platform better, faster, more resilient, and more useful no matter who you are!

Filed Under: Development Tagged With: automation, Kubernetes, Selenium

Extending the Elastic Stack to Fit Our Needs

August 28, 2019 By Harold Schreckengost Leave a Comment

Blue background with icons on it

We are big proponents of the Elastic Stack here at CrossBrowserTesting – we use it for gathering metrics about system performance, tracing down issues in our stack, and identifying targeted improvements to our systems.

Elastic, the company that produces the Elastic Stack’s core tooling, offers a lot of flexibility and power.  We are able to take whatever data we want to examine and make use of it in the Elastic Stack, which opens up a lot of doors in a system the size of ours.

Elasticsearch

Elasticsearch is Elastic’s search and analytics engine, and I’d be lying if I said I wasn’t a big advocate of it. It’s fast, scalable, and efficient – even on modest cloud instances, we can have thousands of messages stored, indexed, and aggregated without the system even breaking a sweat. It has joined the ranks of HAProxy as being one of my go-to examples of incredible software design.

Elasticsearch provides a JSON-based API and gives us a lot of novel ways to look at our data. Some examples include filtered indexes, aggregation, and roll-up indexes that we can use to summarize data for long-term storage and use.

Kibana

Kibana is Elastic’s front-end for Elasticsearch. Its interface can be daunting at first, but there’s plenty of utility laying right under the surface. The log viewing interface provided by Kibana makes it easy to see anomalous states in our logs; the ability to make dashboards out of any set of visualizations, allowing for anything from performance metrics to error breakdowns; there are APM and platform metrics modules that can be attached, offering insight into where performance issues are coming from with pre-configured dashboards and easy-to-set up integrations.

Logstash

Logstash is Elastic’s heavy-duty logging tool, used for parsing or cleaning up incoming data for ingestion by Elasticsearch. The ramp up for it is pretty straightforward and it is extremely powerful. The use of Grok and Oniguruma expressions make it easy to follow what you’re actually parsing in the text – a real step above using standard regular expressions.

Beats

The introduction of Beats by Elastic was one of the biggest changes to their ecosystem. With Beats, there are simple-to-use, powerful, pre-packaged tools for turning existing data into actionable data inside Elasticsearch.

Elastic Stack Beats and what they do:

  • Filebeat: Reads the content of files and makes them available in Elasticsearch
  • Metricbeat: Sends process and system metrics to Elasticsearch
  • Packetbeat: Sends network metrics to Elasticsearch
  • Winlogbeat: Sends Windows event logs
  • Auditbeat: Sends audit and integrity data
  • Heartbeat: Sends uptime and ping data
  • Functionbeat: Sends data for serverless architectures

These tools fit the most common needs of most businesses. Of course, I wouldn’t be writing about extending the Elastic Stack to meet our needs if our needs were completely within those boundaries, would I?

How to build your own Beats

The Beat platform that Elastic releases for the public is built on libbeat, a Go library that handles things like configuration parsing, sending data, and command line arguments. All of the current versions of Beats are built on this library.

Fortunately, libbeat is readily available!

You can find Elastic’s guide to using libbeat here. If you know Go, you can get started on writing custom data shippers right away with this guide.

The second thing we built at CrossBrowserTesting using libbeat was a custom shipper for Android mobile device logs. We have hundreds of real Android devices and trying to keep track of logging across that many devices poses a unique challenge that the majority of businesses will never have.

With Androidbeat, our in-house data shipper for Android mobile devices, we had several requirements:

  • It needed to use the standard Android Debug Bridge (adb) functionality built in to Android
  • It needed to handle temporary unavailability of a device gracefully
  • It needed to be able to add in metadata about the devices attached, such as our unique identifiers, the severity of the logs, and the time the message was sent.

We probably could have hacked something together using Filebeat and the existing adb tool, but it would be fragile and require a number of moving parts. Not something we want to build in a complex system.

Fortunately, each problem was straightforward to solve:

  • We were able to find an adb library for Go that made connecting to our Android devices easy
  • By having a goroutine stop when the connection is broken and having it start again when the device is reconnected, we’re able to gracefully handle disconnects, and by tracking the time of the last logs, we can prevent ourselves from getting duplicates
  • Thanks to the configuration parser inside libbeat, it’s easy to create a YAML file that contains things like what device ID something has, what type of device it is, even a specific log level for that family of devices or even just a specific device, and this data can then be inserted into the data that gets sent to Elasticsearch

It took about a day to build Androidbeat, but there is one major caveat that, should you feel like traveling down this rabbit hole, you likely want to be aware of: your first project using libbeat will take considerably longer. I realize this sounds like just not accounting for spin-up time, but the libbeat platform has a few things that make getting started take a little longer, such as a new build system (using mage), a set of setup scripts that, while they make it much easier to get going with the full functionality, can also add a layer of confusion, and a build configuration system that is, to put it mildly, somewhat arcane.

Overall, the Elastic Stack is extremely powerful, and, even better, it’s open source (under the Apache 2.0 license) and more or less freely available, the only caveat being that certain features are locked behind the licensing process. Like any powerful tool, there’s a learning curve, but you can make it do whatever you need it to do.

Filed Under: Development Tagged With: devops, perf, performance

Using DevOps Practices to Improve Productivity and Happiness

June 19, 2019 By Harold Schreckengost Leave a Comment

retain software testers
retain software testers

DevOps has become such a common topic of discussion that it almost sounds like a buzzword at times.  But there are some really great ideas that live under the DevOps umbrella, you just need to peel the onion back a little.  Even if you are tired of hearing people say, “We use JIRA, we’re Agile now!”, I believe it’s worth checking out.

Look, I get it, this is probably the third or fourth silver bullet you’ve been promised. This isn’t a silver bullet, but more of a roadmap to help you make the right choices for yourself and your team.

The core of DevOps can be broken down into three categories:

  • Reducing the risk of a failure – If we can reduce the chance of something failing, that’s less time we spend fixing it, less cost associated with the fix, and less frustration because of the downtime.
  • Reducing the impact of a failure – If a failure can wipe out the entire system, we have to be exceptionally careful to make sure that nothing ever fails. But we all know that no matter how careful we are, we’re going to see a failure at some point. If we can reduce the impact of a given failure, we can see the same benefits as reducing the risk.
  • Reducing toil – Toil is mindless, repetitive work. No one likes doing mindless work. By reducing toil, people are able to spend more time on things that they’re uniquely qualified for, allowing for more impactful improvements for the product and business.

Getting Started

In general, just because you’re following “best practices” doesn’t mean you’re actually gaining much value from them.  When you’re evaluating a process change, consider whether it will provide value to you and your team.  By going through this process, you can evaluate the pros and cons to ensure it’s the best process moving forward.  At the end of the day, you want to make your team better. Wasting time building out a process that doesn’t help is inefficient.

Automated Tests

Automated tests can be a simple place to get started; they can help your developers become more confident in what they’re writing, without having to invest significant time in verifying every change they’re making.  This can increase the efficiency of those developers, allowing them to do fewer repetitive tasks (reducing toil) and reducing the risk of a failure. A common side effect of this is that code will be more thoroughly tested, since it’s low effort now to do the testing, so people will be more willing.

Automated Deployments

Everyone has that one bit of software that is awful to deploy to production – maybe it takes a long time, maybe it’s error prone, maybe it’s just a lot of steps to try to get through. By automating a deployment, rather than requiring it be done by hand every time, you can reduce the risk of a failure (a well-crafted automated process will be consistent and not accidentally fat-finger a key) while reducing the time investment to deploy.

Cattle Over Pets

This is one of the biggest things to work towards, in my opinion: the idea of treating a system as replaceable, rather than as something that we have to take special care of means that we don’t have to worry about what will happen if a specific failure occurs. We have the ability to rebuild on the spot instead, toss it out, and away we go on our merry way again. So if something does fail, it’s much less likely to be a truly catastrophic failure, and the process of fixing it is the same as setting up the box to begin with – reducing impact and reducing toil.

Continuous Integration and Continuous Delivery

Continuous Integration and Continuous Delivery serve slightly different purposes but are very much related to each other at the same time. Both stem from the same principle: by doing work in small, concrete chunks, we have an easier time making sure those changes are valid, and we have an easier time deploying small, concrete changes over monolithic, far-reaching changes. I’ve heard both of these referred to using the idea of a blast radius – even a lot of smaller changes won’t be enough to completely destroy something, but one big change can be enough to bring down pretty much anything. Judicious use of both CI and CD can help reduce risk, reduce the impact of a change, and reduce toil.

 Where to go from here

A common theme for a lot of these changes is that they are designed around making it easier for developers to do the right thing. Build it and they will come. Build the tools that make peoples’ lives easier, and they will use them.

The benefits of testing earlier (“shifting left“) and more often are immense – if a bug caught early in the development cycle costs $50, for example, one that makes it into production can easily cost 10 times as much, and one that lives in production for a while just gets more and more expensive as time goes on.

Take inspiration from what other people are doing but realize that you don’t have to be held to that. Just because something works for someone else doesn’t mean it will work for you, and it probably won’t if you don’t understand the problem they were able to solve with a tool. As long as you’re making things better, that’s the important part.  It doesn’t have to be scary.

Filed Under: DevOps Tagged With: automated testing, Continuous Delivery, Continuous Integration, devops

You Are (Not) a Fraud: Dealing With Impostor Syndrome

February 19, 2019 By Harold Schreckengost Leave a Comment

Impostor syndrome

Impostor syndrome

When I started at CrossBrowserTesting a little over a year ago, I came here with no formal training in much of what I do. The people who I work with are all exceptionally intelligent, creative, and just plain cool people.

I felt out of my element — here I was, surrounded by all these people who I felt were so much more competent than I was. I found myself staring down a dilemma. Everyone here seemed brilliant — everyone but me, at least. I felt I might never stack up, and I was afraid I would cause the people who had taken a chance on me to have doubts that I could successfully do the job.

For a while, the anxiety took a toll on me, and I would be lying if I said that I was entirely free of that anxiety today. I’ve just gotten better at managing it.

For me, part of learning to deal with this feeling was learning more about why I felt this way. Understanding has always been a powerful tool for me to handle things that cause me stress. As it turns out, this isn’t an uncommon thing, it’s so common that it has a name: impostor syndrome.

Impostor syndrome — that feeling that you’re just one slip-up away from being discovered as a fraud, that you really don’t belong somewhere — has been estimated to impact up to 70% of people at some point in their lives. It has even been known to affect people at the top of their industries, such as Tom Hanks and Starbucks CEO Howard Schultz.

“Very few people, whether you’ve been in that job before or not, get into the seat and believe today that they are now qualified to be the CEO. They’re not going to tell you that, but it’s true,” said Schultz.

It feels like this is somehow more common in technology, though maybe we’re just better about talking about these things as a community.

Causes of Impostor Syndrome

While there is no one cause of Impostor Syndrome, there are many factors that play into it.

For example, take what some call the “Facebook effect,” which has existed in the world for ages, but is easiest to understand in the context of social media.

Most of the time we only ever see the positive side of someone’s life — their great relationships, the exciting things they do, their fancy vacations to exotic places. We rarely get to see their failures, their hardships, their struggles, and so it can be very easy to look at someone’s life and immediately think they’re so much better off.

This happens within tech, as well — we see people talking about their open source projects or their “next big thing”, and we almost always miss the context, the hours of hard work, and the stress.

Technology, as an industry, can also make it hard to avoid this trap. Within a company of any real size, there will always be people with at least some variance across their skillsets. Some people know CSS front-, back-, and sideways, while others know Go inside and out.

The potential problem here, then, isn’t so much that there’s a disparity between people, but rather a simple difference in skillsets. I look at the people who write our front-end and sometimes I feel like I must be an idiot, despite the work I do, because a lot of the specifics of what they are doing each day can go right over my head.

In addition to the breadth of skills present in the industry, the stakes in the technology industry can be enormous. It can be difficult to recognize that there are interesting, valuable, amazing companies that aren’t at the “unicorn” billion-dollar valuation level.

When all we hear about are the Facebooks and Googles, and those are where we think the brilliant people all go to work, of course it’s easy to feel like if you’re not there, you’re not a real programmer.

Compounding this is the fact that tech has incredibly low barriers to entry; it’s not at all uncommon for someone who is self-taught to be able to break into the industry. Anyone can end up at the highest tiers of the industry with little to no formal education, and we see constant success stories of people throughout our industry at all experience levels.

While these are positives in our industry, they can have negative side effects on our own mental health — “That person was able to do this, why can’t I? I must be faking it.” It’s a common, and damaging, thought.

Impacts of Impostor Syndrome

While the business impacts of Impostor Syndrome aren’t heavily studied, we can look at it through the costs of mental health.

According to the American Psychological Association, anxiety disorders alone cost the US economy upwards of $4.1 billion dollars each year, and can result in days of work lost per month for those affected most heavily. Depression can be even more costly, with a cost upward of $44 billion per year. While these don’t exactly capture the costs of impostor syndrome, they can help to illustrate the size of the problem, and how much of a business impact it has.

Even outside the direct monetary costs of impostor syndrome, there is a very real business cost — it can stifle creativity and innovation. Imagine that you worked with someone every day who told you that you were stupid, incompetent, or unqualified, regardless of evidence. Most people would end up being less creative, would feel less safe in their own career, and would end up taking fewer risks, which is where innovation really happens.

In other words, people who can’t feel confident in their own abilities can’t work as effectively, especially in an industry as heavy on experimentation as technology.

There’s a very real personal cost to impostor syndrome, as well. For those affected, impostor syndrome often increases their stress levels — this can impact work relationships, personal relationships, and can even spiral into drug or alcohol problems. Careers can be limited by one’s own skewed perceptions.

What You Can Do

There are plenty of things that can be done to help, depending on who you are.

I’m someone who feels this way

  • Find yourself a good, supportive community. A lot of the people in the CrossBrowserTesting office in Memphis are involved with the Memphis Technology Foundation, and we find a lot of support, both technical and personal, in the community. For myself, at least, it has helped me to feel much more confident.
  • Learn everything you can. While learning can be frustrating when you feel like an impostor, the more you know, the more you can avoid that feeling of not understanding something others do. Even better — learn how you learn so that you can get the most out of your time.
  • Teach everyone you can. Chances are, you know a lot more than you realize, and teaching people is a great way to examine your knowledge and learn more.
  • Push your boundaries. While it can be uncomfortable, if you start pushing past your boundaries, you’ll find your needle for where you’re lacking slowly moving. So apply for that conference you want to attend; go to that hackathon that looks interesting — just keep doing things that will push you outside your comfort zone.
  • Take care of yourself, too. Many of the people I’ve known with impostor syndrome end up pushing too hard and burning out. Take the time to do what you need to do, and know that there is absolutely nothing wrong with seeking out mental health treatment, such as a therapist or psychiatrist, when the burden is too much.

I work with someone who might feel this way

  • Teach your colleagues about what you can. Not only can this help them feel more confident, it also has some real benefits for you, such as breaking down information silos and helping you know your team better.
  • Be a role model. Talk about your own struggles when it’s relevant or when you think it might help someone. Take good care of yourself, and keep pushing your own limits.
  • Encourage the people around you. As long as it’s genuine, a little encouragement can go a very long way toward helping anyone, especially those who aren’t entirely confident in themselves. Let people know when they’ve done a good job, when they bring up a good point, and so on. Most importantly, though, even if you aren’t the encouraging type, avoid demeaning or degrading people. Pointing out their mistakes or calling them unqualified might feel good in the moment, but words like that, especially when they’re used repeatedly and over the long term, can really impact a person and their work performance.

I am an employer/manager

  • Encourage people to do new things. Every time someone goes to a conference, presents at a meetup, or learns a new tool, they will likely feel more confident, more productive, and more content.
  • Listen to people. Everyone has a unique perspective, and it’s always nice to be heard, especially when you constantly worry that you’re not good enough. Even if you can’t do something a particular way, having your ideas acknowledged is important and sometimes newer people make great suggestions because they aren’t stuck in the same mental ruts — what I like to call being “unburdened by experience”.
  • Support people. Sometimes, mental health is hard to understand, especially when it’s not something you’ve experienced. Sometimes, an otherwise amazing person may be struggling with it, and it’s important that they have stability and support. Sometimes, flexibility can be helpful, sometimes helping someone with the load they’re under is in order. As long as the people you manage feel that you have their back when it matters, it can go a long way in helping them feel more confident in their work and their abilities.
  • Mitigate toxic influences. Some people just drag a team down by being negative or distracting. Unfortunately, these impacts can be amplified if a team is already in a position where they might lack confidence. By mitigating these impacts — by teaching team members to act or speak with care, by moving roles around, or, in extreme cases, completely removing a negative influence — not only can you improve the mental health of the team, but you can also have far-reaching impacts on your company’s culture and direction.

Impostor Syndrome has a scale. For some people, it manifests as a realization that they’re not the best they could be and they can use it to push themselves to better things. For others, it can become a crippling anxiety that leaves them barely able to function both professionally and personally.

The key to working with this is to know how you, and those around you, work best and try to capture all the strengths of any team, while mitigating the downsides as much as possible. There is no silver bullet, but it’s still important to keep working toward improving the work environment for everyone.

Other Resources

  • Compassionate Coding @compassioncode
  • Open Sourcing Mental Illness (OSMI)  @OSMIhelp

This blog is based on a conference talk I gave in Huntsville, AL in October of 2018, and will be giving again in May of 2019.

Filed Under: Development Tagged With: impostor syndrome

Creating Our New Transcoding Service

November 20, 2018 By Harold Schreckengost Leave a Comment

Transcoding video

Transcoding video

Every time a video is recorded by a CrossBrowserTesting user, it goes through a handful of steps.

The video is recorded in the Flash Video (*.flv*) format because it is cheap to write to. Because of format issues on the browser side, we then transcode the FLV video to a web-compatible standard inside of an MP4 container.

Unfortunately, this transcoding can be computationally expensive, especially when you have multiple videos being transcoded simultaneously. When you have other services running on these boxes, these services will slow to a crawl, impacting the entire user experience end-to-end.

Given our growth, we are routinely seeing more and more of these scenarios where a system becomes overloaded and cascades into others, so we needed to start working to make the services independent of each other.

Investigating Our Options

When we first started looking into offloading the transcoding processes, we were looking at services similar to Amazon’s Elastic Transcoder. These services are versatile, simple, and can scale to effectively any size we could conceivably need.

There’s just one problem — for our use case, these services are prohibitively expensive.

CrossBrowserTesting users record nearly 300,000 videos in a month, every month, and that number is only expected to grow as we do. With the pricing model of Elastic Transcoder, and similar services, we would be charged by minute of video transcoded.

Since almost all of our recordings are considered HD by Amazon’s standards, Elastic Transcoder costs 3 cents per minute of video output. While that doesn’t sound like a lot, when you have 300,000 videos, that will add up quickly. So, unfortunately, Elastic Transcoder and its ilk are not viable options for our requirements.

Fortunately, our needs are pretty simple — all we really need to do is transcode video to the input resolution, and maintain the same codecs and container format every time. There are very few changes between transcoding runs.

So, we ended up building our own scalable transcoding service for customer videos.

How It All Works

A week after starting this search, we were ready to start sending customers’ videos to the new transcoder.

When a user finishes recording a video, the FLV file created is then sent to Amazon S3 for storage. Inside S3, there is the ability to create notifications any time a file is created inside a bucket or, as in our case, any time a file matching a pattern is created, and these notifications can be sent to Amazon’s Simple Queue Service (SQS), a simple message queue.

These notifications will sit in the queue until they are handled, and they are pulled by a continuously-running program on some number of EC2 instances. This program is written in Python; and pulls the notifications, downloads the file from S3, transcodes it (using ffmpeg), and then uploads the result to S3.

A call to our API marks the video as completed, and the transcoding of a single video is done. This program is set up to handle a certain number of ‘slots’ for transcoding, allowing a single box to transcode multiple videos simultaneously.

How Not to Scale

When we first built this service for ourselves, we had set it to work on two t2.xlarge EC2 instances. We found that these instances provided a good balance of cost-efficiency and speed. After a little bit of trial and error, we settled on this number because it handled our normal load handily, and left us plenty of room for expansion.

At least, that’s what we thought. One day, we noticed that videos were taking an exceedingly long time to become available to our users. This was due to a massive spike in the number of videos in our queue; while our normal load was easily handled by two of these boxes, our load during this incident peaked at over 1,100 videos waiting to be transcoded.

Our two boxes simply couldn’t keep up with our demand, so the queue grew steadily, and I eventually stood up two more transcoding servers. Thanks to the way these are deployed, it doesn’t take long to set up more of these instances, but that was still more work and took more time than I would like to dedicate to handling an incident like this.

After this incident, I set up an autoscaling group for these transcoding boxes. Because it was designed with elasticity — the ability to start and stop additional horizontally-scaled services without causing issues — in mind, this was really simple. All I had to do was take a running box, shut it down, and create an image, which could then be applied to a launch configuration, which in turn is set up as the definition for a scaling profile.

This worked great based on the CPU utilization of the boxes, but one day, I received an alert that it was beginning to creep up again, despite the auto-scaling. As it turns out, under a specific set of circumstances, the boxes end up in a weird state where their CPU utilization never goes above 50% even when they’re transcoding on all slots.

At this point, I realized that CPU utilization, in addition to being fallible, is also a really bad proxy for what we really want out of the transcoder — we don’t care how busy the boxes are, only that the videos are being processed in a timely manner.

After a little experimentation, we now have the autoscaling set up to track the number of videos waiting in the queue. By scaling with the metric we are actually concerned about, we will never end up in a situation where a bad proxy for a metric (in this case, CPU Utilization as a bad proxy for how many videos need to be processed) will cause the system to lag behind.

The end result

In the end, our new transcoder system runs, on average, about 1.3 t3.xlarge processes in a month, and can scale as needed to match our customer demand.

This is a significant savings over services like Elastic Transcoder, while still easily meeting our needs by only doing the things we really need it to do.

It does one thing — transcodes video for the web — and it does it well, allowing us to focus on making the system better every day instead of putting out fires.

Filed Under: Product Update Tagged With: devops, product, transcoding, video

Try CrossBrowserTesting

Everything you need for testing on the web. Browsers & Devices included.


  • Grid In The Cloud
  • Simple CI Integrations
  • Native Debugging Tools
  • Real iOS and Android
  • 2050+ Browser/OS Combinations
  • Intuitive REST API

Start Testing Today

Want Testing Tips?

Want the latest tips & strategies from industry experts right to your inbox? Sign up below.
 

Join Over 600,000 Testers & Developers And Start Testing Today

Learn more Free Trial

Features

  • Live Testing
  • Selenium Automation
  • Automated Screenshots
  • Screenshot Comparison
  • Local Testing
  • Real Devices

Solutions

  • Automated Testing
  • Visual Testing
  • Manual Testing
  • Enterprise
  • Internet Explorer

Resources

  • Browsers & Devices
  • Blog
  • Webinars
  • Integrations
  • ROI Calculator

Company

  • About Us
  • Careers
  • Plans
  • Terms of use
  • Security

Support

  • Help Center
  • API Docs
  • Schedule A Demo
  • Contact Us
  • Write For Us