GC-QA (2018-2019)

TL;DR: I made a QA tool for a private tech firm in four weeks, built using Python, HTML, and a bunch of other technologies, and it’s still used today. Then it got upgraded a whole shit ton one year later.

This is a retype of the previous GC-QA post. Sorry, no fun story with 30-word paragraphs!

Over the past two summers, I’ve been extremely fortunate to land an internship at a tech firm in New York City. While no company names are being revealed here, this is what you need to know:

  • This company provisions iOS devices for healthcare and other industries
  • A launchpad is defined as a hub for devices to connect to
  • A workflow is a set of instructions to deploy on a device connected to a launchpad
  • A deployment is defined as a workflow running on a device.
  • An activity ID is an ID connected to a deployment.
  • Since this software is mostly cloud-run, launchpads act as a gateway to devices to run those workflows on.

If you know the company I’m talking about, congratulations, you get 1 internet point!

 

For the 2018 internship, I was tasked with developing a tool to perform quality assurance (QA) on sets of workflows for long periods of time (which is great for emulating say checking in and out a device, or just making sure that a workflow works over a series of thousands of deployments), and then subsequently summarize this data in charts, which included which workflows failed, which launchpads failed, etc.

In 2018, I made the first version of this QA tool, which took about 90 hours to build over the course of 4 weeks. It was an impressive feat at the time, especially since I was using tons of new frameworks and services.

When I got started in 2018, I first needed to figure out how to use their API. Simple, easy, did it in a few hours. That’s how I roll. Next!

After that first day I got busy making a very rudimentary version of the tool in the terminal. Why exactly wasn’t there a GUI at the start? Not sure. But, sooner than later, a user could input an API key, a single workflow ID, and the tool would deploy to all those devices, with no clue how they finished. I encountered a bit of an issue – how do I check the status of these deployments…especially if there’s more than 1 device?

Multiprocessing to the rescue!

 

I had never used multiprocessing before, so this was completely new stuff to me. But, sooner than later I got the QA tool to start checking the status of these deployments. The premises was simple: Check every 5 seconds on the status by querying the API. If the deployment was ongoing, wait another 5 seconds. If it failed, return a failure, or if it succeeded, return a success. I also had to wrestle with variables across multiple threads..but I hear that multiprocessing has a manager that can set values across threads. Hmm…maybe I used that in my program. Not sure.

In other news – I was also dealing with “write collisions”, which shouldn’t of happened but apparently they did! This was throwing off the total success/failure counts, so my solution was to sleep the script for a random period of time. I didn’t know about multiprocessing locks. I’m sorry.

 

After this first week or so, the QA script was taking shape. Sooner than later the script was able to detect what individual devices to run the QA automation on, and you could manually set the amount of iterations of QA – pretty important if you want to leave QA running overnight! I also added support for printing final results in the terminal, so you could see which device models, iOS versions, and launchpad versions were failing. I thought the script was done!

And then I had a talk with the CEO that went something along the lines of this:

“I think the QA tool is done!” – Me

“Wow! It looks great, but I think controlling it from a terminal isn’t the right way to go. Maybe we should control it from a web browser?” – CEO

“Okay…I’ll try that…” – Me

I proceeded to die inside because I had approximately zero web development knowledge at that time. I then proceeded on figuring out how to run Python in a web browser.

I got really scared at this point, I had no idea what I was supposed to do. But thankfully, I’m an intern! So the architecture was reconsidered.

 

The new architecture? A Python program with a GUI for inputting the settings and such, which also did the backend QA work, and then a web viewer to view the results. This was a good compromise between new skills, and what I already knew, and most importantly, what I was comfortable doing.

 

So, I got started to work. I decided to stick with appJar to use as the GUI for the program, as it’s what was most comfortable to me back in 2018 (yes, Python is not meant for GUI applications). This GUI would serve as the input method for all the details for the QA run. After the details were confirmed, the QA would start on the same script, and then a browser with the results viewer webpage would launch.

To keep track of all this data, there needed to be a database somewhere in the middle. The plan at first was to use Firebase, but it ended up switching to DynamoDB.

Pretty quickly, I got things working with storing data in Dynamo, and getting the GUI portion of GC-QA working as well. The GUI isn’t terribly tricky – just a bunch of inputs for API keys, workflow IDs, and extra settings that rolled through in later iterations of GC-QA. There’s also a bunch of data checking, so the QA tool will get mad if you made a mistake with the API key entry, or maybe setting the wrong QA type.

 

All was going well, and now it was time for the web portal.

 

The web portal is a simple HTML page that lives on the client’s machine. The entire QA tool is client-side, of course except for DynamoDB. When development first started on the web portal, I realized there needed to be a way to distinguish one QA run from another, so that’s where the UUID system came into play.

In just a few days, the web portal was working swimmingly. I was able to render charts with the Google Charts API, something completely new to me, and also load in data from DynamoDB as well. But, with the web portal I had to get used to the notion of callbacks, since at this time, 99% of my coding experience was Python. Python code is, by nature, blocking, e.g. you made a web request, the script waits for a return, then moves on and executes lines below it. But with JavaScript, code is generally non-blocking, e.g. you make a web request, and while waiting for that return the JS interpreter continues to execute lines of code below it.

Once the web viewer really began to take shape, I coded in some more interesting features into it. Depending on the status of the QA run, the web viewer would automatically refresh the entire page (I do understand that I could of called the functions that got Dynamo data and rendered the charts…but oh well!). Additionally, information about the QA run was added, like which workflows were ran, and when the QA run started and ended.

 

At this point, for some perspective, only 2 of the 4 weeks had passed. This was seriously becoming an impressive feat in general – for myself, given I was 15 at the time, and had worked for about 20-40 hours. As testing of the QA tool began, well, there were a lot, and I do mean a lot of unnoticed features, bugs, and just general errors in the QA program.

Commence the third week, where I grew a third arm which included a bug swatter – just for the price of $2499! Just kidding, but I was fixing bugs left and right in the third week. I had to add support for status messages I’d never seen before, and also listening to coworkers on feature suggestions for the QA tool.

Another stupid overlooked feature was that the web viewer didn’t automatically resize charts depending on screen size. Yes, I didn’t know about Bootstrap! But classic me, I added in support for bootstrap within a few hours. That’s just how I roll, neeextt!

 

More cool QA features! First off was different QA types, which was important for emulating certain environments where multiple workflows are run a lot. So, I added a feature where you could select QA type: Single for a single workflow, Sequential for running many workflows in order, or Random where the QA tool randomly picks a workflow from a defined list and deploys it to a device.

Time-based runs also came into fruition, but was a slight challenge to implement, especially in the web portal. On the Python end, I just had to set the iteration count to 99999, and check if a time limit was reached to stop QA (if the QA type was Sequential, the tool will wait for the full sequence of workflows to finish before ending). On the web viewer side,  I let the Python script update the time remaining, but long workflows meant that there wouldn’t be an update on the web viewer side for many minutes. To fix this, I used moment.js to display a countdown until the run was finished. When the QA run was done, the status switched to “QA wrapping up”.

 

As the final week came around, I tied a few loose ends to make the tool the best it could be. I coded in Run History, which is pretty self-explanatory, just a history of QA runs. When I coded this in, I set an arbitrary limit of the most recent 20 runs…but more on this later. I also added in more data categories…well one to be exact, data for which workflow IDs were failing.

The final few days mainly involved me fixing minor bugs, adding really minor features, and adding tons, and tons of code comments and documentation. And then 4 weeks were up! The first internship was done.

While I was gone, I got back emails and texts saying that the QA tool was still being used. I really reflected on how great this internship was, and although there were some flaws, I was still really proud of myself.

 

And cue 2019, I needed an internship over the summer. Turns out I really love keeping things the same (this also explains why I got the same lunch, at the same place, every day, 19 times in a row (I also started eating lunch this year…I might have forgot that last year)), so I went back. Great decision, since the QA tool was a mess. It needed to be fixed.

The issue? Numbers were way, way, way off. The QA tool itself keeps a total number of deployments count locally, but this number assumes that every device that was available at the start of QA is always available, and that every device could be deployed to.

 

The first week back, my brain was in a swirl. How on Earth is it not counting properly? Are the thread checkers clashing with each other? No, but let’s add an exclusive lock just in case. Is the in operator in Python working? Yes, of course it is. What is returned as the deployment status?

Oh…

Device could not be deployed to

Those 6 words that are returned took about 10-20 hours to fix. Then GC-QA became 250% more reliable.

And then came the wave of extra features that should have been implemented in GC-QA v1. Here are some cool features I added in 2019!

Emailing results! Actually a pretty cool feature! It uses Selenium, Firefox, the Mandrill API, and Geckodriver to work it’s magic. When I first started working on the emailing feature, I had a mini-relapse of my brain during GC-QA v1, since I had no idea what was going on. But, after a few days mail was being sent out of GC-QA like there was no tomorrow.

An actual logger! Yes, an actual logger. GC-QA v1 used if verbosity is True statements, now a full on logger is used…and it has file logging for easy debugging!

Serial Number result data! Another actually useful feature to see if one device is failing more than the others (maybe due to faulty hardware or another reason). Additionally, I had to bake in backwards-compatibility code to make sure that runs without serial number data didn’t show the serial number chart, and cause lots of errors.

Resetting Smart Hubs on failure – Cool feature. It’s experimental. It was never fully tested. It might work. Neeeeeext!

Run History improvements out the window – GC-QA v1 had a stupid limitation where run history was limited to 20 runs. GC-QA v2 says no more to that, and can store unlimited run history. Additionally, GC-QA has methods to convert an old (v1) run history database to a new (v2) run history database.

More specific model data: If I can code this feature in quickly enough, GC-QA v2 should have more specific model data. GC-QA v1 was designed to only show iPhone, iPad and iPod as models, but with GC-QA v2 you’ll be able to see specific iDevice models.

The QA tool can’t be used on the same machine as a launchpad! This was actually a major issue that resulted in another project I was working on to get wrapped up…ungracefully. I spent a few days trying to figure out why on Earth the QA tool was crashing all the time. Turns out, for whatever dumb reason, you can’t run the launchpad and the QA tool together. It just started happening randomly, I don’t know why, Python on macOS sucks, the end.

I need to stop developing on Linux! Everything works too well on Linux. I hear macOS is a great platform for development since there’s a bajillion quirks. Thaaaaat’s Apple for ya!

More small features you won’t care about:

  • Being able to add a description to a run
  • Run UUIDs were shortened down from really long to 8 characters
  • In the analyzer and emails you can see what version of GC-QA was used to complete the run
  • There’s like actual versioning
  • requirements.txt file for easy third-party library installs

 

Basically, the new GC-QA is miles better than what I produced at the end of 2018. But hey, I know more code, I’m a more advanced guy, and I’ll laugh at myself one year from now about what I coded this year.