Flight Finder (2019)

This project has a visual representation available on my Portfolio.

https://portfolio.owenthe.dev/flight-finder


Flight Finder is a proof-of-concept experiment to automatically solve the traveling salesman problem via brute force, using real-world data for pricing. It uses a combination of Selenium and the Google Flights website to find the cheapest flights to take.

 

The simple premises of Flight Finder is that a user enters a starting/ending airport, and all the airports they want to pass through on a journey. The user provides a date for when the flights should be calculated for, and at the end of it all, the cheapest flight combination is displayed.

In the initial stages of development, I tried finding a flight API to no luck. All of them were too pricey for this experiment, or just provided too much functionality. I then settled on using Selenium on Google Flights to automatically find the best flights possible. The way this manages to work is because Google Flights has some static URL parameters that will control the website, and a convenient class that contains the cheapest flight price.

During development, I also realized that I’d need some sort of cache to prevent a certain flight leg from being queried multiple times. For instance, calculating JFK-LAX-SEA-IAD-JFK and JFK-LAX-IAD-SEA-JFK use a certain flight leg (JFK-LAX). The cache would make it so that in the first calculation of the JFK-LAX leg, Google Flights would be queried, but in the second calculation, it would use the cached price.

Another quirk in development was being able to catch when Google Flights results in a no flights available error. This happens most if you select some urban but not popular destinations, and set the filters to nonstop flights. With this happening, I also added in a set of curated filters into Flight Finder, along with instructions on how to import filters if desired.

Once all of this is done, Flight Finder starts looping through all the possible combinations of flights, and away it goes! At first, I coded Flight Finder so that it would calculate all the possible combinations beforehand. This is fine, up until about 10-11 airports, when the total number of combinations reaches a very high number and you run out of RAM. Apparently, looping through all the possible combinations calculates it on the fly, so RAM shortages don’t really become an issue.

 

However, because we’re doing the traveling salesman problem via brute force, the Big O Notation for this program is n!. If you’re not a computer expert, I’ll put it this way. If a user inputs 7 airports to travel to, Flight Finder calculates 5,040 possible legs, which takes about 1 minute. However, if the user inputs 9 airports to travel to, Flight Finder calculates 362,880 possible legs, which takes about 10 minutes to calculate.

At 11 airports, the number of possible legs goes up to 39.9 million, which is practically the highest Flight Finder supports, as running these calculations takes about 40 minutes.

At 20 airports, the number of possible legs goes up to 2.43 x 10^18 combinations. At 60 airports, the number of possible legs starts reaching the number of atoms in the universe. This is the issue with the brute force method – past about 15-20 cities, it becomes pretty much impossible to calculate efficiently.

 

Of course, Flight Finder is coded in Python, so we get that to deal with as well. If you’re not aware, Python is a single-threaded language, so it only uses 1 CPU core, and it cannot take full advantage of the multi-core processors in computers. I’ve considered adding multiprocessing support to this program so that it can use all 4-cores, but I don’t know how feasible it is, especially with overhead. I’ve tried using Cython (Python to C converter) on the program and saw about a 1.3-1.5x performance boost, but nothing crazy.

 

The traveling salesperson problem is a really cool piece of math. It’s a question that anyone can understand, but has so much underlying math to it in trying to optimize algorithms to reduce computational time and get as close to the real answer as possible.