Python Flight Search

Posted on Tue 29 March 2016 in hacking

Like many people, I enjoy travel. With family and friends all across the United States and a home base in Berlin, it's fairly easy to find a reason to travel -- either globally or within the EU. That said, what I find more difficult is to determine what's the best way to get from one place to another. I have used many flight trackers before and generally was happy with the results, but I always wondered if there was more to the flight matrix...

As I was planning a potential visit to Cuba, many of the "normal" sites were lacking available trips. Since I'm based in Berlin, it's also easy (and cheap -- thanks budget air!) to fly out of Frankfurt, Paris, Amsterdam or London. This usually means setting up countless alert variations on numerous sites.1

Being a person who has written some scrapers in her time, I thought I'd at least write one to compare a few of the popular flight search sites. I was curious to know what different options the sites gave and compare if the same flights were listed with different prices.

Diving into GitHub

It's always good to see what's out there when you're building something new -- just in case what you're building already exists (or mainly exists). Upon some searching I came across several flight trackers / scrapers written in Python.

FlightScanner's Python SDK looked great. I applied to get an API Key, and so far haven't heard back.2

I found a github flight scraper from @mayanez, but after installation, I realized it no longer worked. This is a big problem for scrapers, since they usually need constant maintenance to function properly. Every time an API changes, it could render your project obsolete.

I located the Google API to unearth the Google Flight Search (purchased from Matrix) called QPX Express. I registered and created a client on my Google Cloud Developer Console (hint: you must search for it to show up), and perused the search documentation. It's worth noting this API charges money after the first 50 requests per day.

I was interested in comparing the Google Flight Search with some of the popular ones here in Europe. Momondo was sadly out with no API and a strict "no automation" policy in their Terms of Service. With some luck, I found SkyPicker (another great site for low fare searches) does have an API with some documentation.

I also found, a popular aggregator here in Germany, has a simple search and no restrictions on automation. I was able to write a scraper to parse responses on their site.

I've amassed the code I wrote in a repository on Github. To note, there is a lot more information available on these API requests, so you could easily extend it to add filtering for your favorite airlines / airports or your least favorites. I've included a script I used to pull the results into a Panda's DataFrame for easy comparison and analysis.

What I found

The first thing I noticed was that, although there were some duplicates, there was definitive variance. (Aha! See??? I'm not crazy!) Some of the sites really offtered quite a few mixed carrier flights (with usually cheaper but longer routes), while others focused on direct flights. The duplicates I saw were always listed with the same times and prices (Conspiracy theory thwarted... 😢).

I found a pretty large variance depending on the search input. For the most part, I was searching for flights out of Berlin, attempting to go long distances (America, Asia, Carribean). Your mileage may vary (HAAAA..😂😂😂).

I also wondered how travel time compared to price (the eternal time vs. money question). I assumed this comparison would be a linear negative correlation, with price decreasing as travel duration increased. I was wrong.

Flight Duration versus Price

In addition, I looked at mean prices across time of day buckets. I like to take morning flights so I can just get them out of the way… for this particular flight search (Berlin to San Francisco), that preference is costly:

departure_tod (mean price)
early am        2495.080635
morning         2459.062500
afternoon       2392.573200
evening         1663.772432
late evening    1544.032000

There's plenty of other questions to ask and answer with this dataset, so feel free to play around with your own searches or let me know if you have anything in particular you'd like me to explore.3

For now, I have a solid way to compare across a few aggregators and some new airline price search tools going forward.

  1. My Feelings about this. 

  2. To be fair, they do have a note that they get thousands of requests and cannot fulfill all of them. If you have a business need for their API, I'm fairly certain you could get an API Key much faster. 

  3. I'm hoping to write some price comparison over time blog posts from this data, so let me know if you have any specific questions.