Sky turns to machine learning to shut down Premier League piracy

The UK broadcaster Sky has turned to Google Cloud's BigQuery data warehouse and machine learning algorithms to investigate web traffic and shut down pirate sites in real time.

Scott Carey Jul 30th 2018

The UK broadcaster Sky has turned to Google Cloud's BigQuery data warehouse and machine learning algorithms to investigate web traffic and shut down pirate sites in real time.

Speaking on stage during Google Cloud Next in San Francisco this week, Mohamed Hammady, CTO at Sky UK said: "We invest close to $8 billion (£6 billion) a year in content. A major part of this investment is buying sports rights and the crown jewel is the broadcasting rights of English Premier League matches."

Sky pays $1.6 billion (£1.2 billion) a year for the rights to 126 Premier League matches, and piracy is a clear threat to that investment.

Following the landmark High Court decision in 2017, which has been boosted earlier this month to stop streams at the server level, internet service providers (ISPs) are responsible for shutting down illegal sites streaming Premier League football.

"Unfortunately, like in any industry, there are bad guys trying to illegally stream football matches," Hammady said. "This can harm the sports industry and can not be tolerated."

However, the court order didn't solve the large scale technology challenge of identifying and shutting down these sites in real time.

Sky, which is in the unique position of being an ISP as well as a sports broadcaster, turned to Google Cloud technology to solve this problem.

The team at Sky collected its NetFlow traffic information as a means of "sampling the traffic on our core network," according to Hammady. Over the course of a year this produced 500 billion data records, so Sky needed a highly scalable data warehouse solution to process this.

"Using BigQuery and an in-house algorithm - which cost $10,000 (£7,500) to develop - we are now able to continuously study traffic patterns with an always up to date list of suspect pirate sites," Hammady said. "Once they have been confirmed as illegal they are shut down.

"The time to run the query on Google Cloud is less than 30 seconds and costs 23 cents for every run, these are the 23 cents of my PnL [profit and loss] that I am most proud to spend," he said.

"The result is a phenomenal reduction in pirate sites in the UK."

This experience helped convince Sky to turn to Google Cloud for its broader data warehouse needs, according to Hammady.

"This success has opened the door for a wider collaboration with Google Cloud as we have decided to build our data lake on the Google Cloud Platform," he said.

A key part of this internal data strategy is driving more personalisation for customers by joining up all of their data in one place.

The aim is to "progressively personalise every interaction to be quicker and more relevant to individual customer needs," Hammady said. "This ranges form content recommendation to offer management and troubleshooting on service calls, across all channels."