Engineering

An Introduction to Machine Learning – Our Supply Traffic Filter

An Introduction to Machine Learning – Our Supply Traffic Filter

With self-driving cars, voice-controlled personal assistants, and recommendation engines built into your favourite streaming services, it seems like we are only at the tip of the iceberg when it comes to Machine Learning (ML). For many of us, it is increasingly governing much of how we live and choose to spend our time. 

Ever passionate about the technology that defines our world, our engineers at Index Exchange have leaned heavily into ML, specifically in the realm of supervised learning. Our ML-enhanced pipeline, “Supply Traffic Filter,” is a good example of how we’ve invested in the space and we’re here to explain exactly how we did it. 

A graphic displaying the process of following and testing a Machine Learning Challenge.

Pick a Challenge

The first step in an ML project is defining the challenge. Not all problems are good candidates for ML. Consider: 

  1. Solving the problem manually: Does it require a lot of time? Involve extensive rules?
  2. Do outcomes change so often it is hard to maintain?
  3. Is there difficulty getting insights into the problem because there is too much data volume?

At IX, one such challenge was managing increasingly massive scale. A good example of this is when COVID hit, we instantaneously saw a huge increase in traffic to our edge at the same time as our data center technicians were grounded with travel restrictions. We needed to determine: could we continue scaling traffic while being more efficient with how we handled it, with little or no impact to the publishers and advertisers who relied on our platform? Predicting the value of incoming auctions would be challenging for us due to our huge data volume. When combined, all of these factors suited an ML challenge, and specifically a supervised machine learning approach.

You can read more about how we approached the situation here

Select Features

We now have a machine learning challenge to solve. Exciting! Next is choosing the best features. An ML feature is a measurable property or characteristic. Feature selection is crucial, as a ML model’s performance is highly dependent on it. The right features can significantly improve model performance and decrease training time. Conversely, irrelevant features can adversely affect model performance. So, how do you know if a feature is suitable or not for your needs?

Identifying Features

Start by finding all possible features. When IX first started working on Supply Traffic Filter, researchers were faced with a multitude of potential features, so they reached out to domain experts within the company to help us gather the list. 

Pre-processing the Feature Set

In a perfect world we would have clean data, but in reality, there is a lot of noise and randomness. As a result, ML engineers typically need to spend a lot of time understanding and cleaning their data.

Some features are textual, but ML models expect numerical values. It is the engineers’ responsibility to translate these features into a numerical format, most commonly by one hot encoding or Integer encoding techniques. One hot encoding often performs better than integer encoding, but comes with a larger training time cost. For small datasets, one hot encoding is recommended. For large datasets, integer encoding is superior.

Reducing the Feature Set

Next, we try to reduce the number of features in a process known as dimensionality reduction. Having a lot of features does not necessarily improve your model’s performance. These are two key reasons why the number of features need to be limited:

  1. Generalize the model. While a feature may look correlative to an outcome, this correlation might not generalize to future data. 
  2. Reduce training and prediction time. A fewer number of features enables faster model training and outcome prediction.

Okay, so we understand that the number of features must be limited, but how do we reduce our feature dimensionality? While there are many different techniques, we suggest going with “Feature Importance.” This technique scores the relevance of a feature to an outcome. In essence, this means that we should keep the high-scoring features and drop the low-scoring ones. 

In our case, the Supply Traffic Filter project started out with 50 features but we were able to narrow it down to just 11 to start.

Select a Learning Model

At this stage we have to choose the learning algorithm. In supervised learning, the two main types of model algorithms are: 

Algorithm Type Predicts
Classification Target classes
Regression Value-likelihood

For our Supply Traffic Filter challenge, our goal was to classify the “no-bid” auctions before they happen, making it a classification problem. Interestingly, however, our research team ended up choosing a regression model. We want to filter traffic by a dynamic amount based on load. If our exchange nodes are heavily overloaded, we need more aggressive filtering to keep them operational and performant. Using the regression model, the model scores each auction’s predicted relevance to our buyers. Based on this score, our servers can dynamically filter traffic proportionally to server active volume.

Selecting Machine Learning Algorithms

In supervised learning, there are many widely used learning algorithms. When picking one, it is important that we have a good understanding of feature characteristics and pros/cons of each algorithm. 

Supervised Learning Algorithm
Pros
Cons



Logistic Regression


  • Simple to implement and understand

  • Low cost on hyperparameter tuning

  • Fast to train


  • Poor performance on non-linear data

  • Sensitive to noise




Decision Tree


  • Scales well with large training data size

  • Easy to visualize and explain


  • Sensitive to overfitting

  • Sensitive to data




Support Vector Machine


  • Robust against outliers

  • Great performance with small training data size


  • Slow to train when data size is big

  • Sensitive to hyperparameters




K-Nearest Neighbor


  • Simple to understand and implement

  • Low cost on hyperparameter tuning

  • No assumptions about the data


  • Slow to infer for large datasets

  • Doesn’t scale well with large dataset




Neural Network


  • Works well with non-linear datasets

  • Works well with high dimensional datasets

  • Large body of academic research


  • Computationally expensive to train

  • Difficult to explain the model

The next step is to look at our dataset and find the algorithm that is best suited to it. It is not enough to just look at the pros and cons.

For example, IX transacts tens of billions of auctions every day, which means ‘Support Vector Machine’ and ‘K-Nearest Neighbor’ can be eliminated. These algorithms are slow to train with large datasets. Next, many of the features in our dataset were categorical, and therefore non-linear. This allowed our research team to eliminate the ‘Logistic Regression’ algorithm (optimized for linear datasets). Lastly, we wanted an algorithm that was easy to explain, eliminating ‘Neural Network.’ This left us a single learning algorithm choice: Decision Tree.

Train

If you followed along until this step, you now have the features and learning algorithm required to train a model! ML engineers spend much of their time cleaning up the features and finding the right training algorithm. With that out of the way, we can get to work training and testing the model.

A graphic displaying the breakdown of datasets into 3 different groups: training, validating, and testing.

Prior to training a model, we need to split the datasets into 3 different groups: training, validating, and testing. This is so that we can generalize the model and collect unbiased performance scores. A common split is for 70% of the dataset to be used for training, 15% for validation, and the remaining 15% for testing.

Training and Tuning the Model

The next step is training and optimizing the model to maximize performance. Model performance can vary depending on something called hyper-parameters. The learning algorithm defines the high-level learning style, whereas the hyper-parameters define the details of how the model should learn. At IX, we used the ‘Decision Tree’ model which happens to have a risk of “overfitting.” Tuning these hyper-parameters allows us to reduce statistical noise while maintaining the model’s ability to follow the statistical trends in the data. The key hyper-parameters for Decision Tree are:

  1. Minimum leaf samples: Minimum number of samples within a leaf node. Too low and the model becomes overfitted (where the model memorized the training data including noises). Too high and the model becomes underfitted (where the model failed to capture the underlying trend of the training data).
  2. Maximum depth: This sets the maximum depth of the decision tree. It prevents the tree from branching further if the depth reaches this value. Too high and the model will become overfitted. Too low and the model becomes underfitted.

How do we ensure that the hyper-parameters are optimal and generalized? Cross validation and grid search algorithms are validation techniques that can be used to intelligently tune hyper-parameters.

Testing the Model

We now have a trained model that best generalizes the training and validation datasets – so it’s time for a test. For Supply Traffic Filter, as we wanted to filter out as many no-bid auctions as possible without losing predictive power, we used 2 KPIs to measure model performance:

  1. Recall Rate: How correctly is the model predicting valuable transactions. It is ok to let through irrelevant auctions, but not to filter auctions that actually would have received a bid.
  2. Auction Reduction Rate (Predicted Condition Negative Rate): Percentage of auctions the model is dropping, i.e. the server resource savings we can expect. 

We created a model capable of dropping a double-digit percentage of auctions with at least 99% recall rate. This is noteworthy because it means we can identify a large number of unproductive auctions before they take place with near-zero impact to our partners. 

Now, depending on the market, the characteristics of the data can rapidly change over time. What is valuable today may become non-valuable tomorrow, and vice-versa. ML engineers must always define the retraining frequency, especially if training is done in batches on offline servers. If a model’s predictive power decreases rapidly, we recommend dropping features that are sensitive to market changes.

Go Live

We have built a model with great predictive power and generalizes well to the data. We are steps away from enabling it in production. 

First is building a pipeline to collect unbiased training data. Once live in production, we want to prevent the model from biasing the training data (i.e. erroneous assumptions), so this data should be independent from the effects of the model. In the Supply Traffic Filter project, the outcome of auctions will depend on the prediction once the model starts to filter auctions; to remove training bias, we created a separate pipeline that bypasses our filtering process. The bypassed auctions are then used for training new, future models.

Lastly, we need to monitor model performance in real time. A sudden event can, for instance, trigger a market shift. When this happens, a new model may need to be redeployed on short notice. IX uses the bypassed auction traffic to monitor model performance in real time. This allows us to track model degradation and take immediate action if we see unexpected or problematic behavior.

A graphic displaying the Ad Request Auction Process through the use of Machine Learning.

Conclusion

Congratulations, you’re operational! 

While there is much work involved, with the right machine learning development process, many can benefit from their own data in new ways. Machine learning is and continues to be a high priority for our engineers at IX and we encourage those who have taken the time to read this blog to spearhead new ML projects themselves!