Machine learning isn’t confined to science fiction movie plots anymore; it’s fueled the proliferation of technologies that touch our everyday lives, including voice recognition with Siri or Alexa, Facebook auto-tagging photos and recommendations from Amazon and Spotify. And many enterprises are eager to leverage machine learning algorithms to increase the efficiency of their network. In fact, some are already using it to enhance their threat detection and optimize wide area networks.
As with any technology, machine learning could wreak havoc on a network if improperly implemented. Before embracing this technology, enterprises should be aware of the ways machine learning can fall flat to avoid setting back their operations and turning the c-suite away from implementing this technology. Roman Sinayev, security intelligence software engineer at Juniper Networks, cites ways to avoid the top machine learning missteps.
It’s amazing what a computer will consider important that a human will immediately dismiss as trivial. This is why it’s imperative to consider as many relevant variables and potential outcomes as possible prior to deploying a machine learning algorithm.
Take for example a model trained to separate images of vehicles into two categories – trucks and cars. If all the images of trucks were taken at night and all the car photos were taken during the day, the model would determine that any image of a vehicle taken at night must be a truck.
Addressing key variables and outcomes will help diminish the possibility of unwanted and unexpected behaviors of the solution.
In order to build a trained statistical model, one has to understand the origin and collection of the data being analyzed. This information is critical to determining the variables and potential outcomes that influence the algorithm’s performance.
Additionally, if a model is misclassifying data, it’s possible that it’s because the model wasn’t trained on the best representative data needed to have an ideal solution.
Producing a useful model comes down to training data structure and quality.Before releasing machine learning into the enterprise, data scientists will test an algorithm model with data sets to ensure its performance. The data has to be diligently visualized and the whole data pipeline monitored as new data is being added for self-training. Data scientists may try to test a model as quickly as possible and use too few testing data sets that don’t represent the information the algorithm will encounter in the real world.
It’s critical to have enough data for the selected variables to be weighted as this properly tests the algorithm model. Feeding more data during this phase helps improve performance substantially and ensures that once in a production environment, the machine learning project truly enhances operations.
A project’s final goal may create new obstacles that can lead to potential blunders. In one famous example, a large company launched a social media bot designed to mimic language patterns of a teenager and evolve from its interactions. Users overloaded the bot with controversial topics, which it began to repurpose as part of its learned behaviors – leading the company that launched the bot to shut it down in less than 24 hours.
Not every machine learning project will be so public or give users open access to manipulate data, but awareness of the environment the algorithm lives in will prevent potential blunders.
When testing the model for performance does not yield the expected results, there are two options – design a better learning algorithm or collect more data. Adding more data helps engineers understand performance limitations. If it is easy to collect more data, continue feeding it to your algorithm to see if you achieve the correct outcome without having to do a redesign.
One type of algorithm that has recently been successful in practical applications is ensemble learning – a process by which multiple models combine to solve a computational intelligence problem. One example of ensemble learning is stacking simple classifiers like logistic regressions. These ensemble learning methods can improve predictive performance more than any of these classifiers individually.