Microsoft revamps machine learning tools for Apache Spark

The new open source release integrates Spark with Cognitive Toolkit and other Microsoft machine learning offerings

Serdar Yegulalp Oct 25th 2018

Microsoft has revamped its MMLSpark open source project, the better to integrate “many deep learning and data science tools to the Spark ecosystem,” according to the notes on the project repository.

MMLSpark, originally released last year, is a collection of projects intended to make Spark more useful in many contexts—mainly machine learning, but also in some general-purpose ways.

MMLSpark wraps all these functions in a set of APIs available for both Scala and Python. The repository contains some quick-start examples, such as using web services in Spark, using OpenCV on Spark for image manipulation, and training a deep image classifier using Azure VMs with GPUs.

MMLSpark itself can be installed on existing Spark clusters as a package, used in the Databricks cloud (or a Databricks appliance on Azure), installed directly in an instance of Python or Anaconda, or run in a Docker container. Integration is also available for the R language, but right now only via a beta auto-generated wrapper.