Does VMware's Approach Signal That Big Data Is Ready for Prime Time?

Thor Olavsrud June 14, 2012
Does VMware's Approach Signal That Big Data Is Ready for Prime Time?
Deploying, configuring and maintaining Hadoop clusters is challenging and time-intensive, but VMware aims to change that with a new open source project that virtualizes the Hadoop cluster and makes it ready for the cloud.

"Hadoop is a Big Data processing de facto standard," says Fausto Ibarra, senior director of product management, Cloud Application Platform, at VMware. "One of the biggest challenges in the adoption of Hadoop is the difficulty in deploying Hadoop and the cost associated with that. What we're basically doing is dramatically simplifying what it takes to deploy, configure and manage Hadoop clusters."

Open Source Serengeti Virtualizes Hadoop
VMware today took the wraps off a new open source project dubbed Serengeti that is designed to be a "one-click" deployment toolkit for deploying highly available Hadoop clusters-and common Hadoop components like Apache Pig and Apache Hive-on VMware's vSphere platform. VMware is leading the Serengeti project in collaboration with key Hadoop distribution vendors like Cloudera, Greenplum, Hortonworks, IBM and MapR.

Currently, Hadoop is primarily deployed on a physical infrastructure. Such deployments can take days, weeks or even months depending on the scale, as IT obtains the necessary hardware, installs the distribution on the nodes and then configures the cluster and all the Hadoop components. And if the cluster is incorrectly sized for your need, resizing it can involve doing much of that work over again.

"With Serengeti you can deploy a Hadoop cluster in as little as 10 minutes without having to learn anything new," Ibarra says. "You have your choice of Hadoop distribution, and you will be able to reuse your existing virtual infrastructure running on vSphere; all while using the same skills and operations requirements as other things on vSphere."