Developers have been kids in a candy store for the past decade with exciting options for databases (such as MongoDB, Elasticsearch, and Cassandra), cloud storage services like Amazon S3, and new paradigms like microservices and serverless computing architectures. With these new approaches, applications are being developed more quickly and efficiently. That’s the good news.
The bad news is that the data generated by these applications creates new challenges when it comes to analytics. Those copious quantities of data largely reside in JSON and other nonrelational formats, inaccessible to traditional tools and methodologies for analytics. For any enterprise looking to embrace the future of data, its biggest enemy is the crufty data infrastructure upon which its past has been built.
Fortunately, there’s hope: A new breed of open source projects, like Dremio and Presto, has arisen to bridge the gap between traditional business intelligence (BI) tools and newfangled data sources. While still early, these tools show promise as a way to let developers use their preferred tools while someone else stitches together the silos.
Old analytics dogs meet new data tricks
While application development practices have evolved dramatically in recent years, the way companies manage data for analytics hasn't changed nearly as much. This wouldn’t matter so much if there weren’t so darn many BI users in any given company. We rightly tout the importance of software developers, but there are probably ten times as many BI users as software developers.
Leaving them behind really isn’t an option, yet that’s exactly what we’re doing.
By and large, most analysis is performed using BI tools like Tableau, Looker, Power BI, Qlik, and Cognos. These tools all assume the data is in one place, in a relational model. Unfortunately for the utility of such tools, the truth is that no company of any size keeps all its data in a single data warehouse, or even in one of those much-hyped data lakes. There are and always will be silos.
A number of open source projects have emerged to bridge this gap between traditional BI tools. These include Presto and Dremio, as well as Amazon Athena (which is based on Presto), and Google BigQuery. These projects aim to run between data sources (relational, file systems, NoSQL sources) and different SQL-based tools like BI as well as data science platforms based on Python and R.
Dremio is different than the other new data kids
While each purports to fulfill that goal, Dremio is different. It’s much more than the query execution engine provided by something like Presto. Dremio integrates other key functional areas for query acceleration, data curation, data lineage, data catalog, and delivers the solution as a self-service model that is similar to Google Docs, but for data sets.
That’s cool, and it just got a bit cooler.
Just this week Dremio announced support for Looker, a popular BI platform. This lets users reach a broader range of sources than they could before (MongoDB, Elasticsearch, S3, HDFS, Azure ADLS, etc), perform joins across sources, and accelerate queries. It expands the reach of data consumers using Looker, and it helps make them be more independent and self-directed—no more waiting in the data bread lines for IT to move data into one silo for analysis.
This is all part of a bigger trend to let data users use their favorite tools, let developers build apps on their favorite databases and filesystems, and solve the technology mismatch problem with a new layer that sits between the tools and the data.
In short, the choice moves from “either/or” to “and,” which is a great selling point to IT pros and others who need to extend the value of existing BI investments while embracing a more modern, open source-driven data future. It lets developers be developers, without having to slow down to bother with the data silos they leave in their wake.