Apache Spark: Where is it going?
Databricks ran Apache Spark Survey 2016 this summer to identify how organizations are using Apache Spark. The survey results suggest that Spark’s growth continues across various industries, building sophisticated data solutions by people in various functional roles.
- Databricks 2016 survey results reflect answers from 900 distinct organizations and 1615 respondents, who were predominantly Apache Spark users.
- Over the year, code contributors almost doubled, and Apache Spark Meetup members tripled, from 66K to 225K.
- Also, since the release of DataFrames in 2015, its usage has doubled, from 15% to 38%; Windows users jumped from 23% to 32%.
- More than half (51%) of the respondents in this survey consider Spark Streaming as an essential component for building real-time streaming use cases, and 82% of respondents say the same for advanced analytics.
- This year, the production use of Spark Streaming jumped from 14% (in 2015) to 22% (in 2016), along with Machine Learning from 13% (in 2015) to 18% (2016).
- Spark deployments in the cloud this year is at 61%, up from 51% last year. By contrast, the Spark deployments using on-premises cluster managers fell by an average of 5%.
- Seventy-four percent of respondents use more than two components, while 64% use three or more in production.
- Along with Spark Streaming and Machine Learning, 38% use DataFrames, while 40% use Spark SQL in production.
Today, there are over 1000 Spark contributors, compared to 600 in 2015 from 250+ organizations. With such large numbers of contributors and organizations investing in Spark’s future development, it has engaged a community of developers globally.