- All Delta Lake enhancements contributed to Linux Foundation
with release of Delta Lake 2.0
- MLflow 2.0 with ML Pipelines accelerates time-to-production
for machine learning projects
- Spark Connect introduced to allow Apache SparkTM
to run on any device
- Project Lightspeed revealed for next generation Spark
Streaming
SAN
FRANCISCO, June 28, 2022 /PRNewswire/ -- Databricks,
the data and AI company and pioneer of the data lakehouse paradigm,
today announced several contributions to popular data and AI open
source projects including Delta Lake, MLflow, and Apache Spark.
At the Data + AI Summit, the largest gathering of the open
source data and AI community, Databricks announced that the company
will contribute all features and enhancements it has made to Delta
Lake to the Linux Foundation and open source all Delta Lake APIs as
part of the Delta Lake 2.0 release. In addition, the company
announced MLflow 2.0, which includes MLflow Pipelines, a new
feature to accelerate and simplify ML model deployments. Finally,
the company introduced Spark Connect, to enable the use of Spark on
virtually any device, and Project Lightspeed, a next generation
Spark Structured Streaming engine for data streaming on the
lakehouse.
"From the beginning, Databricks has been committed to open
standards and the open source community. We have created,
contributed to, fostered the growth of, and donated some of the
most impactful innovations in modern open source technology," said
Ali Ghodsi, Co-Founder and CEO of
Databricks. "Open data lakehouses are quickly becoming the standard
for how the most innovative companies handle their data and AI.
Delta Lake, MLflow and Spark are all core to this architectural
transformation, and we're proud to do our part in accelerating
their innovation and adoption."
Delta Lake 2.0 Brings the Lakehouse to Everyone
Delta
Lake 2.0 will bring unmatched query performance to all Delta
Lake users and enable everyone to build a highly performant data
lakehouse on open standards. With this contribution, Databricks
customers and the open source community will benefit from the full
functionality and enhanced performance of Delta Lake 2.0. The Delta
Lake 2.0 Release Candidate is now available and is expected to be
fully released later this year. The breadth of the Delta Lake
ecosystem makes it flexible and powerful in a wide range of use
cases. Fueling this is a vibrant community of over 6,400 members,
with contributing developers from more than 70 contributing
organizations.
"Databricks provides Akamai with a table storage format that is
open and battle-tested for demanding workloads such as ours. The
lakehouse powers interactive analytics at scale so that our
customers can have near real-time analysis of security events
within our Edge platform," said Aryeh
Sivan, VP Engineering at Akamai. "We are very excited about
the rapid innovation that Databricks, along with the rapidly
growing community, is bringing to Delta Lake. We are also looking
forward to collaborating with other developers on the project to
move the data community to greater heights."
"The Delta Lake project is seeing phenomenal activity and growth
trends indicating the developer community wants to be a part of the
project. Contributor strength has increased by 60% during the last
year and the growth in total commits is up 95% and the average
lines of code per commit is up 900%. We are seeing this upward
velocity from contributing organizations like Uber Technologies,
Walmart and CloudBees, Inc., among others," said Executive Director
of the Linux Foundation, Jim
Zemlin.
MLflow 2.0 Introduces MLflow Pipelines to Templatize and
Automate MLOps
As one of the most successful open source
machine learning (ML) projects, MLflow set the standard for ML
platforms. The release of MLflow 2.0 introduces MLflow
Pipelines to the platform, substantially decreasing time to
production and improving execution at scale through
standardization. MLflow Pipelines offers data scientists
pre-defined, production-ready templates based on the model type
they're building to allow them to reliably bootstrap and accelerate
model development without requiring intervention from production
engineers.
Next Generation Streaming Engine and Spark Whenever and
Wherever
As the leading unified engine for large-scale data
analytics, Spark scales seamlessly to handle data sets of all
sizes. However, the lack of remote connectivity and burden of
applications developed and run on the driver node, hinder the
requirements of modern data applications. To tackle this,
Databricks introduced Spark Connect, a client and server
interface for Apache Spark based on the DataFrame API that will
decouple the client and server for better stability, and allow for
built-in remote connectivity. With Spark Connect, users will
be able to access Spark from any device.
In collaboration with the Spark community, Databricks also
announced Project Lightspeed, the next generation of the
Spark streaming engine. As the diversity of applications
moving into streaming data has increased, new requirements have
emerged to support the most in-demand data workloads for lakehouse,
data streaming. Spark Structured Streaming has been widely
adopted since the early days of streaming because of its ease of
use, performance, large ecosystem, and developer communities. With
that in mind, Databricks will collaborate with the community and
encourage participation in Project Lightspeed to improve
performance, ecosystem support for connectors, enhance
functionality for processing data with new operators and APIs, and
simplify deployment, operations, monitoring and
troubleshooting.
To learn more about Databricks' commitment to the open source
community visit: https://databricks.com/product/open-source.
About Databricks
Databricks is the data and AI
company. More than 7,000 organizations worldwide — including
Comcast, Condé Nast, H&M, and over 40% of the Fortune 500 —
rely on the Databricks Lakehouse Platform to unify their data,
analytics and AI. Databricks is headquartered in San Francisco, with offices around the globe.
Founded by the original creators of Delta Lake, Apache Spark™, and
MLflow, Databricks is on a mission to help data teams solve the
world's toughest problems. To learn more, follow Databricks on
Twitter, LinkedIn and Facebook.
Safe Harbor Statement
This information is provided to
outline Databricks' general product direction and is for
informational purposes only. Customers who purchase Databricks
services should make their purchase decisions relying solely upon
services, features, and functions that are currently available.
Unreleased features or functionality described in forward-looking
statements are subject to change at Databricks discretion and may
not be delivered as planned or at all.
Contact: Press@databricks.com
View original content to download
multimedia:https://www.prnewswire.com/news-releases/databricks-announces-major-contributions-to-flagship-open-source-projects-at-data--ai-summit-301576465.html
SOURCE Databricks