- Advanced data warehousing and data governance capabilities
highlight the future of the modern data stack
- Databricks Marketplace and Data Cleanrooms functionality
accelerate the company's vision for open and collaborative data
sharing
- New data engineering optimizations automatically execute
batch and streaming data pipelines in the most cost efficient
manner
- Enhancements across the machine learning lifecycle radically
simplify MLOps at production scale
SAN FRANCISCO, June 28, 2022 /PRNewswire/ -- Databricks, the
data and AI company and pioneer of the data lakehouse paradigm,
today unveiled the evolution of the Databricks Lakehouse Platform
to a sold-out crowd at the annual Data + AI Summit in San Francisco. New capabilities revealed
include best-in-class data warehousing performance and
functionality, expanded data governance, new data sharing
innovations to include an analytics marketplace and data clean
rooms for secure data collaboration, automatic cost optimization
for ETL operations, and machine learning (ML) lifecycle
improvements.
"Our customers want to be able to do business intelligence, AI,
and machine learning on one platform, where their data already
resides. This requires best-in-class data warehousing capabilities
that can run directly on their data lake. Benchmarking ourselves
against the highest standards, we have proven time and again that
the Databricks Lakehouse Platform gives data teams the best of both
worlds on a simple, open, and multi-cloud platform," said
Ali Ghodsi, Co-founder and CEO of
Databricks. "Today's announcements are a significant step forward
in advancing our Lakehouse vision, as we are making it faster and
easier than ever to maximize the value of data, both within and
across companies."
The Best Data Warehouse is the Lakehouse
Organizations
like Amgen, AT&T, Northwestern Mutual and Walgreens, are making
the move to the lakehouse because of its ability to deliver
analytics on both structured and unstructured data. Today,
Databricks unveiled new data warehousing capabilities in its
platform to further enhance analytics workloads:
- Databricks SQL Serverless, available in preview on AWS,
provides instant, secure, and fully managed elastic compute for
improved performance at a lower cost.
- Photon, the record-setting query engine for lakehouse
systems, will be generally available on Databricks Workspaces in
the coming weeks, further expanding Photon's reach across the
platform. In the two years since Photon was announced, it has
processed exabytes of data, run billions of queries, delivered
benchmark-setting price/performance at up to 12x better than
traditional cloud data warehouses.
- Open source connectors for Go, Node.js, and
Python now make it even simpler to access the lakehouse
from operational applications.
- Databricks SQL CLI now enables developers and analysts
to run queries directly from their local computers.
- Databricks SQL now provides query federation, offering
the ability to query remote data sources including PostgreSQL,
MySQL, AWS Redshift, and others without the need to first extract
and load the data from the source systems.
Data Governance Highlighted as a Top Priority with Advanced
Capability for Unity Catalog
Unity Catalog, generally
available on AWS and Azure in the coming weeks, offers a
centralized governance solution for all data and AI assets, with
built-in search and discovery, automated lineage for all workloads,
with performance and scalability for a lakehouse on any cloud.
Also, Databricks introduced data lineage for Unity Catalog
earlier this month, significantly expanding data governance
capabilities on the lakehouse and giving businesses a complete view
of the entire data lifecycle. With data lineage, customers gain
visibility into where data in their lakehouse came from, who
created it and when, how it has been modified over time, how it's
being used across data warehousing and data science workloads, and
much more.
Enhanced Data Sharing Enabled By Databricks Marketplace and
Cleanrooms
As the first marketplace for all data and AI,
available in the coming months, Databricks Marketplace
provides an open marketplace to package and distribute data and
analytics assets. Going beyond marketplaces that simply offer
datasets, Databricks Marketplace enables data providers to securely
package and monetize a host of assets such as data tables, files,
machine learning models, notebooks and analytics dashboards. Data
consumers can easily discover new data and AI assets, jumpstart
their analysis and gain insights and value from data faster. For
example, instead of acquiring access to a dataset and investing
their own time to develop and maintain dashboards to report on it,
they can choose to simply subscribe to pre-existing dashboards that
already provide the necessary analytics. Databricks Marketplace is
powered by Delta Sharing, allowing data providers to share their
data without having to move or replicate the data from their cloud
storage. This allows providers to deliver data to other clouds,
tools, and platforms from a single source.
Databricks is also helping customers share and collaborate with
data across organizational boundaries. Cleanrooms, available
in the coming months, will provide a way to share and join data
across organizations with a secure, hosted environment and no data
replication required. In the context of media and advertising, for
example, two companies may want to understand audience overlap and
campaign reach. Existing clean room solutions have limitations, as
they are commonly restricted to SQL tools and run the risk of data
duplication across multiple platforms. With Cleanrooms,
organizations can easily collaborate with customers and partners on
any cloud and provide them the flexibility to run complex
computations and workloads using both SQL and data science-based
tools - including Python, R, and Scala - with consistent data
privacy controls.
MLflow 2.0 Streamlines and Accelerates Production Machine
Learning at Scale
Databricks continues to lead the way in
MLOps innovation with the introduction of MLflow 2.0. Getting a
machine learning pipeline into production requires setting up
infrastructure, not just writing code. This can be difficult for
new users and tedious for everyone at scale. MLflow
Pipelines, made possible by MLflow 2.0, now handles the
operational details for users. Instead of setting up orchestration
of notebooks, users can simply define the elements of the pipeline
in a configuration file and MLflow Pipelines manages execution
automatically. Looking beyond MLflow, Databricks also added
Serverless Model Endpoints to directly support production model
hosting, as well as built-in Model Monitoring dashboards to help
teams analyze the real-world model performance.
Delta Live Tables Includes Industry First Performance
Optimizer for Data Engineering Pipelines
Delta Live
Tables (DLT) is the first ETL framework to use a simple,
declarative approach to building reliable data pipelines. Since its
launch earlier this year, Databricks continues to expand DLT with
new capabilities including the introduction of a new performance
optimization layer designed to speed up execution and reduce costs
of ETL. Additionally, new Enhanced Autoscaling is purpose-built to
intelligently scale resources with the fluctuations of streaming
workloads, and Change Data Capture (CDC) for Slowly Changing
Dimensions - Type 2, easily tracks every change in source data for
both compliance and machine learning experimentation purposes.
To learn more about the Databricks Lakehouse Platform visit:
https://databricks.com/product/data-lakehouse. Tune in virtually
for more Data + AI Summit keynotes by registering here for the
free, immersive online experience.
About Databricks
Databricks is the data and AI
company. More than 7,000 organizations worldwide — including
Comcast, Condé Nast, H&M, and over 40% of the Fortune 500 —
rely on the Databricks Lakehouse Platform to unify their data,
analytics and AI. Databricks is headquartered in San Francisco, with offices around the globe.
Founded by the original creators of Delta Lake, Apache Spark™, and
MLflow, Databricks is on a mission to help data teams solve the
world's toughest problems. To learn more, follow Databricks on
Twitter, LinkedIn and Facebook.
Safe Harbor Statement
This information is provided to
outline Databricks' general product direction and is for
informational purposes only. Customers who purchase Databricks
services should make their purchase decisions relying solely upon
services, features, and functions that are currently available.
Unreleased features or functionality described in forward-looking
statements are subject to change at Databricks discretion and may
not be delivered as planned or at all.
Contact: Press@databricks.com
Logo -
https://mma.prnewswire.com/media/1160675/Databricks_Logo.jpg