Databricks DATA + AI Summit 2023

dbt + Machine Learning: What Makes a Great Baton Pass?

2022-07-19 Watch

video

AI/ML Databricks dbt Python Scala SQL

dbt has done a great job of building an elegant, common interface between data engineers and data analysts: uniting on SQL. As the data industry evolves, there's plenty of pain and room to grow in building that interface between data scientists and data analysts. There isn't a good answer for when things go wrong in the machine learning arena: should the data analyst own fine-tuning the pre-processing data(think: prepping transformed data even more for machine learning models to better work with the data). Should we increase the SQL surface area to build ML models or should we leave that to non-SQL interfaces(python/scala/etc.)? Does this have to be an either/or future? Whatever the interface evolves into, it must center people, create a low bar and high ceiling, and focus on outcomes and not the mystique of features/tools behind a learning curve.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

DELETE, UPDATE, MERGE Operations in Data Source

2022-07-19 Watch

video

Data Lake Databricks Spark SQL

If you’ve ever had to delete a set of records for regulatory compliance, update a set of records to fix an issue in the ingestion pipeline, or apply changes in a transaction log to a fact table, you know that row-level operations are becoming critical for modern data lake workflows. This talk will focus on some of the upcoming features in Spark 3.3 that will enable execution of row-level operations and allow Spark to only pass to connectors what rows to delete, update, or insert. As a result, data sources won’t have to provide low-level SQL extensions for Spark and will be able to benefit from a scalable built-in implementation that works across all connectors. The presentation will be useful for data source developers as well as data engineers and analysts interested in performing DELETE, UPDATE, MERGE operations in Spark.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Delta Lake 2.0 Overview

2022-07-19 Watch

video

Flink API Databricks Delta Go Java

After three years of hard work by the Delta community, we are proud to announce the release of Delta Lake 2.0. Completing the work to open-source all of Delta Lake while tens of thousands of organizations were running in production was no small feat and we have the ever-expanding Delta community to thank! Join this session to learn about how the wider Delta community collaborated together to bring these features and integrations together.

Join this session to learn about how the wider Delta community collaborated together to bring these features and integrations together. This includes the Integrations with Apache Spark™, Apache Flink, Apache Pulsar, Presto, Trino, and more.

Features such as OPTIMIZE ZORDER, data skipping using column stats, S3 multi-cluster writes, Change Data Feed, and more.

Language APIs including Rust, Python, Ruby, GoLang, Scala, and Java.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Delta Live Tables: Modern Software Engineering and Management for ETL

2022-07-19 Watch

video

AI/ML Analytics BI Data Analytics Data Science Databricks

Data engineers have the difficult task of cleansing complex, diverse data, and transforming it into a usable source to drive data analytics, data science, and machine learning. They need to know the data infrastructure platform in depth, build complex queries in various languages and stitch them together for production. Join this talk to learn how Delta Live Tables (DLT) simplifies the complexity of data transformation and ETL. DLT is the first ETL framework to use modern software engineering practices to deliver reliable and trusted data pipelines at any scale. Discover how analysts and data engineers can innovate rapidly with simple pipeline development and maintenance, how to remove operational complexity by automating administrative tasks and gaining visibility into pipeline operations, how built-in quality controls and monitoring ensure accurate BI, data science, and ML, and how simplified batch and streaming can be implemented with self-optimizing and auto-scaling data pipelines.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Democratizing Metrics at Airbnb

2022-07-19 Watch

video

Databricks

Data democratization is the process of enabling self-service access to harness insights from data for anyone with varying levels of data expertise in an organization. Without being deliberate, this process often leads to a proliferation of data tools that makes it inherently challenging to ensure consistent insights. At Airbnb, we’ve created a centralized metrics platform named Minerva to guarantee data consistency at scale. You may read about the introduction of Minerva (a 3-part blog) in the Airbnb Tech Blog. In this talk, we’ll share several architectural changes we’ve made to allow for unprecedented flexibility while maintaining consistency, and introduce our plan for open-sourcing Minerva.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Discover Data Lakehouse With End-to-End Lineage

2022-07-19 Watch

video

Analytics Dashboard Data Governance Data Lakehouse Data Quality Databricks

Data Lineage is key for managing change, ensuring data quality and implementing Data Governance in an organization. There are a few use cases for Data Lineage: Data Governance: For compliance and regulatory purposes our customers are required to prove the data/reports they are submitting came from a trusted and verified source.

This typically means identifying the tables and data sets used in a report or dashboard and tracing the source of these tables and fields. Another use case for the Governance scenario is to understand the spread of sensitive data within the lakehouse. Data Discovery: Data analysts looking to self-serve and build their own analytics and models typically spend time exploring and understanding the data in their lakehouse.

Lineage is a key piece of information which enhances the understanding and trustworthiness of the data the analyst plans to use. Problem Identification: Data teams are often called to solve errors in analysts dashboards and reports (“Why is the total number of widgets different in this report than the one I have built?”). This usually leads to an expensive forensic exercise by the DE team to understand the sources of data and the transformations applied to it before it hits the report. Change Management : It is not uncommon for data sources to change, a new source may stop delivering data or a field in the source system changes its semantics.

In this scenario the DE team would like to understand the downstream impact of this change - to get a sense of how many datasets and users will be affected by this change. This will help them determine the impact of the change, manage user expectations and address issues ahead of time In this talk, we will talk about how we capture table and column lineage for spark / delta and unity catalog for our customers in details and how users could leverage data lineage to serve various use cases mentioned above.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Doubling the Capacity of the Data Platform Without Doubling the Cost

2022-07-19 Watch

video

AI/ML Databricks

The data and ML platform at Scribd is growing. I am responsible for understanding and managing its cost, while enabling the business to solve new and interesting problems with our data. In this talk we'll discuss each of the following concepts and how they apply at Scribd and more broadly to other Databricks customers.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Embedding Privacy by Design Into Data Infrastructure Through Open-Source, Extensible Tooling

2022-07-19 Watch

video

Cillian

CI/CD Databricks

The systemic privacy issues in our digital infrastructure stem largely from a fundamental design flaw: privacy is only considered reactively, once personal data is already flowing. Consumer trust is more valuable than ever, and the legal stakes for respecting personal data continue to climb. Appointing a privacy engineer to check boxes at the time of deployment won't cut it...the status quo for data context and data control - in other words, privacy controls - needs to change.

Analogous to AppSec's leftward shift, privacy responsibility lies with builders and maintainers of data and software systems. This requires resources for developers to embrace their role in tasks like evaluating privacy risk with minimal friction, compatible with the array of modern data infrastructure. Cillian will share actionable steps to implement Privacy by Design and offer just one example of what it could look like in action with open-source devtools for automated privacy checks in the CI pipeline.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Enable Production ML with Databricks Feature Store

2022-07-19 Watch

video

AI/ML Analytics Data Lakehouse Databricks Spark SQL

Productionalizing ML models is hard. In fact, very few ML projects make it to production, and one of the hardest problems is data! Most AI platforms are disconnected from the data platform, making it challenging to keep features constantly updated and available in real-time. Offline/online skew prevents models from being used in real-time or, worse, introduces bugs and biases in production. Building systems to enable real-time inference requires valuable production engineering resources. As a result of these challenges, most ML models do not see the light of day.

Learn how you can simplify production ML using Databricks Feature Store, the first feature store built on the data lakehouse. Data sources for features are drawn from a central data lakehouse, and the feature tables themselves are tables in the lakehouse, accessible in Spark and SQL for both machine learning and analytics use cases. Features, data pipelines, source data, and models can all be co-governed in a central platform. Feature Store is seamlessly integrated with Apache Spark™, enabling automatic lineage tracking, and with MLflow, enabling models to look up feature values at inference time automatically. See these capabilities in action and how you can use it for your ML projects.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Enabling Business Users to Perform Interactive Ad-Hoc Analysis over Delta Lake with No Code

2022-07-19 Watch

video

Analytics Cloud Computing Data Analytics Databricks Delta

In this talk, we'll first introduce Sigma Workbooks along with its technical design motivations and architectural details. Sigma Workbooks is an interactive visual data analytics system that enables business users to easily perform complex ad-hoc analysis over data in cloud data warehouses (CDWs). We'll then demonstrate the expressivity, scalability, and ease-of-use of Sigma Workbooks through real-life use cases over datasets stored in Delta Lake. We’ll conclude the talk by sharing the lessons that we have learned throughout the design and implementation iterations of Sigma Workbooks.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Evolution of Data Architectures and How to Build a Lakehouse

2022-07-19 Watch

video

AI/ML Analytics BI Data Analytics Data Lakehouse Data Management

Data architectures are the key and part of a larger picture to building robust analytical and AI applications. One must take a holistic view of the entire data analytics realm when it comes to planning for data science initiatives.

Through this talk, learn about the evolution of the data landscape and why Lakehouses are becoming a de facto for organizations building scalable data architectures. A lakehouse architecture combines data management capability including reliability, integrity, and quality from the data warehouse and supports all data workloads including BI and AI with the low cost and open approach of data lakes.

Data Practitioners will also learn some core concepts of building an efficient Lakehouse with Delta Lake.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Fugue Tune: Distributed Hybrid Hyperparameter Tuning

2022-07-19 Watch

video

Databricks Spark

Hyperparameter optimization on Spark is commonly memory-bound, where the model training is done on data that doesn’t fit on a single machine. We introduce Fugue-tune, an intuitive interface focusing on compute-bound hyperparameter tuning that scales Hyperopt and Optuna by allowing them to leverage Spark and Dask without code change.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

FutureMetrics: Using Deep Learning to Create a Multivariate Time Series Forecasting Platform

2022-07-19 Watch

video

Databricks MLOps

Liquidity forecasting is one of the most essential activities at any bank. TD bank, the largest of the big Five, has to provide liquidity for half a trillion dollars in products, and to forecast it to remain within a $5BN buffer.

The use case was to predict liquidity growth over short to moderate time horizons: 90 days to 18 months. Model must perform reliably in a strict regulatory framework, and as such validating such a model to the required standards is a key area of focus for this talk. While univariate models are widely used for this reason, their performance is capped preventing future improvements for these type problems.

The most challenging aspect of this problem is that the data is shallow (P N): the primary cadence is monthly, and chaotic nature of economic systems results in poor connectivity of behavior across transitions. Goal is to create an MLOps platform for these types of time series forecasting metrics across the enterprise.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Measuring the Success of Your Algorithm Using a Shadow System

2022-07-19 Watch

video

Databricks ETL/ELT

How to determine whether your new data product is a success if you cannot use A/B testing techniques?

At Gousto we recently implemented our newest algorithm to route orders to sites. Comparing this to the previous algorithm using classic A/B testing techniques was not possible, because the algorithm requires a full set of orders to optimise and ensure the volume we send to sites remains stable. A routing algorithm is a high impact product. To ensure confidence in our algorithm before go-live, we came up with a different experimentation strategy. This included building a full-blown shadow system. For measuring its performance we built a set of data pipelines (including ETL) using Databricks.

Sometimes an A/B test cannot do the job. This talk will outline challenges and benefits of building a shadow system, providing the audience with an A/B testing alternative and an overview of relevant considerations in terms of choosing and building this experiment design.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

ÀLaSpark: Gousto's Recipe for Building Scalable PySpark Pipelines

2022-07-19 Watch

video

Data Governance Data Quality Databricks PySpark Python Spark

Find out how Gousto is developing its data pipelines at scale in a repeatable manner. At Gousto, we’ve developed Goustospark - a wrapper around pyspark that allows us to quickly and easily build data pipelines that are deployed into our Databricks environment.

This wrapper abstracts repetitive components of all data pipelines such as spark configurations and metastore interactions. This allows a developer to simply specify the blueprints of the pipeline before turning their attention to more pressing issues, such as data quality and data governance, whilst enjoying a high level of performance and reliability.

In this session we will deep dive into the design patterns we followed, some unique approaches we’ve taken on how we structure pipelines and show a live demo of implementing a new spark streaming pipeline in Databricks from scratch. We will even share some example python code and snippets to help you build your own.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

How AARP Services, Inc. automated SAS transformation to Databricks using LeapLogic

2022-07-19 Watch

video

Analytics Cloud Computing Data Science Databricks Delta DWH

While SAS has been a standard in analytics and data science use cases, it is not cloud-native and does not scale well. Join us to learn how AARP automated the conversion of hundreds of complex data processing, model scoring, and campaign workloads to Databricks using LeapLogic, an intelligent code transformation accelerator that can transform any and all legacy ETL, analytics, data warehouse and Hadoop to modern data platforms.

In this session experts from AARP and Impetus will share about collaborating with Databricks and how they were able to: • Automate modernization of SAS marketing analytics based on coding best practices • Establish a rich library of Spark and Python equivalent functions on Databricks with the same capabilities as SAS procedures, DATA step operations, macros, and functions • Leverage Databricks-native services like Delta Live Tables to implement waterfall techniques for campaign execution and simplify pipeline monitoring

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

How to Automate the Modernization and Migration of Your Data Warehousing Workloads to Databricks

2022-07-19 Watch

video

Analytics Data Lakehouse Databricks DWH PySpark

The logic in your data is the heartbeat of your organization’s reports, analytics, dashboards and applications. But that logic is often trapped in antiquated technologies that can’t take advantage of the massive scalability in the Databricks Lakehouse.

In this session BladeBridge will show how to automate the conversion of this metadata and code into Databricks PySpark and DBSQL. BladeBridge will demonstrate the flexibility of configuring for N legacy technologies to facilitate an automated path for not just a single modernization project but a factory approach for corporate wide modernization.

BladeBridge will also present how you can empirically size your migration project to determine the level of effort required.

In this session you will learn: What BladeBridge Converter is What BladeBridge Analyzer is How BladeBridge configures Readers and Writers How to size a conversion effort How to accelerate adoption of Databricks

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Welcome & Destination Lakehouse Ali Ghodsi Keynote Data + AI Summit 2022

2022-07-19 Watch

video

Ali Ghodsi (Databricks) , Reynold Xin (Databricks) , Matei Zaharia (Databricks)

AI/ML Data Lakehouse Databricks Delta Spark

Join the Day 1 keynote to hear from Databricks co-founders - and original creators of Apache Spark and Delta Lake - Ali Ghodsi, Matei Zaharia, and Reynold Xin on how Databricks and the open source community is taking on the biggest challenges in data. The talks will address the latest updates on the Apache Spark and Delta Lake projects, the evolution of data lakehouse architecture, and how companies like Adobe and Amgen are using lakehouse architecture to advance their data goals.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Adversarial AI—The Nature of the Threat, Impacts, and Mitigation Strategies

2022-07-19 Watch

video

AI/ML Databricks NLP

Adversarial AI/ML is an emerging research area focused on the vulnerabilities of Artificial Intelligence (AI)/Machine Learning (ML) models to adversarial exploitation such as data poisoning, adversarial perturbations, inference and extraction attacks. This research area is of particular interest to domains where AI/ML models play an essential role in the mission-critical decision making processes. In this presentation, we will give a review of the four principal categories of Adversarial AI. We will discuss each one of these, supported by the relevant and interesting examples, and we will discuss the future implications. We will present in greater depth our research in Adversarial NLP, backed by the specific data poisoning and adversarial perturbation examples attacks on NLP classifiers. We will conclude the presentation by discussing the current mitigation approaches and methods, and offer some general recommendations for how to best address the Adversarial AI exploits.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

AI-Fueled Forecasting: The Next Generation of Financial Planning

2022-07-19 Watch

video

AI/ML Analytics Databricks

In an age of data abundance and digital disruption CFOs are adopting next generation planning capabilities to drive strategic decision making in real time. The future of forecasting is AI driven. PrecisionViewTM, a Deloitte’s proprietary forecasting solution leverages data aggregation technologies with predictive analytics and machine-learning capabilities to allow businesses to achieve improved forecasting accuracy.

Attend this webinar to hear about: • AI-powered financial planning that helps generate high-impact insights by incorporating the organization’s internal data and a myriad of external macroeconomic factors • Examples of how companies have achieved success using scenario modelling • Databricks’ compute capabilities that allow for parallel processing which helps generate near real time forecasts at the most granular levels

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Apache Spark Community Update | Reynold Xin Streaming Lakehouse | Karthik Ramasamy

2022-07-19 Watch

video

Karthik Ramasamy (Databricks) , Reynold Xin (Databricks)

AI/ML Data Lakehouse Databricks Spark Data Streaming

Data + AI Summit Keynote talks from Reynold Xin and Karthik Ramasamy

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Best Practices of Maintaining High-Quality Data

2022-07-19 Watch

video

AI/ML Data Governance Data Quality Databricks

Data sits at the heart of machine learning algorithms and makes your model only as good as the data governance policies at the organization. The talk will cover multiple data governance frameworks. Besides, we will talk in depth about one of the key areas of the data governance policy i.e. data quality. The session will cover the significance of the data quality, the definition of goodness, what are the key benefits and impact of maintaining high quality data and processes. Not merely a theoretical aspect, the talk focusses on the practical techniques and guidelines on maintaining the data quality.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Big Data in the Age of Moneyball

2022-07-19 Watch

video

Analytics Big Data Cloud Computing Databricks

Data and predictions have permeated sports and our conversations around it since the beginning. Who will win the big game this weekend? How many points will your favorite player score? How much money will be guaranteed in the next free agent contract? Once could argue that data-driven decisions in sports started with Moneyball in baseball, in 2003. In the two decades since, data and technology have exploded on the scene. The Texas Rangers are using modern cloud software, such as Databricks, to help make sense of this data, and provide actionable information to create a World Series team on the field. From computer vision, pose analytics, and player tracking, to pitch design, base stealing likelihood, and more, come see how the Texas Rangers are using innovative cloud technologies to create action-driven reports from the current sea of Big Data. Finally, this talk will demonstrate how the Texas Rangers use MLFlow and the MLRegistry inside Databricks to organize their predictive models.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Building a Data Science as a Service Platform in Azure with Databricks

2022-07-19 Watch

video

AI/ML Azure Data Science Databricks

Machine learning in the enterprise is rarely delivered by a single team. In order to enable Machine Learning across an organisation you need to target a variety of different skills, processes, technologies, and maturity's. To do this is incredibly hard and requires a composite of different techniques to deliver a single platform which empowers all users to build and deploy machine learning models.

In this session we discuss how Databricks enabled a data science as a service platform for one of the UKs largest household insurers. We look at how this platform is empowering users of all abilities to build models, deploy models and realise and return on investment earlier.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Building Recommendation Systems Using Graph Neural Networks

2022-07-19 Watch

video

AI/ML Databricks

RECKON (RECommendation systems using KnOwledge Networks) is a machine learning project centred around improving the entities intelligence.

We represent the dataset of our site interactions as a heterogeneous graph. The nodes represent various entities in the underlying data (Users, Articles, Authors, etc.). Edges between nodes represent interactions between these entities (User u has read article v, Article u was written by author v, etc.)

RECKON uses a GNN based encoder-decoder architecture to learn representations for important entities in our data by leveraging both their individual features and the interactions between them through repeated graph convolutions.

Personalized Recommendations play an important role in improving our user's experience and retaining them. We would like to take this opportunity to walk through some of the techniques that we have incorporated in RECKON and an end-end building of this product on databricks along with the demo.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

talk-data.com

Databricks DATA + AI Summit 2023

Top Topics

Top Speakers

dbt + Machine Learning: What Makes a Great Baton Pass?

DELETE, UPDATE, MERGE Operations in Data Source

Delta Lake 2.0 Overview

Delta Live Tables: Modern Software Engineering and Management for ETL

Democratizing Metrics at Airbnb

Discover Data Lakehouse With End-to-End Lineage

Doubling the Capacity of the Data Platform Without Doubling the Cost

Embedding Privacy by Design Into Data Infrastructure Through Open-Source, Extensible Tooling

Enable Production ML with Databricks Feature Store

Enabling Business Users to Perform Interactive Ad-Hoc Analysis over Delta Lake with No Code

Evolution of Data Architectures and How to Build a Lakehouse

Fugue Tune: Distributed Hybrid Hyperparameter Tuning

FutureMetrics: Using Deep Learning to Create a Multivariate Time Series Forecasting Platform

Measuring the Success of Your Algorithm Using a Shadow System

ÀLaSpark: Gousto's Recipe for Building Scalable PySpark Pipelines

How AARP Services, Inc. automated SAS transformation to Databricks using LeapLogic

How to Automate the Modernization and Migration of Your Data Warehousing Workloads to Databricks

Welcome & Destination Lakehouse Ali Ghodsi Keynote Data + AI Summit 2022

Adversarial AI—The Nature of the Threat, Impacts, and Mitigation Strategies

AI-Fueled Forecasting: The Next Generation of Financial Planning

Apache Spark Community Update | Reynold Xin Streaming Lakehouse | Karthik Ramasamy

Best Practices of Maintaining High-Quality Data

Big Data in the Age of Moneyball

Building a Data Science as a Service Platform in Azure with Databricks

Building Recommendation Systems Using Graph Neural Networks