talk-data.com talk-data.com

Event

Databricks DATA + AI Summit 2023

2026-01-11 YouTube Visit website ↗

Activities tracked

582

Sessions & talks

Showing 501–525 of 582 · Newest first

Search within this event →
Cloud Fetch: High-bandwidth Connectivity With BI Tools

Cloud Fetch: High-bandwidth Connectivity With BI Tools

2022-07-19 Watch
video

Business Intelligence (BI) tools such as Tableau and Microsoft Power BI are notoriously slow at extracting large query results from traditional data warehouses because they typically fetch the data in a single thread through a SQL endpoint that becomes a data transfer bottleneck. Data analysts can connect their BI tools to Databricks SQL endpoints to query data in tables through an ODBC/JDBC protocol integrated in our Simba drivers. With Cloud Fetch, which we released in Databricks Runtime 8.3 and Simba ODBC 2.6.17 driver, we introduce a new mechanism for fetching data in parallel via cloud storage such as AWS S3 and Azure Data Lake Storage to bring the data faster to BI tools. In our experiments using Cloud Fetch, we observed a 10x speed-up in extract performance due to parallelism.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Computational Data Governance at Scale

Computational Data Governance at Scale

2022-07-19 Watch
video

This talk is about the implementation of a Data Mesh in a Fozzy Group. In our experience, the biggest bottleneck in transition to Data Mesh is unclear data ownership. This and other issues can be solved with (federated) computational data governance. We will go through the process of building a global data lineage with 200k tables, 40k table replications, and 70k SQL stored procedures. Also, we will cover our lessons from building data product culture with explicit and automated tracking of ownership and data quality. Fozzy Group is a holding company that comprises about 40 different businesses with 60k employees in various domains: retail, banking, insurance, logistics, agriculture, HoReCa, E-Commerce, etc.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Databricks and Enterprise Observability with Overwatch

Databricks and Enterprise Observability with Overwatch

2022-07-19 Watch
video

Join us for a quick discussion and demonstration on how Overwatch can help you understand cost, utilization, workloads, and much more across the Databricks platform TODAY. You will understand what it is, how to get started, and we will explore several common observability questions Overwatch helps customers answer every day.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Databricks Meets Power BI

Databricks Meets Power BI

2022-07-19 Watch
video

Databricks and Spark are becoming increasingly popular and are now used as a modern data platform to analyze real-time or batch data. In addition, Databricks offers a great integration for machine learning developers.

Power BI, on the other hand, is a great platform for easy graphical analysis of data, and it's a great way to bring hundreds of different data sources together, analyze them together and make them accessible on any device.

So let's just bring both worlds together and see how well Databricks works with Power BI.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Data On-Board: The Aerospace Revolution

Data On-Board: The Aerospace Revolution

2022-07-19 Watch
video

From airplanes to satellites, through mission systems, the aerospace industry generates a huge amount of data waiting to be exploited. All this information is shaping new concepts and capabilities that will forever change the industry thanks to artificial intelligence: autonomous flight, fault prediction, automatic problem detection or energy efficiency among many others. To achieve this, we face countless challenges, such as the rigorous AI certification and trustworthiness, safety, data integrity and security, which will have to be faced in this exciting Airbus journey: welcome to the aerospace revolution!

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Data Policy in the Past, Present, and Future

Data Policy in the Past, Present, and Future

2022-07-19 Watch
video
Jacob Pasner, PhD (Senator Ron Wyden's technology policy team (Office of Senator Ron Wyden))

Federal, State, and Local governments are doing their best to play catch-up as the modern world is revolutionized by the collection and utilization of vast data repositories. This leaves them in the awkward position of implementing their own data modernization projects while simultaneously cleaning up the regulatory mess created by the widely accepted "Go fast and break things" mentality of industry. This talk draws up on Jacob Pasner, PhD's data science background and his yearlong experience as a Science and Technology Policy fellow on Senator Ron Wyden's technology policy team. The discussion will center on his work navigating the complex inner workings of Federal executive and legislative branch bureaucracies while drafting first-of-its kind government data legislation, advocating for congressional modernization projects, and leading data privacy oversight of industry. Note: The opinions presented here are Jacob Pasner's alone and do not represent the views of Senator Ron Wyden or his office.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Deep-Dive into Delta Lake

Deep-Dive into Delta Lake

2022-07-19 Watch
video

Delta Lake is becoming a defacto-standard for storing big amounts data for analytical purposes in a data lake. But what is behind it? How does it work under the hood? In this session you we will dive deep into the internals of Delta Lake by unpacking the transaction log and also highlight some common pitfalls when working with Delta Lake (and show how to avoid them).

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Detecting Financial Crime Using an Azure Advanced Analytics Platform and MLOps Approach

Detecting Financial Crime Using an Azure Advanced Analytics Platform and MLOps Approach

2022-07-19 Watch
video

As gatekeepers of the financial system, banks play a crucial role in reporting possible instances of financial crime. At the same time, criminals continuously reinvent their approaches to hide their activities among dense transaction data. In this talk, we describe the challenges of detecting money laundering and outline why employing machine learning via MLOps is critically important to identify complex and ever-changing patterns.

In anti-money-laundering, machine learning answers to a dire need for vigilance and efficiency where previous-generation systems fall short. We will demonstrate how our open platform facilitates a gradual migration towards a model-driven landscape, using the example of transaction-monitoring to showcase applications of supervised and unsupervised learning, human explainability, and model monitoring. This environment enables us to drive change from the ground up in how the bank understands its clients to detect financial crime.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Disrupting the Prescription Drug Market with AI and Data

Disrupting the Prescription Drug Market with AI and Data

2022-07-19 Watch
video

The US prescription drug market has known issues; overpriced, unaffordable drugs; razor-thin margins for pharmacies; widespread inefficiencies, and endemic lack of transparency.

AI and Data are key technologies to fix these issues and benefit patients, pharmacies, and providers. AI-driven price optimization brings price transparency and removes inefficiencies, making prescription drugs more affordable. Drug recommendations and personalization empowers consumers and providers with better choices, knowledge, and control.

To support all these solutions, we have built our intelligent Pharma AI and Data Platform. Based on the DataBricks’ platform, we delivered our AI and Data platform into production in 2.5 months, deploying our innovative AI Optimized Pricing models, supporting tens of thousands of pharmacies, and connecting millions of consumers. Continuing in our journey, we are building AI prescription recommender, medication adherence improver, and healthcare personalization.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Elixir: The Wickedly Awesome Batch and Stream Processing Language You Should Have in Your Toolbox

Elixir: The Wickedly Awesome Batch and Stream Processing Language You Should Have in Your Toolbox

2022-07-19 Watch
video

Elixir is an Erlang-VM bytecode-compatible programming language that is growing in popularity.

In this session I will show how you can apply Elixir towards solving data engineering problems in novel ways.

Examples include: • How to leverage Erlang's lightweight distributed process coordination to run clusters of workers across docker containers and perform data ingestion. • A framework that hooks Elixir functions as steps into Airflow graphs. • How to consume and process Kafka events directly within Elixir microservices.

For each of the above I'll show real system examples and walk through the key elements step by step. No prior familiarity with Erlang or Elixir will be required.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

From PostGIS to Spark SQL: The History and Future of Spatial SQL

From PostGIS to Spark SQL: The History and Future of Spatial SQL

2022-07-19 Watch
video

In this talk, we'll review the major milestones that have defined Spatial SQL as the powerful tool for geospatial analytics that it is today.

From the early foundations of the JTS Topology Suite and GEOS and its application on the PostGIS extension for PostgreSQL, to the latest implementation in Spark SQL using libraries such as the CARTO Analytics Toolbox for Databricks, Spatial SQL has been a key component of many geospatial analytics products and solutions, leveraging the computing power of different databases with SQL as lingua franca, allowing easy adoption by data scientists, analysts and engineers.

The latest innovation in this area is the CARTO Spatial Extension for Databricks, which makes the most of the near-unlimited scalability provided by Spark and the cutting-edge geospatial capabilities that CARTO offers.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

GIS Pipeline Acceleration with Apache Sedona

GIS Pipeline Acceleration with Apache Sedona

2022-07-19 Watch
video

In CKDelta, we ingest and process a massive amount of geospatial data. Using Apache Sedona together with Databricks have accelerated our data pipelines many times.

In this talk, we'll talk about migrating the existing pipelines to Sedona + PySpark and the pitfalls we encountered along the way.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Git for Data Lakes—How lakeFS Scales Data Versioning to Billions of Objects

Git for Data Lakes—How lakeFS Scales Data Versioning to Billions of Objects

2022-07-19 Watch
video

Modern data lake architectures rely on object storage as the single source of truth. We use them to store an increasing amount of data, which is increasingly complex and interconnected. While scalable, these object stores provide little safety guarantees: lacking semantics that allow atomicity, rollbacks, and reproducibility capabilities needed for data quality and resiliency.

lakeFS - an open source data version control system designed for Data Lakes solves these problems by introducing concepts borrowed from Git: branching, committing, merging and rolling back changes to data.

In this talk you'll learn about the challenges with using object storage for data lakes and how lakeFS enables you to solve them.

By the end of the session you’ll understand how lakeFS scales its Git-like data model to petabytes of data, across billions of objects - without affecting throughput or performance. We will also demo branching, writing data using Spark and merging it on a billion-object repository.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Graph-based stream processing

Graph-based stream processing

2022-07-19 Watch
video

The understanding of complex relationships and interdependencies between different data points is crucial to many decision-making processes.

Graph analytics have found their way into every major industry, from marketing and financial services to transportation. Fraud detection, recommendation engines and process optimization are some of the use cases where real-time decisions are mission-critical, and the underlying domain can be easily modeled as a graph.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Hassle-Free Data Ingestion into the Lakehouse

Hassle-Free Data Ingestion into the Lakehouse

2022-07-19 Watch
video

Ingesting data from hundreds of different data sources is critical before organizations can execute advanced analytics, data science, and machine learning. Unfortunately, ingesting and unifying this data to create a reliable single source of truth is usually extremely time-consuming and costly. In this session, discover how Databricks simplifies data ingestion, at low latency, with SQL-only ingestion capabilities. We will discuss and demonstrate how you can easily and quickly ingest any data into the lakehouse. The session will also cover newly-released features and tools that make data ingestion even simpler on the Databricks Lakehouse Platform.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

How AT&T Data Science Team Solved an Insurmountable Big Data Challenge on Databricks

How AT&T Data Science Team Solved an Insurmountable Big Data Challenge on Databricks

2022-07-19 Watch
video

Data driven personalization is an insurmountable challenge for AT&T’s data science team because of the size of datasets and complexity of data engineering. More often these data preparation tasks not only take several hours or days to complete but some of these tasks fail to complete affecting productivity.

In this session, the AT&T Data Science team will talk about how RAPIDS Accelerator for Apache Spark and Photon runtime on Databricks can be leveraged to process these extremely large datasets resulting in improved content recommendation, classification, etc while reducing infrastructure costs. The team will compare speedups and costs to the regular Databricks runtime Apache Spark environment. The size of tested datasets vary from 2TB - 50TB which consists of data collected from for 1 day to 31 days.

The talk will showcase the results from both RAPIDS accelerator for Apache Spark and Databricks Photon runtime.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

How Databricks is driving disruptive digital transformation in the airline industry

How Databricks is driving disruptive digital transformation in the airline industry

2022-07-19 Watch
video

In today’s business climate, leaders need to embrace continual change to stay ahead of potential impacts to their business. For Wizz Air, the fastest-growing airline in Europe, embracing digital transformation has been key to overcoming critical challenges: COVID-related travel restrictions, unpredictable fuel costs, and even war. By migrating and modernizing data with Databricks, Wizz Air is accelerating its transformation at massive scale, driving substantial operational efficiencies and better customer experiences. Join a conversation between leaders from Avanade and Wizz Air to learn how Databricks is helping take the airline to new heights.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

How EPRI Uses Computer Vision to Mitigate Wildfire Risks for Electric Utilities

How EPRI Uses Computer Vision to Mitigate Wildfire Risks for Electric Utilities

2022-07-19 Watch
video

For this talk, Labelbox has invited the Electric Power and Research Institute (EPRI) to share information about how it is using computer vision, drone technology, and Labelbox’s training data platform to reduce wildfire risks innate to electricity delivery. This talk is a great starting point for any data teams tackling difficult computer vision projects. The Labelbox team will demonstrate how teams can produce their own annotated datasets like EPRI did, and import them into the Lakehouse for AI with the Labelbox Connector for Databricks.

Mechanical failures from overhead electrical infrastructure, in certain environments, are described in utility wildfire mitigation plans as potential ignition concerns. The utility industry is evaluating drones and new inspection technologies that may support more efficient and timely identification of such at risk assets. EPRI will present several of its AI initiatives and their impact on wildfire prevention and proper maintenance of power lines.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

How McAfee Leverages Databricks on AWS at Scale

How McAfee Leverages Databricks on AWS at Scale

2022-07-19 Watch
video

McAfee, a global leader in online protection security enables home users and businesses to stay ahead of fileless attacks, viruses, malware, and other online threats. Learn how McAfee leverages Databricks on AWS to create a centralized data platform as a single source of truth to power customer insights. We will also describe how McAfee uses additional AWS services specifically Kinesis and CloudWatch to provide real time data streaming and monitor and optimize their Databricks on AWS deployment. Finally, we’ll discuss business benefits and lessons learned during McAfee’s petabyte scale migration to Databricks on AWS using Databricks Delta clone technology coupled with network, compute, storage optimizations on AWS.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

How Robinhood Built a Streaming Lakehouse to Bring Data Freshness from 24h to Less Than 15 Mins

How Robinhood Built a Streaming Lakehouse to Bring Data Freshness from 24h to Less Than 15 Mins

2022-07-19 Watch
video

Robinhood’s data lake is the bedrock foundation that powers business analytics, product experimentation, and other machine learning applications throughout our organization. Come join this session where we will share our journey of building a scalable streaming data lakehouse with Spark, Postgres and other leading open source technologies.

We will lay out our architecture in depth and describe how we perform CDC streaming ingestion and incremental processing of 1000’s of Postgres tables into our data lake.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

How socat and UNIX Pipes Can Help Data Integration

How socat and UNIX Pipes Can Help Data Integration

2022-07-19 Watch
video

Nearly every developer is familiar with creating a CLI. Containerized CLIs provide a flexible, cross-language standard with a low barrier to entry for open-source contributors. The ETL process can be reduced to two CLIs: one that reads data and one that writes data. While this interface is simple enough to implement from the contributor’s side, Kubernetes’ distributed nature means orchestrating data transfer between the CLIs on Kubernetes presents an unsolved problem.

This talk describes a novel approach to reliably orchestrate CLIs on Kubernetes for data integration. Through this lens, we go through the evaluation of strategies and describe the pros and cons of each architecture for horizontally scaling containerised data integration workflows on Kubernetes. We also cover the journey of implementing a TCP-based “process” abstraction over CLIs using socat and UNIX pipes. This same approach powers all of Airbyte’s Kubernetes deployments and helps sync TBs of data daily.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

How the Largest County in the US is Transforming Hiring with a Modern Data Lakehouse

How the Largest County in the US is Transforming Hiring with a Modern Data Lakehouse

2022-07-19 Watch
video

Los Angeles County’s Department of Human Resources (DHR) is responsible for attracting a diverse workforce for the 37 departments it supports. Each year, DHR processes upwards of 400,000 applications for job opportunities making it one of the largest employers in the nation. Managing a hiring process of this scale is complex with many complicated factors such as background checks and skills examination. These processes, if not managed properly, can create bottlenecks and a poor experience for both candidates and hiring managers.

In order to identify areas for improvement, DHR set out to build detailed operational metrics across each stage of the hiring process. DHR used to conduct high level analysis manually using excel and other disparate tools. The data itself was limited, difficult to obtain, and analyze. In addition, it was taking analysts weeks to manually pull data from half a dozen siloed systems into excel for cleansing and analysis. This process was labor-intensive, inefficient, and prone to human error.

To overcome these challenges, DHR in partnership with Internal Services Department (ISD) adopted a modern data architecture in the cloud. Powered by the Azure Databricks Lakehouse, DHR was able to bring together their diverse volumes of data into a single platform for data analytics. Manual ETL processes that took weeks could now be automated in 10 minutes or less. With this new architecture, DHR has built Business Intelligence dashboards to unpack the hiring process to get a clear picture of where the bottlenecks are and track the speed with which candidates move through the process The dashboards allow the County departments innovate and make changes to enhance and improve the experience of potential job seekers and improve the timeliness of securing highly qualified and diverse County personnel at all employment levels.

In this talk, we’ll discuss DHR’s journey towards building a data-driven hiring process, the architecture decisions that enabled this transformation and the types of analytics that we’ve deployed to improve hiring efforts.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

How to Build a Complete Security and Governance Solution Using Unity Catalog

How to Build a Complete Security and Governance Solution Using Unity Catalog

2022-07-19 Watch
video

Unity Catalog unifies governance and security for Databricks in one place. It can store data classifications and privileges and enforce them.

This talk will go into the details of Unity Catalog and explains the core building blocks in Unity Catalog for Security and Governance. I will also explain how Privacera translates Apache Ranger policies into native policies of Unity Catalogs, audits are collected from Unity Catalog and imported into the centralized Audit Store of Apache Ranger, and Privacera can extend Unity Catalog.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Ingesting data into Lakehouse with COPY INTO

Ingesting data into Lakehouse with COPY INTO

2022-07-19 Watch
video

COPY INTO is a popular data ingestion SQL command for Databricks users, especially for customers using Databricks SQL. In this talk, we want to discuss the data ingestion use cases in Databricks and how COPY INTO fits your data ingestion needs. We will discuss a few new COPY INTO features and how to achieve the following use cases: 1. Loading data into a Delta Table incrementally ; 2. Fixing errors in already loaded data and helping you with data cleansing; 3. Evolving your schema over time; 4. Previewing data before ingesting; 5. Loading data from a third party data source. In this session, we will demo the new features, discuss the architecture for the implementation, and how other Databricks features are using COPY INTO under the hood.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Integrating Apache Superset into a B2B Platform: Why and How

Integrating Apache Superset into a B2B Platform: Why and How

2022-07-19 Watch
video

Our IT team creates a portal for managing a pizzeria franchise business. This portal is a rather large and unwieldy b2b system that has been developing for more than 10 years.

Our partners need dashboards to manage their business. These dashboards must be fully integrated into the portal. This is the job for our data engineers!

In this talk, I will tell you how and why we chose Apache Superset, what difficulties we encountered during integration and what refinements we had to make to achieve this goal.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/