Analytics

Live from the Lakehouse: LLMs, LangChain, and analytics engineering workflow with dbt Labs

2023-07-14 · Databricks DATA + AI Summit 2023 Watch

video

by Drew Banin (Fishtown Analytics) , Nicolas Palaez (Databricks) , Harrison Chase (LangChain)

AI/ML Analytics Engineering Data Lakehouse Databricks dbt GenAI LLM Marketing

Hear from three guests. Harrison Chase (CEO, LangChain) and Nicolas Palaez (Sr. Technical Marketing Manager, Databricks) on LLMs and generative AI. Third guest, Drew Banin (co-founder, dbt Labs), discusses analytics engineering workflow with his company dbt Labs, how he started the company, and how they provide value with the Databricks partnership. Hosted by Ari Kaplan (Head of Evangelism, Databricks) and Pearl Ubaru (Sr Technical Marketing Engineer, Databricks)

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Cutting the Edge in Fighting Cybercrime: Reverse-Engineering a Search Language to Cross-Compile

2022-07-22 · Databricks DATA + AI Summit 2023 Watch

video

Data Lakehouse Databricks Delta Scala Cyber Security Spark

Traditional cybersecurity Security Information and Event Management (SIEM) ways do not scale well for data sources with 30TiB per day, leading HSBC to create a Cybersecurity Lakehouse with Delta and Spark. Creating a platform to overcome several conventional technical constraints, the limitation in the amount of data for long-term analytics available in traditional platforms and query languages is difficult to scale and time-consuming to run. In this talk, we’ll learn how to implement (or actually reverse-engineer) a language with Scala and translate it into what Apache Spark understands, the Catalyst engine. We’ll guide you through the technical journey of building equivalents of a query language into Spark. We’ll learn how HSBC business benefited from this cutting-edge innovation, like decreasing time and resources for Cyber data processing migration, improving Cyber threat Incident Response, and fast onboarding of HSBC Cyber Analysts on Spark with Cybersecurity Lakehouse platform.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Applied Predictive Maintenance in Aviation: Without Sensor Data

2022-07-19 · Databricks DATA + AI Summit 2023 Watch

video

Azure Data Lakehouse Databricks

We will show how using Azure Databricks Lakehouse is modernizing our data & analytics environment which has given us new capability to create custom predictive models for hundreds of families of aircraft components without sensor data. We currently have over 95% success rate with over $1.3 million in avoided operational impact costs in FY21.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Cloud and Data Science Modernization of Veterans Affairs Financial Service Center with Azure Databri

2022-07-19 · Databricks DATA + AI Summit 2023 Watch

video

AI/ML Azure Cloud Computing Dashboard Data Analytics Data Science Databricks

The Department of Veterans Affairs (VA) is home to over 420,000 employees, provides health care for 9.16 million enrollees and manages the benefits of 5.75 million recipients. The VA also hosts an array of financial management, professional, and administrative services at their Financial Service Center (FSC), located in Austin, Texas. The FSC is divided into various service groups organized around revenue centers and product lines, including the Data Analytics Service (DAS). To support the VA mission, in 2021 FSC DAS continued to press forward with their cloud modernization efforts, successfully achieving four key accomplishments:

Office of Community Care (OCC) Financial Time Series Forecast - Financial forecasting enhancements to predict claims CFO Dashboard - Productivity and capability enhancements for financial and audit analytics Datasets Migrated to the Cloud - Migration of on-prem datasets to the cloud for down-stream analytics (includes a supply chain proof-of-concept) Data Science Hackathon - A hackathon to predict bad claims codes and demonstrate DAS abilities to accelerate a ML use case using Databricks AutoML

This talk discusses FSC DAS’ cloud and data science modernization accomplishments in 2021, lessons learned, and what’s ahead.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Technical and Tactical Football Analysis Through Data

2022-07-19 · Databricks DATA + AI Summit 2023 Watch

video

Databricks

How LaLiga uses and combines eventing and tracking data to implement novel analytics and metrics, thus helping analysts to better understand the technical and tactical aspects of their clubs.

This presentation will explain the treatment of these data and its subsequent use to create metrics and analytical models.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

A Low-Code Approach to 10x Data Engineering

2022-07-19 · Databricks DATA + AI Summit 2023 Watch

video

Data Engineering Databricks DevOps Git PySpark Spark

Can we take Data Engineering on Spark 10x beyond where it is today?

Yes, we can enable 10x more users on Spark, and make them 10x more productive from day 1. Data engineering can run at scale, and it can still be 10x simpler and faster to develop, deploy, and manage pipelines.

Low code is the key. A modern data engineering platform built on low code will enable all data users, from new graduates to experts, to visually develop high-quality pipelines. With Visual = Code, the visual elements will be stored as PySpark code on Git and deployed using the best software practices taken from DevOps. Search and lineage help data engineers and their customers in analytics understand how each column value was produced, when it was updated, and the associated quality metric.

See how a complete, low-code data engineering platform can reduce complexity and effort, enabling you to rapidly deploy, scale, and use Spark, making data and analytics a strategic asset in your company.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

How to Implement a Semantic Layer for Your Lakehouse

2022-07-19 · Databricks DATA + AI Summit 2023 Watch

video

Data Lakehouse Databricks

Learn how semantic layers are becoming a critical component of a modern analytics stack.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

How To Use Databricks SQL for Analytics on Your Lakehouse

2022-07-19 · Databricks DATA + AI Summit 2023 Watch

video

BI Cloud Computing Data Lake Data Lakehouse Databricks DWH Power BI SQL Tableau

Most organizations run complex cloud data architectures that silo applications, users, and data. As a result, most analysis is performed with stale data and there isn’t a single source of truth of data for analytics.

Join this interactive follow-along deep dive demo to learn how Databricks SQL allows you to operate a multicloud lakehouse architecture that delivers data warehouse performance at data lake economics — with up to 12x better price/performance than traditional cloud data warehouses. Now data analysts and scientists can work with the freshest and most complete data and quickly derive new insights for accurate decision-making.

Here’s what we’ll cover: • Managing data access and permissions and monitoring how the data is being used and accessed in real time across your entire lakehouse infrastructure • Configuring and managing compute resources for fast performance, low latency, and high user concurrency to your data lake • Creating and working with queries, dashboards, query refresh, troubleshooting features and alerts • Creating connections to third-party BI and database tools (Power BI, Tableau, DbVisualizer, etc.) so that you can query your lakehouse without making changes to your analytical and dashboarding workflows

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Migrate and Modernize your Data Platform with Confluent and Databricks

2022-07-19 · Databricks DATA + AI Summit 2023 Watch

video

Cloud Computing Databricks Netezza Oracle

Moving and building in the cloud to accelerate analytics development requires enterprises to rethink their data infrastructure. Whether you are moving from an on-prem legacy system or you were born in the cloud, businesses are turning to Confluent and Databricks to help them unlock new real-time customer experiences and intelligence for their backend operations.

Join us to see how Confluent and Databricks enable companies to set data in motion across any system, at any scale, in near real-time. Connecting Confluent with Databricks allows companies to migrate and connect data from on-prem databases and data warehouses like Netezza, Oracle, and Cloudera to Databricks in the cloud to power real-time analytics.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Operational Analytics: Expanding the Reach of Data in the Lakehouse Era

2022-07-19 · Databricks DATA + AI Summit 2023 Watch

video

Data Lake Data Lakehouse Databricks ETL/ELT

Organizations want data lakes to be the source of truth for analytics. But operational teams rarely recognize the power the data lake, shortening the reach of all the valuable data within it. Instead, these business users often treat operational tools like Salesforce, Marketo, and NetSuite as their source of truth.

The reality is lakehouses and operational tools alike have missed critical pieces of data and don’t provide the full customer picture. Operational Analytics solves this last mile problem by making it possible to sync transformed data directly from your data lake back into these systems to expand the reach of your data.

In this talk you’ll learn: - What Operational Analytics & Reverse ETL are and why they're taking off - How Operational Analytics helps companies today activate and expand the reach of their data - Real-life use cases from companies using Operational Analytics to empower their data teams & give them the seat at the table they deserve

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Power to the (SQL) People: Python UDFs in DBSQL

2022-07-19 · Databricks DATA + AI Summit 2023 Watch

video

BI Cloud Computing Data Lakehouse Databricks dbt Python SQL

Databricks SQL (DB SQL) allows customers to leverage the simple and powerful Lakehouse architecture with up to 12x better price/performance compared to traditional cloud data warehouses. Analysts can use standard SQL to easily query data and share insights using a query editor, dashboards or a BI tool of their choice, and analytics engineers can build and maintain efficient data pipelines, including with tools like dbt.

While SQL is great at querying and transforming data, sometimes you need to extend its capabilities with the power of Python, a full programming language. Users of Databricks notebooks already enjoy seamlessly mixing SQL, Python and several other programming languages. Use cases include masking or encrypting and decrypting sensitive data, complex transformation logic, using popular open source libraries or simply reusing code that has already been written elsewhere in Databricks. In many cases, it is simply prohibitive or even impossible to rewrite the logic in SQL.

Up to now, there was no way to use Python from within DBSQL. We are removing this restriction with the introduction of Python User Defined Functions (UDFs). DBSQL users can now create, manage and use Python UDFs using standard SQL. UDFs are registered in Unity Catalog, which means they can be governed and used throughout Databricks, including in notebooks.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Amgen’s Journey To Building a Global 360 View of its Customers with the Lakehouse

2022-07-19 · Databricks DATA + AI Summit 2023 Watch

video

Agile/Scrum AI/ML AWS Cloud Computing Data Engineering Data Lakehouse Data Quality Databricks Delta DevOps Git Spark +1 more

Serving patients in over 100 countries, Amgen is a leading global biotech company focused on developing therapies that have the power to save lives. Delivering on this mission requires our commercial teams to regularly meet with healthcare providers to discuss new treatments that can help patients in need. With the onset of the pandemic, where face-to-face interactions with doctors and other Healthcare Providers (HCPs) were severely impacted, Amgen had to rethink these interactions. With that in mind, the Amgen Commercial Data and Analytics team leveraged a modern data and AI architecture built on the Databricks Lakehouse to help accelerate its digital and data insights capabilities. This foundation enabled Amgen’s teams to develop a comprehensive, customer-centric view to support flexible go-to-market models and provide personalized experiences to our customers. In this presentation, we will share our recent journey of how we took an agile approach to bringing together over 2.2 petabytes of internally generated and externally sourced vendor data , and onboard into our AWS Cloud and Databricks environments to enable a standardized, scalable and robust capabilities to meet the business requirements in our fast-changing life sciences environment. We will share use cases of how we harmonized and managed our diverse sets of data to deliver efficiency, simplification, and performance outcomes for the business. We will cover the following aspects of our journey along with best practices we learned over time: • Our architecture to support Amgen’s Commercial Data & Analytics constant processing around the globe • Engineering best practices for building large scale Data Lakes and Analytics platforms such as Team organization, Data Ingestion and Data Quality Frameworks, DevOps Toolkit and Maturity Frameworks, and more • Databricks capabilities adopted such as Delta Lake, Workspace policies, SQL workspace endpoints, and MLflow for model registry and deployment. Also, various tools were built for Databricks workspace administration • Databricks capabilities being explored for future, such as Multi-task Orchestration, Container-based Apache Spark Processing, Feature Store, Repos for Git integration, etc. • The types of commercial analytics use cases we are building on the Databricks Lakehouse platform Attendees building global and Enterprise scale data engineering solutions to meet diverse sets of business requirements will benefit from learning about our journey. Technologists will learn how we addressed specific Business problems via reusable capabilities built to maximize value.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Introduction to Flux and OSS Replication

2022-07-19 · Databricks DATA + AI Summit 2023 Watch

video

Cloud Computing Databricks

In this breakout session we’ll learn about Flux, the data scripting and query language for InfluxDB. InfluxDB is the leading time series database platform. With Flux you can perform time series lifecycle management tasks, data preparation and analytics, alert tasks, and more. InfluxDB has two offerings: InfluxDB Cloud and InfluxDB OSS. Finally, we’ll learn about how you can use Flux and the replication tool to consolidate data from your OSS instances running at the edge to InfluxDB Cloud.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Leveraging ML-Powered Analytics for Rapid Insights and Action (a demonstration)

2022-07-19 · Databricks DATA + AI Summit 2023 Watch

video

AI/ML Data Lakehouse Databricks Modern Data Stack

The modern data stack makes it possible to query high-volume data with extremely high granularity, dimensionality, and cardinality. Operationalized machine learning is a great way to address this complex data, focusing the scope of analyst inquiry and quickly exposing dimensions, groups, and sub-groups of data with the greatest impact on key metrics.

This session will discuss how to leverage operationalized AI/ML to automatically define millions of features and perform billions of simultaneous hypothesis tests across a wide dataset to identify key drivers of metric change. A technical demonstration will include an overview of leveraging the Databricks Lakehouse using Sisu’s AI/ML-powered decision intelligence platform: connecting to Databricks, defining metrics, automated AI/ML-powered analysis, and exposing actionable business insights.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Live Analytics: The next user engagement frontier

2022-07-19 · Databricks DATA + AI Summit 2023 Watch

video

Databricks Thoughtspot

The last couple years have put a new lens on how organizations approach analytics - day-old data became useless, and only in-the-moment-insights became relevant, pushing data and analytics teams to their breaking point. The results: everyone has fast forwarded in their transformation and modernization plans, and it's also made us look differently at who engages with data and how.

At ThoughtSpot, we believe analytics is not just for data people. It’s for everyone - everywhere. Join us in this session to: Learn how to transform the user experience with self-service, interactive analytics Get real-life tips on implementing a modern analytics strategy See a demo of Live Analytics in ThoughtSpot Hear how Norwegian airline Flyr is resetting analytics in their industry by putting data first

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Low-Code Machine Learning on Databricks with AutoML

2022-07-19 · Databricks DATA + AI Summit 2023 Watch

video

AI/ML Databricks

Teams across an organization should be able to use predictive analytics for their business. While there are data scientists and data engineers who can leverage code to build ML models, there are domain experts and analysts who can benefit from low-code tools to build ML solutions.

Join this session to learn how you can leverage Databricks AutoML and other low-code tools to build, train and deploy ML models into production. Additionally, Databricks takes a unique glass-box approach, so you can take the code behind ML model and tweak further to fine-tune performance and integrate into production systems. See these capabilities in action and learn how Databricks empowers users of varying levels of expertise to build ML solutions.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Migrating Complex SAS Processes to Databricks - Case Study

2022-07-19 · Databricks DATA + AI Summit 2023 Watch

video

Cloud Computing Data Management Data Science Databricks ETL/ELT SAS Spark SQL

Many federal agencies use SAS software for critical operational data processes. While SAS has historically been a leader in analytics, it has often been used by data analysts for ETL purposes as well. However, modern data science demands on ever-increasing volumes and types of data require a shift to modern, cloud architectures and data management tools and paradigms for ETL/ELT. In this presentation, we will provide a case study at Centers for Medicare and Medicaid Services (CMS) detailing the approach and results of migrating a large, complex legacy SAS process to modern, open-source/open-standard technology - Spark SQL & Databricks – to produce results ~75% faster without reliance on proprietary constructs of the SAS language, with more scalability, and in a manner that can more easily ingest old rules and better govern the inclusion of new rules and data definitions. Significant technical and business benefits derived from this modernization effort are described in this session.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Modern Architecture of a Cloud-Enabled Data and Analytics Platform

2022-07-19 · Databricks DATA + AI Summit 2023 Watch

video

Agile/Scrum Cloud Computing Databricks

In today’s modern IT organization whether it is the delivery of a sophisticated analytical model or a product advancement decision or understanding the behavior of a customer, the fact remains that in every instance we rely on data to make good, informed decisions. Given this backdrop, having an architecture which supports the ability to efficiently collect data from a wide range of sources within the company is still an important goal of all data organizations.

In this session we will explain how Bayer has deployed a hybrid data platform which strives to integrate key existing legacy data systems of the past while taking full advantage of what a modern cloud data platform has to offer in terms of scalability and flexibility. It will elaborate the use of its most significant component, Databricks, which serves to provide not only a very sophisticated data pipelining solution but also a complete ecosystem for teams to create data and analytical solutions in a flexible and agile way.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Monitoring and Quality Assurance of Complex ML Deployments via Assertions

2022-07-19 · Databricks DATA + AI Summit 2023 Watch

video

by Daniel Kang

AI/ML Databricks

Machine Learning (ML) is increasingly being deployed in complex situations by teams. While much research effort has focused on the training and validation stages, other parts have been neglected by the research community.

In this talk, Daniel Kang will describe two abstractions (model assertions and learned observation assertions) that allow users to input domain knowledge to find errors at deployment time and in labeling pipelines. He will show real-world errors in labels and ML models deployed in autonomous vehicles, visual analytics, and ECG classification that these abstractions can find. I'll further describe how they can be used to improve model quality by up to 2x at a fixed labeling budget. This work is being conducted jointly with researchers from Stanford University and Toyota Research Institute.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Mosaic: A Framework for Geospatial Analytics at Scale

2022-07-19 · Databricks DATA + AI Summit 2023 Watch

video

API Databricks HTML Java PySpark Spark

In this session we’ll present Mosaic, a new Databricks Labs project with a geospatial flavour.

Mosaic provides users of Spark and Databricks with a unified framework for distributing geospatial analytics. Users can choose to employ existing Java-based tools such as JTS or Esri's Geometry API for Java and Mosaic will handle the task of parallelizing these tools' operations: e.g. efficiently reading and writing geospatial data and performing spatial functions on geometries. Mosaic helps users scale these operations by providing spatial indexing capabilities (using, for example, Uber's H3 library) and advanced techniques for optimising common point-in-polygon and polygon-polygon intersection operations.

The development of Mosaic builds upon techniques developed with Ordnance Survey (the central hub for geospatial data across UK Government) and described in this blog post: https://databricks.com/blog/2021/10/11/efficient-point-in-polygon-joins-via-pyspark-and-bng-geospatial-indexing.html

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

talk-data.com

Activity Trend

Top Events

Top Speakers

Live from the Lakehouse: LLMs, LangChain, and analytics engineering workflow with dbt Labs

Cutting the Edge in Fighting Cybercrime: Reverse-Engineering a Search Language to Cross-Compile

Applied Predictive Maintenance in Aviation: Without Sensor Data

Cloud and Data Science Modernization of Veterans Affairs Financial Service Center with Azure Databri

Technical and Tactical Football Analysis Through Data

A Low-Code Approach to 10x Data Engineering

How to Implement a Semantic Layer for Your Lakehouse

How To Use Databricks SQL for Analytics on Your Lakehouse

Migrate and Modernize your Data Platform with Confluent and Databricks

Operational Analytics: Expanding the Reach of Data in the Lakehouse Era

Power to the (SQL) People: Python UDFs in DBSQL

Amgen’s Journey To Building a Global 360 View of its Customers with the Lakehouse

Introduction to Flux and OSS Replication

Leveraging ML-Powered Analytics for Rapid Insights and Action (a demonstration)

Live Analytics: The next user engagement frontier

Low-Code Machine Learning on Databricks with AutoML

Migrating Complex SAS Processes to Databricks - Case Study

Modern Architecture of a Cloud-Enabled Data and Analytics Platform

Monitoring and Quality Assurance of Complex ML Deployments via Assertions

Mosaic: A Framework for Geospatial Analytics at Scale