talk-data.com talk-data.com

Topic

Databricks

big_data analytics spark

1286

tagged

Activity Trend

515 peak/qtr
2020-Q1 2026-Q1

Activities

1286 activities · Newest first

If a Duck Quacks in the Forest and Everyone Hears, Should You Care?

YES! "Duck posting" has become an internet meme for praising DuckDB on Twitter. Nearly every quack using DuckDB has done it once or twice. But, why all the fuss? With advances in CPUs, memory, SSDs, and the software that enables it all, our personal machines are powerful beasts relegated to handling a few Chrome tabs and sitting 90% idle. As data engineers and data analysts, this seems like a waste that's not only expensive, but also impacting the environment.

In this session, you will see how DuckDB brings SQL analytics capabilities to a 2MB standalone executable on your laptop that only recently required a large cluster. This session will explain the architecture of DuckDB that enables high performance analytics on a laptop: great query optimization, vectorized execution, continuous improvements in compression and more. We will show its capabilities using live demos, from the pandas library to WASM, to the command-line. We'll demonstrate performance on large datasets, and talk about how we're exploring using the laptop to augment cloud analytics workloads.

Talk by: Ryan Boyd

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Using Lakehouse to Fight Cancer:Ontada’s Journey to Establish a RWD Platform on Databricks Lakehouse

Ontada, a McKesson business, is an oncology real-world data and evidence, clinical education and provider of technology business dedicated to transforming the fight against cancer. Core to Ontada’s mission is using real-world data (RWD) and evidence generation to improve patient health outcomes and to accelerate life science research.

To support its mission, Ontada embarked on a journey to migrate its enterprise data warehouse (EDW) from an on-premise Oracle database to Databricks Lakehouse. This move allows Ontada to now consume data from any source, including structured and unstructured data from its own EHR and genomics lab results, and realize faster time to insight. In addition, using the Lakehouse has helped Ontada eliminate data silos, enabling the organization to realize the full potential of RWD – from running traditional descriptive analytics to extracting biomarkers from unstructured data. The session will cover the following topics:

  • Oracle to Databricks: migration best practices and lessons learned
  • People, process, and tools: expediting innovation while protecting patient information using Unity Catalog
  • Getting the most out of the Databricks Lakehouse: from BI to genomics, running all analytics under one platform
  • Hyperscale biomarker abstraction: reducing the manual effort needed to extract biomarkers from large unstructured data (medical notes, scanned/faxed documents) using spaCY and John Snow Lab NLP libraries

Join this session to hear how Ontada is transforming RWD to deliver safe and effective cancer treatment.

Talk by: Donghwa Kim

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

AI Regulation is Coming: The EU AI Act and How Databricks Can Help with Compliance

With the heightened attention on LLMs and what they can do, and the widening impact of AI on day-to-day life, the push by regulators across the globe to regulate AI is intensifying. As with GDPR in the privacy realm, the EU is leading the way with the EU Artificial Intelligence Act (AIA). Regulators everywhere will be looking to the AIA as precedent, and understanding the requirements imposed by the AIA is important for all players in the AI channel. Although not finalized, the basic framework regarding how the AIA will work is becoming clearer. The impact on developers and deployers of AI (‘providers’ and ‘users’ under the AIA) will be substantial. Although the AIA will probably not go into effect until early 2025, AI applications developed today will likely be affected, and design and development decisions made now should take the future regulations into account. In this session, we Matteo Quattrocchi, Brussels-based Director, Policy – EMEA, for BSA (the Software Alliance – the leading advocacy organization representing the enterprise software sector), will present an overview of the current proposed requirements under the AIA and give an update on the ongoing deliberations and likely timing for enactment. We will also highlight some of the ways the Lakehouse platform, including Managed MLflow, can help providers and users of ML-based applications meet the requirements of the AIA and other upcoming AI regulations.

Talk by: Matteo Quattrocchi and Scott Starbird

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Data Architecture: IQVIA's Migration to Databricks Lakehouse for High-Performance Analytics

As the healthcare and life science (HLS) industry has grown and evolved, a need has emerged for scalable and cost-effective ETL solutions capable of processing billions of records at terabyte scale. IQVIA has the largest global healthcare data networks in the world, with over one million data sources providing access to 1.2B non-identified patient records and 100 billion healthcare records processed annually in over 100 countries. IQVIA’s ability to combine, centralize, and integrate various sources of HLS data enables clinical-to-commercial operational intelligence and omnichannel analytics for its clients. Databricks Lakehouse allows IQVIA to onboard the rapidly growing number of clients while delivering strong business value to customers, cost-efficiently and at scale. 

During this session, you will learn more about how IQVIA is leveraging Databricks Lakehouse as well as how HLS organizations can soon access IQVIA data assets though the Databricks Marketplace for quick and secure data sharing.

Talk by: Venkat Dasari and William Zanine

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Data Caching Strategies for Data Analytics and AI

he increasing popularity of data analytics and artificial intelligence (AI) has led to a dramatic increase in the volume of data being used in these fields, creating a growing need for an enhanced computational capability. Cache plays a crucial role as an accelerator for data and AI computations, but it is important to note that these domains have different data access patterns, requiring different cache strategies. In this session, you will see our observations on data access patterns in the analytical SQL and AI training domains based on practical experience with large-scale systems. We will discuss the evaluation results of various caching strategies for analytical SQL and AI and provide caching recommendations for different use cases. Over the years, we have learned some best practices from big internet companies about the following aspects of our journey:

  1. Traffic pattern for analytical SQL and cache strategy recommendation
  2. Traffic pattern for AI training and how we can measure the cache efficiency for different AI training process
  3. Cache capacity planning based on real-time metrics of the working set
  4. Adaptive caching admission and eviction for uncertain traffic patterns

Talk by: Chunxu Tang and Beinan Wang

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp Databricks named a Leader in 2022 Gartner® Magic QuadrantTM CDBMS: https://dbricks.co/3phw20d

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Data Democratization at Michelin

Too often business decisions in large organizations are based on time consuming and labor-intensive data extracts, fragile Excel or access sheets that require significant manual intervention. The teams that prepare these manual reports have invaluable heuristic knowledge that, when combined with meaningful data and tools, can make smart business decisions. Imagine a world where these business teams are empowered with tools that help them build meaningful reports despite their limited technical expertise.

In this session, we will discuss: - The value derived from investing in developing citizen data personas within a business organization - How we successfully built a citizen data analytics culture within Michelin - Real examples of the impact of this initiative on the business and on the people themselves

The audience will walk away with some convincing arguments for building a citizen data culture in their organization and a how-to cookbook that they can use to cultivate citizen data personas. Finally, they can interactively uncover key success factors in the case of Michelin that can help drive a similar initiative in their respective companies.

Talk by: Philippe Leonhart and Fabien Cochet

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Delta-rs, Apache Arrow, Polars, WASM: Is Rust the Future of Analytics?

Rust is a unique language whose traits make it very appealing for data engineering. In this session, we'll walk through the different aspects of the language that make it such a good fit for big data processing including: how it improves performance and how it provides greater safety guarantees and compatibility with a wide range of existing tools that make it well positioned to become a major building block for the future of analytics.

We will also take a hands-on look through real code examples at a few emerging technologies built on top of Rust that utilize these capabilities, and learn how to apply them to our modern lakehouse architecture.

Talk by: Oz Katz

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Democratize AI & ML in a Large Company: The Importance of User Enablement & Technical Training

The biggest critical factor to success in a cloud transformation is people. As such, having a change management process in place to manage the impact of the transformation and user enablement is foundational to any large program. In this session, we will dive into how TD bank democratizes data, mobilizes a community of over 2000 analytics users and the tactics we used to successfully enable new use cases on Cloud. The session will focus on the following:

To democratize data: - Centralize a data platform that is accessible to all employees and allow for easy data sharing - Implement privacy and security to protect data and use data ethically - Compliance and governance for using data in responsible and compliant way - Simplification of processes and procedures to reduce redundancy and faster adoption

To mobilize end users: - Increase data literacy: provide training and resources for employees to increase their abilities and skills - Foster a culture of collaboration and openness: cross-functional teams to collaborate and share ideas - Encourage exploration of innovative ideas that impact the organization's values and customers technical enablement and adoption tactics we've used at TD Bank:

  1. Hands-on training for over 1300+ analytics users with emphasis on learn by doing, to relate to real-life situations
  2. Online tutorials and documentations to be used as self-paced study
  3. Workshops and office hours on specific topics to empower business users
  4. Coaching to work with teams on a specific use case/complex issue and provide recommendations for a faster, cost effective solutions
  5. Offer certification and encourage continuous education for employees to keep up to date with latest
  6. Feedback loop: get user feedback on training and user experience to improve future trainings

Talk by: Ellie Hajarian

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Five Things You Didn't Know You Could Do with Databricks Workflows

Databricks workflows has come a long way since the initial days of orchestrating simple notebooks and jar/wheel files. Now we can orchestrate multi-task jobs and create a chain of tasks with lineage and DAG with either fan-in or fan-out among multiple other patterns or even run another Databricks job directly inside another job.

Databricks workflows takes its tag: “orchestrate anything anywhere” pretty seriously and is a truly fully-managed, cloud-native orchestrator to orchestrate diverse workloads like Delta Live Tables, SQL, Notebooks, Jars, Python Wheels, dbt, SQL, Apache Spark™, ML pipelines with excellent monitoring, alerting and observability capabilities as well. Basically, it is a one-stop product for all orchestration needs for an efficient lakehouse. And what is even better is, it gives full flexibility of running your jobs in a cloud-agnostic and cloud-independent way and is available across AWS, Azure and GCP.

In this session, we will discuss and deep dive on some of the very interesting features and will showcase end-to-end demos of the features which will allow you to take full advantage of Databricks workflows for orchestrating the lakehouse.

Talk by: Prashanth Babu

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Improving Hospital Operations with Streaming Data and Real Time AI/ML

Over the past two years, Providence has developed a robust streaming data platform (SDP) leveraging Databricks in Azure. The SDP enables us to ingest and process real-time data reflecting clinical operations across our 52 hospitals and roughly 1000 ambulatory clinics. The HL7 messages generated by Epic are parsed using Databricks in our secure cloud environment and used to generate an up-to-the minute picture of exactly what is happening at the point of care.

We are already leveraging this information to minimize hospital overcrowding and have been actively integrating AI/ML to accurately forecast future conditions (e.g., arrivals, length of stay, acuity, and discharge requirements.) This allows us to both improve resource utilization (e.g., nurse staffing levels) and to optimize patient throughput. The result is both improved patient care and operational efficiency.

In this session, we will share how these outcomes are only possible with the power and elegance afforded by our investments in Azure, Databricks, and increasingly Lakehouse. We will demonstrate Providence's blueprint for enabling real-time analytics which can be generalized to other healthcare providers.

Talk by: Lindsay Mico and Deylo Woo

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Jet Streaming Data & Predictive Analytics: How Collins Aerospace to Keep Aircraft Flying

Most have experienced the frustration and disappointment of a flight delay or cancelation due to aircraft issues. The Collins Aerospace business unit at Raytheon Technologies is committed to redefining aerospace by using data to deliver a more reliable, sustainable, efficient, and enjoyable aviation industry.

Ascentia is a product example of this with focus on helping airlines make smarter and more sustainable decisions by anticipating aircraft maintenance issues in advance, leading to more reliable flight schedules and fewer delays. Over the past five years a variety of products from the Databricks technology suite were employed to achieve this. Leveraging cloud infrastructure and harnessing the Databricks Lakehouse, Apache Spark™ development, and Databricks’ dynamic platform, Collins has been able to accelerate development and deployment of predictive health monitoring (PHM) analytics to generate Ascentia’s aircraft maintenance recommendations.

Labcorp Data Platform Journey: From Selection to Go-Live in Six Months

Join this session to learn about the Labcorp data platform transformation from on-premises Hadoop to AWS Databricks Lakehouse. We will share best practices and lessons learned from cloud-native data platform selection, implementation, and migration from Hadoop (within six months) with Unity Catalog.

We will share steps taken to retire several legacy on-premises technologies and leverage Databricks native features like Spark streaming, workflows, job pools, cluster policies and Spark JDBC within Databricks platform. Lessons learned in Implementing Unity Catalog and building a security and governance model that scales across applications. We will show demos that walk you through batch frameworks, streaming frameworks, data compare tools used across several applications to improve data quality and speed of delivery.

Discover how we have improved operational efficiency, resiliency and reduced TCO, and how we scaled building workspaces and associated cloud infrastructure using Terraform provider.

Talk by: Mohan Kolli and Sreekanth Ratakonda

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Making Travel More Accessible for Customers Bringing Mobility Devices

American Airlines takes great pride in caring for customers travel, and recognize the importance of supporting the dignity and independence of everyone who travels with us. As we work to improve the customer experience, we're committed to making our airline more accessible to everyone. Our work to ensure that travel that is accessible to all is well underway. We have been particularly focused on making the journey smoother for customers who rely on wheelchairs or other mobility devices. We have implemented the use of a bag tag specifically for wheelchairs and scooters that gives team members more information, like the mobility device’s weight and battery type, or whether it needs to be returned to a customer before a connecting flight.

As a data engineering and analytics team, we at American Airlines are building a passenger service request data product that will provide timely insights on expected mobility device traffic at each airport so that the front-line team members can provide seamless travel experience to the passengers.

Talk by: Teja Tangeda and Madhan Venkatesan

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Managing Data Encryption in Apache Spark™

Sensitive data sets can be encrypted directly by new Apache Spark™ versions (3.2 and higher). Setting several configuration parameters and DataFrame options will trigger the Apache Parquet modular encryption mechanism that protects select columns with column-specific keys. The upcoming Spark 3.4 version will also support uniform encryption, where all DataFrame columns are encrypted with the same key.

Spark data encryption is already leveraged by a number of companies to protect personal or business confidential data in their production environments. The main integration effort is focused on key access control and on building a Spark/Parquet plug-in code that can interact with company’s key management service (KMS).

In this session, we will briefly cover the basics of Spark/Parquet encryption usage, and dive into the details of encryption key management that will help in integrating this Spark data protection mechanism in your deployment. You will learn how to run a HelloWorld encryption sample, and how to extend it into a real world production code integrated with your organization’s KMS and access control policies. We will talk about the standard envelope encryption approach to big data protection, the performance-vs-security trade-offs between single and double envelope wrapping, internal and external key metadata storage. We will see a demo, and discuss the new features such as uniform encryption and two-tier management of encryption keys.

Talk by: Gidon Gershinsky

Here’s more to explore: Data, Analytics, and AI Governance: https://dbricks.co/44gu3YU

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Multicloud Data Governance on the Databricks Lakehouse

Across industries, a multicloud setup has quickly become the reality for large organizations. Multi-cloud introduces new governance challenges as permissions models often do not translate from one cloud to the other and if they do, are insufficiently granular to accommodate privacy requirements and principles of least privilege. This problem can be especially acute for data and AI workloads that rely on sharing and aggregating large and diverse data sources across business unit boundaries and where governance models need to incorporate assets such as table rows/columns and ML features and models.

In this session, we will provide guidelines on how best to overcome these challenges for companies that have adopted the Databricks Lakehouse as their collaborative space for data teams across the organization, by exploiting some of the unique product features of the Databricks platform. We will focus on a common scenario: a data platform team providing data assets to two different ML teams, one using the same cloud and the other one using a different cloud.

We will explain the step-by-step setup of a unified governance model by leveraging the following components and conventions:

  • Unity Catalog for implementing fine-grained access control across all data assets: files in cloud storage, rows and columns in tables and ML features and models
  • The Databricks Terraform provider to automatically enforce guardrails and permissions across clouds
  • Account level SSO Integration and identity federation to centralize administer access across workspaces
  • Delta sharing to seamlessly propagate changes in provider data sets to consumers in near real-time
  • Centralized audit logging for a unified view on what asset was accessed by whom

Talk by: Ioannis Papadopoulos and Volker Tjaden

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Nebula: The Journey of Scaling Instacart’s Data Pipelines with Apache Spark™ and Lakehouse

Instacart has gone through immense growth during the pandemic and the trend continues. Instacart ads is no exception in this growth story. We have launched many new product lines including display and video ads covering the full advertising funnel to address the increasing demand of our retail partners. We have built advanced models to auto-suggest optimal bidding to increase the ROI for our CPG partners. Advertisers’ trust is the utmost priority and thus the quest to build a top-class ads measurement platform.

Ads data processing requires complex data verifications to update ads serving stats. In ETL pipelines these were implemented through files containing thousands of lines of raw SQL which were hard to scale, test, and iterate upon. Our data engineers used to spend hours testing small changes due to a lack of local testing mechanisms. These pain points stress our need for better tools. After some research, we chose Apache Spark™ as our preferred tool to rebuild ETLs, and the Databricks platform made this move easier. In this session, We'll share our journey to move our pipelines to Spark and Delta Lake on Databricks. With Spark, Scala, and Delta we solved many problems which were slowing the team’s productivity. Some key areas that will be covered include:

  • Modular and composable code
  • Unit testing framework
  • Incremental event processing with spark structured streaming
  • Granular resource tuning for better performance and cost efficacy

Other than the domain business logic, the problems discussed here are quite common for performing data processing at scale. We hope that sharing our learnings will benefit others who are going through similar growth challenges or migrating to Lakehouse.

Talk by: Devlina Das and Arthur Li

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Practical Pipelines: A Houseplant Alerting System with ksqlDB

Taking care of houseplants can be difficult; in many cases, over-watering and under-watering can have the same symptoms. Remove the guesswork involved in caring for your houseplants while also gaining valuable experience in building a practical, event-driven pipeline in your own home! This session explores the process of building a houseplant monitoring and alerting system using a Raspberry Pi and Apache Kafka. Moisture and temperature readings are captured from sensors in the soil and streamed into Kafka. From there, we use stream processing to transform the data, create a summary view of the current state, and drive real-time push alerts through Telegram.

In this session, we will talk about how to ingest the data followed by the tools, including ksqlDB and Kafka Connect, that help transform the raw data into useful information, and finally, You'll be shown how to use Kafka Producers and Consumers to make the entire application more interactive. By the end of this session, you’ll have everything you need to start building practical streaming pipelines in your own home. Roll up your sleeves – let’s get our hands dirty!

Talk by: Danica Fine

Here’s more to explore: Big Book of Data Engineering: 2nd Edition: https://dbricks.co/3XpPgNV The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored by: Alation | Unlocking the Power of Real-Time Data to Maximize Data Insights

It’s no secret that access to the right data at the right time is critical for data-driven decision making. In fact, as data culture becomes more and more ingrained in the enterprise, business users increasingly demand real-time, actionable data. But, what happens when it takes up to 24 hours to access your point-of-sale data? RaceTrac faced many of these data accessibility challenges as it sought to derive intelligence from its retail transaction data, specifically the data from their stores, information from their fuel purchasing arms, and delivery data for their fleet.

Through a combination of the Databricks Lakehouse and the lineage and self-discovery capabilities of the Alation Data Intelligence Platform, RaceTrac rose to the challenge. Hear from Raghu Jayachandran, Senior Manager of Enterprise Data at RaceTrac, and discover how RaceTrac gained real-time access to their transaction data in Databricks, and uses Alation to provide insight into which data can drive the business insights they needed.

Talk by: Diby Malakar and Raghu Jayachandran

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksin

Sponsored by: Labelbox | Unlocking Enterprise AI with Your Proprietary Data and Foundation Models

We are starting to see a paradigm shift in how AI systems are built across enterprises. In 2023 and beyond, this shift is being propelled by the era of foundation models. Foundation models can be seen as the next evolution in using "pre-trained" models and transfer learning. In order to fully leverage these breakthrough models, we’ve seen a common formula for success: leading AI teams within enterprises need to be able successfully harness their own store of unstructured data and pair this with the right model in order to ship intelligent applications that deliver next-generation experiences to their customers.

In this session you will learn how to incorporate foundation models into your data and machine learning workflows so that anyone can build AI faster and, in many cases, get the business outcome without needing to build AI models altogether. Which foundation AI models can be used to pre-label / enrich data and what specific data pipeline (data engine) will enable this? Real-world use cases of when to incorporate large language models and fine-tuning to improve machine learning models in real-time. Discover the power of leveraging both Labelbox and Databricks to streamline this data management and model deployment process.

Talk by: Manu Sharma

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored: Impetus | Accelerating ADP’s Business Transformation w/ a Modern Enterprise Data Platform

Learn How ADP’s Enterprise Data Platform Is used to drive direct monetization opportunities, differentiate its solutions, and improve operations. ADP is continuously searching for ways to increase innovation velocity, time-to-market, and improve the overall enterprise efficiency. Making data and tools available to teams across the enterprise while reducing data governance risk is the key to making progress on all fronts. Learn about ADP’s enterprise data platform that created a single source of truth with centralized tools, data assets, and services. It allowed teams to innovate and gain insights by leveraging cross-enterprise data and central machine learning operations.

Explore how ADP accelerated creation of the data platform on Databricks and AWS, achieve faster business outcomes, and improve overall business operations. The session will also cover how ADP significantly reduced its data governance risk, elevated the brand by amplifying data and insights as a differentiator, increased data monetization, and leveraged data to drive human capital management differentiation.

Talk by: Chetan Kalanki and Zaf Babin

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc