talk-data.com talk-data.com

Topic

Databricks

big_data analytics spark

561

tagged

Activity Trend

515 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: Databricks DATA + AI Summit 2023 ×
Sponsored by: Striim | Powering a Delightful Travel Experience with a Real-Time Operational Data Hub

American Airlines champions operational excellence in airline operations to provide the most delightful experience to our customers with on-time flights and meticulously maintained aircraft. To modernize and scale technical operations with real-time, data-driven processes, we delivered a DataHub that connects data from multiple sources and delivers it to analytics engines and systems of engagement in real-time. This enables operational teams to use any kind of aircraft data from almost any source imaginable and turn it into meaningful and actionable insights with speed and ease. This empowers maintenance hubs to choose the best service and determine the most effective ways to utilize resources that can impact maintenance outcomes and costs. The end-product is a smooth and scalable operation that results in a better experience for travelers. In this session, you will learn how we combine an operational data store (MongoDB) and a fully managed streaming engine (Striim) to enable analytics teams using Databricks with real-time operational data.

Talk by: John Kutay and Ganesh Deivarayan

Here’s more to explore: Big Book of Data Engineering: 2nd Edition: https://dbricks.co/3XpPgNV The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksin

Sponsored by: Toptal | Enable Data Streaming within Multicloud Strategies

Join Toptal as we discuss how we can help organizations handle their data streaming needs in an environment utilizing multiple cloud providers. We will delve into the data scientist and data engineering perspective on this challenge. Embracing an open format, utilizing open source technologies while managing the solution through code are the keys to success.

Talk by: Christina Taylor and Matt Kroon

Here’s more to explore: Big Book of Data Engineering: 2nd Edition: https://dbricks.co/3XpPgNV The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksin

Streamlining API Deploy ML Models Across Multiple Brands: Ahold Delhaize's Experience on Serverless

At Ahold Delhaize, we have 19 local brands. Most of our brands have common goals, such as providing personalized offers to their customers, a better search engine on e-commerce websites, and forecasting models to reduce food waste and ensure availability. As a central team, our goal is to standardize the way of working across all of these brands, including the deployment of machine learning models. To this end, we have adopted Databricks as our standard platform for our batch inference models.

However, API deployment for real time inference models remained challenging due to the varying capabilities of our brands. Our attempts to standardize API deployments with different tools failed due to complexity of our organization. Fortunately, Databricks has recently introduced a new feature: serverless API deployment. Since all our brands already use Databricks, this feature was easy to adopt. It allows us to easily reuse API deployment across all of our brands, significantly reducing time to market (from 6-12 months to one month), increasing efficiency, and reducing the costs. In this session, you will see the solution architecture, sample use case specifically used to cross-sell model deployed to four different brands, and API deployment using Databricks Serverless API with custom model.

Talk by: Maria Vechtomova and Basak Eskili

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Unlock the Next Evolution of the Modern Data Stack With the Lakehouse Revolution -- with Live Demos

As the data landscape evolves, organizations are seeking innovative solutions that provide enhanced value and scalability without exploding costs. In this session, we will explore the exciting frontier of the Modern Data Stack on Databricks Lakehouse, a game-changing alternative to traditional Data Cloud offerings. Learn how Databricks Lakehouse empowers you to harness the full potential of Fivetran, dbt, and Tableau, while optimizing your data investments and delivering unmatched performance.

We will showcase real-world demos that highlight the seamless integration of these modern data tools on the Databricks Lakehouse platform, enabling you to unlock faster and more efficient insights. Witness firsthand how the synergy of Lakehouse and the Modern Data Stack outperforms traditional solutions, propelling your organization into the future of data-driven innovation. Don't miss this opportunity to revolutionize your data strategy and unleash unparalleled value with the lakehouse revolution.

Talk by: Kyle Hale and Roberto Salcido

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

US Army Corp of Engineers Enhanced Commerce & National Sec Through Data-Driven Geospatial Insight

The US Army Corps of Engineers (USACE) is responsible for maintaining and improving nearly 12,000 miles of shallow-draft (9'-14') inland and intracoastal waterways, 13,000 miles of deep-draft (14' and greater) coastal channels, and 400 ports, harbors, and turning basins throughout the United States. Because these components of the national waterway network are considered assets to both US commerce and national security, they must be carefully managed to keep marine traffic operating safely and efficiently.

The National DQM Program is tasked with providing USACE a nationally standardized remote monitoring and documentation system across multiple vessel types with timely data access, reporting, dredge certifications, data quality control, and data management. Government systems have often lagged commercial systems in modernization efforts, and the emergence of the cloud and Data Lakehouse Architectures have empowered USACE to successfully move into the modern data era.

This session incorporates aspects of these topics: Data Lakehouse Architecture: Delta Lake, platform security and privacy, serverless, administration, data warehouse, Data Lake, Apache Iceberg, Data Mesh GIS: H3, MOSAIC, spatial analysis data engineering: data pipelines, orchestration, CDC, medallion architecture, Databricks Workflows, data munging, ETL/ELT, lakehouses, data lakes, Parquet, Data Mesh, Apache Spark™ internals. Data Streaming: Apache Spark Structured Streaming, real-time ingestion, real-time ETL, real-time ML, real-time analytics, and real-time applications, Delta Live Tables. ML: PyTorch, TensorFlow, Keras, scikit-learn, Python and R ecosystems data governance: security, compliance, RMF, NIST data sharing: sharing and collaboration, delta sharing, data cleanliness, APIs.

Talk by: Jeff Mroz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Accelerating the Development of Viewership Personas with a Unified Feature Store

With the proliferation of video content and flourishing consumer demand, there is an enormous opportunity for customer-centric video entertainment companies to use data and analytics to understand what their viewers want and deliver more of the content that that meets their needs.

At DIRECTV, our Data Science Center of Excellence is constantly looking to push the boundary of innovation in how we can better and more quickly understand the needs of our customers and leverage those actionable insights to deliver business impact. One way in which we do so is through the development of Viewership Personas with cluster analysis at scale to group our customers by the types of content they enjoy watching. This process is significantly accelerated by a unified feature store which contain a wide array of features that captures key information on viewing preferences.

This talk will focus on how the DIRECTV Data Science team utilizes Databricks to help develop a unified feature store, and learn how we leverage the feature store to accelerate the process of running machine learning algorithms to find meaningful viewership clusters.

Talk by: Malav Shah,Taylor Hosbach

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Advanced Governance with Collibra on Databricks

A data lake is only as good as its governance. Understanding what data you have, performing classification, defining/applying security policies and auding how it's used is the data governance lifecycle. Unity Catalog with its rich ecosystem of supported tools simplifies all stages of the data governance lifecycle. Learn how metadata can be hydrated, into Collibra directly from Unity Catalog. Once the metadata is available in Collibra we will demonstrate classification, defining security policies on the data and pushing those policies into Databricks. All access and usage of data is automatically audited with real time lineage provided in the data explorer as well as system tables.

Talk by: Leon Eller and Antonio Castelo

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksin

A Fireside Chat: Building Your Startup on Databricks

Are you interested in learning how leading startups build applications on Databricks and leverage the power of the lakehouse? Join us for a fireside chat with cutting edge startups as we discuss real world insights and best practices for building on the Databricks Lakehouse, as well as successes and challenges encountered along the way. This conversation will provide an opportunity to learn and ask questions to panelists spanning all sectors.

Talk by: Chris Hecht, Derek Slager, Uri May, and Edward Chiu

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksin

AI-Accelerated Delta Tables: Faster, Easier, Cheaper

In this session, learn about recent releases for Delta Tables and the upcoming roadmap. Learn how to leverage AI to get blazing fast performance from Delta, without requiring users to do time-consuming and complicated tuning themselves. Recent releases like Predictive I/O and Auto Tuning for Optimal File Sizes will be covered, as well as the exciting roadmap of even more intelligent capabilities.

Talk by: Sirui Sun and Vijayan Prabhakaran

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Best Practices for Setting Up Databricks SQL at Enterprise Scale

To learn more, visit the Databricks Security and Trust Center: https://www.databricks.com/trust

In this session, we will talk about the best practices for setting up Databricks to run at large enterprise scale with thousands of users, departmental security and governance, and end-to-end lineage from ingestion to BI tools. We’ll showcase the power of Unity Catalog and Databricks SQL as the core of your modern data stack and how to achieve both data, environment, and financial governance while empowering your users to quickly find and access the data they need.

Talk by: Siddharth Bhai, Paul Roome, Jeremy Lewallen, and Samrat Ray

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksin

Bridging the Production Gap: Develop and Deploy Code Easily With IDEs

Hear from customers how they are using software development best practices to combine the best of Integrated Development Environments (IDEs) with Databricks. See the latest developments that unlock key productivity gains from IDEs like code linters, AI code assistants and integrations with CI/CD tools to make going to production smoother and more reliable.

Attend this session to learn how to use IDEs with Databricks and take advantage of:

  • Native development - Write code, edit files and run on Databricks with the familiarity of your favorite IDE with DB Connect
  • Interactive debugging - Step through code in a cluster to quickly pinpoint and fix errors so that code is more robust and easily maintained
  • CI/CD pipelines - Set up and manage your CI/CD pipelines using the new CLI
  • IDE ecosystems - Use familiar integrations to streamline code reviews and deploy code faster

Sign up today to boost your productivity by combining your favorite IDE with the scale of Databricks.

Talk by: Saad Ansari and Fabian Jakobs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksin

Data & AI Products on Databricks: Making Data Engineering & Consumption Self-Service Data Platforms

Our client, a large IT and business consulting firm, embarked on a journey to create “Data As a Product” for both their internal and external stakeholders. In this project, Infosys took a data platform approach and leveraged Delta Sharing, API endpoints, and Unity Catalog to effectively create a realization of Data and AI Products (Data Mesh) architecture. This session presents the three primary design patterns used, providing valuable insights for your evolution toward a no-code/low-code approach.

Talk by: Ankit Sharma

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Databricks Cost Management: Tips and Tools to Stay Under Budget

How do you prevent surprise bills at the end of the month? Join us as we discuss best practices for cost management. You'll learn how to analyze and break down costs and hear best practices for keeping your budget in check. This session will:

  • Walk through cost reporting across various surfaces
  • Discuss best practices for cost optimization on Databricks
  • Highlight how tagging and budgets can give you the confidence you seek
  • Share news about upcoming features related to cost management

Talk by: Greg Kroleski and Thorsten Jacobs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Databricks Marketplace: Going Beyond Data and Applications

The demand for third-party data has never been greater, but existing marketplaces simply aren't cutting it. You deserve more than being locked into a walled garden of just data sets and simple applications. You deserve an open marketplace to exchange ML models, notebooks, datasets and more. The Databricks Marketplace is the ultimate solution for your data, AI and analytics needs, powered by open source Delta Sharing. Databricks is revolutionizing the data marketplace space.

Join us for a demo-filled session and learn how Databricks Marketplace is exactly what you need in today’s AI-driven innovation ecosystem. Hear from customers on how Databricks is empowering organizations to leverage shared knowledge and take their analytics and AI to new heights. Take advantage of this rare opportunity to ask questions of the Databricks product team that is building the Databricks Marketplace..

Talk by: Mengxi Chen and Darshana Sivakumar

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Fair Data or Foul Data…Lakehouse for Public Sector as a FAIR platform

FAIR (findable, accessible, interoperable, reusable) data and data platforms are becoming more and more important in public sector. Lakehouse platform is strongly aligned with these principles. Lakehouse provides tools required to both adhere to FAIR but also to FAIRify data that isn't FAIR compliant. In this session, we will cover parts of the lakehouse that enable end users to FAIRify data products, how to build good robust data products and which parts of Lakehouse align to which principles in FAIR.

We'll demonstrate how DLT is crucial for data transformations on nonFAIR data, how Unity Catalog unlocks discoverability (F) and governed data access (A), and how marketplace, cleanrooms and Delta Sharing unlock interoperability and data exchange (I and R). These concepts are massive enablers for highly regulated industries such as Public Sector. It undeniably important to align Lakehouse to standards that are widely adopted by standards and policy makers and regulators. These principles transcend all industries and all use cases.

Talk by: Milos Colic and Pritesh Patel

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

High Volume Intelligent Streaming with Sub-Minute SLA for Near Real-Time Data Replication

Attend this session and learn about an innovative solution built around Databricks structured streaming and Delta Live Tables (DLT) to replicate thousands of tables from on-premises to cloud-based relational databases. A highly desirable pattern for many enterprises across the industries to replicate on-premises data to cloud-based data lakes and data stores in near real time for consumption.

This powerful architecture can offload legacy platform workloads and accelerate cloud journey. The intelligent cost-efficient solution leverages thread-pools, multi-task jobs, Kafka, Apache Spark™ structured streaming and DLT. This session will go into detail about problems, solutions, lessons-learned and best practices.

Talk by: Suneel Konidala and Murali Madireddi

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Introducing Universal Format: Iceberg and Hudi Support in Delta Lake

In this session, we will talk about how Delta Lake plans to integrate with Iceberg and Hudi. Customers are being forced to choose storage formats based on the tools that support them rather than choosing the most performant and functional format for their lakehouse architecture. With Universal Format (“UniForm”), Delta removes the need to make this compromise and makes Delta tables compatible with Iceberg and Hudi query engines. We will do a technical deep dive of the technology, demo it, and discuss the roadmap.

Talk by: Himanshu Raja and Ryan Johnson

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Journey to Real-Time ML: A Look at Feature Platforms & Modern RT ML Architectures Using Tecton

Are you struggling to keep up with the demands of real-time machine learning? Like most organizations building real-time ML, you’re probably looking for a better way to: Manage the lifecycle of ML models and features, Implement batch, streaming, and real-time data pipelines, Generate accurate training datasets and serve models and data online with strict SLAs, supporting millisecond latencies and high query volumes. Look no further. In this session, we will unveil a modern technical architecture that simplifies the process of managing real-time ML models and features.

Using MLflow and Tecton, we’ll show you how to build a robust MLOps platform on Databricks that can easily handle the unique challenges of real-time data processing. Join us to discover how to streamline the lifecycle of ML models and features, implement data pipelines with ease, and generate accurate training datasets with minimal effort. See how to serve models and data online with mission-critical speed and reliability, supporting millisecond latencies and high query volumes.

Take a firsthand look at how FanDuel uses this solution to power their real-time ML applications, from responsible gaming to content recommendations and marketing optimization. See for yourself how this system can be used to define features, train models, process streaming data, and serve both models and features online for real-time inference with a live demo. Join us to learn how to build a modern MLOps platform for your real-time ML use cases.

Talk by: Mike Del Balso and Morgan Hsu

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Lakehouses: The Best Start to Your Graph Data and Analytics Journey

Data architects and IT executives are continually looking for the best ways to integrate graph data and analytics into their organizations to improve business outcomes. This session outlines how the Data Lakehouse provides the perfect starting point for a successful journey. We will explore how the Data Lakehouse offer the unique combination of scalability, flexibility, and speed to quickly and effectively ingest, pre-process, curate, and analyze graph data to create powerful analytics. Additionally, we will discuss the benefits of using the Data Lakehouse over traditional graph databases and how it can help improve time to insight, time to production and overall satisfaction. At the end of this presentation, attendees will: - Understand the benefits of using a Data Lakehouse for graph data and analytics - Learn how to get started with a successful Lakehouse implementation (demo) - Discover the advantages of using a Data Lakehouse over graph databases - Learn specifically where graph databases integrate and perform better together

Key Takeaways: - Data lakehouses provide the perfect starting point for a successful graph data and analytics journey - Data lakehouses offer scalability, flexibility, and speed to quickly and effectively analyze graph data - The Data lakehouse is a cost-effective alternative to traditional graph database shortening your time to insight and de-risk your project

Talk by: Douglas Moore

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Leveraging Machine Learning on Databricks to Deliver Best in Class Customer Engagement

In today's competitive business environment, customer engagement is a top priority for organizations looking to retain and grow their customer base. In this session, we will showcase how we used Databricks, a powerful machine learning platform, to build and deploy distributed deep learning machine learning models using Apache Spark™ and Horovod for best-in-class customer engagement. We will discuss the challenges we faced and the solutions we implemented, including data preparation, model training, and model deployment. We will also share the results of our efforts, including increased customer retention and improved customer satisfaction. Attendees will walk away with practical tips and best practices for using Databricks to drive customer engagement in for their own organizations. In this session we will:

  • ]Explore Morgan Stanley’s approach to best-in-class customer engagement
  • Discuss how data and technology was leveraged to help solve the business problem
  • Share our experience using Databricks to build and deploy machine learning models for customer engagement
  • Provide practical tips and best practices for using Databricks in a production environment

Talk by: Raja Lanka and Ryan Kennedy

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc