talk-data.com talk-data.com

Topic

Databricks

big_data analytics spark

1286

tagged

Activity Trend

515 peak/qtr
2020-Q1 2026-Q1

Activities

1286 activities · Newest first

Bridging the Production Gap: Develop and Deploy Code Easily With IDEs

Hear from customers how they are using software development best practices to combine the best of Integrated Development Environments (IDEs) with Databricks. See the latest developments that unlock key productivity gains from IDEs like code linters, AI code assistants and integrations with CI/CD tools to make going to production smoother and more reliable.

Attend this session to learn how to use IDEs with Databricks and take advantage of:

  • Native development - Write code, edit files and run on Databricks with the familiarity of your favorite IDE with DB Connect
  • Interactive debugging - Step through code in a cluster to quickly pinpoint and fix errors so that code is more robust and easily maintained
  • CI/CD pipelines - Set up and manage your CI/CD pipelines using the new CLI
  • IDE ecosystems - Use familiar integrations to streamline code reviews and deploy code faster

Sign up today to boost your productivity by combining your favorite IDE with the scale of Databricks.

Talk by: Saad Ansari and Fabian Jakobs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksin

Data & AI Products on Databricks: Making Data Engineering & Consumption Self-Service Data Platforms

Our client, a large IT and business consulting firm, embarked on a journey to create “Data As a Product” for both their internal and external stakeholders. In this project, Infosys took a data platform approach and leveraged Delta Sharing, API endpoints, and Unity Catalog to effectively create a realization of Data and AI Products (Data Mesh) architecture. This session presents the three primary design patterns used, providing valuable insights for your evolution toward a no-code/low-code approach.

Talk by: Ankit Sharma

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Databricks Cost Management: Tips and Tools to Stay Under Budget

How do you prevent surprise bills at the end of the month? Join us as we discuss best practices for cost management. You'll learn how to analyze and break down costs and hear best practices for keeping your budget in check. This session will:

  • Walk through cost reporting across various surfaces
  • Discuss best practices for cost optimization on Databricks
  • Highlight how tagging and budgets can give you the confidence you seek
  • Share news about upcoming features related to cost management

Talk by: Greg Kroleski and Thorsten Jacobs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Databricks Marketplace: Going Beyond Data and Applications

The demand for third-party data has never been greater, but existing marketplaces simply aren't cutting it. You deserve more than being locked into a walled garden of just data sets and simple applications. You deserve an open marketplace to exchange ML models, notebooks, datasets and more. The Databricks Marketplace is the ultimate solution for your data, AI and analytics needs, powered by open source Delta Sharing. Databricks is revolutionizing the data marketplace space.

Join us for a demo-filled session and learn how Databricks Marketplace is exactly what you need in today’s AI-driven innovation ecosystem. Hear from customers on how Databricks is empowering organizations to leverage shared knowledge and take their analytics and AI to new heights. Take advantage of this rare opportunity to ask questions of the Databricks product team that is building the Databricks Marketplace..

Talk by: Mengxi Chen and Darshana Sivakumar

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Fair Data or Foul Data…Lakehouse for Public Sector as a FAIR platform

FAIR (findable, accessible, interoperable, reusable) data and data platforms are becoming more and more important in public sector. Lakehouse platform is strongly aligned with these principles. Lakehouse provides tools required to both adhere to FAIR but also to FAIRify data that isn't FAIR compliant. In this session, we will cover parts of the lakehouse that enable end users to FAIRify data products, how to build good robust data products and which parts of Lakehouse align to which principles in FAIR.

We'll demonstrate how DLT is crucial for data transformations on nonFAIR data, how Unity Catalog unlocks discoverability (F) and governed data access (A), and how marketplace, cleanrooms and Delta Sharing unlock interoperability and data exchange (I and R). These concepts are massive enablers for highly regulated industries such as Public Sector. It undeniably important to align Lakehouse to standards that are widely adopted by standards and policy makers and regulators. These principles transcend all industries and all use cases.

Talk by: Milos Colic and Pritesh Patel

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

High Volume Intelligent Streaming with Sub-Minute SLA for Near Real-Time Data Replication

Attend this session and learn about an innovative solution built around Databricks structured streaming and Delta Live Tables (DLT) to replicate thousands of tables from on-premises to cloud-based relational databases. A highly desirable pattern for many enterprises across the industries to replicate on-premises data to cloud-based data lakes and data stores in near real time for consumption.

This powerful architecture can offload legacy platform workloads and accelerate cloud journey. The intelligent cost-efficient solution leverages thread-pools, multi-task jobs, Kafka, Apache Spark™ structured streaming and DLT. This session will go into detail about problems, solutions, lessons-learned and best practices.

Talk by: Suneel Konidala and Murali Madireddi

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Introducing Universal Format: Iceberg and Hudi Support in Delta Lake

In this session, we will talk about how Delta Lake plans to integrate with Iceberg and Hudi. Customers are being forced to choose storage formats based on the tools that support them rather than choosing the most performant and functional format for their lakehouse architecture. With Universal Format (“UniForm”), Delta removes the need to make this compromise and makes Delta tables compatible with Iceberg and Hudi query engines. We will do a technical deep dive of the technology, demo it, and discuss the roadmap.

Talk by: Himanshu Raja and Ryan Johnson

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Journey to Real-Time ML: A Look at Feature Platforms & Modern RT ML Architectures Using Tecton

Are you struggling to keep up with the demands of real-time machine learning? Like most organizations building real-time ML, you’re probably looking for a better way to: Manage the lifecycle of ML models and features, Implement batch, streaming, and real-time data pipelines, Generate accurate training datasets and serve models and data online with strict SLAs, supporting millisecond latencies and high query volumes. Look no further. In this session, we will unveil a modern technical architecture that simplifies the process of managing real-time ML models and features.

Using MLflow and Tecton, we’ll show you how to build a robust MLOps platform on Databricks that can easily handle the unique challenges of real-time data processing. Join us to discover how to streamline the lifecycle of ML models and features, implement data pipelines with ease, and generate accurate training datasets with minimal effort. See how to serve models and data online with mission-critical speed and reliability, supporting millisecond latencies and high query volumes.

Take a firsthand look at how FanDuel uses this solution to power their real-time ML applications, from responsible gaming to content recommendations and marketing optimization. See for yourself how this system can be used to define features, train models, process streaming data, and serve both models and features online for real-time inference with a live demo. Join us to learn how to build a modern MLOps platform for your real-time ML use cases.

Talk by: Mike Del Balso and Morgan Hsu

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Lakehouses: The Best Start to Your Graph Data and Analytics Journey

Data architects and IT executives are continually looking for the best ways to integrate graph data and analytics into their organizations to improve business outcomes. This session outlines how the Data Lakehouse provides the perfect starting point for a successful journey. We will explore how the Data Lakehouse offer the unique combination of scalability, flexibility, and speed to quickly and effectively ingest, pre-process, curate, and analyze graph data to create powerful analytics. Additionally, we will discuss the benefits of using the Data Lakehouse over traditional graph databases and how it can help improve time to insight, time to production and overall satisfaction. At the end of this presentation, attendees will: - Understand the benefits of using a Data Lakehouse for graph data and analytics - Learn how to get started with a successful Lakehouse implementation (demo) - Discover the advantages of using a Data Lakehouse over graph databases - Learn specifically where graph databases integrate and perform better together

Key Takeaways: - Data lakehouses provide the perfect starting point for a successful graph data and analytics journey - Data lakehouses offer scalability, flexibility, and speed to quickly and effectively analyze graph data - The Data lakehouse is a cost-effective alternative to traditional graph database shortening your time to insight and de-risk your project

Talk by: Douglas Moore

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Leveraging Machine Learning on Databricks to Deliver Best in Class Customer Engagement

In today's competitive business environment, customer engagement is a top priority for organizations looking to retain and grow their customer base. In this session, we will showcase how we used Databricks, a powerful machine learning platform, to build and deploy distributed deep learning machine learning models using Apache Spark™ and Horovod for best-in-class customer engagement. We will discuss the challenges we faced and the solutions we implemented, including data preparation, model training, and model deployment. We will also share the results of our efforts, including increased customer retention and improved customer satisfaction. Attendees will walk away with practical tips and best practices for using Databricks to drive customer engagement in for their own organizations. In this session we will:

  • ]Explore Morgan Stanley’s approach to best-in-class customer engagement
  • Discuss how data and technology was leveraged to help solve the business problem
  • Share our experience using Databricks to build and deploy machine learning models for customer engagement
  • Provide practical tips and best practices for using Databricks in a production environment

Talk by: Raja Lanka and Ryan Kennedy

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Monetizing Data Assets: Sharing Data, Models and Features

Data is an asset. Selling/sharing data has largely been solved, and hosted models exist (example: ChatGPT), but moving sensitive data across the public internet or across clouds is problematic. Sharing features (the result of feature engineering) can be monetized for new potential revenue streams. Sharing models can also be monetized while avoiding the transfer of sensitive data.

This session will walk through a few examples of how to share models and features to generate new revenue streams using Delta Sharing, MLflow, and Databricks

Talk by: Keith Anderson and Avinash Sooriyarachchi

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Planning and Executing a Snowflake Data Warehouse Migration to Databricks

Organizations are going through a critical phase of data infrastructure modernization, laying the foundation for the future, and adapting to support growing data and AI needs. Organizations that embraced cloud data warehouses (CDW) such as Snowflake have ended up trying to use a data warehousing tool for ETL pipelines and data science. This created unnecessary complexity and resulted in poor performance since data warehouses are optimized for SQL-based analytics only.

Realizing the limitation and pain with cloud data warehouses, organizations are turning to a lakehouse-first architecture. Though a cloud platform to cloud platform migration should be relatively easy, the breadth of the Databricks platform provides flexibility and hence requires careful planning and execution. In this session, we present the migration methodology, technical approaches, automation tools, product/feature mapping, a technical demo and best practices using real-world case studies for migrating data, ELT pipelines and warehouses from Snowflake to Databricks.

Talk by: Satish Garla and Ramachandran Venkat

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksin

Post-Merger: Implementing Unity Catalog Across Multiple Accounts

Warner Media and Discovery have recently merged to form Warner Bros Discovery. Owning two Databricks accounts and wanting to maintain their separation, our data governance team has successfully implemented Unity Catalog as our data governance solution across both accounts, allowing our teams to collaboratively and securely use the data assets of two organizations collaboratively and securely.

This session is aimed at sharing that success story, including initial challenges, our approach, our architecture, the actual implementation, and user success post-implementation.

Talk by: Ramprasad Koya and Susheel Lakshmipathi

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Scaling AI Applications with Databricks, HuggingFace and Pinecone

The production and management of large-scale vector embeddings can be a challenging problem. The integration of Databricks, Hugging Face and Pinecone offers a powerful solution. Vector embeddings have become an essential tool in the development of AI powered applications. Embeddings are representations of data learned by machine models. High quality embeddings are unlocking use cases like semantic search, recommendation engines, and anomaly detection. Databricks' Apache Spark™ ecosystem together with Hugging Face's Transformers library enable large-scale vector embeddings production using GPU processing, Pinecone's vector database provides ultra-low latency querying and upserting of billions of embeddings, allowing for high-quality embeddings at scale for real-time AI apps.

In this session, we will present a concrete use case of this integration in the context of a natural language processing application. We will demonstrate how Pinecone's vector database can be integrated with Databricks and Hugging Face to produce large-scale vector embeddings of text data and how these embeddings can be used to improve the performance of various AI applications. You will see the benefits of this integration in terms of speed, scalability, and cost efficiency. By leveraging the GPU processing capabilities of Databricks and the ultra low-latency querying capabilities of Pinecone, we can significantly improve the performance of NLP tasks while reducing the cost and complexity of managing large-scale vector embeddings. You will learn about the technical details of this integration and how it can be implemented in your own AI projects, and gain insights into the speed, scalability, and cost efficiency benefits of using this solution.

Talk by: Roie Schwaber-Cohen

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored: Ascent IO | Publish a Data Mesh Product in Under 10 Minutes w/ Delta Sharing & Ascend

Learn how to quickly ingest, transform and share data in Delta Lake with intelligent data pipelines on Ascend. Using live data, we'll cover everything you need to know to get your first data products up and running fast. We'll talk about first principles for building a scalable mesh and tips for reducing maintenance work as you grow. And you'll see how Ascend applies patented fingerprinting technology to manage change across your interconnected pipelines as you build out the mesh.

Talk by: Jon Osborn

Here’s more to explore: A New Approach to Data Sharing: https://dbricks.co/44eUnT1

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored: AWS|Build Generative AI Solution on Open Source Databricks Dolly 2.0 on Amazon SageMaker

Create a custom chat-based solution to query and summarize your data within your VPC using Dolly 2.0 and Amazon SageMaker. In this talk, you will learn about Dolly 2.0, Databricks, state-of-the-art, open source, LLM, available for commercial and Amazon SageMaker, AWS’s premiere toolkit for ML builders. You will learn how to deploy and customize models to reference your data using retrieval augmented generation (RAG) and additional fine tuning techniques…all using open-source components available today.

Talk by: Venkat Viswanathan and Karl Albertsen

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored by: Immuta | Building an End-to-End MLOps Workflow with Automated Data Access Controls

WorldQuant Predictive’s customers rely on our predictions to understand how changing world and market conditions will impact decisions to be made. Speed is critical, and so are accuracy and resilience. To that end, our data team built a modern, automated MLOps data flow using Databricks as a key part of our data science tooling, and integrated with Immuta to provide automated data security and access control.

In this session, we will share details of how we used policy-as-code to support our globally distributed data science team with secure data sharing, testing, validation and other model quality requirements. We will also discuss our data science workflow that uses Databricks-hosted MLflow together with an Immuta-backed custom feature store to maximize speed and quality of model development through automation. Finally, we will discuss how we deploy the models into our customized serverless inference environment, and how that powers our industry solutions.

Talk by: Tyler Ditto

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

The C-Level Guide to Data Strategy Success with the Lakehouse

Join us for a practical session on implementing a data strategy leveraging people, process, and technology to meet the growing demands of your business stakeholders for faster innovation at lower cost. In this session we will share real-world examples on best practices and things to avoid as you drive your strategy from the board to the business units in your organization

Talk by: Robin Sutara and Dael Williamson

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksin

The First Sports & Ent Data Market Powered by Pumpjack Dataworks, Revelate, Immuta & Databricks

Creating a secure and easily actionable marketplace is no simple task. Add to this governance requirements of privacy frameworks and responsibilities of protecting consumer data, and things get harder. With Pumpjack Dataworks partnering with Databricks, Immuta, and Revelate, we bring secure, privacy-focused data products directly to data consumers.

Talk by: Corey Zwart and Tom Tercek

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Use Apache Spark™ from Anywhere: Remote Connectivity with Spark Connect

Over the past decade, developers, researchers, and the community at large have successfully built tens of thousands of data applications using Apache Spark™. Since then, use cases and requirements of data applications have evolved. Today, every application, from web services that run in application servers, interactive environments such as notebooks and IDEs, to phones and edge devices such as smart home devices, want to leverage the power of data. However, Spark's driver architecture is monolithic, running client applications on top of a scheduler, optimizer and analyzer. This architecture makes it hard to address these new requirements as there is no built-in capability to remotely connect to a Spark cluster from languages other than SQL.

Spark Connect introduces a decoupled client-server architecture for Apache Spark that allows remote connectivity to Spark clusters using the DataFrame API and unresolved logical plans as the protocol. The separation between client and server allows Spark and its open ecosystem to be leveraged from everywhere. It can be embedded in modern data applications, in IDEs, notebooks and programming languages. This session highlights how simple it is to connect to Spark using Spark Connect from any data applications or IDEs. We will do a deep dive into the architecture of Spark Connect and provide an outlook on how the community can participate in the extension of Spark Connect for new programming languages and frameworks bringing the power of Spark everywhere.

Talk by: Martin Grund and Stefania Leone

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc