talk-data.com

Topic

Java

programming_language object_oriented enterprise

Activities

tagged

Activity Trend

25 peak/qtr

2020-Q1 2026-Q1

Top Events

O'Reilly Data Engineering Books 243 Data Engineering Podcast 39 O'Reilly Data Science Books 23 ADSP: Algorithms + Data Structures = Programs 11 Google Cloud Next '25 7 Microsoft Ignite 2025 6 Databricks DATA + AI Summit 2023 6 Data + AI Summit 2025 5 Introducing Hypernate: Data Mapper Framework for Hyperledger Fabric Chaincode 4 Google Cloud Next '24 4 DataTalks.Club 4 Introducing Hypernate: Data Mapper Framework for Hyperledger Fabric Chaincode 3

Top Speakers

Tobias Macey 39 Bryce Adelstein Lelbach (NVIDIA) 11 Conor Hoekstra 11 Steven Feuerstein 7 César Pérez López 6 Bill Pribyl 5 Kevlin Henney 5 Jeff Linwood 4 Dan Dobrin (Google Cloud) 4 Jonathan O’Connor (LADE GmbH) 4 Dave Minter 4 Joseph Ottinger 4

Activities

Showing filtered results

All Video Podcast Book

Filtering by: Data + AI Summit 2025 ×

What’s New in Apache Spark™ 4.0?

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Daniel Tenedorio (Databricks) , Wenchen Fan (Databricks)

AI/ML API Python Scala Spark SQL Data Streaming

Join this session for a concise tour of Apache Spark™ 4.0’s most notable enhancements: SQL features: ANSI by default, scripting, SQL pipe syntax, SQL UDF, session variable, view schema evolution, etc. Data type: VARIANT type, string collation Python features: Python data source, plotting API, etc. Streaming improvements: State store data source, state store checkpoint v2, arbitrary state v2, etc. Spark Connect improvements: More API coverage, thin client, unified Scala interface, etc. Infrastructure: Better error message, structured logging, new Java/Scala version support, etc. Whether you’re a seasoned Spark user or new to the ecosystem, this talk will prepare you to leverage Spark 4.0’s latest innovations for modern data and AI pipelines.

Creating a Custom PySpark Stream Reader with PySpark 4.0

2025-06-11 · Data + AI Summit 2025 Watch

lightning_talk

by Skyler Myers (Entrada)

Databricks Delta Kafka MySQL PySpark Spark Data Streaming

PySpark supports many data sources out of the box, such as Apache Kafka, JDBC, ODBC, Delta Lake, etc. However, some older systems, such as systems that use JMS protocol, are not supported by default and require considerable extra work for developers to read from them. One such example is ActiveMQ for streaming. Traditionally, users of ActiveMQ have to use a middle-man in order to read the stream with Spark (such as writing to a MySQL DB using Java code and reading that table with Spark JDBC). With PySpark 4.0’s custom data sources (supported in DBR 15.3+) we are able to cut out the middle-man processing using batch or Spark Streaming and consume the queues directly from PySpark, saving developers considerable time and complexity in getting source data into your Delta Lake and governed by Unity Catalog and orchestrated with Databricks Workflows.

Breaking Barriers: Building Custom Spark 4.0 Data Connectors with Python

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Sourav Gulati (Databricks) , Ashish Saraswat (Databricks)

API Python Scala Spark Data Streaming

Building a custom Spark data source connector once required Java or Scala expertise, making it complex and limiting. This left many proprietary data sources without public SDKs disconnected from Spark. Additionally, data sources with Python SDKs couldn't harness Spark’s distributed power. Spark 4.0 changes this with a new Python API for data source connectors, allowing developers to build fully functional connectors without Java or Scala. This unlocks new possibilities, from integrating proprietary systems to leveraging untapped data sources. Supporting both batch and streaming, this API makes data ingestion more flexible than ever. In this talk, we’ll demonstrate how to build a Spark connector for Excel using Python, showcasing schema inference, data reads/writes and streaming support. Whether you're a data engineer or Spark enthusiast, you’ll gain the knowledge to integrate Spark with any data source — entirely in Python.

Building Real-time Trading Dashboards with Lakeflow Declarative Pipelines, Serverless OLTP and Databricks Apps

2025-06-10 · Data + AI Summit 2025

talk

by Matt Slack (Databricks) , Matthew Moorcroft (Databricks)

Databricks Data Streaming

Barclays Post Trade real-time trade monitoring platform was historically built on a complex set of legacy technologies including Java, Solace, and custom micro-services.This session will demonstrate how the power of Lakeflow Declarative Pipelines' new real-time mode, in conjunction with the foreach_batch_sink, can enable simple, cost-effective streaming pipelines that can load high volumes of data into Databricks new Serverless OLTP database with very low latency.Once in our OLTP database, this can be used to update real-time trading dashboards, securely hosted in Databricks Apps, with the latest stock trades - enabling better, more responsive decision-making and alerting.The session will walk-through the architecture, and demonstrate how simple it is to create and manage the pipelines and apps within the Databricks environment.

Delta Kernel for Rust and Java

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Nick Lanham (Databricks)

AI/ML API C#/.NET ClickHouse Delta DuckDB Rust

Delta Kernel makes it easy for engines and connectors to read and write Delta tables. It supports many Delta features and robust connectors, including DuckDB, Clickhouse, Spice AI and delta-dotnet. In this session, we'll cover lessons learned about how to build a high-performance library that lets engines integrate the way they want, while not having to worry about the details of the Delta protocol. We'll talk through how we streamlined the API as well as its changes and underlying motivations. We'll discuss some new highlight features like write support, and the ability to do CDF scans. Finally we'll cover the future roadmap for the Kernel project and what you can expect from the project over the coming year.