Use Azure Migrate for AI assisted insights and cloud transformation

2025-11-18 · Microsoft Ignite 2025 Watch

breakout

by Vishal Jain (Microsoft) , Anant Raigaga (Itron Inc) , Shashank Bansal (Microsoft)

AI/ML Azure Cloud Computing Linux SQL postgresql

Discover how you can make the most of your IT estate migrations and modernizations with the newest AI capabilities. This session guides IT teams through assessing current environments, setting goals, and creating a business case with Azure Migrate for all of your workload types like Windows Server, SQL Server, .NET, Linux, PostgreSQL, Java, and more. We’ll explore tools to inventory workloads, map dependencies, and create actionable migration roadmaps.

What’s New in Apache Spark™ 4.0?

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Daniel Tenedorio (Databricks) , Wenchen Fan (Databricks)

AI/ML API Python Scala Spark SQL Data Streaming

Join this session for a concise tour of Apache Spark™ 4.0’s most notable enhancements: SQL features: ANSI by default, scripting, SQL pipe syntax, SQL UDF, session variable, view schema evolution, etc. Data type: VARIANT type, string collation Python features: Python data source, plotting API, etc. Streaming improvements: State store data source, state store checkpoint v2, arbitrary state v2, etc. Spark Connect improvements: More API coverage, thin client, unified Scala interface, etc. Infrastructure: Better error message, structured logging, new Java/Scala version support, etc. Whether you’re a seasoned Spark user or new to the ecosystem, this talk will prepare you to leverage Spark 4.0’s latest innovations for modern data and AI pipelines.

Creating a Custom PySpark Stream Reader with PySpark 4.0

2025-06-11 · Data + AI Summit 2025 Watch

lightning_talk

by Skyler Myers (Entrada)

Databricks Delta Kafka MySQL PySpark Spark Data Streaming

PySpark supports many data sources out of the box, such as Apache Kafka, JDBC, ODBC, Delta Lake, etc. However, some older systems, such as systems that use JMS protocol, are not supported by default and require considerable extra work for developers to read from them. One such example is ActiveMQ for streaming. Traditionally, users of ActiveMQ have to use a middle-man in order to read the stream with Spark (such as writing to a MySQL DB using Java code and reading that table with Spark JDBC). With PySpark 4.0’s custom data sources (supported in DBR 15.3+) we are able to cut out the middle-man processing using batch or Spark Streaming and consume the queues directly from PySpark, saving developers considerable time and complexity in getting source data into your Delta Lake and governed by Unity Catalog and orchestrated with Databricks Workflows.

Breaking Barriers: Building Custom Spark 4.0 Data Connectors with Python

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Sourav Gulati (Databricks) , Ashish Saraswat (Databricks)

API Python Scala Spark Data Streaming

Building a custom Spark data source connector once required Java or Scala expertise, making it complex and limiting. This left many proprietary data sources without public SDKs disconnected from Spark. Additionally, data sources with Python SDKs couldn't harness Spark’s distributed power. Spark 4.0 changes this with a new Python API for data source connectors, allowing developers to build fully functional connectors without Java or Scala. This unlocks new possibilities, from integrating proprietary systems to leveraging untapped data sources. Supporting both batch and streaming, this API makes data ingestion more flexible than ever. In this talk, we’ll demonstrate how to build a Spark connector for Excel using Python, showcasing schema inference, data reads/writes and streaming support. Whether you're a data engineer or Spark enthusiast, you’ll gain the knowledge to integrate Spark with any data source — entirely in Python.

Delta Kernel for Rust and Java

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Nick Lanham (Databricks)

AI/ML API C#/.NET ClickHouse Delta DuckDB Rust

Delta Kernel makes it easy for engines and connectors to read and write Delta tables. It supports many Delta features and robust connectors, including DuckDB, Clickhouse, Spice AI and delta-dotnet. In this session, we'll cover lessons learned about how to build a high-performance library that lets engines integrate the way they want, while not having to worry about the details of the Delta protocol. We'll talk through how we streamlined the API as well as its changes and underlying motivations. We'll discuss some new highlight features like write support, and the ability to do CDF scans. Finally we'll cover the future roadmap for the Kernel project and what you can expect from the project over the coming year.

Christian Tzolov: Spring AI: Integrating Generative AI in Java Enterprise

2024-12-06 · DATA MINER Big Data Europe Conference 2020 Watch

video

by Christian Tzolov

AI/ML Analytics API Big Data Dashboard GenAI RAG Vector DB

🌟 Session Overview 🌟

Session Name: Spring AI: Integrating Generative AI in Java Enterprise Speaker: Christian Tzolov Session Description: This session explores Spring AI, a new framework enabling Java developers to integrate AI seamlessly into enterprise applications. Spring AI was born from the realization that using Generative AI is primarily an integration problem that boils down to integrating your enterprise data and APIs with the AI models.

In this talk, the Spring AI project lead will introduce you to the essential GenAI concepts and provide a hands-on guide to kick-start your AI application development journey. Spring AI offers a comprehensive suite of components required for building an AI software stack, upholding Spring's renowned design principles, such as portability and modular design.

This session will introduce many Spring AI features, starting with a portable client API to interact with AI models. You will learn how to create effective AI prompts, convert AI responses into POJOs, and use function calling to integrate your existing APIs with the AI model.

Use cases like “query over your docs” are demonstrated by showcasing Spring AI features such as creating embeddings and storing them in a vector database. The popular RAG pattern and ways you can effectively evaluate how your AI application is performing are discussed.

🚀 About Big Data and RPA 2024 🚀

Unlock the future of innovation and automation at Big Data & RPA Conference Europe 2024! 🌟 This unique event brings together the brightest minds in big data, machine learning, AI, and robotic process automation to explore cutting-edge solutions and trends shaping the tech landscape. Perfect for data engineers, analysts, RPA developers, and business leaders, the conference offers dual insights into the power of data-driven strategies and intelligent automation. 🚀 Gain practical knowledge on topics like hyperautomation, AI integration, advanced analytics, and workflow optimization while networking with global experts. Don’t miss this exclusive opportunity to expand your expertise and revolutionize your processes—all from the comfort of your home! 📊🤖✨

📅 Yearly Conferences: Curious about the evolution of QA? Check out our archive of past Big Data & RPA sessions. Watch the strategies and technologies evolve in our videos! 🚀 🔗 Find Other Years' Videos: 2023 Big Data Conference Europe https://www.youtube.com/playlist?list=PLqYhGsQ9iSEpb_oyAsg67PhpbrkCC59_g 2022 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEryAOjmvdiaXTfjCg5j3HhT 2021 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEqHwbQoWEXEJALFLKVDRXiP

💡 Stay Connected & Updated 💡

Don’t miss out on any updates or upcoming event information from Big Data & RPA Conference Europe. Follow us on our social media channels and visit our website to stay in the loop!

🌐 Website: https://bigdataconference.eu/, https://rpaconference.eu/ 👤 Facebook: https://www.facebook.com/bigdataconf, https://www.facebook.com/rpaeurope/ 🐦 Twitter: @BigDataConfEU, @europe_rpa 🔗 LinkedIn: https://www.linkedin.com/company/73234449/admin/dashboard/, https://www.linkedin.com/company/75464753/admin/dashboard/ 🎥 YouTube: http://www.youtube.com/@DATAMINERLT

Unlock innovation with AI by migrating enterprise apps to App Service | BRK207H

2023-11-16 · Microsoft Ignite 2023 Watch

video

by Ed Donahue , Stefan Schackow , Gaurav Seth (Microsoft) , Tulika Chaudharie , Michael YenChi Ho (Microsoft) , Scott Hunter (Microsoft) , Byron Tardif , Yutang Lin

AI/ML Azure Cloud Computing DevOps GitHub HTML Microsoft

Discover why Azure App Service is as fast growing as the managed platform of choice for migrating on-premises .NET and Java apps to the cloud. Learn how to deploy your web applications with ease, using built-in support for containers like GitHub and DevOps. Secure your apps with SSL, authentication, and firewall features. We'll also share latest tools and innovations to lower the cost and time to complete your migration projects.

To learn more, please check out these resources: * https://aka.ms/Ignite23CollectionsBRK207H * https://info.microsoft.com/ww-landing-contact-me-for-events-m365-in-person-events.html?LCID=en-us&ls=407628-contactme-formfill * https://aka.ms/azure-ignite2023-dataaiblog

𝗦𝗽𝗲𝗮𝗸𝗲𝗿𝘀: * Gaurav Seth * Scott Hunter * Tulika Chaudharie * Byron Tardif * Ed Donahue * Michael YenChi Ho * Stefan Schackow * Yutang Lin

𝗦𝗲𝘀𝘀𝗶𝗼𝗻 𝗜𝗻𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻: This video is one of many sessions delivered for the Microsoft Ignite 2023 event. View sessions on-demand and learn more about Microsoft Ignite at https://ignite.microsoft.com

BRK207H | English (US) | AI & Apps

MSIgnite

Unlocking the Power of Databricks SDKs: The Power to Integrate, Streamline, and Automate

2023-07-26 · Databricks DATA + AI Summit 2023 Watch

video

by Serge Smertin (Databricks)

Data Lakehouse Databricks Python Terraform

In today's data-driven landscape, the demands placed upon data engineers are diverse and multifaceted. With the integration of Java, Python, or Go microservices, Databricks SDKs provide a powerful bridge between the established ecosystems and Databricks. They allow data engineers to unlock new levels of integration and collaboration, as well as integrate Unity Catalog into processes to create advanced workflows straight from notebooks.

In this session, learn best practices for when and how to use SDK, command-line interface, or Terraform integration to seamlessly integrate with Databricks and revolutionize how you integrate with the Databricks Lakehouse. The session covers using shell scripts to automate complex tasks and streamline operations that improve scalability.

Talk by: Serge Smertin

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Photon for Dummies: How Does this New Execution Engine Actually Work?

2023-07-25 · Databricks DATA + AI Summit 2023 Watch

video

by Holly Smith (Databricks)

Computer Science Databricks Spark Virtual Machine

Did you finish the Photon whitepaper and think, wait, what? I know I did; it’s my job to understand it, explain it, and then use it. If your role involves using Apache Spark™ on Databricks, then you need to know about Photon and where to use it. Join me, chief dummy, nay "supreme" dummy, as I break down this whitepaper into easy to understand explanations that don’t require a computer science degree. Together we will unravel mysteries such as:

Why is a Java Virtual Machine the current bottleneck for Spark enhancements?
What does vectorized even mean? And how was it done before?
Why is the relationship status between Spark and Photon "complicated?"

In this session, we’ll start with the basics of Apache Spark, the details we pretend to know, and where those performance cracks are starting to show through. Only then will we start to look at Photon, how it’s different, where the clever design choices are and how you can make the most of this in your own workloads. I’ve spent over 50 hours going over the paper in excruciating detail; every reference, and in some instances, the references of the references so that you don’t have to.

Talk by: Holly Smith

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Change Data Streaming Patterns With Debezium & Apache Flink | Decodable

2023-05-11 · Data Council 2023 Watch

video

by Gunnar Morling (Decodable)

AI/ML Analytics Flink Data Engineering Data Streaming

ABOUT THE TALK: Microservices are one of the big trends in software engineering of the last few years.

In this session we'll discuss and showcase how open-source change data capture (CDC) with Debezium can help developers with typical challenges they often face when working on microservices.

Learn how to:

Employ the outbox pattern for reliable, eventually consistent data exchange between microservices, without incurring unsafe dual writes or tight coupling
Gradually extract microservices from existing monolithic applications, using CDC, the strangler fig pattern and Apache Flink
Coordinate long-running business transactions across multiple services using CDC-based saga orchestration, ensuring such activity gets consistently applied or aborted by all participating services.

ABOUT THE SPEAKER: Gunnar Morling is a software engineer and open-source enthusiast by heart, currently working at Decodable on stream processing based on Apache Flink. In his prior role as a software engineer at Red Hat, he led the Debezium project, a distributed platform for change data capture. He is a Java Champion and has founded multiple open source projects such as JfrUnit, kcctl, and MapStruct. Gunnar is an avid blogger (morling.dev) and has spoken at a wide range of conferences like QCon, Java One, and Devoxx. He lives in Hamburg, Germany.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

Improving Apache Spark Application Processing Time by Configurations, Code Optimizations, etc.

2022-07-19 · Databricks DATA + AI Summit 2023 Watch

video

Azure Databricks Kafka KPI Spark Data Streaming

In this session, we'll go over several use-cases and describe the process of improving our spark structured streaming application micro-batch time from ~55 to ~30 seconds in several steps.

Our app is processing ~ 700 MB/s of compressed data, it has very strict KPIs, and it is using several technologies and frameworks such as: Spark 3.1, Kafka, Azure Blob Storage, AKS and Java 11.

We'll share our work and experience in those fields, and go over a few tips to create better Spark structured streaming applications.

The main areas that will be discussed are: Spark Configuration changes, code optimizations and the implementation of the Spark custom data source.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Mosaic: A Framework for Geospatial Analytics at Scale

2022-07-19 · Databricks DATA + AI Summit 2023 Watch

video

Analytics API Databricks HTML PySpark Spark

In this session we’ll present Mosaic, a new Databricks Labs project with a geospatial flavour.

Mosaic provides users of Spark and Databricks with a unified framework for distributing geospatial analytics. Users can choose to employ existing Java-based tools such as JTS or Esri's Geometry API for Java and Mosaic will handle the task of parallelizing these tools' operations: e.g. efficiently reading and writing geospatial data and performing spatial functions on geometries. Mosaic helps users scale these operations by providing spatial indexing capabilities (using, for example, Uber's H3 library) and advanced techniques for optimising common point-in-polygon and polygon-polygon intersection operations.

The development of Mosaic builds upon techniques developed with Ordnance Survey (the central hub for geospatial data across UK Government) and described in this blog post: https://databricks.com/blog/2021/10/11/efficient-point-in-polygon-joins-via-pyspark-and-bng-geospatial-indexing.html

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Delta Lake 2.0 Overview

2022-07-19 · Databricks DATA + AI Summit 2023 Watch

video

Flink API Databricks Delta Go Presto Python Rust S3 Scala Spark Trino

After three years of hard work by the Delta community, we are proud to announce the release of Delta Lake 2.0. Completing the work to open-source all of Delta Lake while tens of thousands of organizations were running in production was no small feat and we have the ever-expanding Delta community to thank! Join this session to learn about how the wider Delta community collaborated together to bring these features and integrations together.

Join this session to learn about how the wider Delta community collaborated together to bring these features and integrations together. This includes the Integrations with Apache Spark™, Apache Flink, Apache Pulsar, Presto, Trino, and more.

Features such as OPTIMIZE ZORDER, data skipping using column stats, S3 multi-cluster writes, Change Data Feed, and more.

Language APIs including Rust, Python, Ruby, GoLang, Scala, and Java.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Privacy Preserving Machine Learning and Big Data Analytics Using Apache Spark

2022-07-19 · Databricks DATA + AI Summit 2023 Watch

video

AI/ML Analytics Big Data Cloud Computing Data Analytics Databricks PySpark Python Scala Cyber Security Spark

In recent years, latest privacy laws & regulations bring a fundamental shift in the protection of data and privacy, placing new challenges to data applications. To resolve these privacy & security challenges in big data ecosystem without impacting existing applications, several hardware TEE (Trusted Execution Environment) solutions have been proposed for Apache Spark, e.g., PySpark with Scone and Opaque etc. However, to the best of our knowledge, none of them provide full protection to data pipelines in Spark applications. An adversary may still get sensitive information from unprotected components and stages. Furthermore, some of them greatly narrowed supported applications, e.g., only support SparkSQL. In this presentation, we will present a new PPMLA (privacy preserving machine learning and analytics) solution built on top of Apache Spark, BigDL, Occlum and Intel SGX. It ensures all spark components and pipelines are fully protected by Intel SGX, and existing Spark applications written in Scala, Java or Python can be migrated into our platform without any code change. We will demonstrate how to build distributed end-to-end SparkML/SparkSQL workloads with our solution on untrusted cloud environment and share real-world use cases for PPMLA.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

talk-data.com

Java

Activity Trend

Top Events

Top Speakers

Use Azure Migrate for AI assisted insights and cloud transformation

What’s New in Apache Spark™ 4.0?

Creating a Custom PySpark Stream Reader with PySpark 4.0

Breaking Barriers: Building Custom Spark 4.0 Data Connectors with Python

Delta Kernel for Rust and Java

Christian Tzolov: Spring AI: Integrating Generative AI in Java Enterprise

Unlock innovation with AI by migrating enterprise apps to App Service | BRK207H

MSIgnite

Unlocking the Power of Databricks SDKs: The Power to Integrate, Streamline, and Automate

Photon for Dummies: How Does this New Execution Engine Actually Work?

Change Data Streaming Patterns With Debezium & Apache Flink | Decodable

Improving Apache Spark Application Processing Time by Configurations, Code Optimizations, etc.

Mosaic: A Framework for Geospatial Analytics at Scale

Delta Lake 2.0 Overview

Privacy Preserving Machine Learning and Big Data Analytics Using Apache Spark