talk-data.com

Topic

GitHub

version_control collaboration code_hosting

Activities

tagged

Activity Trend

79 peak/qtr

2020-Q1 2026-Q1

Top Events

ADSP: Algorithms + Data Structures = Programs 154 Data Engineering Podcast 123 DataTalks.Club 104 Microsoft Ignite 2025 50 Data Skeptic 13 O'Reilly Data Engineering Books 12 DataTopics: All Things Data, AI & Tech 11 O'Reilly Data Science Books 11 Data + AI Summit 2025 10 SciPy 2025 10 Airflow Summit 2025 8 The Pragmatic Engineer 8

Top Speakers

Conor Hoekstra 154 Bryce Adelstein Lelbach (NVIDIA) 148 Tobias Macey 123 Ben Deane 40 Sean Parent (Adobe) 15 Kyle Polich 13 Tristan Brindle (C++ London Uni) 11 Gergely Orosz 7 Zach Laine 6 Mukundan Sankar 6 Paulo Vasconcellos 5 Michael YenChi Ho (Microsoft) 5

Activities

Showing filtered results

All Video Podcast Book

Filtering by: Data + AI Summit 2025 ×

Hands-on Learning: AI-Powered Data Engineering with Lakeflow: Techniques for Modern Data Professionals (repeat)

2025-06-12 · Data + AI Summit 2025

talk

by Frank Munz (Databricks)

AI/ML Data Engineering Data Governance Databricks GenAI SQL Data Streaming

This session is repeated. This introductory workshop caters to data engineers seeking hands-on experience and data architects looking to deepen their knowledge. The workshop is structured to provide a solid understanding of the following data engineering and streaming concepts: Introduction to Lakeflow and the Data Intelligence Platform Getting started with Lakeflow Declarative Pipelines for declarative data pipelines in SQL using Streaming Tables and Materialized Views Mastering Databricks Workflows with advanced control flow and triggers Understanding serverless compute Data governance and lineage with Unity Catalog Generative AI for Data Engineers: Genie and Databricks Assistant We believe you can only become an expert if you work on real problems and gain hands-on experience. Therefore, we will equip you with your own lab environment in this workshop and guide you through practical exercises like using GitHub, ingesting data from various sources, creating batch and streaming data pipelines, and more.

American Airlines Flies to New Heights with Data Intelligence

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Saimahesh Chava (American Airlines) , Yash Joshi (Accenture)

API Data Governance Databricks Hive

American Airlines migrated from Hive Metastore to Unity Catalog using automated processes with Databricks APIs and GitHub Actions. This automation streamlined the migration for many applications within AA, ensuring consistency, efficiency and minimal disruption while enhancing data governance and disaster recovery capabilities.

Hands-on Learning: AI-Powered Data Engineering with Lakeflow: Techniques for Modern Data Professionals

2025-06-11 · Data + AI Summit 2025

talk

by Frank Munz (Databricks)

AI/ML Data Engineering Data Governance Databricks GenAI SQL Data Streaming

This introductory workshop caters to data engineers seeking hands-on experience and data architects looking to deepen their knowledge. The workshop is structured to provide a solid understanding of the following data engineering and streaming concepts: Introduction to Lakeflow and the Data Intelligence Platform Getting started with Lakeflow Declarative Pipelines for declarative data pipelines in SQL using Streaming Tables and Materialized Views Mastering Databricks Workflows with advanced control flow and triggers Understanding serverless compute Data governance and lineage with Unity Catalog Generative AI for Data Engineers: Genie and Databricks Assistant We believe you can only become an expert if you work on real problems and gain hands-on experience. Therefore, we will equip you with your own lab environment in this workshop and guide you through practical exercises like using GitHub, ingesting data from various sources, creating batch and streaming data pipelines, and more.

Meet Goose, an Open Source AI Agent

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Bradley Axen (Block)

AI/ML Jira

goose is an open source AI agent framework that allows anyone to connect language model output to real world action. Released in January by Block (the company made up of Square, Cash App, Afterpay, and TIDAL), its use cases range from vibe coding to connecting all of the internal apps and services an enterprise uses. It can be powered by any language model that has tool calling capabilities.goose's modular design allows it to connect with any system through simple extensions. Built on the open Model Context Protocol (developed with Anthropic), goose transforms natural language into actions across various tools and services. Whether integrating with platforms like Jira and GitHub, or executing system commands and scripts, its plug-and-play architecture means anyone can extend Goose's capabilities to suit their needs.Finally, goose has both a command line interface and desktop app — it isn't limited to an IDE to start connecting to MCP servers and building powerful agentic workflows.

Automated Deployment with Databricks Asset Bundles

2025-06-10 · Data + AI Summit 2025

talk

API CI/CD Data Engineering Databricks DataOps Delta DevOps Spark

This course provides a comprehensive review of DevOps principles and their application to Databricks projects. It begins with an overview of core DevOps, DataOps, continuous integration (CI), continuous deployment (CD), and testing, and explores how these principles can be applied to data engineering pipelines. The course then focuses on continuous deployment within the CI/CD process, examining tools like the Databricks REST API, SDK, and CLI for project deployment. You will learn about Databricks Asset Bundles (DABs) and how they fit into the CI/CD process. You’ll dive into their key components, folder structure, and how they streamline deployment across various target environments in Databricks. You will also learn how to add variables, modify, validate, deploy, and execute Databricks Asset Bundles for multiple environments with different configurations using the Databricks CLI. Finally, the course introduces Visual Studio Code as an Interactive Development Environment (IDE) for building, testing, and deploying Databricks Asset Bundles locally, optimizing your development process. The course concludes with an introduction to automating deployment pipelines using GitHub Actions to enhance the CI/CD workflow with Databricks Asset Bundles. By the end of this course, you will be equipped to automate Databricks project deployments with Databricks Asset Bundles, improving efficiency through DevOps practices. Pre-requisites: Strong knowledge of the Databricks platform, including experience with Databricks Workspaces, Apache Spark, Delta Lake, the Medallion Architecture, Unity Catalog, Delta Live Tables, and Workflows. In particular, knowledge of leveraging Expectations with Lakeflow Declarative Pipelines. Labs : Yes Certification Path: Databricks Certified Data Engineer Professional

Deploying Databricks Asset Bundles (DABs) at Scale

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Saad Ansari (Databricks) , Pieter Noordhuis (Databricks)

AI/ML Azure Azure DevOps BI Dashboard Databricks DevOps Git

This session is repeated.Managing data and AI workloads in Databricks can be complex. Databricks Asset Bundles (DABs) simplify this by enabling declarative, Git-driven deployment workflows for notebooks, jobs, Lakeflow Declarative Pipelines, dashboards, ML models and more.Join the DABs Team for a Deep Dive and learn about:The Basics: Understanding Databricks asset bundlesDeclare, define and deploy assets, follow best practices, use templates and manage dependenciesCI/CD & Governance: Automate deployments with GitHub Actions/Azure DevOps, manage Dev vs. Prod differences, and ensure reproducibilityWhat’s new and what's coming up! AI/BI Dashboard support, Databricks Apps support, a Pythonic interface and workspace-based deploymentIf you're a data engineer, ML practitioner or platform architect, this talk will provide practical insights to improve reliability, efficiency and compliance in your Databricks workflows.

Building AI Models In Health Care Using Semi-Synthetic Data

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Holden Karau (Fight Health Insurance)

AI/ML

Regulated or restricted fields like Health Care make collecting training data complicated. We all want to do the right thing, but how? This talk will look at how Fight Health Insurance used de-identified public and proprietary information to create a semi-synthetic training set for use in fine-tuning machine learning models to power Fight Paperwork. We'll explore how to incorporate the latest "reasoning" techniques in fine tuning as well as how to make models that you can afford to serve — think single GPU inference instead of a cluster of A100s. In addition to the talk we have the code used in a public GitHub repo — although it is a little rough, so you might want to use it more as a source of inspiration rather than directly forking it.

CI/CD for Databricks: Advanced Asset Bundles and GitHub Actions

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Dustin Vannoy (Databricks)

CI/CD Databricks Python

This session is repeated.Databricks Asset Bundles (DABs) provide a way to use the command line to deploy and run a set of Databricks assets — like notebooks, Python code, Lakeflow Declarative Pipelines and workflows. To automate deployments, you create a deployment pipeline that uses the power of DABs along with other validation steps to ensure high quality deployments.In this session you will learn how to automate CI/CD processes for Databricks while following best practices to keep deployments easy to scale and maintain. After a brief explanation of why Databricks Asset Bundles are a good option for CI/CD, we will walk through a working project including advanced variables, target-specific overrides, linting, integration testing and automatic deployment upon code review approval. You will leave the session clear on how to build your first GitHub Action using DABs.ub Action using DABs.

From Code Completion to Autonomous Software Engineering Agents

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Kilian Lieret (Princeton University)

AI/ML

As language models have advanced, they have moved beyond code completion and are beginning to tackle software engineering tasks in a more autonomous, agentic way. However, evaluating agentic capabilities is challenging. To address this, we first introduce SWE-bench, a benchmark built from real GitHub issues that has become the standard for assessing AI’s ability to resolve complex software tasks in large codebases. We will discuss the current state of the field, the limitations of today’s models, and how far we still are from truly autonomous AI developers. Next, we will explore the fundamentals of agents based on hands-on demonstrations with SWE-agent, a simple yet powerful agent framework designed for software engineering but adaptable to a variety of domains. By the end of this session, you will have a clear understanding of the current frontier of agentic AI in software engineering, the challenges ahead and how you can experiment with AI agents in your own workflows.

Advanced Machine Learning Operations

2025-06-09 · Data + AI Summit 2025

talk

AI/ML CI/CD Data Lakehouse Databricks Git Python

The course is designed to cover advanced concepts and workflows in machine learning operations. It starts by introducing participants to continuous integration (CI) and continuous development (CD) workflows within machine learning projects, guiding them through the deployment of a sample CI/CD workflow using Databricks in the first section. Moving on to the second part, participants delve into data and model testing, where they actively create tests and automate CI/CD workflows. Finally, the course concludes with an exploration of model monitoring concepts, demonstrating the use of Lakehouse Monitoring to oversee machine learning models in production settings. Pre-requisites: Familiarity with Databricks workspace and notebooks; knowledge of machine learning model development and deployment with MLflow (e.g. intermediate-level knowledge of traditional ML concepts, development with CI/CD, the use of Python and Git for ML projects with popular platforms like GitHub) Labs: Yes Certification Path: Databricks Certified Machine Learning Professional