Source control for Fabric items; collaboration and rollback via version history.
talk-data.com
Topic
Git
136
tagged
Activity Trend
Top Events
Demonstration of Git-based source control for Fabric items, enabling collaboration and rollback through version history.
Source control for Fabric items, and WHY Git enables collaboration and rollback (version history for everything!)
Master end-to-end data engineering on Azure Databricks. From data ingestion and Delta Lake to CI/CD and real-time streaming, build secure, scalable, and performant data solutions with Spark, Unity Catalog, and ML tools. Key Features Build scalable data pipelines using Apache Spark and Delta Lake Automate workflows and manage data governance with Unity Catalog Learn real-time processing and structured streaming with practical use cases Implement CI/CD, DevOps, and security for production-ready data solutions Explore Databricks-native ML, AutoML, and Generative AI integration Book Description "Data Engineering with Azure Databricks" is your essential guide to building scalable, secure, and high-performing data pipelines using the powerful Databricks platform on Azure. Designed for data engineers, architects, and developers, this book demystifies the complexities of Spark-based workloads, Delta Lake, Unity Catalog, and real-time data processing. Beginning with the foundational role of Azure Databricks in modern data engineering, you’ll explore how to set up robust environments, manage data ingestion with Auto Loader, optimize Spark performance, and orchestrate complex workflows using tools like Azure Data Factory and Airflow. The book offers deep dives into structured streaming, Delta Live Tables, and Delta Lake’s ACID features for data reliability and schema evolution. You’ll also learn how to manage security, compliance, and access controls using Unity Catalog, and gain insights into managing CI/CD pipelines with Azure DevOps and Terraform. With a special focus on machine learning and generative AI, the final chapters guide you in automating model workflows, leveraging MLflow, and fine-tuning large language models on Databricks. Whether you're building a modern data lakehouse or operationalizing analytics at scale, this book provides the tools and insights you need. What you will learn Set up a full-featured Azure Databricks environment Implement batch and streaming ingestion using Auto Loader Optimize Spark jobs with partitioning and caching Build real-time pipelines with structured streaming and DLT Manage data governance using Unity Catalog Orchestrate production workflows with jobs and ADF Apply CI/CD best practices with Azure DevOps and Git Secure data with RBAC, encryption, and compliance standards Use MLflow and Feature Store for ML pipelines Build generative AI applications in Databricks Who this book is for This book is for data engineers, solution architects, cloud professionals, and software engineers seeking to build robust and scalable data pipelines using Azure Databricks. Whether you're migrating legacy systems, implementing a modern lakehouse architecture, or optimizing data workflows for performance, this guide will help you leverage the full power of Databricks on Azure. A basic understanding of Python, Spark, and cloud infrastructure is recommended.
Gain cutting-edge skills in building a full-stack web application with AI assistance. This book will guide you in creating your own travel application using React and Node.js, with MongoDB as the database, while emphasizing the use of Gen AI platforms like Perplexity.ai and Claude for quicker development and more accurate debugging. The book’s step-by-step approach will help you bridge the gap between traditional web development methods and modern AI-assisted techniques, making it both accessible and insightful. It provides valuable lessons on professional web application development practices. By focusing on a practical example, the book offers hands-on experience that mirrors real-world scenarios, equipping you with relevant and in-demand skills that can be easily transferred to other projects. The book emphasizes the principles of responsive design, teaching you how to create web applications that adapt seamlessly to different screen sizes and devices. This includes using fluid grids, media queries, and optimizing layouts for usability across various platforms. You will also learn how to design, manage, and query databases using MongoDB, ensuring you can effectively handle data storage and retrieval in your applications. Most significantly, the book will introduce you to generative AI tools and prompt engineering techniques that can accelerate coding and debugging processes. This modern approach will streamline development workflows and enhance productivity. By the end of this book, you will not only have learned how to create a complete web application from backend to frontend, along with database management, but you will also have gained invaluable associated skills such as using IDEs, version control, and deploying applications efficiently and effectively with AI. What You Will Learn How to build a full-stack web application from scratch How to use generative AI tools to enhance coding efficiency and streamline the development process How to create user-friendly interfaces that enhance the overall experience of your web applications How to design, manage, and query databases using MongoDB Who This Book Is For Frontend developers, backend developers, and full-stack developers.
In this episode, Ciro Greco (Co-founder & CEO, Bauplan) joins me to discuss why the future of data infrastructure must be "Code-First" and how this philosophy accidentally created the perfect environment for AI Agents.
We explore why the "Modern Data Stack" isn't ready for autonomous agents and why a programmable lakehouse is the solution. Ciro explains that while we trust agents to write code (because we can roll it back), allowing them to write data requires strict safety rails.
He breaks down how Bauplan uses "Git for Data" semantics - branching, isolation, and transactionality - to provide an air-gapped sandbox where agents can safely operate without corrupting production data. Welcome to the future of the lakehouse.
Bauplan: https://www.bauplanlabs.com/
Summary In this crossover episode, Max Beauchemin explores how multiplayer, multi‑agent engineering is transforming the way individuals and teams build data and AI systems. He digs into the shifting boundary between data and AI engineering, the rise of “context as code,” and how just‑in‑time retrieval via MCP and CLIs lets agents gather what they need without bloating context windows. Max shares hard‑won practices from going “AI‑first” for most tasks, where humans focus on orchestration and taste, and the new bottlenecks that appear — code review, QA, async coordination — when execution accelerates 2–10x. He also dives deep into Agor, his open‑source agent orchestration platform: a spatial, multiplayer workspace that manages Git worktrees and live dev environments, templatizes prompts by workflow zones, supports session forking and sub‑sessions, and exposes an internal MCP so agents can schedule, monitor, and even coordinate other agents.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to dataengineeringpodcast.com/bruin today to get started. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Maxime Beauchemin about the impact of multi-player multi-agent engineering on individual and team velocity for building better data systemsInterview IntroductionHow did you get involved in the area of data management?Can you start by giving an overview of the types of work that you are relying on AI development agents for?As you bring agents into the mix for software engineering, what are the bottlenecks that start to show up?In my own experience there are a finite number of agents that I can manage in parallel. How does Agor help to increase that limit?How does making multi-agent management a multi-player experience change the dynamics of how you apply agentic engineering workflows?Contact Info LinkedInLinks AgorApache AirflowApache SupersetPresetClaude CodeCodexPlaywright MCPTmuxGit WorktreesOpencode.aiGitHub CodespacesOnaThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
This talk explores how AI-powered tools are revolutionizing the way developers interact with Git and GitHub. We'll dive into GitHub Copilot's capabilities beyond code completion, demonstrating how it can assist with Git workflows, commit message generation, and repository management tasks that traditionally required memorizing complex command-line syntax.
This comprehensive session covers the complete lifecycle of intelligent data agents in Microsoft Fabric, from initial configuration to enterprise-wide deployment. Learn proven best practices for building context-aware agents with curated data sources and effective instructions. Discover how to implement robust CI/CD pipelines powered by Git for managing configurations and deployments. The session also demonstrates consumption patterns of data agents across the AI ecosystem.
Bring software-engineering discipline to your data. Learn how Microsoft Fabric integrates Git and deployment pipelines to take a branch from validation to production—fast and safe. Learn how to use parameterized deployments for tables and views, run automated checks, and avoid common pitfalls like broken shortcuts. Walk away with practical patterns to operationalize your Lakehouse with confidence and speed.
Ditch the hand-cranked Word specs and kill your documentation debt for good. In this 45-minute demo you’ll see the Power Platform Documentation Extension turn every pipeline run into living, version-controlled docs—complete with ER-diagrams, data dictionaries, security-role matrices, option-set tables and workflow summaries. We’ll wire the extension into Azure DevOps, commit Markdown/Branded Word Documents artefacts back to Git. By session-end you’ll have a reusable YAML snippet that can be added to any Power Platform CI/CD flow.
Git is the backbone of modern software development — but mastering real-world workflows goes far beyond basic commits and pulls. This session dives deep into how Visual Studio and Visual Studio Code streamline the Git experience while still giving you full control over advanced operations and branching strategies. We’ll start with the everyday developer workflow: staging, committing, branching, merging, and synchronizing repositories directly inside the IDE. From there, we’ll tackle the challenges that arise in real projects — merge conflicts, rebase vs. merge, squash commits, and rewriting history when sensitive data accidentally enters your repository. You’ll see how Visual Studio’s graphical interface, combined with Git Bash and posh-git, provides the flexibility of the command line without losing the visual context developers rely on. The session concludes with branching and release management practices in Azure DevOps, demonstrating how to align Git workflows with CI/CD pipelines for clean, auditable, and collaborative development. Attendees will leave with practical strategies for keeping Git repositories clean, secure, and consistent, and a clear understanding of how to manage even complex workflows efficiently using Visual Studio, VS Code, and Azure DevOps.
In this twofold session, I'll cover how we've used dbt to bring order in heaps of SQL statements used to manage a datawarehouse. I'd like to share how dbt made our team more efficient and our data warehouse more resilient. Secondly, I'll highlight why dbt enabled a way forward on supporting low-code applications: by leveraging our data warehouse as a backend. I'll dive into systemic design, application architecture & data modelling. Tools/tech covered will be SQL, Trino, Outsystems, GIT, Airflow and of course dbt! Expect practical insights, architectural patterns, and lessons learned from a real-world implementation.
This practical, in-depth guide shows you how to build modern, sophisticated data processes using the Snowflake platform and DataOps.live —the only platform that enables seamless DataOps integration with Snowflake. Designed for data engineers, architects, and technical leaders, it bridges the gap between DataOps theory and real-world implementation, helping you take control of your data pipelines to deliver more efficient, automated solutions. . You’ll explore the core principles of DataOps and how they differ from traditional DevOps, while gaining a solid foundation in the tools and technologies that power modern data management—including Git, DBT, and Snowflake. Through hands-on examples and detailed walkthroughs, you’ll learn how to implement your own DataOps strategy within Snowflake and maximize the power of DataOps.live to scale and refine your DataOps processes. Whether you're just starting with DataOps or looking to refine and scale your existing strategies, this book—complete with practical code examples and starter projects—provides the knowledge and tools you need to streamline data operations, integrate DataOps into your Snowflake infrastructure, and stay ahead of the curve in the rapidly evolving world of data management. What You Will Learn Explore the fundamentals of DataOps , its differences from DevOps, and its significance in modern data management Understand Git’s role in DataOps and how to use it effectively Know why DBT is preferred for DataOps and how to apply it Set up and manage DataOps.live within the Snowflake ecosystem Apply advanced techniques to scale and evolve your DataOps strategy Who This Book Is For Snowflake practitioners—including data engineers, platform architects, and technical managers—who are ready to implement DataOps principles and streamline complex data workflows using DataOps.live.
With 700+ monthly BI users, Cribl scales self-service through governance, SDLC workflows, and smart AI practices. Join Priya Gupta and Chris Merrick to see how Git, Omni’s dbt integration, and a $20 auto-doc hack to enrich 100+ models helps deliver fast and trusted AI-powered insights.
Talk on distributed version control and how data projects can leverage Git and open formats like Apache Iceberg to enable multi-user data pipelines with snapshotting, time-travel, and branching.
Distributed version control systems - such as Git - unlock software development in multi-player mode: devs can safely work over the same code base, with standard (albeit perhaps not user-friendly!) abstractions for snapshotting, time-travel, and branching. Data folks have rarely been so lucky, as their projects crucially depend on data, whose life-cycle management is often cumbersome and custom. In this talk, we present open formats - such as Apache Iceberg - to practitioners with limited exposure to modern cloud infrastructure. In particular, we show how moving from datasets to tables unlocks a similar multi-player mode when building data pipelines, with equivalent abstractions for snapshotting, time-travel, branching, and a unified backbone for pipelines, data science, and AI use cases.
In this session we will introduce Snowflake’s latest developer tools for secure authentication, Git integration, collaborative notebooks, code workspaces, unified data engineering, and workflow automation - all aimed at streamlining product development and deployment.
We will also demonstrate how Snowflake enables engineers to query, aggregate, and extract insights from structured and unstructured data—including sales, support transcripts, and images—using AISQL functions and Cortex AI models. The demo highlights Snowflake Intelligence’s Data Agents, which let you interact with multimodal data sources and external apps, generate visualizations, and produce direct answers via natural language.
This is the introductory course for dbt Architects. Learn foundational concepts covered in the dbt Architect exam including setting up dbt projects according to best practices, manage dbt connections and environments, and leverage dbt features to enhance security, observability and cross-department collaborations After this course, you will be able to: Explain how dbt integrates with data platforms and git providers Configure environments for different git promotion strategies Manage a multi-project account in dbt. Prerequisites for this course include: dbt fundamentals What to Bring: You will need to bring your own laptop to complete the hands-on exercises. We will provide all the other sandbox environments for dbt and data platform. Duration: 4 hours Fee: $400 Trainings and certifications are not offered separately and must be purchased with a Coalesce pass Trainings and certifications are not available for Coalesce Online passes