talk-data.com talk-data.com

Topic

Databricks

big_data analytics spark

1286

tagged

Activity Trend

515 peak/qtr
2020-Q1 2026-Q1

Activities

1286 activities · Newest first

The Data Engineer's Guide to Microsoft Fabric

Modern data engineering is evolving; and with Microsoft Fabric, the entire data platform experience is being redefined. This essential book offers a fresh, hands-on approach to navigating this shift. Rather than being an introduction to features, this guide explains how Fabric's key components—Lakehouse, Warehouse, and Real-Time Intelligence—work under the hood and how to put them to use in realistic workflows. Written by Christian Henrik Reich, a data engineering expert with experience that extends from Databricks to Fabric, this book is a blend of foundational theory and practical implementation of lakehouse solutions in Fabric. You'll explore how engines like Apache Spark and Fabric Warehouse collaborate with Fabric's Real-Time Intelligence solution in an integrated platform, and how to build ETL/ELT pipelines that deliver on speed, accuracy, and scale. Ideal for both new and practicing data engineers, this is your entry point into the fabric of the modern data platform. Acquire a working knowledge of lakehouses, warehouses, and streaming in Fabric Build resilient data pipelines across real-time and batch workloads Apply Python, Spark SQL, T-SQL, and KQL within a unified platform Gain insight into architectural decisions that scale with data needs Learn actionable best practices for engineering clean, efficient, governed solutions

Generative AI on Microsoft Azure

Companies are now moving generative AI projects from the lab to production environments. To support these increasingly sophisticated applications, they're turning to advanced practices such as multiagent architectures and complex code-based frameworks. This practical handbook shows you how to leverage cutting-edge techniques using Microsoft's powerful ecosystem of tools to deploy trustworthy AI systems tailored to your organization's needs. Written for and by AI professionals, Generative AI on Microsoft Azure goes beyond the technical core aspects, examining underlying principles, tools, and practices in depth, from the art of prompt engineering to strategies for fine-tuning models to advanced techniques like retrieval-augmented generation (RAG) and agentic AI. Through real-world case studies and insights from top experts, you'll learn how to harness AI's full potential on Azure, paving the way for groundbreaking solutions and sustainable success in today's AI-driven landscape. Understand the technical foundations of generative AI and how the technology has evolved over the last few years Implement advanced GenAI applications using Microsoft services like Azure AI Foundry, Copilot, GitHub Models, Azure Databricks, and Snowflake on Azure Leverage patterns, tools, frameworks, and platforms to customize AI projects Manage, govern, and secure your AI-enabled systems with responsible AI practices Build upon expert guidance to avoid common pitfalls, future-proof your applications, and more

Data Engineering with Azure Databricks

Master end-to-end data engineering on Azure Databricks. From data ingestion and Delta Lake to CI/CD and real-time streaming, build secure, scalable, and performant data solutions with Spark, Unity Catalog, and ML tools. Key Features Build scalable data pipelines using Apache Spark and Delta Lake Automate workflows and manage data governance with Unity Catalog Learn real-time processing and structured streaming with practical use cases Implement CI/CD, DevOps, and security for production-ready data solutions Explore Databricks-native ML, AutoML, and Generative AI integration Book Description "Data Engineering with Azure Databricks" is your essential guide to building scalable, secure, and high-performing data pipelines using the powerful Databricks platform on Azure. Designed for data engineers, architects, and developers, this book demystifies the complexities of Spark-based workloads, Delta Lake, Unity Catalog, and real-time data processing. Beginning with the foundational role of Azure Databricks in modern data engineering, you’ll explore how to set up robust environments, manage data ingestion with Auto Loader, optimize Spark performance, and orchestrate complex workflows using tools like Azure Data Factory and Airflow. The book offers deep dives into structured streaming, Delta Live Tables, and Delta Lake’s ACID features for data reliability and schema evolution. You’ll also learn how to manage security, compliance, and access controls using Unity Catalog, and gain insights into managing CI/CD pipelines with Azure DevOps and Terraform. With a special focus on machine learning and generative AI, the final chapters guide you in automating model workflows, leveraging MLflow, and fine-tuning large language models on Databricks. Whether you're building a modern data lakehouse or operationalizing analytics at scale, this book provides the tools and insights you need. What you will learn Set up a full-featured Azure Databricks environment Implement batch and streaming ingestion using Auto Loader Optimize Spark jobs with partitioning and caching Build real-time pipelines with structured streaming and DLT Manage data governance using Unity Catalog Orchestrate production workflows with jobs and ADF Apply CI/CD best practices with Azure DevOps and Git Secure data with RBAC, encryption, and compliance standards Use MLflow and Feature Store for ML pipelines Build generative AI applications in Databricks Who this book is for This book is for data engineers, solution architects, cloud professionals, and software engineers seeking to build robust and scalable data pipelines using Azure Databricks. Whether you're migrating legacy systems, implementing a modern lakehouse architecture, or optimizing data workflows for performance, this guide will help you leverage the full power of Databricks on Azure. A basic understanding of Python, Spark, and cloud infrastructure is recommended.

Snowflake and Databricks have become significant vendors to many enterprises, with increasing product offerings and a broad array of use cases. This session provides strategies and tactics to negotiate the best contractual deal and pose critical questions. D&A leaders will learn how their organizations can best estimate future usage needs, mitigate risk when negotiating with the vendor, discover how to best attain leverage, what is and is not leverage in Salesforce's view, etc.

Snowflake and Databricks have become significant vendors to many enterprises, with increasing product offerings and broad array of use cases. This session provides attendees with strategies and tactics to negotiate the best contractual deal with Snowflake or Databricks and to pose questions when it comes to dealing with them. D&A leaders need to understand how their organization can best estimate future usage needs, mitigate risk when negotiating with the vendor, discover how to best attain leverage, what is and is not leverage in Salesforce's view, etc.

ML and Generative AI in the Data Lakehouse

In today's race to harness generative AI, many teams struggle to integrate these advanced tools into their business systems. While platforms like GPT-4 and Google's Gemini are powerful, they aren't always tailored to specific business needs. This book offers a practical guide to building scalable, customized AI solutions using the full potential of data lakehouse architecture. Author Bennie Haelen covers everything from deploying ML and GenAI models in Databricks to optimizing performance with best practices. In this must-read for data professionals, you'll gain the tools to unlock the power of large language models (LLMs) by seamlessly combining data engineering and data science to create impactful solutions. Learn to build, deploy, and monitor ML and GenAI models on a data lakehouse architecture using Databricks Leverage LLMs to extract deeper, actionable insights from your business data residing in lakehouses Discover how to integrate traditional ML and GenAI models for customized, scalable solutions Utilize open source models to control costs while maintaining model performance and efficiency Implement best practices for optimizing ML and GenAI models within the Databricks platform

A session for Delivery Managers, Product Owners, and Leaders on moving beyond PoC to scalable, production-ready Databricks deployments. Topics include structuring projects from discovery through production with clear milestones, managing stakeholder expectations, establishing delivery governance, deploying Databricks infrastructure (multi-environment Dev/Test/Prod, Unity Catalog, storage configuration, and CI/CD with Databricks Asset Bundles), and creating a clear path to productionisation. Real-world delivery examples illustrate how to accelerate time to value, de-risk deployments, and build stakeholder confidence in data & AI programs.

The rise of AI has sparked excitement, disruption and a fair bit of existential dread among data engineers. With automation encroaching on traditional workflows and generative models promising to write code, build pipelines and even architect solutions, where does that leave the humble data engineer?

In this engaging and thought provoking session, Simon Whiteley - Databricks Champion, YouTube creator, and CTO of Advancing Analytics cuts through the hype to explore what survival really looks like in this new AI driven landscape. Drawing on real-world experience delivering production grade AI solutions with Databricks, Simon will unpack:

  • The evolving role of the data engineer in AI-centric projects
  • Which skills are becoming obsolete and which are more vital than ever
  • How tools like Databricks are reshaping the engineering workflow
  • Practical strategies to stay relevant, valuable, and ahead of the curve

Whether you're a seasoned engineer or just starting out, this session will leave you with a clearer view of the future, a few laughs, and a toolkit for thriving not just surviving in the age of AI.

Efficient Time-Series Forecasting with Thousands of Local Models on Databricks

In industries like energy and retail, forecasting often requires local models when each time series has unique behavior — though training thousands of them can be overwhelming. However, training and managing thousands of such models presents scalability and operational challenges. This talk shows how we scaled local models on Databricks by leveraging the Pandas API on Spark, and shares practical lessons on storage, reuse, and scaling challenges to make this approach efficient when it’s truly needed

AWS re:Invent 2025 - Keynote Customer - Condé Nast

Sanjay Bhakta details Condé Nast's complete digital reinvention by migrating 800+ properties to AWS infrastructure with partners like Databricks and Snowplow, transforming from data-rich/insights-poor to cloud-native, personalized content delivery.

Learn more about AWS events: https://go.aws/events

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world’s most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSEvents

Accelerate Data and AI transformation with Azure Databricks

As organizations aim to be more data-driven, integrated, scalable, and collaborative platforms are vital. Azure Databricks delivers unified data analytics for processing, AI, and real-time insights. Its full potential emerges within the integration with the Microsoft ecosystem. This session shows how Azure Databricks serves as the data and AI backbone while empowering users to leverage Microsoft solutions like Power BI, Power Apps and Microsoft Foundry for advanced, real-time decision-making.

Join this hands-on lab to design and deploy a modern, cloud-native analytics and AI solution using Azure Databricks, Microsoft Foundry, and Microsoft Copilot Studio. Work with the Zava-Litware scenario to perform data ingestion, orchestration with Lakeflow, AI-driven insights via Genie, mirrored catalog in Microsoft Fabric, Copilot Studio low-code automation, and advanced reporting in Power BI. Build a scalable, cost-efficient solution showcasing AI-powered analytics for business transformation.

Please RSVP and arrive at least 5 minutes before the start time, at which point remaining spaces are open to standby attendees.

Discover how to supercharge analytics and AI workflows using Azure Databricks and Microsoft Fabric. This hands-on lab explores native AI/BI features in Azure Databricks, including ML-powered insights and real-time analytics. Learn multiple ways to serve data to Power BI, with a deep dive into Direct Lake mode with Fabric. Ideal for developers, data scientists, data analysts, and engineers modernizing BI with lakehouse architecture in the AI era.

Please RSVP and arrive at least 5 minutes before the start time, at which point remaining spaces are open to standby attendees.

Discover how to supercharge analytics and AI workflows using Azure Databricks and Microsoft Fabric. This hands-on lab explores native AI/BI features in Azure Databricks, including ML-powered insights and real-time analytics. Learn multiple ways to serve data to Power BI, with a deep dive into Direct Lake mode with Fabric. Ideal for developers, data scientists, data analysts, and engineers modernizing BI with lakehouse architecture in the AI era.

Please RSVP and arrive at least 5 minutes before the start time, at which point remaining spaces are open to standby attendees.

Discover how to supercharge analytics and AI workflows using Azure Databricks and Microsoft Fabric. This hands-on lab explores native AI/BI features in Azure Databricks, including ML-powered insights and real-time analytics. Learn multiple ways to serve data to Power BI, with a deep dive into Direct Lake mode with Fabric. Ideal for developers, data scientists, data analysts, and engineers modernizing BI with lakehouse architecture in the AI era.

Please RSVP and arrive at least 5 minutes before the start time, at which point remaining spaces are open to standby attendees.

Join this hands-on lab to design and deploy a modern, cloud-native analytics and AI solution using Azure Databricks, Microsoft Foundry, and Microsoft Copilot Studio. Work with the Zava-Litware scenario to perform data ingestion, orchestration with Lakeflow, AI-driven insights via Genie, mirrored catalog in Microsoft Fabric, Copilot Studio low-code automation, and advanced reporting in Power BI. Build a scalable, cost-efficient solution showcasing AI-powered analytics for business transformation.

Please RSVP and arrive at least 5 minutes before the start time, at which point remaining spaces are open to standby attendees.

Learn how partners can build scalable, secure AI solutions with Microsoft Foundry. Integrate models from OpenAI, Cohere, Mistral, Hugging Face, and Meta Llama using Azure Databricks, Cosmos DB, Snowflake, and SQL. Foundry enables orchestration of agents, model customization, and secure data workflows—all within environments like GitHub, Visual Studio, and Copilot Studio.

Join this hands-on lab to design and deploy a modern, cloud-native analytics and AI solution using Azure Databricks, Microsoft Foundry, and Microsoft Copilot Studio. Work with the Zava-Litware scenario to perform data ingestion, orchestration with Lakeflow, AI-driven insights via Genie, mirrored catalog in Microsoft Fabric, Copilot Studio low-code automation, and advanced reporting in Power BI. Build a scalable, cost-efficient solution showcasing AI-powered analytics for business transformation.

Please RSVP and arrive at least 5 minutes before the start time, at which point remaining spaces are open to standby attendees.