In this course, you’ll learn how to apply patterns to securely store and delete personal information for data governance and compliance on the Data Intelligence Platform. We’ll cover topics like storing sensitive data appropriately to simplify granting access and processing deletes, processing deletes to ensure compliance with the right to be forgotten, performing data masking, and configuring fine-grained access control to configure appropriate privileges to sensitive data.Pre-requisites: Ability to perform basic code development tasks using the Databricks workspace (create clusters, run code in notebooks, use basic notebook operations, import repos from git, etc), intermediate programming experience with SQL and PySpark (extract data from a variety of file formats and data sources, apply a number of common transformations to clean data, reshape and manipulate complex data using advanced built-in functions), intermediate programming experience with Delta Lake (create tables, perform complete and incremental updates, compact files, restore previous versions etc.). Beginner experience with Lakeflow Declarative Pipelines and streaming workloads.Labs: YesCertification Path: Databricks Certified Data Engineer Professional
talk-data.com
Topic
Data Streaming
64
tagged
Activity Trend
Top Events
In this course, you’ll learn how to optimize workloads and physical layout with Spark and Delta Lake and and analyze the Spark UI to assess performance and debug applications. We’ll cover topics like streaming, liquid clustering, data skipping, caching, photons, and more. Pre-requisites: Ability to perform basic code development tasks using the Databricks workspace (create clusters, run code in notebooks, use basic notebook operations, import repos from git, etc), intermediate programming experience with SQL and PySpark (extract data from a variety of file formats and data sources, apply a number of common transformations to clean data, reshape and manipulate complex data using advanced built-in functions), intermediate programming experience with Delta Lake (create tables, perform complete and incremental updates, compact files, restore previous versions etc.). Labs: Yes Certification Path: Databricks Certified Data Engineer Professional
In this course, you’ll learn how to Incrementally process data to power analytic insights with Structured Streaming and Auto Loader, and how to apply design patterns for designing workloads to perform ETL on the Data Intelligence Platform with Lakeflow Declarative Pipelines. First, we’ll cover topics including ingesting raw streaming data, enforcing data quality, implementing CDC, and exploring and tuning state information. Then, we’ll cover options to perform a streaming read on a source, requirements for end-to-end fault tolerance, options to perform a streaming write to a sink, and creating an aggregation and watermark on a streaming dataset. Pre-requisites: Ability to perform basic code development tasks using the Databricks workspace (create clusters, run code in notebooks, use basic notebook operations, import repos from git, etc.), intermediate programming experience with SQL and PySpark (extract data from a variety of file formats and data sources, apply a number of common transformations to clean data, reshape and manipulate complex data using advanced built-in functions), intermediate programming experience with Delta Lake (create tables, perform complete and incremental updates, compact files, restore previous versions etc.). Beginner experience with streaming workloads and familiarity with Lakeflow Declarative Pipelines. Labs: No Certification Path: Databricks Certified Data Engineer Professional
In this course, you’ll learn how to have efficient data ingestion with Lakeflow Connect and manage that data. Topics include ingestion with built-in connectors for SaaS applications, databases and file sources, as well as ingestion from cloud object storage, and batch and streaming ingestion. We'll cover the new connector components, setting up the pipeline, validating the source and mapping to the destination for each type of connector. We'll also cover how to ingest data with Batch to Streaming ingestion into Delta tables, using the UI with Auto Loader, automating ETL with Lakeflow Declarative Pipelines or using the API.This will prepare you to deliver the high-quality, timely data required for AI-driven applications by enabling scalable, reliable, and real-time data ingestion pipelines. Whether you're supporting ML model training or powering real-time AI insights, these ingestion workflows form a critical foundation for successful AI implementation.Pre-requisites: Beginner familiarity with the Databricks Data Intelligence Platform (selecting clusters, navigating the Workspace, executing notebooks), cloud computing concepts (virtual machines, object storage, etc.), production experience working with data warehouses and data lakes, intermediate experience with basic SQL concepts (select, filter, groupby, join, etc), beginner programming experience with Python (syntax, conditions, loops, functions), beginner programming experience with the Spark DataFrame API (Configure DataFrameReader and DataFrameWriter to read and write data, Express query transformations using DataFrame methods and Column expressions, etc.Labs: NoCertification Path: Databricks Certified Data Engineer Associate