In this course, you'll learn concepts and perform labs that showcase workflows using Unity Catalog - Databricks' unified and open governance solution for data and AI. We'll start off with a brief introduction to Unity Catalog, discuss fundamental data governance concepts, and then dive into a variety of topics including using Unity Catalog for data access control, managing external storage and tables, data segregation, and more. Pre-requisites: Beginner familiarity with the Databricks Data Intelligence Platform (selecting clusters, navigating the Workspace, executing notebooks), cloud computing concepts (virtual machines, object storage, etc.), production experience working with data warehouses and data lakes, intermediate experience with basic SQL concepts (select, filter, groupby, join, etc), beginner programming experience with Python (syntax, conditions, loops, functions), beginner programming experience with the Spark DataFrame API (Configure DataFrameReader and DataFrameWriter to read and write data, Express query transformations using DataFrame methods and Column expressions, etc.) Labs: Yes Certification Path: Databricks Certified Data Engineer Associate
talk-data.com
Topic
Python
30
tagged
Activity Trend
Top Events
In this course, you’ll learn how to orchestrate data pipelines with Lakeflow Jobs (previously Databricks Workflows) and schedule dashboard updates to keep analytics up-to-date. We’ll cover topics like getting started with Lakeflow Jobs, how to use Databricks SQL for on-demand queries, and how to configure and schedule dashboards and alerts to reflect updates to production data pipelines. Pre-requisites: Beginner familiarity with the Databricks Data Intelligence Platform (selecting clusters, navigating the Workspace, executing notebooks), cloud computing concepts (virtual machines, object storage, etc.), production experience working with data warehouses and data lakes, intermediate experience with basic SQL concepts (select, filter, groupby, join, etc), beginner programming experience with Python (syntax, conditions, loops, functions), beginner programming experience with the Spark DataFrame API (Configure DataFrameReader and DataFrameWriter to read and write data, Express query transformations using DataFrame methods and Column expressions, etc.) Labs: No Certification Path: Databricks Certified Data Engineer Associate
Join us for an introductory session on Databricks DQX, a Python-based framework designed to validate the quality of PySpark DataFrames. Discover how DQX can empower you to proactively tackle data quality challenges, enhance pipeline reliability and make more informed business decisions with confidence. Traditional data quality tools often fall short by providing limited, actionable insights, relying heavily on post-factum monitoring, and being restricted to batch processing. DQX overcomes these limitations by enabling real-time quality checks at the point of data entry, supporting both batch and streaming data validation and delivering granular insights at the row and column level. If you’re seeking a simple yet powerful data quality framework that integrates seamlessly with Databricks, this session is for you.
This course is designed to introduce three primary machine learning deployment strategies and illustrate the implementation of each strategy on Databricks. Following an exploration of the fundamentals of model deployment, the course delves into batch inference, offering hands-on demonstrations and labs for utilizing a model in batch inference scenarios, along with considerations for performance optimization. The second part of the course comprehensively covers pipeline deployment, while the final segment focuses on real-time deployment. Participants will engage in hands-on demonstrations and labs, deploying models with Model Serving and utilizing the serving endpoint for real-time inference. By mastering deployment strategies for a variety of use cases, learners will gain the practical skills needed to move machine learning models from experimentation to production. This course shows you how to operationalize AI solutions efficiently, whether it's automating decisions in real-time or integrating intelligent insights into data pipelines. Pre-requisites: Familiarity with Databricks workspace and notebooks, familiarity with Delta Lake and Lakehouse, intermediate level knowledge of Python (e.g. common Python libraries for DS/ML like Scikit-Learn, awareness of model deployment strategies) Labs: Yes Certification Path: Databricks Certified Machine Learning Associate
The course is designed to cover advanced concepts and workflows in machine learning operations. It starts by introducing participants to continuous integration (CI) and continuous development (CD) workflows within machine learning projects, guiding them through the deployment of a sample CI/CD workflow using Databricks in the first section. Moving on to the second part, participants delve into data and model testing, where they actively create tests and automate CI/CD workflows. Finally, the course concludes with an exploration of model monitoring concepts, demonstrating the use of Lakehouse Monitoring to oversee machine learning models in production settings. Pre-requisites: Familiarity with Databricks workspace and notebooks; knowledge of machine learning model development and deployment with MLflow (e.g. intermediate-level knowledge of traditional ML concepts, development with CI/CD, the use of Python and Git for ML projects with popular platforms like GitHub) Labs: Yes Certification Path: Databricks Certified Machine Learning Professional
In this course, you’ll learn how to define and schedule data pipelines that incrementally ingest and process data through multiple tables on the Data Intelligence Platform, using Lakeflow Declarative Pipelines in Spark SQL and Python. We’ll cover topics like how to get started with Lakeflow Declarative Pipelines, how Lakeflow Declarative Pipelines tracks data dependencies in data pipelines, how to configure and run data pipelines using the Lakeflow Declarative Pipelines. UI, how to use Python or Spark SQL to define data pipelines that ingest and process data through multiple tables on the Data Intelligence Platform, using Auto Loader and Lakeflow Declarative Pipelines, how to use APPLY CHANGES INTO syntax to process Change Data Capture feeds, and how to review event logs and data artifacts created by pipelines and troubleshoot syntax.By streamlining and automating reliable data ingestion and transformation workflows, this course equips you with the foundational data engineering skills needed to help kickstart AI use cases. Whether you're preparing high-quality training data or enabling real-time AI-driven insights, this course is a key step in advancing your AI journey.Pre-requisites: Beginner familiarity with the Databricks Data Intelligence Platform (selecting clusters, navigating the Workspace, executing notebooks), cloud computing concepts (virtual machines, object storage, etc.), production experience working with data warehouses and data lakes, intermediate experience with basic SQL concepts (select, filter, groupby, join, etc), beginner programming experience with Python (syntax, conditions, loops, functions), beginner programming experience with the Spark DataFrame API (Configure DataFrameReader and DataFrameWriter to read and write data, Express query transformations using DataFrame methods and Column expressions, etc.)Labs: NoCertification Path: Databricks Certified Data Engineer Associate
In this course, you’ll learn how to develop traditional machine learning models on Databricks. We’ll cover topics like using popular ML libraries, executing common tasks efficiently with AutoML and MLflow, harnessing Databricks' capabilities to track model training, leveraging feature stores for model development, and implementing hyperparameter tuning. Additionally, the course covers AutoML for rapid and low-code model training, ensuring that participants gain practical, real-world skills for streamlined and effective machine learning model development in the Databricks environment. Pre-requisites: Familiarity with Databricks workspace and notebooks, familiarity with Delta Lake and Lakehouse, intermediate level knowledge of Python (e.g. common Python libraries for DS/ML like Scikit-Learn, fundamental ML algorithms like regression and classification, model evaluation with common metrics) Labs: Yes Certification Path: Databricks Certified Machine Learning Associate
In this course, you’ll learn how to have efficient data ingestion with Lakeflow Connect and manage that data. Topics include ingestion with built-in connectors for SaaS applications, databases and file sources, as well as ingestion from cloud object storage, and batch and streaming ingestion. We'll cover the new connector components, setting up the pipeline, validating the source and mapping to the destination for each type of connector. We'll also cover how to ingest data with Batch to Streaming ingestion into Delta tables, using the UI with Auto Loader, automating ETL with Lakeflow Declarative Pipelines or using the API.This will prepare you to deliver the high-quality, timely data required for AI-driven applications by enabling scalable, reliable, and real-time data ingestion pipelines. Whether you're supporting ML model training or powering real-time AI insights, these ingestion workflows form a critical foundation for successful AI implementation.Pre-requisites: Beginner familiarity with the Databricks Data Intelligence Platform (selecting clusters, navigating the Workspace, executing notebooks), cloud computing concepts (virtual machines, object storage, etc.), production experience working with data warehouses and data lakes, intermediate experience with basic SQL concepts (select, filter, groupby, join, etc), beginner programming experience with Python (syntax, conditions, loops, functions), beginner programming experience with the Spark DataFrame API (Configure DataFrameReader and DataFrameWriter to read and write data, Express query transformations using DataFrame methods and Column expressions, etc.Labs: NoCertification Path: Databricks Certified Data Engineer Associate
In this course, you’ll learn the fundamentals of preparing data for machine learning using Databricks. We’ll cover topics like exploring, cleaning, and organizing data tailored for traditional machine learning applications. We’ll also cover data visualization, feature engineering, and optimal feature storage strategies. By building a strong foundation in data preparation, this course equips you with the essential skills to create high-quality datasets that can power accurate and reliable machine learning and AI models. Whether you're developing predictive models or enabling downstream AI applications, these capabilities are critical for delivering impactful, data-driven solutions. Pre-requisites: Familiarity with Databricks workspace, notebooks, as well as Unity Catalog. An intermediate level knowledge of Python (scikit-learn, Matplotlib), Pandas, and PySpark. As well as with concepts of exploratory data analysis, feature engineering, standardization, and imputation methods). Labs: Yes Certification Path: Databricks Certified Machine Learning Associate
The course intends to equip professional-level machine learning practitioners with knowledge and hands-on experience in utilizing Apache Spark™ for machine learning purposes, including model fine-tuning. Additionally, the course covers using the Pandas library for scalable machine learning tasks. The initial section of the course focuses on comprehending the fundamentals of Apache Spark™ along with its machine learning capabilities. Subsequently, the second section delves into fine-tuning models using the hyperopt library. The final segment involves learning the implementation of the Pandas API within Apache Spark™, encompassing guidance on Pandas UDFs (User-Defined Functions) and the Functions API for model inference. Pre-requisites: Familiarity with Databricks workspace and notebooks; knowledge of machine learning model development and deployment with MLflow (e.g. basic understanding of DS/ML concepts, common model metrics and python libraries as well as a basic understanding of scaling workloads with Spark) Labs: Yes Certification Path: Databricks Certified Machine Learning Professional