talk-data.com talk-data.com

B

Speaker

Bennie Haelen

2

talks

author

Filter by Event / Source

Talks & appearances

2 activities · Newest first

Search activities →
ML and Generative AI in the Data Lakehouse

In today's race to harness generative AI, many teams struggle to integrate these advanced tools into their business systems. While platforms like GPT-4 and Google's Gemini are powerful, they aren't always tailored to specific business needs. This book offers a practical guide to building scalable, customized AI solutions using the full potential of data lakehouse architecture. Author Bennie Haelen covers everything from deploying ML and GenAI models in Databricks to optimizing performance with best practices. In this must-read for data professionals, you'll gain the tools to unlock the power of large language models (LLMs) by seamlessly combining data engineering and data science to create impactful solutions. Learn to build, deploy, and monitor ML and GenAI models on a data lakehouse architecture using Databricks Leverage LLMs to extract deeper, actionable insights from your business data residing in lakehouses Discover how to integrate traditional ML and GenAI models for customized, scalable solutions Utilize open source models to control costs while maintaining model performance and efficiency Implement best practices for optimizing ML and GenAI models within the Databricks platform

Delta Lake: Up and Running

With the surge in big data and AI, organizations can rapidly create data products. However, the effectiveness of their analytics and machine learning models depends on the data's quality. Delta Lake's open source format offers a robust lakehouse framework over platforms like Amazon S3, ADLS, and GCS. This practical book shows data engineers, data scientists, and data analysts how to get Delta Lake and its features up and running. The ultimate goal of building data pipelines and applications is to gain insights from data. You'll understand how your storage solution choice determines the robustness and performance of the data pipeline, from raw data to insights. You'll learn how to: Use modern data management and data engineering techniques Understand how ACID transactions bring reliability to data lakes at scale Run streaming and batch jobs against your data lake concurrently Execute update, delete, and merge commands against your data lake Use time travel to roll back and examine previous data versions Build a streaming data quality pipeline following the medallion architecture