talk-data.com talk-data.com

Topic

AI/ML

Artificial Intelligence/Machine Learning

data_science algorithms predictive_analytics

9014

tagged

Activity Trend

1532 peak/qtr
2020-Q1 2026-Q1

Activities

9014 activities · Newest first

Sponsored by: Atlan | How Fox & Atlan are Partnering to Make Metadata a Common System of Trust, Context, and Governance

With hundreds of millions viewing broadcasts from news to sports, Fox relies on a sophisticated and trusted architecture ingesting 100+ data sources, carefully governed to improve UX across products, drive sales and marketing, and ensure KPI tracking. Join Oliver Gomes, VP of Enterprise and Data Platform at Fox, and Prukalpa Sankar of Atlan to learn how true partnership helps their team navigate opportunities from Governance to AI. To govern and democratize their multi-cloud data platform, Fox chose Atlan to make data accessible and understandable for more users than ever before. Their team then used a data product approach to create a shared language using context from sources like Unity Catalog at a single point of access, no matter the underlying technology. Now, Fox is defining an ambitious future for Metadata. With Atlan and Iceberg driving interoperability, their team prepares to build a “control plane”, creating a common system of trust and governance.

Sponsored by: dbt Labs | Empowering the Enterprise for the Next Era of AI and BI

The next era of data transformation has arrived. AI is enhancing developer workflows, enabling downstream teams to collaborate effectively through governed self-service. Additionally, SQL comprehension is producing detailed metadata that boosts developer efficiency while ensuring data quality and cost optimization. Experience this firsthand with dbt’s data control plane, a centralized platform that provides organizations with repeatable, scalable, and governed methods to succeed with Databricks in the modern age.

Sponsored by: EY | Navigating the Future: Knowledge-Powered Insights on AI, Information Governance, Real-Time Analytics

In an era where data drives strategic decision-making, organizations must adapt to the evolving landscape of business analytics. This session will focus on three pivotal themes shaping the future of data management and analytics in 2025. Join our panel of experts, including a Business Analytics Leader, Head of Information Governance, and Data Science Leader, as they explore: - Knowledge-Powered AI: Discover trends in Knowledge-Powered AI and how these initiatives can revolutionize business analytics, with real-world examples of successful implementations. - Information Governance: Explore the role of information governance in ensuring data integrity and compliance. Our experts will discuss strategies for establishing robust frameworks that protect organizational assets. - Real-Time Analytics: Understand the importance of real-time analytics in today’s fast-paced environment. The panel will highlight how organizations can leverage real-time data for agile decision-making.

Unity Catalog Managed Tables: Faster Queries, Lower Costs, Effortless Data Management

What if you could simplify data management, boost performance, and cut costs-all at once? Join us to discover how Unity Catalog managed tables can slash your storage costs, supercharge query speeds, and automate optimizations with AI on the Data Intelligence Platform. Experience seamless interoperability with third-party clients, and be among the first to preview our new game-changing tool that makes moving to UC managed tables effortless. Don’t miss this exciting session that will redefine your data strategy!

AI-Powered Marketing Data Management: Solving the Dirty Data Problem with Databricks

Marketing teams struggle with ‘dirty data’ — incomplete, inconsistent, and inaccurate information that limits campaign effectiveness and reduces the accuracy of AI agents. Our AI-powered marketing data management platform, built on Databricks, solves this with anomaly detection, ML-driven transformations and the built-in Acxiom Referential Real ID Graph with Data Hygiene.We’ll showcase how Delta Lake, Unity Catalog and Lakeflow Declarative Pipelines power our multi-tenant architecture, enabling secure governance and 75% faster data processing. Our privacy-first design ensures compliance with GDPR, CCPA and HIPAA through role-based access, encryption key management and fine-grained data controls.Join us for a live demo and Q&A, where we’ll share real-world results and lessons learned in building a scalable, AI-driven marketing data solution with Databricks.

Boosting Data Science and AI Productivity With Databricks Notebooks

This session is repeated. Want to accelerate your team's data science workflow? This session reveals how Databricks Notebooks can transform your productivity through an optimized environment designed specifically for data science and AI work. Discover how notebooks serve as a central collaboration hub where code, visualizations, documentation and results coexist seamlessly, enabling faster iteration and development. Key takeaways: Leveraging interactive coding features including multi-language support, command-mode shortcuts and magic commands Implementing version control best practices through Git integration and notebook revision history Maximizing collaboration through commenting, sharing and real-time co-editing capabilities Streamlining ML workflows with built-in MLflow tracking and experiment management You'll leave with practical techniques to enhance your notebook-based workflow and deliver AI projects faster with higher-quality results.

In this course, you'll learn concepts and perform labs that showcase workflows using Unity Catalog - Databricks' unified and open governance solution for data and AI. We'll start off with a brief introduction to Unity Catalog, discuss fundamental data governance concepts, and then dive into a variety of topics including using Unity Catalog for data access control, managing external storage and tables, data segregation, and more. Pre-requisites: Beginner familiarity with the Databricks Data Intelligence Platform (selecting clusters, navigating the Workspace, executing notebooks), cloud computing concepts (virtual machines, object storage, etc.), production experience working with data warehouses and data lakes, intermediate experience with basic SQL concepts (select, filter, groupby, join, etc), beginner programming experience with Python (syntax, conditions, loops, functions), beginner programming experience with the Spark DataFrame API (Configure DataFrameReader and DataFrameWriter to read and write data, Express query transformations using DataFrame methods and Column expressions, etc.) Labs: Yes Certification Path: Databricks Certified Data Engineer Associate

Easy Ways to Optimize Your Databricks Costs

In this session, we will explore effective strategies for optimizing costs on the Databricks platform, a leading solution for handling large-scale data workloads. Databricks, known for its open and unified approach, offers several tools and methodologies to ensure users can maximize their return on investment (ROI) while managing expenses efficiently. Key points: Understanding usage with AI/BI tools Organizing costs with tagging Setting up budgets Leveraging System Tables By the end of this session, you will have a comprehensive understanding of how to leverage Databricks' built-in tools for cost optimization, ensuring that their data and AI projects not only deliver value but do so in a cost-effective manner. This session is ideal for data engineers, financial analysts, and decision-makers looking to enhance their organization’s efficiency and financial performance through strategic cost management on Databricks.

From Code Completion to Autonomous Software Engineering Agents

As language models have advanced, they have moved beyond code completion and are beginning to tackle software engineering tasks in a more autonomous, agentic way. However, evaluating agentic capabilities is challenging. To address this, we first introduce SWE-bench, a benchmark built from real GitHub issues that has become the standard for assessing AI’s ability to resolve complex software tasks in large codebases. We will discuss the current state of the field, the limitations of today’s models, and how far we still are from truly autonomous AI developers. Next, we will explore the fundamentals of agents based on hands-on demonstrations with SWE-agent, a simple yet powerful agent framework designed for software engineering but adaptable to a variety of domains. By the end of this session, you will have a clear understanding of the current frontier of agentic AI in software engineering, the challenges ahead and how you can experiment with AI agents in your own workflows.

This course introduces learners to evaluating and governing GenAI (generative artificial intelligence) systems. First, learners will explore the meaning behind and motivation for building evaluation and governance/security systems. Next, the course will connect evaluation and governance systems to the Databricks Data Intelligence Platform. Third, learners will be introduced to a variety of evaluation techniques for specific components and types of applications. Finally, the course will conclude with an analysis of evaluating entire AI systems with respect to performance and cost. Pre-requisites: Familiarity with prompt engineering, and experience with the Databricks Data Intelligence Platform. Additionally, knowledge of retrieval-augmented generation (RAG) techniques including data preparation, embeddings, vectors, and vector databases Labs: Yes Certification Path: Databricks Certified Generative AI Engineer Associate

Getting Started With Lakeflow Connect

Hundreds of customers are already ingesting data with Lakeflow Connect from SQL Server, Salesforce, ServiceNow, Google Analytics, SharePoint, PostgreSQL and more to unlock the full power of their data. Lakeflow Connect introduces built-in, no-code ingestion connectors from SaaS applications, databases and file sources to help unlock data intelligence. In this demo-packed session, you’ll learn how to ingest ready-to-use data for analytics and AI with a few clicks in the UI or a few lines of code. We’ll also demonstrate how Lakeflow Connect is fully integrated with the Databricks Data Intelligence Platform for built-in governance, observability, CI/CD, automated pipeline maintenance and more. Finally, we’ll explain how to use Lakeflow Connect in combination with downstream analytics and AI tools to tackle common business challenges and drive business impact.

Improve AI Training With the First Synthetic Personas Dataset Aligned to Real-World Distributions

A big challenge in LLM development and synthetic data generation is ensuring data quality and diversity. While data incorporating varied perspectives and reasoning traces consistently improves model performance, procuring such data remains impossible for most enterprises. Human-annotated data struggles to scale, while purely LLM-based generation often suffers from distribution clipping and low entropy. In a novel compound AI approach, we combine LLMs with probabilistic graphical models and other tools to generate synthetic personas grounded in real demographic statistics. The approach allows us to address major limitations in bias, licensing, and persona skew of existing methods. We release the first open-source dataset aligned with real-world distributions and show how enterprises can leverage it with Gretel Data Designer (now part of NVIDIA) to bring diversity and quality to model training on the Databricks platform, all while addressing model collapse and data provenance concerns head-on.

This course is designed to introduce three primary machine learning deployment strategies and illustrate the implementation of each strategy on Databricks. Following an exploration of the fundamentals of model deployment, the course delves into batch inference, offering hands-on demonstrations and labs for utilizing a model in batch inference scenarios, along with considerations for performance optimization. The second part of the course comprehensively covers pipeline deployment, while the final segment focuses on real-time deployment. Participants will engage in hands-on demonstrations and labs, deploying models with Model Serving and utilizing the serving endpoint for real-time inference. By mastering deployment strategies for a variety of use cases, learners will gain the practical skills needed to move machine learning models from experimentation to production. This course shows you how to operationalize AI solutions efficiently, whether it's automating decisions in real-time or integrating intelligent insights into data pipelines. Pre-requisites: Familiarity with Databricks workspace and notebooks, familiarity with Delta Lake and Lakehouse, intermediate level knowledge of Python (e.g. common Python libraries for DS/ML like Scikit-Learn, awareness of model deployment strategies) Labs: Yes Certification Path: Databricks Certified Machine Learning Associate

Scaling Sales Excellence: How Databricks Uses Its Own Tech to Train GTM Teams

In this session, discover how Databricks leverages the power of Gen AI, MosaicML, Model Serving and Databricks Apps to revolutionize sales enablement. We’ll showcase how we built an advanced chatbot that equips our go-to-market team with the tools and knowledge needed to excel in customer-facing interactions. This AI-driven solution not only trains our salespeople but also enhances their confidence and effectiveness in demonstrating the transformative potential of Databricks to future customers. Attendees will gain insights into the architecture, development process and practical applications of this innovative approach. The session will conclude with an interactive demo, offering a firsthand look at the chatbot in action. Join us to explore how Databricks is using its own platform to drive sales excellence through cutting-edge AI solutions.

Self-Improving Agents and Agent Evaluation With Arize & Databricks ML Flow

As autonomous agents become increasingly sophisticated and widely deployed, the ability for these agents to evaluate their own performance and continuously self-improve is essential. However, the growing complexity of these agents amplifies potential risks, including exposure to malicious inputs and generation of undesirable outputs. In this talk, we'll explore how to build resilient, self-improving agents. To drive self-improvement effectively, both the agent and the evaluation techniques must simultaneously improve with a continuously iterating feedback loop. Drawing from extensive real-world experiences across numerous productionized use cases, we will demonstrate practical strategies for combining tools from Arize, Databricks MLflow and Mosaic AI to evaluate and improve high-performing agents.

Sponsored by: Qlik | Turning Data into Business Impact: How to Build AI-Ready, Trusted Data Products on Databricks

Explore how to build use case-specific data products designed to power everything from traditional BI dashboards to machine learning and LLM-enabled applications. Gain an understanding of what data products are and why they are essential for delivering AI-ready data that is integrated, timely, high-quality, secure, contextual, and easily consumable. Discover strategies for unlocking business data from source systems to enable analytics and AI use cases, with a deep dive into the three-tiered data product architecture: the Data Product Engineering Plane (where data engineers ingest, integrate, and transform data), the Data Product Management Plane (where teams manage the full lifecycle of data products), and the Data Product Marketplace Plane (where consumers search for and use data products). Discover how a flexible, composable data architecture can support organizations at any stage of their data journey and drive impactful business outcomes.

The Future of Real Time Insights with Databricks and SAP

Tired of waiting on SAP data? Join this session to see how Databricks and SAP make it easy to query business-ready data—no ETL. With Databricks SQL, you’ll get instant scale, automatic optimizations, and built-in governance across all your enterprise analytics data. Fast and AI-powered insights from SAP data are finally possible—and this is how.

The Hitchhiker's Guide to Delta Lake Streaming in an Agentic Universe

As data engineering continues to evolve the shift from batch-oriented to streaming-first has become standard across the enterprise. The reality is these changes have been taking shape for the past decade — we just now also happen to be standing on the precipice of true disruption through automation, the likes of which we could only dream about before. Yes, AI Agents and LLMs are already a large part of our daily lives, but we (as data engineers) are ultimately on the frontlines ensuring that the future of AI is powered by consistent, just-in-time data — and Delta Lake is critical to help us get there. This session will provide you with best practices learned the hard way by one of the authors of The Delta Lake Definitive Guide including: Guide to writing generic applications as components Workflow automation tips and tricks Tips and tricks for Delta clustering (liquid, z-order, and classic) Future facing: Leveraging metadata for agentic pipelines and workflow automation