talk-data.com talk-data.com

Topic

AI/ML

Artificial Intelligence/Machine Learning

data_science algorithms predictive_analytics

9014

tagged

Activity Trend

1532 peak/qtr
2020-Q1 2026-Q1

Activities

9014 activities · Newest first

Automating Engineering with AI - LLMs in Metadata Driven Frameworks

The demand for data engineering keeps growing, but data teams are bored by repetitive tasks, stumped by growing complexity and endlessly harassed by an unrelenting need for speed. What if AI could take the heavy lifting off your hands? What if we make the move away from code-generation and into config-generation — how much more could we achieve? In this session, we’ll explore how AI is revolutionizing data engineering, turning pain points into innovation. Whether you’re grappling with manual schema generation or struggling to ensure data quality, this session offers practical solutions to help you work smarter, not harder. You’ll walk away with a good idea of where AI is going to disrupt the data engineering workload, some good tips around how to accelerate your own workflows and an impending sense of doom around the future of the industry!

Sponsored by: Airbyte | How Data Movement Powers GenAI

In this session, discover how effective data movement is foundational to successful GenAI implementations. As organizations rush to adopt AI technologies, many struggle with the infrastructure needed to manage the massive influx of unstructured data these systems require. Jim Kutz, Head of Data at Airbyte, draws from 20+ years of experience leading data teams at companies like Grafana, CircleCI, and BlackRock to demonstrate how modern data movement architectures can enable secure, compliant GenAI applications. Learn practical approaches to data sovereignty, metadata management, and privacy controls that transform data governance into an enabler for AI innovation. This session will explore how you can securely leverage your most valuable asset—first-party data—for GenAI applications while maintaining complete control over sensitive information. Walk away with actionable strategies for building an AI-ready data infrastructure that balances innovation with governance requirements.

Sponsored by: IBM | How to leverage unstructured data to build more accurate, trustworthy AI agents

As AI adoption accelerates, unstructured data has emerged as a critical—yet often overlooked—asset for building accurate, trustworthy AI agents. But preparing and governing this data at scale remains a challenge. Traditional data integration and RAG approaches fall short. In this session, discover how IBM enables AI agents grounded in governed, high-quality unstructured data. Learn how our unified data platform streamlines integration across batch, streaming, replication, and unstructured sources—while accelerating data intelligence through built-in governance, quality, lineage, and data sharing. But governance doesn’t stop at data. We’ll explore how AI governance extends oversight to the models and agents themselves. Walk away with practical strategies to simplify your stack, strengthen trust in AI outputs, and deliver AI-ready data at scale.

Founder discussion: Matei on UC, Data Intelligence and AI Governance

Matei is a legend of open source: he started the Apache Spark project in 2009, co-founded Databricks, and worked on other widely used data and AI software, including MLflow, Delta Lake, and Dolly. His most recent research is about combining large language models (LLMs) with external data sources, such as search systems, and improving their efficiency and result quality. This will be a conversation coverering the latest and greatest of UC, Data Intelligence, AI Governance, and more.

Summit Live: Data Sharing and Collaboration

Hear more on the latest in data collaboration, which is paramount to unlocking business success. Delta Sharing is an open-source approach to share and govern data, AI models, dashboards, and notebooks across clouds and platforms - without the costly need for replication. Databricks Clean Rooms provide safe hosting environments for data collaboration across companies, also without the costly duplication of data. And the Databricks Marketplace is the open marketplace for all your data, analytics, and AI needs.

Nora Szentivanyi is joined by Michael Hanson and Raphael Brun-Aguerre to discuss key takeaways from the May CPI reports and the outlook for the rest of the year. While May inflation surprised softer in both the US and Euro area, pass-through from tariffs is still expected to push US core inflation higher with core PCE rising to 3.4% on a 4q/4q basis. At the same time, the Euro area’s path to 2% core HICP looks more assured after the unwind of the April Easter effect. We still think US trade policy will be net disinflationary for Europe and see a positive gap opening up between core inflation in the US and the rest of the world.

This podcast was recorded on 12 June 2025.

This communication is provided for information purposes only. Institutional clients can view the related reports at

https://www.jpmm.com/research/content/GPS-5006121-0

https://www.jpmm.com/research/content/GPS-4991721-0

https://www.jpmm.com/research/content/GPS-5000178-0

https://www.jpmm.com/research/content/GPS-4993783-0

for more information; please visit www.jpmm.com/research/disclosures for important disclosures.

© 2025 JPMorgan Chase & Co. All rights reserved. This material or any portion hereof may not be reprinted, sold or redistributed without the written consent of J.P. Morgan. It is strictly prohibited to use or share without prior written consent from J.P. Morgan any research material received from J.P. Morgan or an authorized third-party (“J.P. Morgan Data”) in any third-party artificial intelligence (“AI”) systems or models when such J.P. Morgan Data is accessible by a third-party. It is permissible to use J.P. Morgan Data for internal business purposes only in an AI system or model that protects the confidentiality of J.P. Morgan Data so as to prevent any and all access to or use of such J.P. Morgan Data by any third-party.

Zhamak Dehghani (creator of Data Mesh, CEO of NextData) joins me to chat about what she thinks is next in data - autonomous data products, decentralized data and AI, and much more.

Zhamak is one of the people I most respect in our industry. She's a once-in-a generation phenomenon who will change the trajectory of our industry.

Try Keboola 👉 https://www.keboola.com/mcp?utm_campaign=FY25_Q2_RoW_Marketing_Events_Webinar_Keboola_MCP_Server_Launch_June&utm_source=Youtube&utm_medium=Avery Today, we'll create an entire data pipeline from scratch without writing a single line of code! Using Keboola MCP server and ClaudeAI, we’ll extract data from my FindADataJob.com RSS feed, transform it, load it into Google BigQuery, and visualize it with Streamlit. This is the future of data engineering! Keboola MCP Integration: https://mcp.connection.us-east4.gcp.keboola.com/sse I Analyzed Data Analyst Jobs to Find Out What Skills You ACTUALLY Need https://www.youtube.com/watch?v=lo3VU1srV1E&t=212s 💌 Join 10k+ aspiring data analysts & get my tips in your inbox weekly 👉 https://www.datacareerjumpstart.com/newsletter 🆘 Feeling stuck in your data journey? Come to my next free "How to Land Your First Data Job" training 👉 https://www.datacareerjumpstart.com/training 👩‍💻 Want to land a data job in less than 90 days? 👉 https://www.datacareerjumpstart.com/daa 👔 Ace The Interview with Confidence 👉 https://www.datacareerjumpstart.com/interviewsimulator ⌚ TIMESTAMPS 00:00 - Introduction 00:54 - Definition of Basic Data Engineering Terms 02:26 - Keboola MCP and Its Capabilities 07:48 - Extracting Data from RSS Feed 12:43 - Transforming and Cleaning the Data 19:19 - Aggregating and Analyzing Data 23:19 - Scheduling and Automating the Pipeline 25:04 - Visualizing Data with Streamlit

🔗 CONNECT WITH AVERY 🎥 YouTube Channel: https://www.youtube.com/@averysmith 🤝 LinkedIn: https://www.linkedin.com/in/averyjsmith/ 📸 Instagram: https://instagram.com/datacareerjumpstart 🎵 TikTok: https://www.tiktok.com/@verydata 💻 Website: https://www.datacareerjumpstart.com/ Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

In this episode of Hub & Spoken, Jason Foster, CEO & Founder of Cynozure, speaks with Lisa Allen, Director of Data at The Pensions Regulator (TPR), about the role of data in protecting savers and shaping a more resilient pensions industry. Lisa shares the story behind TPR's new data strategy and how it's helping to modernise an ecosystem that oversees more than £2 trillion in savings across 38 million members. Drawing on her experience at organisations including the Ordnance Survey and the Open Data Institute, she explains why strong data foundations, industry collaboration, and adaptive thinking are essential to success. The conversation explores how the regulator is building a data marketplace, adopting open standards, and applying AI to enable risk-based regulation, while reducing unnecessary burdens on the industry. Lisa also discusses the value of working transparently, co-designing with stakeholders, and staying agile in the face of rapid change. This episode is a must-listen for business leaders, regulators, and data professionals thinking about strategy, innovation, and sector-wide impact. ****    Cynozure is a leading data, analytics and AI company that helps organisations to reach their data potential. It works with clients on data and AI strategy, data management, data architecture and engineering, analytics and AI, data culture and literacy, and data leadership. The company was named one of The Sunday Times' fastest-growing private companies in both 2022 and 2023 and recognised as The Best Place to Work in Data by DataIQ in 2023 and 2024. Cynozure is a certified B Corporation. 

AI/BI Genie: A Look Under the Hood of Everyone's Friendly, Neighborhood GenAI Product

Go beyond the user interface and explore the cutting-edge technology driving AI/BI Genie. This session breaks down the AI/BI Genie architecture, showcasing how LLMs, retrieval-augmented generation (RAG) and finely tuned knowledge bases work together to deliver fast, accurate responses. We’ll also explore how AI agents orchestrate workflows, optimize query performance and continuously refine their understanding. Ideal for those who want to geek out about the tech stack behind Genie, this session offers a rare look at the magic under the hood.

AI-Powered Profits: Smarter Order and Inventory Management

Join this session to hear from two incredible companies, Xylem and Joby Aviation. Xylem shares their successful journey from fragmented legacy systems to a unified Enterprise Data Platform, demonstrating how they integrated complex ERP data across four business segments to achieve breakthrough improvements in parts management and operational efficiency. Following Xylem's story, learn how Joby Aviation leveraged Databricks to automate and accelerate flight test data checks, cutting processing times from over two hours to under thirty minutes. This session highlights how advanced cloud tools empower engineers to quickly build and run custom data checks, improving both speed and safety in flight test operations.

Beyond AI Accuracy: Building Trustworthy and Responsible AI Application Through Mosaic AI Framework

Generic LLM metrics are useless until it meets your business needs.In this session we will dive deep into creating bespoke custom state-of-the-art AI metrics that matters to you. Discuss best practices on LLM evaluation strategies, when to use LLM judge vs. statistical metrics and many more. Through a live demo using Mosaic AI Framework, we will showcase: How you can build your own custom AI metric tailored to your needs for your GenAI application Implement autonomous AI evaluation suite for complex, multi-agent systems Generate ground truth data at scale and production monitoring strategies Drawing from extensive experience on working with customers on real-world use cases, we will share actionable insights on building a robust AI evaluation framework By the end of this session, you'll be equipped to create AI solutions that are not only powerful but also relevant to your organizations needs. Join us to transform your AI strategy and make a tangible impact on your business!

Bridging BI Tools: Deep Dive Into AI/BI Dashboards for Power BI Practitioners

In the rapidly-evolving field of data analytics, (AI/BI) dashboards and Power BI stand out as two formidable approaches, each offering unique strengths and catering to specific use cases. Power BI has earned its reputation for delivering user-friendly, highly customisable visualisations and reports for data analysis. On the other hand, AI/BI dashboards have gained good traction due to their seamless integration with the Databricks platform, making them an attractive option for data practitioners. This session will provide a comparison of these two tools, highlighting their respective features, strengths and potential limitations. Understanding the nuances between these tools is crucial for organizations aiming to make informed decisions about their data analytics strategy. This session will equip participants with the knowledge needed to select the most appropriate tool or combination of tools to meet their data analysis requirements and drive data-informed decision-making processes.

Building Responsible AI Agents on Databricks

This presentation explores how Databricks' Data Intelligence Platform supports the development and deployment of responsible AI in credit decisioning, ensuring fairness, transparency and regulatory compliance. Key areas include bias and fairness monitoring using Lakehouse Monitoring to track demographic metrics and automated alerts for fairness thresholds. Transparency and explainability are enhanced through the Mosaic AI Agent Framework, SHAP values and LIME for feature importance auditing. Regulatory alignment is achieved via Unity Catalog for data lineage and AIBI dashboards for compliance monitoring. Additionally, LLM reliability and security are ensured through AI guardrails and synthetic datasets to validate model outputs and prevent discriminatory patterns. The platform integrates real-time SME and user feedback via Databricks Apps and AI/BI Genie Space.

Databricks in Action: Azure’s Blueprint for Secure and Cost-Effective Operations

Erste Group's transition to Azure Databricks marked a significant upgrade from a legacy system to a secure, scalable and cost-effective cloud platform. The initial architecture, characterized by a complex hub-spoke design and stringent compliance regulations, was replaced with a more efficient solution. The phased migration addressed high network costs and operational inefficiencies, resulting in a 60% reduction in networking costs and a 30% reduction in compute costs for the central team. This transformation, completed over a year, now supports real-time analytics, advanced machine learning and GenAI while ensuring compliance with European regulations. The new platform features a Unity Catalogue, separate data catalogs and dedicated workspaces, demonstrating a successful shift to a cloud-based machine learning environment with significant improvements in cost, performance and security.

Healthcare Interoperability: End-to-End Streaming FHIR Pipelines With Databricks & Redox

Redox & Databricks direct integration can streamline your interoperability workflows from responding in record time to preauthorization requests to letting attending physicians know about a change in risk for sepsis and readmission in near real time from ADTs. Data engineers will learn how to create fully-streaming ETL pipelines for ingesting, parsing and acting on insights from Redox FHIR bundles delivered directly to Unity Catalog volumes. Once available in the Lakehouse, AI/BI Dashboards and Agentic Frameworks help write FHIR messages back to Redox for direct push down to EMR systems. Parsing FHIR bundle resources has never been easier with SQL combined with the new VARIANT data type in Delta and streaming table creation against Serverless DBSQL Warehouses. We'll also use Databricks accelerators dbignite and redoxwrite for writing and posting FHIR bundles back to Redox integrated EMRs and we'll extend AI/BI with Unity Catalog SQL UDFs and the Redox API for use in Genie.

How Navy Federal's Enterprise Data Ecosystem Leverages Unity Catalog for Data + AI Governance

Navy Federal Credit Union has 200+ enterprise data sources in the enterprise data lake. These data assets are used for training 100+ machine learning models and hydrating a semantic layer for serving, at an average 4,000 business users daily across the credit union. The only option for extracting data from analytic semantic layer was to allow consuming application to access it via an already-overloaded cloud data warehouse. Visualizing data lineage for 1,000 + data pipelines and associated metadata is impossible and understanding the granular cost for running data pipelines is a challenge. Implementing Unity Catalog opened alternate path for accessing analytic semantic data from lake. It also opened the doors to remove duplicate data assets stored across multiple lakes which will save hundred thousands of dollars in data engineering efforts, compute and storage costs.

Leveling Up Gaming Analytics: How Supercell Evolved Player Experiences With Snowplow and Databricks

In the competitive gaming industry, understanding player behavior is key to delivering engaging experiences. Supercell, creators of Clash of Clans and Brawl Stars, faced challenges with fragmented data and limited visibility into user journeys. To address this, they partnered with Snowplow and Databricks to build a scalable, privacy-compliant data platform for real-time insights. By leveraging Snowplow’s behavioral data collection and Databricks’ Lakehouse architecture, Supercell achieved: Cross-platform data unification: A unified view of player actions across web, mobile and in-game Real-time analytics: Streaming event data into Delta Lake for dynamic game balancing and engagement Scalable infrastructure: Supporting terabytes of data during launches and live events AI & ML use cases: Churn prediction and personalized in-game recommendations This session explores Supercell’s data journey and AI-driven player engagement strategies.