talk-data.com talk-data.com

Topic

AI/ML

Artificial Intelligence/Machine Learning

data_science algorithms predictive_analytics

9014

tagged

Activity Trend

1532 peak/qtr
2020-Q1 2026-Q1

Activities

9014 activities · Newest first

Small models don’t need more parameters, they need better data. I’ll share how my team built the xLAM family of small action models that punch far above their weight, enabling fast and accurate AI agents deployable anywhere. We’ll explore why high-quality, task-specific data is the ultimate performance driver and how it turns small models into powerful, real-world solutions. You’ll leave with a practical playbook for creating small models that are fast, efficient, and ready to deploy from the edge to the enterprise.

For years, data engineering was a story of predictable "pipelines": move data from point A to point B. But AI just hit the reset button on our entire field. Now, we're all staring into the void, wondering what's next. While the fundamentals haven't changed, data remains challenging in the traditional areas of data governance, data management, and data modeling, which still present challenges. Everything else is up for grabs. This talk will cut through the noise and explore the future of data engineering in an AI-driven world. We'll examine how team structures will evolve, why agentic workflows and real-time systems are becoming non-negotiable, and how our focus must shift from building dashboards and analytics to architecting for automated action. The reset button has been pushed. It's time for us to invent the future of our industry.

Brought to You By: •⁠ Statsig ⁠ — ⁠ The unified platform for flags, analytics, experiments, and more. Companies like Graphite, Notion, and Brex rely on Statsig to measure the impact of the pace they ship. Get a 30-day enterprise trial here. •⁠ Linear – The system for modern product development. Linear is a heavy user of Swift: they just redesigned their native iOS app using their own take on Apple’s Liquid Glass design language. The new app is about speed and performance – just like Linear is. Check it out. — Chris Lattner is one of the most influential engineers of the past two decades. He created the LLVM compiler infrastructure and the Swift programming language – and Swift opened iOS development to a broader group of engineers. With Mojo, he’s now aiming to do the same for AI, by lowering the barrier to programming AI applications. I sat down with Chris in San Francisco, to talk language design, lessons on designing Swift and Mojo, and – of course! – compilers. It’s hard to find someone who is as enthusiastic and knowledgeable about compilers as Chris is! We also discussed why experts often resist change even when current tools slow them down, what he learned about AI and hardware from his time across both large and small engineering teams, and why compiler engineering remains one of the best ways to understand how software really works. — Timestamps (00:00) Intro (02:35) Compilers in the early 2000s (04:48) Why Chris built LLVM (08:24) GCC vs. LLVM (09:47) LLVM at Apple  (19:25) How Chris got support to go open source at Apple (20:28) The story of Swift  (24:32) The process for designing a language  (31:00) Learnings from launching Swift  (35:48) Swift Playgrounds: making coding accessible (40:23) What Swift solved and the technical debt it created (47:28) AI learnings from Google and Tesla  (51:23) SiFive: learning about hardware engineering (52:24) Mojo’s origin story (57:15) Modular’s bet on a two-level stack (1:01:49) Compiler shortcomings (1:09:11) Getting started with Mojo  (1:15:44) How big is Modular, as a company? (1:19:00) AI coding tools the Modular team uses  (1:22:59) What kind of software engineers Modular hires  (1:25:22) A programming language for LLMs? No thanks (1:29:06) Why you should study and understand compilers — The Pragmatic Engineer deepdives relevant for this episode: •⁠ AI Engineering in the real world • The AI Engineering stack • Uber's crazy YOLO app rewrite, from the front seat • Python, Go, Rust, TypeScript and AI with Armin Ronacher • Microsoft’s developer tools roots — Production and marketing by ⁠⁠⁠⁠⁠⁠⁠⁠https://penname.co/⁠⁠⁠⁠⁠⁠⁠⁠. For inquiries about sponsoring the podcast, email [email protected].

Get full access to The Pragmatic Engineer at newsletter.pragmaticengineer.com/subscribe

In this episode of Data Skeptic's Recommender Systems series, Kyle sits down with Aditya Chichani, a senior machine learning engineer at Walmart, to explore the darker side of recommendation algorithms. The conversation centers on shilling attacks—a form of manipulation where malicious actors create multiple fake profiles to game recommender systems, either to promote specific items or sabotage competitors. Aditya, who researched these attacks during his undergraduate studies at SPIT before completing his master's in computer science with a data science specialization at UC Berkeley, explains how these vulnerabilities emerge particularly in collaborative filtering systems. From promoting a friend's ska band on Spotify to inflating product ratings on e-commerce platforms, shilling attacks represent a significant threat in an industry where approximately 4% of reviews are fake, translating to $800 billion in annual sales in the US alone. The discussion delves deep into collaborative filtering, explaining both user-user and item-item approaches that create similarity matrices to predict user preferences. However, these systems face various shilling attacks of increasing sophistication: random attacks use minimal information with average ratings, while segmented attacks strategically target popular items (like Taylor Swift albums) to build credibility before promoting target items. Bandwagon attacks focus on highly popular items to connect with genuine users, and average attacks leverage item rating knowledge to appear authentic. User-user collaborative filtering proves particularly vulnerable, requiring as few as 500 fake profiles to impact recommendations, while item-item filtering demands significantly more resources. Aditya addresses detection through machine learning techniques that analyze behavioral patterns using methods like PCA to identify profiles with unusually high correlation and suspicious rating consistency. However, this remains an evolving challenge as attackers adapt strategies, now using large language models to generate more authentic-seeming fake reviews. His research with the MovieLens dataset tested detection algorithms against synthetic attacks, highlighting how these concerns extend to modern e-commerce systems. While companies rarely share attack and detection data publicly to avoid giving attackers advantages, academic research continues advancing both offensive and defensive strategies in recommender systems security.

Sujay Dutta and Sidd Rajagopal, authors of "Data as the Fourth Pillar," join the show to make the compelling case that for C-suite leaders obsessed with AI, data must be elevated to the same level as people, process, and technology. They provide a practical playbook for Chief Data Officers (CDOs) to escape the "cost center" trap by focusing on the "demand side" (business value) instead of just the "supply side" (technology). They also introduce frameworks like "Data Intensity" and "Total Addressable Value (TAV)" for data. We also tackle the reality of AI "slopware" and the "Great Pacific garbage patch" of junk data , explaining how to build the critical "context" (or "Data Intelligence Layer") that most GenAI projects are missing. Finally, they explain why the CDO must report directly to the CEO to play "offense," not defense.

AI in healthcare is advancing fast—supporting clinicians, improving patient outcomes, and reshaping how care is delivered. But making these solutions work in the real world demands tight collaboration across disciplines. This roundtable brings together leaders from hospitals, academia, and industry to share what’s driving real progress today: smarter workflows, better data use, and products built with clinicians in mind. A grounded look at where medical AI is already making a difference and what’s coming next. We’re fortunate to have a diverse lineup spanning academia, clinical practice, and industry innovation: Arlindo Oliveira, Daniel Rodrigues, Nuno André da Silva, and Suelen Cristina.

Start with a dataset in Motherduck and build a production-ready analytics app using Omni’s semantic model and APIs. We’ll cover practical data modeling techniques, share lessons learned from building AI features, and walk through how to give AI the context it needs to answer questions accurately. You’ll leave with a working app and the skills to build your next one.

Get ready to ingest data and transform it into ready-to-use datasets using Python. We'll share a no-nonsense approach for developing and testing data connectors and transformations locally. Moving to production will be a matter of tweaking your configuration. In the end, you get a simple dataset interface to build dashboards & applications, train predictive models, or create agentic workflows. This workshop includes two guest speakers. Brian teach how to leverage AI IDEs, MCP servers and LLM scaffoldings to create ingestion pipelines. Elvis will show how to interactively define transformations and data quality checks.

Learn to build an autonomous data science agent from scratch using open-source models and modern AI tools. This hands-on workshop will guide you through implementing a ReAct-based agent that can perform end-to-end data analysis tasks, from data cleaning to model training, using natural language reasoning and Python code generation. We'll explore the CodeAct framework, where the agent "thinks" through problems and then generates executable Python code as actions. You'll discover how to safely execute AI-generated code using Together Code Interpreter, creating a modular and maintainable system that can handle complex analytical workflows. Perfect for data scientists, ML engineers, and developers interested in agentic AI, this workshop combines practical implementation with best practices for building reasoning-driven AI assistants. By the end, you'll have a working data science agent and understand the fundamentals of agent architecture design. What you'll learn: ReAct framework implementation Safe code execution in AI systems Agent evaluation and optimization techniques Building transparent, "hackable" AI agents No advanced AI background required, just familiarity with Python and data science concepts.