talk-data.com talk-data.com

Topic

AI/ML

Artificial Intelligence/Machine Learning

data_science algorithms predictive_analytics

9014

tagged

Activity Trend

1532 peak/qtr
2020-Q1 2026-Q1

Activities

9014 activities · Newest first

How does a worm know what’s good for dinner? In this episode, we uncover how C. elegans can distinguish between helpful and harmful microbes — and it’s all down to polyamines. These microbe-produced metabolites act like scent beacons, guiding worms to nutritious bacteria like E. coli while steering them away from pathogens.

We explore:

How chemosensory neurons detect polyamines like cadaverine and putrescine Why ADF and AWC neurons are tuned to sniff out E. coli-enriched scents How the AIB interneuron acts as a decision hub for foraging Why worms lose interest in mutant E. coli strains lacking polyamines What this tells us about host-microbe interactions and innate sensory coding

📖 Based on the research article: “Chemosensory detection of polyamine metabolites guides C. elegans to nutritive microbes” Benjamin Brissette, Lia Ficaro, Chenguang Li, et al. Published in Science Advances (2024) 🔗 https://doi.org/10.1126/sciadv.adj4387

🎧 Subscribe to the WOrM Podcast for more full-organism discoveries in behaviour, sensory biology, and microbe-host interactions.

This podcast is generated with artificial intelligence and curated by Veeren. If you’d like your publication featured on the show, please get in touch.

📩 More info: 🔗 ⁠⁠www.veerenchauhan.com⁠⁠ 📧 [email protected]

Cognee organizes your data into AI memory. It builds structured AI memory by transforming raw data into a modular, queryable knowledge graph powered by embeddings. Like any complex system, it depends on many hyperparameters that shape performance in subtle ways. This talk shows how systematic tuning can improve AI memory, what current evaluation methods reveal (and miss), and why future progress will depend as much on better evaluation and optimization as on new architectures.

From Manual to LLMs: Scaling Product Categorization

How to use LLMs to categorize hundreds of thousands of products into 1,000 categories at scale? Learn about our journey from manual/rule-based methods, via fine-tuned semantic models, to a robust multi-step process which uses embeddings and LLMs via the OpenAI APIs. This talk offers data scientists and AI practitioners learnings and best practices for putting such a complex LLM-based system into production. This includes prompt development, balancing cost vs. accuracy via model selection, testing mult-case vs. single-case prompts, and saving costs by using the OpenAI Batch API and a smart early-stopping approach. We also describe our automation and monitoring in a PySpark environment.

Navigating healthcare scientific knowledge:building AI agents for accurate biomedical data retrieval

With a focus on healthcare applications where accuracy is non negotiable, this talk highlights challenges and delivers practical insights on building AI agents which query complex biological and scientific data to answer sophisticated questions. Drawing from our experience developing Owkin-K Navigator, a free-to-use AI co-pilot for biological research, I'll share hard-won lessons about combining natural language processing with SQL querying and vector database retrieval to navigate large biomedical knowledge sources, addressing challenges of preventing hallucinations and ensuring proper source attribution. This session is ideal for data scientists, ML engineers, and anyone interested in applying python and LLM ecosystem to the healthcare domain.

Beyond Benchmarks: Practical Evaluation Strategies for Compound AI Systems

Evaluating large language models (LLMs) in real-world applications goes far beyond standard benchmarks. When LLMs are embedded in complex pipelines, choosing the right models, prompts, and parameters becomes an ongoing challenge.

In this talk, we will present a practical, human-in-the-loop evaluation framework that enables systematic improvement of LLM-powered systems based on expert feedback. By combining domain expert insights and automated evaluation methods, it is possible to iteratively refine these systems while building transparency and trust.

This talk will be valuable for anyone who wants to ensure their LLM applications can handle real-world complexity - not just perform well on generic benchmarks.

How We Automate Chaos: Agentic AI and Community Ops at PyCon DE & PyData

Using AI agents and automation, PyCon DE & PyData volunteers have transformed chaos into streamlined conference ops. From YAML files to LLM-powered assistants, they automate speaker logistics, FAQs, video processing, and more while keeping humans focused on creativity. This case study reveals practical lessons on making AI work in real-world scenarios: structured workflows, validation, and clear context beat hype. Live demos and open-source tools included.

AI, data, numbers—without uploads. Hash, mask, and redact PII, then run data analytics locally for time-saving and privacy. In this episode, we build a No-Upload AI Analyst that keeps your PII safe: HMAC SHA-256 hashing, masking, and redaction using policy presets and client-side transforms. We’ll: • Reframe the problem (insights > risk) • Set four hard constraints (no uploads, local preferred, policy presets, human-readable audit) • Use rules-first privacy + schema semantics • Walk the 5-step workflow (paste headers → pick preset → set secret → transform → analyze) • Show real-world cases (HIPAA/HITECH-aware analytics, FERPA contexts, product analytics) • Share a checklist + quiz + local Streamlit approach Perfect for data teams in healthcare, finance, education, and privacy-sensitive orgs. Key takeaways Stop uploading customer data. Transform it client-side first.Use HMAC hashing to keep joins without exposing raw emails/IDs.Mask for human-readable UI; redact when you don’t need the field.Ship a data-handling report with every analysis.Run the app locally for maximum privacy.Affiliate note: I record with Riverside (affiliate) and host on RSS.com (affiliate). Links in show notes. Links Blog version: (Free): https://mukundansankar.substack.com/p/the-no-upload-ai-analyst-v4-secure Join the Discussion (comments hub): https://mukundansankar.substack.com/notesTools I use for my Podcast and Affiliate PartnersRecording Partner: Riverside → Sign up here (affiliate)Host Your Podcast: RSS.com (affiliate )Research Tools: Sider.ai (affiliate)Sourcetable AI: Join Here(affiliate)🔗 Connect with Me:Free Email NewsletterWebsite: Data & AI with MukundanGitHub: https://github.com/mukund14Twitter/X: @sankarmukund475LinkedIn: Mukundan SankarYouTube: Subscribe

Here are 5 exciting and unique data analyst projects that will build your skills and impress hiring managers! These range from beginner to advanced and are designed to enhance your data storytelling abilities. ✨ Try Julius today at https://landadatajob.com/Julius-YT Where I Go To Find Datasets (as a data analyst) 👉 https://youtu.be/DHfuvMyBofE?si=ABsdUfzgG7Nsbl89 💌 Join 10k+ aspiring data analysts & get my tips in your inbox weekly 👉 https://www.datacareerjumpstart.com/newsletter 🆘 Feeling stuck in your data journey? Come to my next free "How to Land Your First Data Job" training 👉 https://www.datacareerjumpstart.com/training 👩‍💻 Want to land a data job in less than 90 days? 👉 https://www.datacareerjumpstart.com/daa 👔 Ace The Interview with Confidence 👉 https://www.datacareerjumpstart.com/interviewsimulator

⌚ TIMESTAMPS 00:00 - Introduction 00:24 - Project 1: Stock Price Analysis 03:46 - Project 2: Real Estate Data Analysis (SQL) 07:52 - Project 3: Personal Finance Dashboard (Tableau or Power BI) 11:20 - Project 4: Pokemon Analysis (Python) 14:16 - Project 5: Football Data Analysis (any tool)

🔗 CONNECT WITH AVERY 🎥 YouTube Channel: https://www.youtube.com/@averysmith 🤝 LinkedIn: https://www.linkedin.com/in/averyjsmith/ 📸 Instagram: https://instagram.com/datacareerjumpstart 🎵 TikTok: https://www.tiktok.com/@verydata 💻 Website: https://www.datacareerjumpstart.com/ Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

Most AI Agents Are Useless. Let’s Fix That

AI agents are having a moment, but most of them are little more than fragile prototypes that break under pressure. Together, we’ll explore why so many agentic systems fail in practice, and how to fix that with real engineering principles. In this talk, you’ll learn how to build agents that are modular, observable, and ready for production. If you’re tired of LLM demos that don’t deliver, this talk is your blueprint for building agents that actually work.

Probably Fun: Games to teach Machine Learning

In this tutorial, you will play several games that can be used to teach machine learning concepts. Each game can be played in big and small groups. Some involve hands- on material such as cards, some others involve electronic app. All games contain one or more concepts from Machine Learning.

As an outcome, you will take away multiple ideas that make complex topics more understandable – and enjoyable. By doing so, we would like to demonstrate that Machine Learning does not require computers, but the core ideas can be exemplified in a clear and memorable way without. We also would like to demonstrate that gamification is not limited to online quiz questions, but offers ways for learners to bond.

We will bring a set of carefully selected games that have been proven in a big classroom setting and contain useful abstractions of linear models, decision trees, LLMs and several other Machine Learning concepts. We also believe that it is probably fun to participate in this tutorial.

Training Specialized Language Models with Less Data: An End-to-End Practical Guide

Small Language Models (SLMs) offer an efficient and cost-effective alternative to LLMs—especially when latency, privacy, inference costs or deployment constraints matter. However, training them typically requires large labeled datasets and is time-consuming, even if it isn't your first rodeo.

This talk presents an end-to-end approach for curating high-quality synthetic data using LLMs to train domain-specific SLMs. Using a real-world use case, we’ll demonstrate how to reduce manual labeling time, cut costs, and maintain performance—making SLMs viable for production applications.

Whether you are a seasoned Machine Learning Engineer or a person just getting starting with building AI features, you will come away with the inspiration to build more performant, secure and environmentally-friendly AI systems.

Summary In this episode of the Data Engineering Podcast Serge Gershkovich, head of product at SQL DBM, talks about the socio-technical aspects of data modeling. Serge shares his background in data modeling and highlights its importance as a collaborative process between business stakeholders and data teams. He debunks common misconceptions that data modeling is optional or secondary, emphasizing its crucial role in ensuring alignment between business requirements and data structures. The conversation covers challenges in complex environments, the impact of technical decisions on data strategy, and the evolving role of AI in data management. Serge stresses the need for business stakeholders' involvement in data initiatives and a systematic approach to data modeling, warning against relying solely on technical expertise without considering business alignment.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Enterprises today face an enormous challenge: they’re investing billions into Snowflake and Databricks, but without strong foundations, those investments risk becoming fragmented, expensive, and hard to govern. And that’s especially evident in large, complex enterprise data environments. That’s why companies like DirecTV and Pfizer rely on SqlDBM. Data modeling may be one of the most traditional practices in IT, but it remains the backbone of enterprise data strategy. In today’s cloud era, that backbone needs a modern approach built natively for the cloud, with direct connections to the very platforms driving your business forward. Without strong modeling, data management becomes chaotic, analytics lose trust, and AI initiatives fail to scale. SqlDBM ensures enterprises don’t just move to the cloud—they maximize their ROI by creating governed, scalable, and business-aligned data environments. If global enterprises are using SqlDBM to tackle the biggest challenges in data management, analytics, and AI, isn’t it worth exploring what it can do for yours? Visit dataengineeringpodcast.com/sqldbm to learn more.Your host is Tobias Macey and today I'm interviewing Serge Gershkovich about how and why data modeling is a sociotechnical endeavorInterview IntroductionHow did you get involved in the area of data management?Can you start by describing the activities that you think of when someone says the term "data modeling"?What are the main groupings of incomplete or inaccurate definitions that you typically encounter in conversation on the topic?How do those conceptions of the problem lead to challenges and bottlenecks in execution?Data modeling is often associated with data warehouse design, but it also extends to source systems and unstructured/semi-structured assets. How does the inclusion of other data localities help in the overall success of a data/domain modeling effort?Another aspect of data modeling that often consumes a substantial amount of debate is which pattern to adhere to (star/snowflake, data vault, one big table, anchor modeling, etc.). What are some of the ways that you have found effective to remove that as a stumbling block when first developing an organizational domain representation?While the overall purpose of data modeling is to provide a digital representation of the business processes, there are inevitable technical decisions to be made. What are the most significant ways that the underlying technical systems can help or hinder the goals of building a digital twin of the business?What impact (positive and negative) are you seeing from the introduction of LLMs into the workflow of data modeling?How does tool use (e.g. MCP connection to warehouse/lakehouse) help when developing the transformation logic for achieving a given domain representation? What are the most interesting, innovative, or unexpected ways that you have seen organizations address the data modeling lifecycle?What are the most interesting, unexpected, or challenging lessons that you have learned while working with organizations implementing a data modeling effort?What are the overall trends in the ecosystem that you are monitoring related to data modeling practices?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Links sqlDBMSAPJoe ReisERD == Entity Relation DiagramMaster Data ManagementdbtData ContractsData Modeling With Snowflake book by Serge (affiliate link)Type 2 DimensionData VaultStar SchemaAnchor ModelingRalph KimballBill InmonSixth Normal FormMCP == Model Context ProtocolThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Beyond the Black Box: Interpreting ML models with SHAP

As machine learning models become more accurate and complex, explainability remains essential. Explainability helps not just with trust and transparency but also with generating actionable insights and guiding decision-making. One way of interpreting the model outputs is using SHapley Additive exPlanations (SHAP). In this talk, I will go through the concept of Shapley values and its mathematical intuition and then walk through a few real-world examples for different ML models. Attendees will gain a practical understanding of SHAP's strengths and limitations and how to use it to explain model predictions in their projects effectively.

AI-Ready Data in Action: Powering Smarter Agents

This hands-on workshop focuses on what AI engineers do most often: making data AI-ready and turning it into production-useful applications. Together with dltHub and LanceDB, you’ll walk through an end-to-end workflow: collecting and preparing real-world data with best practices, managing it in LanceDB, and powering AI applications with search, filters, hybrid retrieval, and lightweight agents. By the end, you’ll know how to move from raw data to functional, production-ready AI setups without the usual friction. We will touch upon multi-modal data and going to production with this end-to-end use case.

Scaling Python: An End-to-End ML Pipeline for ISS Anomaly Detection with Kubeflow and MLFlow

Building and deploying scalable, reproducible machine learning pipelines can be challenging, especially when working with orchestration tools like Slurm or Kubernetes. In this talk, we demonstrate how to create an end-to-end ML pipeline for anomaly detection in International Space Station (ISS) telemetry data using only Python code.

We show how Kubeflow Pipelines, MLFlow, and other open-source tools enable the seamless orchestration of critical steps: distributed preprocessing with Dask, hyperparameter optimization with Katib, distributed training with PyTorch Operator, experiment tracking and monitoring with MLFlow, and scalable model serving with KServe. All these steps are integrated into a holistic Kubeflow pipeline.

By leveraging Kubeflow's Python SDK, we simplify the complexities of Kubernetes configurations while achieving scalable, maintainable, and reproducible pipelines. This session provides practical insights, real-world challenges, and best practices, demonstrating how Python-first workflows empower data scientists to focus on machine learning development rather than infrastructure.

What’s Really Going On in Your Model? A Python Guide to Explainable AI

As machine learning models become more complex, understanding why they make certain predictions is becoming just as important as the predictions themselves. Whether you're dealing with business stakeholders, regulators, or just debugging unexpected results, the ability to explain your model is no longer optional , it's essential.

In this talk, we'll walk through practical tools in the Python ecosystem that help bring transparency to your models, including SHAP, LIME, and Captum. Through hands-on examples, you'll learn how to apply these libraries to real-world models from decision trees to deep neural networks and make sense of what's happening under the hood.

If you've ever struggled to explain your model’s output or justify its decisions, this session will give you a toolkit to build more trustworthy, interpretable systems without sacrificing performance.

Automating Content Creation with LLMs: A Journey from Manual to AI-Driven Excellence

In the fast-paced realm of travel experiences, GetYourGuide encountered the challenge of maintaining consistent, high-quality content across its global marketplace. Manual content creation by suppliers often resulted in inconsistencies and errors, negatively impacting conversion rates. To address this, we leveraged large language models (LLMs) to automate content generation, ensuring uniformity and accuracy. This talk will explore our innovative approach, including the development of fine-tuned models for generating key text sections and the use of Function Calling GPT API for structured data. A pivotal aspect of our solution was the creation of an LLM evaluator to detect and correct hallucinations, thereby improving factual accuracy. Through A/B testing, we demonstrated that AI-driven content led to fewer defects and increased bookings. Attendees will gain insights into training data refinement, prompt engineering, and deploying AI at scale, offering valuable lessons for automating content creation across industries.

podcast_episode
by Richie (DataCamp) , Klaus Kleinfeld (K2Elevation (Founder/CEO); Chairman of KONUX and FERNRIDE; advisory/board roles at NEOM, GreyOrange, Fero Labs, EMH Partners; former CEO of NEOM, Alcoa/Arconic, and Siemens AG)

The modern workplace often glorifies constant productivity and hustle culture, but at what cost? More professionals are burning out earlier in their careers, while elite athletes are extending their peak performance years. What can business leaders learn from high-performance sports about energy management and sustainable success? How do you distinguish between your 'inner game'—managing your energy and purpose—and your 'outer game' of business skills and execution? Could simple techniques like compartmentalization, breathing exercises, and finding deeper purpose transform your professional effectiveness? What if the key to avoiding burnout isn't working less, but working differently? Dr. Klaus Kleinfeld is an international executive, investor, and entrepreneur. He is the Founder and CEO of K2Elevation, which develops and invests in technology and biotech ventures across Germany, Austria, and the U.S. He serves as Chairman of KONUX and FERNRIDE, sits on the supervisory boards of GreyOrange, Fero Labs, and NEOM, and is an Advisory Partner at EMH Partners. Previously, he was the first CEO of NEOM, where he remains on the board and advises the Kingdom of Saudi Arabia on economic development. Earlier in his career, Dr. Kleinfeld was Chairman and CEO of Alcoa/Arconic, leading the company through a major transformation and successful split, and spent two decades at Siemens, ultimately becoming CEO of Siemens AG. He has also served on numerous global boards and advisory councils, including the Brookings Institution, Council on Foreign Relations, and World Economic Forum, and advised U.S. Presidents and international leaders. Born in Bremen, Germany, he holds an MBA from the University of Göttingen, a PhD from the University of Würzburg, and dual U.S.-German citizenship. In the episode, Richie and Klaus explore the causes of workplace burnout, the parallels between high-performing workers and athletes, the importance of managing energy and purpose, practical techniques for emotional and mental control, the role of downtime in productivity, and strategies for creating a supportive work culture, and much more. Links Mentioned in the Show: Klaus’ Book - Leading to ThriveConnect with KlausCourse: Understanding Prompt EngineeringRelated Episode: Becoming Remarkable with Guy Kawasaki, Author and Chief Evangelist at CanvaRewatch RADAR AI  New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business