talk-data.com talk-data.com

Event

DataTalks.Club

2020-11-21 – 2025-11-28 Podcasts Visit website ↗

Activities tracked

31

DataTalks.Club - the place to talk about data!

Filtering by: LLM ×

Sessions & talks

Showing 1–25 of 31 · Newest first

Search within this event →

From Black-Box Systems to Augmented Decision-Making - Anusha Akkina

2025-11-28 Listen
podcast_episode
Alexey , Anusha Akkina (Auralytix)

In this talk, Anusha Akkina, co-founder of Auralytix, shares her journey from working as a Chartered Accountant and Auditor at Deloitte to building an AI-powered finance intelligence platform designed to augment, not replace, human decision-making. Together with host Alexey from DataTalks.Club, she explores how AI is transforming finance operations beyond spreadsheets—from tackling ERP limitations to creating real-time insights that drive strategic business outcomes.

TIMECODES: 00:00 Building trust in AI finance and introducing Auralytix 02:22 From accounting roots to auditing at Deloitte and Paraxel 08:20 Moving to Germany and pivoting into corporate finance 11:50 The data struggle in strategic finance and the need for change 13:23 How Auralytix was born: bridging AI and financial compliance 17:15 Why ERP systems fail finance teams and how spreadsheets fill the gap 24:31 The real cost of ERP rigidity and lessons from failed transformations 29:10 The hidden risks of spreadsheet dependency and knowledge loss 37:30 Experimenting with ChatGPT and coding the first AI finance prototype 43:34 Identifying finance’s biggest pain points through user research 47:24 Empowering finance teams with AI-driven, real-time decision insights 50:59 Developing an entrepreneurial mindset through strategy and learning 54:31 Essential resources and finding the right AI co-founder

Connect with Anusha - Linkedin - https://www.linkedin.com/in/anusha-akkina-acma-cgma-56154547/ - Website - https://aurelytix.com/

Connect with DataTalks.Club: - Join the community - https://datatalks.club/slack.html - Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/r?cid=ZjhxaWRqbnEwamhzY3A4ODA5azFlZ2hzNjBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ - Check other upcoming events - https://lu.ma/dtc-events - GitHub: https://github.com/DataTalksClub - LinkedIn - https://www.linkedin.com/company/datatalks-club/ - Twitter - https://twitter.com/DataTalksClub - Website - https://datatalks.club/

Qdrant 2025 Conference Interviews

2025-11-28 Listen
podcast_episode

At Qdrant Conference, builders, researchers, and industry practitioners shared how vector search, retrieval infrastructure, and LLM-driven workflows are evolving across developer tooling, AI platforms, analytics teams, and modern search research.

Andrey Vasnetsov (Qdrant) explained how Qdrant was born from the need to combine database-style querying with vector similarity search—something he first built during the COVID lockdowns. He highlighted how vector search has shifted from an ML specialty to a standard developer tool and why hosting an in-person conference matters for gathering honest, real-time feedback from the growing community.

Slava Dubrov (HubSpot) described how his team uses Qdrant to power AI Signals, a platform for embeddings, similarity search, and contextual recommendations that support HubSpot’s AI agents. He shared practical use cases like look-alike company search, reflected on evaluating agentic frameworks, and offered career advice for engineers moving toward technical leadership.

Marina Ariamnova (SumUp) presented her internally built LLM analytics assistant that turns natural-language questions into SQL, executes queries, and returns clean summaries—cutting request times from days to minutes. She discussed balancing analytics and engineering work, learning through real projects, and how LLM tools help analysts scale routine workflows without replacing human expertise.

Evgeniya (Jenny) Sukhodolskaya (Qdrant) discussed the multi-disciplinary nature of DevRel and her focus on retrieval research. She shared her work on sparse neural retrieval, relevance feedback, and hybrid search models that blend lexical precision with semantic understanding—contributing methods like Mini-COIL and shaping Qdrant’s search quality roadmap through end-to-end experimentation and community education.

Speakers

Andrey Vasnetsov Co-founder & CTO of Qdrant, leading the engineering and platform vision behind a developer-focused vector database and vector-native infrastructure. Connect: https://www.linkedin.com/in/andrey-vasnetsov-75268897/

Slava Dubrov Technical Lead at HubSpot working on AI Signals—embedding models, similarity search, and context systems for AI agents. Connect: https://www.linkedin.com/in/slavadubrov/

Marina Ariamnova Data Lead at SumUp, managing analytics and financial data workflows while prototyping LLM tools that automate routine analysis. Connect: https://www.linkedin.com/in/marina-ariamnova/

Evgeniya (Jenny) Sukhodolskaya Developer Relations Engineer at Qdrant specializing in retrieval research, sparse neural methods, and educational ML content. Connect: https://www.linkedin.com/in/evgeniya-sukhodolskaya/

How to Build and Evaluate AI systems in the Age of LLMs - Hugo Bowne-Anderson

2025-10-24 Listen
podcast_episode
Hugo Bowne-Anderson (DataCamp)

In this talk, Hugo Bowne-Anderson, an independent data and AI consultant, educator, and host of the podcasts Vanishing Gradients and High Signal, shares his journey from academic research and curriculum design at DataCamp to advising teams at Netflix, Meta, and the US Air Force. Together, we explore how to build reliable, production-ready AI systems—from prompt evaluation and dataset design to embedding agents into everyday workflows.

You’ll learn about: How to structure teams and incentives for successful AI adoptionPractical prompting techniques for accurate timestamp and data generationBuilding and maintaining evaluation sets to avoid “prompt overfitting”- Cost-effective methods for LLM evaluation and monitoringTools and frameworks for debugging and observing AI behavior (Logfire, Braintrust, Phoenix Arise)The evolution of AI agents—from simple RAG systems to proactive, embedded assistantsHow to escape “proof of concept purgatory” and prioritize AI projects that drive business valueStep-by-step guidance for building reliable, evaluable AI agents This session is ideal for AI engineers, data scientists, ML product managers, and startup founders looking to move beyond experimentation into robust, scalable AI systems. Whether you’re optimizing RAG pipelines, evaluating prompts, or embedding AI into products, this talk offers actionable frameworks to guide you from concept to production.

LINKS Escaping POC Purgatory: Evaluation-Driven Development for AI Systems - https://www.oreilly.com/radar/escaping-poc-purgatory-evaluation-driven-development-for-ai-systems/Stop Building AI Agents - https://www.decodingai.com/p/stop-building-ai-agentsHow to Evaluate LLM Apps Before You Launch - https://www.youtube.com/watch?si=90fXJJQThSwGCaYv&v=TTr7zPLoTJI&feature=youtu.beMy Vanishing Gradients Substack - https://hugobowne.substack.com/Building LLM Applications for Data Scientists and Software Engineers https://maven.com/hugo-stefan/building-ai-apps-ds-and-swe-from-first-principles?promoCode=datatalksclub TIMECODES: 00:00 Introduction and Expertise 04:04 Transition to Freelance Consulting and Advising 08:49 Restructuring Teams and Incentivizing AI Adoption 12:22 Improving Prompting for Timestamp Generation 17:38 Evaluation Sets and Failure Analysis for Reliable Software 23:00 Evaluating Prompts: The Cost and Size of Gold Test Sets 27:38 Software Tools for Evaluation and Monitoring 33:14 Evolution of AI Tools: Proactivity and Embedded Agents 40:12 The Future of AI is Not Just Chat 44:38 Avoiding Proof of Concept Purgatory: Prioritizing RAG for Business Value 50:19 RAG vs. Agents: Complexity and Power Trade-Offs 56:21 Recommended Steps for Building Agents 59:57 Defining Memory in Multi-Turn Conversations

Connect with Hugo Twitter - https://x.com/hugobowneLinkedin - https://www.linkedin.com/in/hugo-bowne-anderson-045939a5/Github - https://github.com/hugobowneWebsite - https://hugobowne.github.io/ Connect with DataTalks.Club: Join the community - https://datatalks.club/slack.htmlSubscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/r?cid=ZjhxaWRqbnEwamhzY3A4ODA5azFlZ2hzNjBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQCheck other upcoming events - https://lu.ma/dtc-eventsGitHub: https://github.com/DataTalksClub- LinkedIn - https://www.linkedin.com/company/datatalks-club/ Twitter - https://twitter.com/DataTalksClub - Website - https://datatalks.club/

From Biotechnology to Bioinformatics Software - Sebastian Ayala Ruano

2025-10-24 Listen
podcast_episode

In this talk, Sebastian, a bioinformatics researcher and software engineer, shares his inspiring journey from wet lab biotechnology to computational bioinformatics. Hosted by Data Talks Club, this session explores how data science, AI, and open-source tools are transforming modern biological research — from DNA sequencing to metagenomics and protein structure prediction.

You’ll learn about: - The difference between wet lab and dry lab workflows in biotechnology - How bioinformatics enables faster insights through data-driven modeling - The MCW2 Graph Project and its role in studying wastewater microbiomes - Using co-abundance networks and the CC Lasso algorithm to map microbial interactions - How AlphaFold revolutionized protein structure prediction - Building scientific knowledge graphs to integrate biological metadata - Open-source tools like VueGen and VueCore for automating reports and visualizations - The growing impact of AI and large language models (LLMs) in research and documentation - Key differences between R (BioConductor) and Python ecosystems for bioinformatics

This talk is ideal for data scientists, bioinformaticians, biotech researchers, and AI enthusiasts who want to understand how data science, AI, and biology intersect. Whether you work in genomics, computational biology, or scientific software, you’ll gain insights into real-world tools and workflows shaping the future of bioinformatics.

Links: - MicW2Graph: https://zenodo.org/records/12507444 - VueGen: https://github.com/Multiomics-Analytics-Group/vuegen - Awesome-Bioinformatics: https://github.com/danielecook/Awesome-Bioinformatics

TIMECODES00:00 Sebastian’s Journey into Bioinformatics06:02 From Wet Lab to Computational Biology08:23 Wet Lab vs Dry Lab Explained12:35 Bioinformatics as Data Science for Biology15:30 How DNA Sequencing Works19:29 MCW2 Graph and Wastewater Microbiomes23:10 Building Microbial Networks with CC Lasso26:54 Protein–Ligand Simulation Basics29:58 Predicting Protein Folding in 3D33:30 AlphaFold Revolution in Protein Prediction36:45 Inside the MCW2 Knowledge Graph39:54 VueGen: Automating Scientific Reports43:56 VueCore: Visualizing OMIX Data47:50 Using AI and LLMs in Bioinformatics50:25 R vs Python in Bioinformatics Tools53:17 Closing Thoughts from Ecuador Connect with Sebastian Twitter - https://twitter.com/sayalaruanoLinkedin - https://linkedin.com/in/sayalaruano Github - https://github.com/sayalaruanoWebsite - https://sayalaruano.github.io/ Connect with DataTalks.Club: Join the community - https://datatalks.club/slack.htmlSubscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/r?cid=ZjhxaWRqbnEwamhzY3A4ODA5azFlZ2hzNjBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQCheck other upcoming events - https://lu.ma/dtc-eventsGitHub: https://github.com/DataTalksClubLinkedIn - https://www.linkedin.com/company/datatalks-club/Twitter - https://twitter.com/DataTalksClub - Website - https://datatalks.club/

Lessons from Applied AI: Tesla, Waymo, and Beyond - Aishwarya Jadhav

2025-10-10 Listen
podcast_episode

In this episode, we talked with Aishwarya Jadhav, a machine learning engineer whose career has spanned Morgan Stanley, Tesla, and now Waymo. Aishwarya shares her journey from big data in finance to applied AI in self-driving, gesture understanding, and computer vision. She discusses building an AI guide dog for the visually impaired, contributing to malaria mapping in Africa, and the challenges of deploying safe autonomous systems. We also explore the intersection of computer vision, NLP, and LLMs, and what it takes to break into the self-driving AI industry.TIMECODES00:51 Aishwarya’s career journey from finance to self-driving AI05:45 Building AI guide dog for the visually impaired12:03 Exploring LiDAR, radar, and Tesla’s camera-based approach16:24 Trust, regulation, and challenges in self-driving adoption19:39 Waymo, ride-hailing, and gesture recognition for traffic control24:18 Malaria mapping in Africa and AI for social good29:40 Deployment, safety, and testing in self-driving systems37:00 Transition from NLP to computer vision and deep learning43:37 Reinforcement learning, robotics, and self-driving constraints51:28 Testing processes, evaluations, and staged rollouts for autonomous driving52:53 Can multimodal LLMs be applied to self-driving?55:33 How to get started in self-driving AI careersConnect with Aishwarya- Linkedin - https://www.linkedin.com/in/aishwaryajadhav8/Connect with DataTalks.Club:- Join the community - https://datatalks.club/slack.html- Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/r?cid=ZjhxaWRqbnEwamhzY3A4ODA5azFlZ2hzNjBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ- Check other upcoming events - https://lu.ma/dtc-events- GitHub: https://github.com/DataTalksClub- LinkedIn - https://www.linkedin.com/company/datatalks-club/ - Twitter - https://twitter.com/DataTalksClub - Website - https://datatalks.club/

Building reliable AI products in the era of Gen AI and Agents - Ranjitha Kulkarni

2025-10-10 Listen
podcast_episode
Ranjitha Kulkarni (NeuBird AI (past: Microsoft, Dropbox))

In this episode, we talked with Ranjitha Kulkarni, a machine learning engineer with a rich career spanning Microsoft, Dropbox, and now NeuBird AI. Ranjitha shares her journey into ML and NLP, her work building recommendation systems, early AI agents, and cutting-edge LLM-powered products. She offers insights into designing reliable AI systems in the new era of generative AI and agents, and how context engineering and dynamic planning shape the future of AI products.TIMECODES00:00 Career journey and early curiosity04:25 Speech recognition at Microsoft05:52 Recommendation systems and early agents at Dropbox07:44 Joining NewBird AI12:01 Defining agents and LLM orchestration16:11 Agent planning strategies18:23 Agent implementation approaches22:50 Context engineering essentials30:27 RAG evolution in agent systems37:39 RAG vs agent use cases40:30 Dynamic planning in AI assistants43:00 AI productivity tools at Dropbox46:00 Evaluating AI agents53:20 Reliable tool usage challenges58:17 Future of agents in engineering Connect with Ranjitha- Linkedin - https://www.linkedin.com/in/ranjitha-gurunath-kulkarniConnect with DataTalks.Club:- Join the community - https://datatalks.club/slack.html- Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/r?cid=ZjhxaWRqbnEwamhzY3A4ODA5azFlZ2hzNjBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ- Check other upcoming events - https://lu.ma/dtc-events- GitHub: https://github.com/DataTalksClub- LinkedIn - https://www.linkedin.com/company/datatalks-club/ - Twitter - https://twitter.com/DataTalksClub - Website - https://datatalks.club/

Lessons from Two Decades of AI - Micheal Lanham

2025-09-26 Listen
podcast_episode

In this episode, we talk with Michael Lanham, an AI and software innovator with over two decades of experience spanning game development, fintech, oil and gas, and agricultural tech. Michael shares his journey from building neural network-based games and evolutionary algorithms to writing influential books on AI agents and deep learning. He offers insights into the evolving AI landscape, practical uses of AI agents, and the future of generative AI in gaming and beyond.

TIMECODES 00:00 Micheal Lanham’s career journey and AI agent books 05:45 Publishing journey: AR, Pokémon Go, sound design, and reinforcement learning 10:00 Evolution of AI: evolutionary algorithms, deep learning, and agents 13:33 Evolutionary algorithms in prompt engineering and LLMs 18:13 AI agent books second edition and practical applications 20:57 AI agent workflows: minimalism, task breakdown, and collaboration 26:25 Collaboration and orchestration among AI agents 31:24 Tools and reasoning servers for agent communication 35:17 AI agents in game development and generative AI impact 38:57 Future of generative AI in gaming and immersive content 41:42 Coding agents, new LLMs, and local deployment 45:40 AI model trends and data scientist career advice 53:36 Cognitive testing, evaluation, and monitoring in AI 58:50 Publishing details and closing remarks

Connect with Micheal Linkedin - https://www.linkedin.com/in/micheal-lanham-189693123/ Connect with DataTalks.Club: Join the community - https://datatalks.club/slack.htmlSubscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/...Check other upcoming events - https://lu.ma/dtc-eventsGitHub: https://github.com/DataTalksClubLinkedIn -   / datatalks-club   Twitter -   / datatalksclub   Website - https://datatalks.club/

Berlin PyData 2025 Conference Interviews

2025-09-26 Listen
podcast_episode
Yashasvi Misra (Pure Storage) , Igor Kvachenok (Leuphana University of Lüneburg) , Selim Nowicki (Distill Labs) , Mehdi Ouazza , Gülsah Durmaz

At PyData Berlin, community members and industry voices highlighted how AI and data tooling are evolving across knowledge graphs, MLOps, small-model fine-tuning, explainability, and developer advocacy.

  • Igor Kvachenok (Leuphana University / ProKube) combined knowledge graphs with LLMs for structured data extraction in the polymer industry, and noted how MLOps is shifting toward LLM-focused workflows.
  • Selim Nowicki (Distill Labs) introduced a platform that uses knowledge distillation to fine-tune smaller models efficiently, making model specialization faster and more accessible.
  • Gülsah Durmaz (Architect & Developer) shared her transition from architecture to coding, creating Python tools for design automation and volunteering with PyData through PyLadies.
  • Yashasvi Misra (Pure Storage) spoke on explainable AI, stressing accountability and compliance, and shared her perspective as both a data engineer and active Python community organizer.
  • Mehdi Ouazza (MotherDuck) reflected on developer advocacy through video, workshops, and branding, showing how creative communication boosts adoption of open-source tools like DuckDB.

Igor Kvachenok Master’s student in Data Science at Leuphana University of Lüneburg, writing a thesis on LLM-enhanced data extraction for the polymer industry. Builds RDF knowledge graphs from semi-structured documents and works at ProKube on MLOps platforms powered by Kubeflow and Kubernetes.

Connect: https://www.linkedin.com/in/igor-kvachenok/

Selim Nowicki Founder of Distill Labs, a startup making small-model fine-tuning simple and fast with knowledge distillation. Previously led data teams at Berlin startups like Delivery Hero, Trade Republic, and Tier Mobility. Sees parallels between today’s ML tooling and dbt’s impact on analytics.

Connect: https://www.linkedin.com/in/selim-nowicki/

Gülsah Durmaz Architect turned developer, creating Python-based tools for architectural design automation with Rhino and Grasshopper. Active in PyLadies and a volunteer at PyData Berlin, she values the community for networking and learning, and aims to bring ML into architecture workflows.

Connect: https://www.linkedin.com/in/gulsah-durmaz/

Yashasvi (Yashi) Misra Data Engineer at Pure Storage, community organizer with PyLadies India, PyCon India, and Women Techmakers. Advocates for inclusive spaces in tech and speaks on explainable AI, bridging her day-to-day in data engineering with her passion for ethical ML.

Connect: https://www.linkedin.com/in/misrayashasvi/

Mehdi Ouazza Developer Advocate at MotherDuck, formerly a data engineer, now focused on building community and education around DuckDB. Runs popular YouTube channels ("mehdio DataTV" and "MotherDuck") and delivered a hands-on workshop at PyData Berlin. Blends technical clarity with creative storytelling.

Connect: https://www.linkedin.com/in/mehd-io/

Berlin Buzzwords 2025 Conference Interviews

2025-09-12 Listen
podcast_episode
Kacper Łukawski (Qdrant) , Filip Makraduli (Superlinked) , Atita Arora , Brian Goldin (Voyager Search) , André Charton (Kleinanzeigen) , Manish Gill (ClickHouse)

At Berlin Buzzwords, industry voices highlighted how search is evolving with AI and LLMs.

  • Kacper Łukawski (Qdrant) stressed hybrid search (semantic + keyword) as core for RAG systems and promoted efficient embedding models for smaller-scale use.
  • Manish Gill (ClickHouse) discussed auto-scaling OLAP databases on Kubernetes, combining infrastructure and database knowledge.
  • André Charton (Kleinanzeigen) reflected on scaling search for millions of classifieds, moving from Solr/Elasticsearch toward vector search, while returning to a hands-on technical role.
  • Filip Makraduli (Superlinked) introduced a vector-first framework that fuses multiple encoders into one representation for nuanced e-commerce and recommendation search.
  • Brian Goldin (Voyager Search) emphasized spatial context in retrieval, combining geospatial data with AI enrichment to add the “where” to search.
  • Atita Arora (Voyager Search) highlighted geospatial AI models, the renewed importance of retrieval in RAG, and the cautious but promising rise of AI agents.

Together, their perspectives show a common thread: search is regaining center stage in AI—scaling, hybridization, multimodality, and domain-specific enrichment are shaping the next generation of retrieval systems.

Kacper Łukawski Senior Developer Advocate at Qdrant, he educates users on vector and hybrid search. He highlighted Qdrant’s support for dense and sparse vectors, the role of search with LLMs, and his interest in cost-effective models like static embeddings for smaller companies and edge apps. Connect: https://www.linkedin.com/in/kacperlukawski/

Manish Gill
Engineering Manager at ClickHouse, he spoke about running ClickHouse on Kubernetes, tackling auto-scaling and stateful sets. His team focuses on making ClickHouse scale automatically in the cloud. He credited its speed to careful engineering and reflected on the shift from IC to manager.
Connect: https://www.linkedin.com/in/manishgill/

André Charton
Head of Search at Kleinanzeigen, he discussed shaping the company’s search tech—moving from Solr to Elasticsearch and now vector search with Vespa. Kleinanzeigen handles 60M items, 1M new listings daily, and 50k requests/sec. André explained his career shift back to hands-on engineering.
Connect: https://www.linkedin.com/in/andrecharton/

Filip Makraduli
Founding ML DevRel engineer at Superlinked, an open-source framework for AI search and recommendations. Its vector-first approach fuses multiple encoders (text, images, structured fields) into composite vectors for single-shot retrieval. His Berlin Buzzwords demo showed e-commerce search with natural-language queries and filters.
Connect: https://www.linkedin.com/in/filipmakraduli/

Brian Goldin
Founder and CEO of Voyager Search, which began with geospatial search and expanded into documents and metadata enrichment. Voyager indexes spatial data and enriches pipelines with NLP, OCR, and AI models to detect entities like oil spills or windmills. He stressed adding spatial context (“the where”) as critical for search and highlighted Voyager’s 12 years of enterprise experience.
Connect: https://www.linkedin.com/in/brian-goldin-04170a1/

Atita Arora
Director of AI at Voyager Search, with nearly 20 years in retrieval systems, now focused on geospatial AI for Earth observation data. At Berlin Buzzwords she hosted sessions, attended talks on Lucene, GPUs, and Solr, and emphasized retrieval quality in RAG systems. She is cautiously optimistic about AI agents and values the event as both learning hub and professional reunion.
Connect: https://www.linkedin.com/in/atitaarora/

Build a Strong Career in Data - Lavanya Gupta

2025-05-09 Listen
podcast_episode
Lavanya Gupta (JPMorgan Chase)

In this podcast episode, we talked with Lavanya Gupta about Building a Strong Career in Data. About the Speaker: Lavanya is a Carnegie Mellon University (CMU) alumni of the Language Technologies Institute (LTI). She works as a Sr. AI/ML Applied Associate at JPMorgan Chase in their specialized Machine Learning Center of Excellence (MLCOE) vertical. Her latest research on long-context evaluation of LLMs was published in EMNLP 2024.

In addition to having a strong industrial research background of 5+ years, she is also an enthusiastic technical speaker. She has delivered talks at events such as Women in Data Science (WiDS) 2021, PyData, Illuminate AI 2021, TensorFlow User Group (TFUG), and MindHack! Summit. She also serves as a reviewer at top-tier NLP conferences (NeurIPS 2024, ICLR 2025, NAACL 2025). Additionally, through her collaborations with various prestigious organizations, like Anita BOrg and Women in Coding and Data Science (WiCDS), she is committed to mentoring aspiring machine learning enthusiasts.

In this episode, we talk about Lavanya Gupta’s journey from software engineer to AI researcher. She shares how hackathons sparked her passion for machine learning, her transition into NLP, and her current work benchmarking large language models in finance. Tune in for practical insights on building a strong data career and navigating the evolving AI landscape.

🕒 TIMECODES 00:00 Lavanya’s journey from software engineer to AI researcher 10:15 Benchmarking long context language models 12:36 Limitations of large context models in real domains 14:54 Handling large documents and publishing research in industry 19:45 Building a data science career: publications, motivation, and mentorship 25:01 Self-learning, hackathons, and networking 33:24 Community work and Kaggle projects 37:32 Mentorship and open-ended guidance 51:28 Building a strong data science portfolio 🔗 CONNECT WITH LAVANYALinkedIn -   / lgupta18  🔗 CONNECT WITH DataTalksClub Join the community - https://datatalks.club/slack.html Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/... Check other upcoming events - https://lu.ma/dtc-events LinkedIn -   / datatalks-club   Twitter -   / datatalksclub   Website - https://datatalks.club/

Data Intensive AI - Bartosz Mikulski

2025-03-21 Listen
podcast_episode

In this podcast episode, we talked with Bartosz Mikulski about Data Intensive AI.

About the Speaker: Bartosz is an AI and data engineer. He specializes in moving AI projects from the good-enough-for-a-demo phase to production by building a testing infrastructure and fixing the issues detected by tests. On top of that, he teaches programmers and non-programmers how to use AI. He contributed one chapter to the book 97 Things Every Data Engineer Should Know, and he was a speaker at several conferences, including Data Natives, Berlin Buzzwords, and Global AI Developer Days. 

In this episode, we discuss Bartosz’s career journey, the importance of testing in data pipelines, and how AI tools like ChatGPT and Cursor are transforming development workflows. From prompt engineering to building Chrome extensions with AI, we dive into practical use cases, tools, and insights for anyone working in data-intensive AI projects. Whether you’re a data engineer, AI enthusiast, or just curious about the future of AI in tech, this episode offers valuable takeaways and real-world experiences.

0:00 Introduction to Bartosz and his background 4:00 Bartosz’s career journey from Java development to AI engineering 9:05 The importance of testing in data engineering 11:19 How to create tests for data pipelines 13:14 Tools and approaches for testing data pipelines 17:10 Choosing Spark for data engineering projects 19:05 The connection between data engineering and AI tools 21:39 Use cases of AI in data engineering and MLOps 25:13 Prompt engineering techniques and best practices 31:45 Prompt compression and caching in AI models 33:35 Thoughts on DeepSeek and open-source AI models 35:54 Using AI for lead classification and LinkedIn automation 41:04 Building Chrome extensions with AI integration 43:51 Comparing Cursor and GitHub Copilot for coding 47:11 Using ChatGPT and Perplexity for AI-assisted tasks 52:09 Hosting static websites and using AI for development 54:27 How blogging helps attract clients and share knowledge 58:15 Using AI to assist with writing and content creation

🔗 CONNECT WITH Bartosz LinkedIn: https://www.linkedin.com/in/mikulskibartosz/ Github: https://github.com/mikulskibartosz Website: https://mikulskibartosz.name/blog/

🔗 CONNECT WITH DataTalksClub Join the community - https://datatalks.club/slack.html Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/r?cid=ZjhxaWRqbnEwamhzY3A4ODA5azFlZ2hzNjBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ Check other upcoming events - https://lu.ma/dtc-events LinkedIn - https://www.linkedin.com/company/datatalks-club/ Twitter - https://twitter.com/DataTalksClub Website - https://datatalks.club/

MLOps in Corporations and Startups - Nemanja Radojkovic

2025-03-14 Listen
podcast_episode

In this podcast episode, we talked with Nemanja Radojkovic about MLOps in Corporations and Startups.

About the Speaker: Nemanja Radojkovic is Senior Machine Learning Engineer at Euroclear.

In this event,we’re diving into the world of MLOps, comparing life in startups versus big corporations. Joining us again is Nemanja, a seasoned machine learning engineer with experience spanning Fortune 500 companies and agile startups. We explore the challenges of scaling MLOps on a shoestring budget, the trade-offs between corporate stability and startup agility, and practical advice for engineers deciding between these two career paths. Whether you’re navigating legacy frameworks or experimenting with cutting-edge tools.

1:00 MLOps in corporations versus startups 6:03 The agility and pace of startups 7:54 MLOps on a shoestring budget 12:54 Cloud solutions for startups 15:06 Challenges of cloud complexity versus on-premise 19:19 Selecting tools and avoiding vendor lock-in 22:22 Choosing between a startup and a corporation 27:30 Flexibility and risks in startups 29:37 Bureaucracy and processes in corporations 33:17 The role of frameworks in corporations 34:32 Advantages of large teams in corporations 40:01 Challenges of technical debt in startups 43:12 Career advice for junior data scientists 44:10 Tools and frameworks for MLOps projects 49:00 Balancing new and old technologies in skill development 55:43 Data engineering challenges and reliability in LLMs 57:09 On-premise vs. cloud solutions in data-sensitive industries 59:29 Alternatives like Dask for distributed systems

🔗 CONNECT WITH NEMANJA LinkedIn -   / radojkovic   Github - https://github.com/baskervilski

🔗 CONNECT WITH DataTalksClub Join the community - https://datatalks.club/slack.html Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/... Check other upcoming events - https://lu.ma/dtc-events  LinkedIn -   / datatalks-club    Twitter -   / datatalksclub    Website - https://datatalks.club/ 

Redefining AI Infrastructure: Open-Source, Chips, and the Future Beyond Kubernetes – Andrey Cheptsov

2025-01-31 Listen
podcast_episode
Andrey Cheptsov (dstack)

In this podcast episode, we talked with Andrey Cheptsov about ​The future of AI infrastructure.

About the Speaker: Andrey Cheptsov is the founder and CEO of dstack, an open-source alternative to Kubernetes and Slurm, built to simplify the orchestration of AI infrastructure. Before dstack, Andrey worked at JetBrains for over a decade helping different teams make the best developer tools. During the event, the guest, Andrey Cheptsov, founder and CEO of dstack, discussed the complexities of AI infrastructure. We explore topics like the challenges of using Kubernetes for AI workloads, the need to rethink container orchestration, and the future of hybrid and cloud-only infrastructures. Andrey also shares insights into the role of on-premise and bare-metal solutions, edge computing, and federated learning. 00:00 Andrey's Career Journey: From JetBrains to DStack 5:00 The Motivation Behind DStack 7:00 Challenges in Machine Learning Infrastructure 10:00 Transitioning from Cloud to On-Prem Solutions 14:30 Reflections on OpenAI's Evolution 17:30 Open Source vs Proprietary Models: A Balanced Perspective 21:01 Monolithic vs. Decentralized AI businesses 22:05 The role of privacy and control in AI for industries like banking and healthcare 30:00 Challenges in training large AI models: GPUs and distributed systems 37:03 DeepSpeed's efficient training approach vs. brute force methods 39:00 Challenges for small and medium businesses: hosting and fine-tuning models 47:01 Managing Kubernetes challenges for AI teams 52:00 Hybrid vs. cloud-only infrastructure 56:03 On-premise vs. bare-metal solutions 58:05 Exploring edge computing and its challenges

🔗 CONNECT WITH ANDREY CHEPTSOV Twitter -  / andrey_cheptsov   Linkedin -  / andrey-cheptsov   GitHub - https://github.com/dstackai/dstack/ Website - https://dstack.ai/

🔗 CONNECT WITH DataTalksClub Join DataTalks.Club:⁠⁠⁠https://datatalks.club/slack.html⁠⁠⁠ Our events:⁠⁠⁠https://datatalks.club/events.html⁠⁠⁠ Datalike Substack -⁠⁠⁠https://datalike.substack.com/⁠⁠⁠ LinkedIn:⁠⁠⁠  / datatalks-club  ⁠

AI in Industry: Trust, Return on Investment and Future - Maria Sukhareva

2024-12-06 Listen
podcast_episode
Maria Sukhareva (Siemens)

Reflection on an Almost Two-Year Journey of Generative AI in Industry – Maria Sukhareva

​About the speaker:

​Maria Sukhareva is a principal key expert in Artificial Intelligence in Siemens with over 15 years of experience at the forefront of generative AI technologies. Known for her keen eye for technological innovation, Maria excels at transforming cutting-edge AI research into practical, value-driven tools that address real-world needs. Her approach is both hands-on and results-focused, with a commitment to creating scalable, long-term solutions that improve communication, streamline complex processes, and empower smarter decision-making. Maria's work reflects a balanced vision, where the power of innovation is met with ethical responsibility, ensuring that her AI projects deliver impactful and production-ready outcomes.

We talked about:

00:00 DataTalks.Club intro

02:13 Career journey: From linguistics to AI

08:02 The Evolution of AI Expertise and its Future

13:10 AI vulnerabilities: Bypassing bot restrictions

17:00 Non-LLM classifiers as a more robust solution

22:56 Risks of chatbot deployment: Reputational and financial

27:13 The role of AI as a tool, not a replacement for human workers

31:41 The role of human translators in the age of AI

34:49 Evolution of English and its Germanic roots

38:44 Beowulf and Old English

39:43 Impact of the Norman occupation on English grammar

42:34 Identifying mushrooms with AI apps and safety precautions

45:08 Decoding ancient languages ​​like Sumerian

49:43 The evolution of machine translation and multilingual models

53:01 Challenges with low-resource languages ​​and inconsistent orthography

57:28 Transition from academia to industry in AI

Join our Slack: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

DataTalks.Club 4th Anniversary AMA Podcast – Alexey Grigorev and Johanna Bayer

2024-10-26 Listen
podcast_episode

We talked about:

00:00 DataTalks.Club intro

00:00 DataTalks.Club anniversary "Ask Me Anything" event with Alexey Grigorev

02:29 The founding of DataTalks .Club

03:52 Alexey's transition from Java work to DataTalks.Club

04:58 Growth and success of DataTalks.Club courses

12:04 Motivation behind creating a free-to-learn community

24:03 Staying updated in data science through pet projects

26 :37 Hosting a second podcast and maintaining programming skills

28:56 Skepticism about LLMs and their relevance

31:53 Transitioning to DataTalks.Club and personal reflections

33:32 Memorable moments and the first event's success

36:19 Community building during the pandemic

38:31 AI's impact on data analysts and future roles

42:24 Discussion on AI in healthcare

44:37 Age and reflections on personal milestones

47:54 Building communities and personal connections

49:34 Future goals for the community and courses

51:18 Community involvement and engagement strategies

53:46 Ideas for competitions and hackathons

54:20 Inviting guests to the podcast

55:29 Course updates and future workshops

56:27 Podcast preparation and research process

58:30 Career opportunities in data science and transitioning fields

1:01 :10 Book recommendations and personal reading experiences

About the speaker:

Alexey Grigorev is the founder of DataTalks.Club.

Join our slack: https://datatalks.club/slack.html

Working as a Core Developer in the Scikit-Learn Universe - Guillaume Lemaître

2024-07-26 Listen
podcast_episode

In this podcast episode, we talked with Guillaume Lemaître about navigating scikit-learn and imbalanced-learn.

🔗 CONNECT WITH Guillaume Lemaître LinkedIn - https://www.linkedin.com/in/guillaume-lemaitre-b9404939/ Twitter - https://x.com/glemaitre58 Github - https://github.com/glemaitre Website - https://glemaitre.github.io/

🔗 CONNECT WITH DataTalksClub Join the community - https://datatalks-club.slack.com/join/shared_invite/zt-2hu0sjeic-ESN7uHt~aVWc8tD3PefSlA#/shared-invite/email Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/u/0/r?cid=ZjhxaWRqbnEwamhzY3A4ODA5azFlZ2hzNjBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ Check other upcoming events - https://lu.ma/dtc-events LinkedIn - https://www.linkedin.com/company/datatalks-club/ Twitter - https://twitter.com/DataTalksClub Website - https://datatalks.club/

🔗 CONNECT WITH ALEXEY Twitter - https://twitter.com/Al_Grigor Linkedin - https://www.linkedin.com/in/agrigorev/

🎙 ABOUT THE PODCAST At DataTalksClub, we organize live podcasts that feature a diverse range of guests from the data field. Each podcast is a free-form conversation guided by a prepared set of questions, designed to learn about the guests’ career trajectories, life experiences, and practical advice. These insightful discussions draw on the expertise of data practitioners from various backgrounds.

We stream the podcasts on YouTube, where each session is also recorded and published on our channel, complete with timestamps, a transcript, and important links.

You can access all the podcast episodes here - https://datatalks.club/podcast.html

📚Check our free online courses ML Engineering course - http://mlzoomcamp.com Data Engineering course - https://github.com/DataTalksClub/data-engineering-zoomcamp MLOps course - https://github.com/DataTalksClub/mlops-zoomcamp Analytics in Stock Markets - https://github.com/DataTalksClub/stock-markets-analytics-zoomcamp LLM course - https://github.com/DataTalksClub/llm-zoomcamp Read about all our courses in one place - https://datatalks.club/blog/guide-to-free-online-courses-at-datatalks-club.html

👋🏼 GET IN TOUCH If you want to support our community, use this link - https://github.com/sponsors/alexeygrigorev

If you're a company and want to support us, contact at [email protected]

Building a Domestic Risk Assessment Tool - Sabina Firtala

2024-07-13 Listen
podcast_episode
Ba Linh Le (Frontline100) , Sabina Firtala

Links:

LinkedIn:https://www.linkedin.com/company/frontline100/ Ba Linh Le's LinkedIn: https://www.linkedin.com/in/ba-linh-le-/ Sabrina's LinkedIn: https://www.linkedin.com/in/sabina-firtala/ Twitter: https://x.com/frontline_100?mx=2 Website: https://www.frontline100.com/

Free LLM course: https://github.com/DataTalksClub/llm-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Berlin Buzzwords 2024

2024-07-06 Listen
podcast_episode

We stream the podcasts on YouTube, where each session is also recorded and published on our channel, complete with timestamps, a transcript, and important links.

You can access all the podcast episodes here - https://datatalks.club/podcast.html

📚Check our free online courses ML Engineering course - http://mlzoomcamp.com Data Engineering course - https://github.com/DataTalksClub/data-engineering-zoomcamp MLOps course - https://github.com/DataTalksClub/mlops-zoomcamp Analytics in Stock Markets - https://github.com/DataTalksClub/stock-markets-analytics-zoomcamp LLM course - https://github.com/DataTalksClub/llm-zoomcamp Read about all our courses in one place - https://datatalks.club/blog/guide-to-free-online-courses-at-datatalks-club.html

👋🏼 GET IN TOUCH If you want to support our community, use this link - https://github.com/sponsors/alexeygrigorev

If you’re a company, support us at [email protected]

Knowledge Graphs and LLMs Across Academia and Industry - Anahita Pakiman

2024-04-05 Listen
podcast_episode

We talked about:

Anahita's Background Mechanical Engineering and Applied Mechanics Finite Element Analysis vs. Machine Learning Optimization and Semantic Reporting Application of Knowledge Graphs in Research Graphs vs Tabular Data Computational graphs Graph Data Science and Graph Machine Learning Combining Knowledge Graphs and Large Language Models (LLMs) Practical Applications and Projects Challenges and Learnings Anahita’s Recommendations

Links:

GitHub repo: https://github.com/antahiap/ADPT-LRN-PHYS/tree/main

Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Building Machine Learning Products - Reem Mahmoud

2024-03-16 Listen
podcast_episode

We talked about:

Reem’s background Context-aware sensing and transfer learning Shifting focus from PhD to industry Reem’s experience with startups and dealing with prejudices towards PhDs AI interviewing solution How candidates react to getting interviewed by an AI avatar End-to-end overview of a machine learning project The pitfalls of using LLMs in your process Mitigating biases Addressing specific requirements for specific roles Reem’s resource recommendations

Links:

LinkedIn: https://www.linkedin.com/in/reemmahmoud/recent-activity/all/ Website: https://topmate.io/reem_mahmoud

Free Data Engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Navigating Challenges and Innovations in Search Technologies - Atita Arora

2023-12-27 Listen
podcast_episode

We talked about:

Atita’s background How NLP relates to search Atita’s experience with Lucidworks and OpenSource Connections Atita’s experience with Qdrant and vector databases Utilizing vector search Major changes to search Atita has noticed throughout her career RAG (Retrieval-Augmented Generation) Building a chatbot out of transcripts with LLMs Ingesting the data and evaluating the results Keeping humans in the loop Application of vector databases for machine learning Collaborative filtering Atita’s resource recommendations

Links:

LinkedIn: https://www.linkedin.com/in/atitaarora/
Twitter: https://x.com/atitaarora Github: https://github.com/atarora Human-in-the-Loop Machine Learning: https://www.manning.com/books/human-in-the-loop-machine-learning Relevant Search: https://www.manning.com/books/relevant-search Let's learn about Vectors: https://hub.superlinked.com/ Langchain: https://python.langchain.com/docs/get_started/introduction Qdrant blog: https://blog.qdrant.tech/ OpenSource Connections Blog: https://opensourceconnections.com/blog/

Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Pragmatic and Standardized MLOps - Maria Vechtomova

2023-09-08 Listen
podcast_episode
Maria Vechtomova (Marvelous MLOps)

We talked about:

Maria's background Marvelous MLOps Maria's definition of MLOps Alternate team setups without a central MLOps team Pragmatic vs non-pragmatic MLOps Must-have ML tools (categories) Maturity assessment What to start with in MLOps Standardized MLOps Convincing DevOps to implement Understanding what the tools are used for instead of knowing all the tools Maria's next project plans Is LLM Ops a thing? What Ahold Delhaize does Resource recommendations to learn more about MLOps The importance of data engineering knowledge for ML engineers

Links:

LinkedIn: https://www.linkedin.com/company/marvelous-mlops/

Website: https://marvelousmlops.substack.com/

Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Democratizing Causality - Aleksander Molak

2023-08-25 Listen
podcast_episode

We talked about:

Aleksander's background Aleksander as a Causal Ambassador Using causality to make decisions Counterfactuals and and Judea Pearl Meta-learners vs classical ML models Average treatment effect Reducing causal bias, the super efficient estimator, and model uplifting Metrics for evaluating a causal model vs a traditional ML model Is the added complexity of a causal model worth implementing? Utilizing LLMs in causal models (text as outcome) Text as treatment and style extraction The viability of A/B tests in causal models Graphical structures and nonparametric identification Aleksander's resource recommendations

Links:

The Book of Why: https://amzn.to/3OZpvBk Causal Inference and Discovery in Python: https://amzn.to/46Pperr Book's GitHub repo: https://github.com/PacktPublishing/Causal-Inference-and-Discovery-in-Python The Battle of Giants: Causality vs NLP (PyData Berlin 2023): https://www.youtube.com/watch?v=Bd1XtGZhnmw New Frontiers in Causal NLP (papers repo): https://bit.ly/3N0TFTL

Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

The Good, the Bad and the Ugly of GPT - Sandra Kublik

2023-08-04 Listen
podcast_episode

We talked about:

Sandra's background Making a YouTube channel to break into the LLM space The business cases for LLMs LLMs as amplifiers The befits of keeping a human in the loop when using LLMs (AI limitations) Using LLMs as assistants Building an app that uses an LLM Prompt whisperers and how to improve your prompts Sandra's 7-day LLM experiment Sandra's LLM content recommendations Finding Sandra online

Links:

LinkedIn: https://www.linkedin.com/in/sandrakublik/ Twitter: https://twitter.com/sandra_kublik Youtube: https://www.youtube.com/@sandra_kublik

Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

LLMs for Everyone - Meryem Arik

2023-07-28 Listen
podcast_episode
Meryem Arik (TitanML)

We talked about:

Meryam's background The constant evolution of startups How Meryam became interested in LLMs What is an LLM (generative vs non-generative models)? Why LLMs are important Open source models vs API models What TitanML does How fine-tuning a model helps in LLM use cases Fine-tuning generative models How generative models change the landscape of human work How to adjust models over time Vector databases and LLMs How to choose an open source LLM or an API Measuring input data quality Meryam's resource recommendations

Links:

Website: https://www.titanml.co/ Beta docs: https://titanml.gitbook.io/iris-documentation/overview/guide-to-titanml... Using llama2.0 in TitanML Blog: https://medium.com/@TitanML/the-easiest-way-to-fine-tune-and-inference-llama-2-0-8d8900a57d57 Discord: https://discord.gg/83RmHTjZgf Meryem LinkedIn: https://www.linkedin.com/in/meryemarik/

Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html