In this episode, we talked with Aishwarya Jadhav, a machine learning engineer whose career has spanned Morgan Stanley, Tesla, and now Waymo. Aishwarya shares her journey from big data in finance to applied AI in self-driving, gesture understanding, and computer vision. She discusses building an AI guide dog for the visually impaired, contributing to malaria mapping in Africa, and the challenges of deploying safe autonomous systems. We also explore the intersection of computer vision, NLP, and LLMs, and what it takes to break into the self-driving AI industry.TIMECODES00:51 Aishwarya’s career journey from finance to self-driving AI05:45 Building AI guide dog for the visually impaired12:03 Exploring LiDAR, radar, and Tesla’s camera-based approach16:24 Trust, regulation, and challenges in self-driving adoption19:39 Waymo, ride-hailing, and gesture recognition for traffic control24:18 Malaria mapping in Africa and AI for social good29:40 Deployment, safety, and testing in self-driving systems37:00 Transition from NLP to computer vision and deep learning43:37 Reinforcement learning, robotics, and self-driving constraints51:28 Testing processes, evaluations, and staged rollouts for autonomous driving52:53 Can multimodal LLMs be applied to self-driving?55:33 How to get started in self-driving AI careersConnect with Aishwarya- Linkedin - https://www.linkedin.com/in/aishwaryajadhav8/Connect with DataTalks.Club:- Join the community - https://datatalks.club/slack.html- Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/r?cid=ZjhxaWRqbnEwamhzY3A4ODA5azFlZ2hzNjBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ- Check other upcoming events - https://lu.ma/dtc-events- GitHub: https://github.com/DataTalksClub- LinkedIn - https://www.linkedin.com/company/datatalks-club/ - Twitter - https://twitter.com/DataTalksClub - Website - https://datatalks.club/
talk-data.com
Activities tracked
16
DataTalks.Club - the place to talk about data!
Top Topics
Sessions & talks
Showing 1–16 of 16 · Newest first
In this episode, we talked with Ranjitha Kulkarni, a machine learning engineer with a rich career spanning Microsoft, Dropbox, and now NeuBird AI. Ranjitha shares her journey into ML and NLP, her work building recommendation systems, early AI agents, and cutting-edge LLM-powered products. She offers insights into designing reliable AI systems in the new era of generative AI and agents, and how context engineering and dynamic planning shape the future of AI products.TIMECODES00:00 Career journey and early curiosity04:25 Speech recognition at Microsoft05:52 Recommendation systems and early agents at Dropbox07:44 Joining NewBird AI12:01 Defining agents and LLM orchestration16:11 Agent planning strategies18:23 Agent implementation approaches22:50 Context engineering essentials30:27 RAG evolution in agent systems37:39 RAG vs agent use cases40:30 Dynamic planning in AI assistants43:00 AI productivity tools at Dropbox46:00 Evaluating AI agents53:20 Reliable tool usage challenges58:17 Future of agents in engineering Connect with Ranjitha- Linkedin - https://www.linkedin.com/in/ranjitha-gurunath-kulkarniConnect with DataTalks.Club:- Join the community - https://datatalks.club/slack.html- Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/r?cid=ZjhxaWRqbnEwamhzY3A4ODA5azFlZ2hzNjBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ- Check other upcoming events - https://lu.ma/dtc-events- GitHub: https://github.com/DataTalksClub- LinkedIn - https://www.linkedin.com/company/datatalks-club/ - Twitter - https://twitter.com/DataTalksClub - Website - https://datatalks.club/
Berlin Buzzwords 2025 Conference Interviews
At Berlin Buzzwords, industry voices highlighted how search is evolving with AI and LLMs.
- Kacper Łukawski (Qdrant) stressed hybrid search (semantic + keyword) as core for RAG systems and promoted efficient embedding models for smaller-scale use.
- Manish Gill (ClickHouse) discussed auto-scaling OLAP databases on Kubernetes, combining infrastructure and database knowledge.
- André Charton (Kleinanzeigen) reflected on scaling search for millions of classifieds, moving from Solr/Elasticsearch toward vector search, while returning to a hands-on technical role.
- Filip Makraduli (Superlinked) introduced a vector-first framework that fuses multiple encoders into one representation for nuanced e-commerce and recommendation search.
- Brian Goldin (Voyager Search) emphasized spatial context in retrieval, combining geospatial data with AI enrichment to add the “where” to search.
- Atita Arora (Voyager Search) highlighted geospatial AI models, the renewed importance of retrieval in RAG, and the cautious but promising rise of AI agents.
Together, their perspectives show a common thread: search is regaining center stage in AI—scaling, hybridization, multimodality, and domain-specific enrichment are shaping the next generation of retrieval systems.
Kacper Łukawski Senior Developer Advocate at Qdrant, he educates users on vector and hybrid search. He highlighted Qdrant’s support for dense and sparse vectors, the role of search with LLMs, and his interest in cost-effective models like static embeddings for smaller companies and edge apps. Connect: https://www.linkedin.com/in/kacperlukawski/
Manish Gill
Engineering Manager at ClickHouse, he spoke about running ClickHouse on Kubernetes, tackling auto-scaling and stateful sets. His team focuses on making ClickHouse scale automatically in the cloud. He credited its speed to careful engineering and reflected on the shift from IC to manager.
Connect: https://www.linkedin.com/in/manishgill/
André Charton
Head of Search at Kleinanzeigen, he discussed shaping the company’s search tech—moving from Solr to Elasticsearch and now vector search with Vespa. Kleinanzeigen handles 60M items, 1M new listings daily, and 50k requests/sec. André explained his career shift back to hands-on engineering.
Connect: https://www.linkedin.com/in/andrecharton/
Filip Makraduli
Founding ML DevRel engineer at Superlinked, an open-source framework for AI search and recommendations. Its vector-first approach fuses multiple encoders (text, images, structured fields) into composite vectors for single-shot retrieval. His Berlin Buzzwords demo showed e-commerce search with natural-language queries and filters.
Connect: https://www.linkedin.com/in/filipmakraduli/
Brian Goldin
Founder and CEO of Voyager Search, which began with geospatial search and expanded into documents and metadata enrichment. Voyager indexes spatial data and enriches pipelines with NLP, OCR, and AI models to detect entities like oil spills or windmills. He stressed adding spatial context (“the where”) as critical for search and highlighted Voyager’s 12 years of enterprise experience.
Connect: https://www.linkedin.com/in/brian-goldin-04170a1/
Atita Arora
Director of AI at Voyager Search, with nearly 20 years in retrieval systems, now focused on geospatial AI for Earth observation data. At Berlin Buzzwords she hosted sessions, attended talks on Lucene, GPUs, and Solr, and emphasized retrieval quality in RAG systems. She is cautiously optimistic about AI agents and values the event as both learning hub and professional reunion.
Connect: https://www.linkedin.com/in/atitaarora/
Build a Strong Career in Data - Lavanya Gupta
In this podcast episode, we talked with Lavanya Gupta about Building a Strong Career in Data. About the Speaker: Lavanya is a Carnegie Mellon University (CMU) alumni of the Language Technologies Institute (LTI). She works as a Sr. AI/ML Applied Associate at JPMorgan Chase in their specialized Machine Learning Center of Excellence (MLCOE) vertical. Her latest research on long-context evaluation of LLMs was published in EMNLP 2024.
In addition to having a strong industrial research background of 5+ years, she is also an enthusiastic technical speaker. She has delivered talks at events such as Women in Data Science (WiDS) 2021, PyData, Illuminate AI 2021, TensorFlow User Group (TFUG), and MindHack! Summit. She also serves as a reviewer at top-tier NLP conferences (NeurIPS 2024, ICLR 2025, NAACL 2025). Additionally, through her collaborations with various prestigious organizations, like Anita BOrg and Women in Coding and Data Science (WiCDS), she is committed to mentoring aspiring machine learning enthusiasts.
In this episode, we talk about Lavanya Gupta’s journey from software engineer to AI researcher. She shares how hackathons sparked her passion for machine learning, her transition into NLP, and her current work benchmarking large language models in finance. Tune in for practical insights on building a strong data career and navigating the evolving AI landscape.
🕒 TIMECODES 00:00 Lavanya’s journey from software engineer to AI researcher 10:15 Benchmarking long context language models 12:36 Limitations of large context models in real domains 14:54 Handling large documents and publishing research in industry 19:45 Building a data science career: publications, motivation, and mentorship 25:01 Self-learning, hackathons, and networking 33:24 Community work and Kaggle projects 37:32 Mentorship and open-ended guidance 51:28 Building a strong data science portfolio 🔗 CONNECT WITH LAVANYALinkedIn - / lgupta18 🔗 CONNECT WITH DataTalksClub Join the community - https://datatalks.club/slack.html Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/... Check other upcoming events - https://lu.ma/dtc-events LinkedIn - / datatalks-club Twitter - / datatalksclub Website - https://datatalks.club/
Linguistics and Fairness - Tamara Atanasoska
In this podcast episode, we talked with Tamara Atanasoska about building fair AI systems.
About the Speaker:Tamara works on ML explainability, interpretability and fairness as Open Source Software Engineer at probable. She is a maintainer of fairlearn, contributor to scikit-learn and skops. Tamara has both computer science/ software engineering and a computational linguistics(NLP) background.During the event, the guest discussed their career journey from software engineering to open-source contributions, focusing on explainability in AI through Scikit-learn and Fairlearn. They explored fairness in AI, including challenges in credit loans, hiring, and decision-making, and emphasized the importance of tools, human judgment, and collaboration. The guest also shared their involvement with PyLadies and encouraged contributions to Fairlearn. 00:00 Introduction to the event and the community 01:51 Topic introduction: Linguistic fairness and socio-technical perspectives in AI 02:37 Guest introduction: Tamara’s background and career 03:18 Tamara’s career journey: Software engineering, music tech, and computational linguistics 09:53 Tamara’s background in language and computer science 14:52 Exploring fairness in AI and its impact on society 21:20 Fairness in AI models26:21 Automating fairness analysis in models 32:32 Balancing technical and domain expertise in decision-making 37:13 The role of humans in the loop for fairness 40:02 Joining Probable and working on open-source projects 46:20 Scopes library and its integration with Hugging Face 50:48 PyLadies and community involvement 55:41 The ethos of Scikit-learn and Fairlearn
🔗 CONNECT WITH TAMARA ATANASOSKA Linkedin - https://www.linkedin.com/in/tamaraatanasoska GitHub- https://github.com/TamaraAtanasoska
🔗 CONNECT WITH DataTalksClub Join DataTalks.Club:https://datatalks.club/slack.html Our events:https://datatalks.club/events.html Datalike Substack -https://datalike.substack.com/ LinkedIn: / datatalks-club
Large Hadron Collider and Mentorship – Anastasia Karavdina
We talked about:
00:00 DataTalks.Club intro
00:00 Large Hadron Collider and Mentorship
02:35 Career overview and transition from physics to data science
07:02 Working at the Large Hadron Collider
09:19 How particles collide and the role of detectors
11:03 Data analysis challenges in particle physics and data science similarities
13:32 Team structure at the Large Hadron Collider
20:05 Explaining the connection between particle physics and data science
23:21 Software engineering practices in particle physics
26:11 Challenges during interviews for data science roles
29:30 Mentoring and offering advice to job seekers
40:03 The STAR method and its value in interviews
50:32 Paid vs unpaid mentorship and finding the right fit
About the speaker:
Anastasia is a particle physicist turned data scientist, with experience in large-scale experiments like those at the Large Hadron Collider. She also worked at Blue Yonder, scaling AI-driven solutions for global supply chain giants, and at Kaufland e-commerce, focusing on NLP and search. Anastasia is a mentor for Ml/AI, dedicated to helping her mentees achieve their goals. She is passionate about growing the next generation of data science elite in Germany: from Data Analysts up to ML Engineers.
Join our Slack: https://datatalks .club/slack.html
Human-Centered AI for Disordered Speech Recognition - Katarzyna Foremniak
We talked about:
00:00 DataTalks.Club intro
08:06 Background and career journey of Katarzyna
09:06 Transition from linguistics to computational linguistics
11:38 Merging linguistics and computer science
15:25 Understanding phonetics and morpho-syntax
17:28 Exploring morpho-syntax and its relation to grammar
20:33 Connection between phonetics and speech disorders
24:41 Improvement of voice recognition systems
27:31 Overview of speech recognition technology
30:24 Challenges of ASR systems with atypical speech
30:53 Strategies for improving recognition of disordered speech
37:07 Data augmentation for training models
40:17 Transfer learning in speech recognition
42:18 Challenges of collecting data for various speech disorders
44:31 Stammering and its connection to fluency issues
45:16 Polish consonant combinations and pronunciation challenges
46:17 Use of Amazon Transcribe for generating podcast transcripts
47:28 Role of language models in speech recognition
49:19 Contextual understanding in speech recognition
51:27 How voice recognition systems analyze utterances
54:05 Personalization of ASR models for individuals
56:25 Language disorders and their impact on communication
58:00 Applications of speech recognition technology
1:00:34 Challenges of personalized and universal models
1:01:23 Voice recognition in automotive applications
1:03:27 Humorous voice recognition failures in cars
1:04:13 Closing remarks and reflections on the discussion
About the speaker:
Katarzyna is a computational linguist with over 10 years of experience in NLP and speech recognition. She has developed language models for automotive brands like Audi and Porsche and specializes in phonetics, morpho-syntax, and sentiment analysis.
Kasia also teaches at the University of Warsaw and is passionate about human-centered AI and multilingual NLP.
Join our slack: https://datatalks.club/slack.html
We talked about:
Atita’s background How NLP relates to search Atita’s experience with Lucidworks and OpenSource Connections Atita’s experience with Qdrant and vector databases Utilizing vector search Major changes to search Atita has noticed throughout her career RAG (Retrieval-Augmented Generation) Building a chatbot out of transcripts with LLMs Ingesting the data and evaluating the results Keeping humans in the loop Application of vector databases for machine learning Collaborative filtering Atita’s resource recommendations
Links:
LinkedIn: https://www.linkedin.com/in/atitaarora/
Twitter: https://x.com/atitaarora
Github: https://github.com/atarora
Human-in-the-Loop Machine Learning: https://www.manning.com/books/human-in-the-loop-machine-learning
Relevant Search: https://www.manning.com/books/relevant-search
Let's learn about Vectors: https://hub.superlinked.com/
Langchain: https://python.langchain.com/docs/get_started/introduction
Qdrant blog: https://blog.qdrant.tech/
OpenSource Connections Blog: https://opensourceconnections.com/blog/
Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
We talked about:
Aleksander's background Aleksander as a Causal Ambassador Using causality to make decisions Counterfactuals and and Judea Pearl Meta-learners vs classical ML models Average treatment effect Reducing causal bias, the super efficient estimator, and model uplifting Metrics for evaluating a causal model vs a traditional ML model Is the added complexity of a causal model worth implementing? Utilizing LLMs in causal models (text as outcome) Text as treatment and style extraction The viability of A/B tests in causal models Graphical structures and nonparametric identification Aleksander's resource recommendations
Links:
The Book of Why: https://amzn.to/3OZpvBk Causal Inference and Discovery in Python: https://amzn.to/46Pperr Book's GitHub repo: https://github.com/PacktPublishing/Causal-Inference-and-Discovery-in-Python The Battle of Giants: Causality vs NLP (PyData Berlin 2023): https://www.youtube.com/watch?v=Bd1XtGZhnmw New Frontiers in Causal NLP (papers repo): https://bit.ly/3N0TFTL
Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html
Building an Open-Source NLP Tool - Johannes Hötter
We talked about:
Johannes’s background Johannes’s Open Source Spotlight demos – Refinery and Bricks The difficulties of working with natural language processing (NLP) Incorporating ChatGPT into a process as a heuristic What is Bricks? The process of starting a startup – Kern Making the decision to go with open source Pros and cons of launching as open source Kern’s business model Working with enterprises Johannes as a salesperson The team at Kern Johannes’s role at Kern How Johannes and Henrik separate responsibilities at Kern Working with very niche use cases The short story of how Kern got its funding Johannes’s resource recommendation
Links:
Refinery's GitHub repo: https://github.com/code-kern-ai/refinery Bricks' Github repo: https://github.com/code-kern-ai/bricks Bricks Open Source Spotlight demo: https://www.youtube.com/watch?v=r3rXzoLQy2U Refinery Open Source Spotlight demo: https://www.youtube.com/watch?v=LlMhN2f7YDg Discord: https://discord.com/invite/qf4rGCEphW Ker's Website: https://www.kern.ai
Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
From Testing Phones to Managing NLP Projects - Alvaro Navas Peire
We talked about:
Alvaro’s background Working as a QA (Quality Assurance) engineer Transitioning from QA to Machine Learning Gathering knowledge about ML field Searching for an ML job (improving soft skills and CV) Data science interview skills Zoomcamp projects Zoomcamp project deployment How to not undersell yourself during interviews Alvaro’s experience with interviews during his transition Alvaro’s Zoomcamp notes Alvaro’s coach The importance of mathematical knowledge to a transition into ML Preparing for technical interviews Alvaro’s typical workday Alvaro’s team’s tech stack The importance of a technical background to transitioning into ML
Links:
Alvaro's CV: https://www.dropbox.com/s/89hkt3ug0toqa2n/CV%20nou%20-%20angl%C3%A8s.pdf?dl=0 Github profile: https://github.com/ziritrion LinkedIn profile: https://www.linkedin.com/in/alvaronavas/
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcampJoin
DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
We talked about:
Christiaan’s background Usual ways of collecting and curating data Getting the buy-in from experts and executives Starting an annotation booklet Pre-labeling Dataset collection Human level baseline and feedback Using the annotation booklet to boost annotation productivity Putting yourself in the shoes of annotators (and measuring performance) Active learning Distance supervision Weak labeling Dataset collection in career positioning and project portfolios IPython widgets GDPR compliance and non-English NLP Finding Christiaan online
Links:
My personal blog: https://useml.net/ Comtura, my company: https://comtura.ai/ LI: https://www.linkedin.com/in/christiaan-swart-51a68967/ Twitter: https://twitter.com/swartchris8/
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Developer Advocacy Engineer for Open-Source - Merve Noyan
We talked about:
Merve’s background Merve’s first contributions to open source What Merve currently does at Hugging Face (Hub, Spaces) What is means to be a developer advocacy engineer at Hugging Face The best way to get open source experience (Google Summer of Code, Hacktoberfest, and sprints) The peculiarities of hiring as it relates to code contributions Best resources to learn about NLP besides Hugging Face Good first projects for NLP The most important topics in NLP right now NLP ML Engineer vs NLP Data Scientist Project recommendations and other advice to catch the eye of recruiters Merve on Twitch and her podcast Finding Merve online Merve and Mario Kart
Links:
Hugging Face Course: https://hf.co/course Natural Language Processing in TensorFlow: https://www.coursera.org/learn/natural-language-processing-tensorflow Github ML Poetry: https://github.com/merveenoyan/ML-poetry Tackling multiple tasks with a single visual language model: https://www.deepmind.com/blog/tackling-multiple-tasks-with-a-single-visual-language-model Hugging Face big science/TOpp: https://huggingface.co/bigscience/T0pp Pathways Language Model (PaLM) blog: https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html
MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
We talked about:
Ivan’s role at Personio Ivan’s background Studying technical management Managing a software team NLP teams NLP engineers Becoming an NLP engineer Computer vision NLP engineer vs ML engineer Conversational designers Linguistics outside of chatbots When does a team need an NLP engineer or a linguist? The future of NLP NLP pipelines GPT-3 Problems of GPT-3 Does GPT-3 make everything obsolete? What NLP actually is? Does NLP solve problems better than humans? State of language translation NLP Pandect
Links:
https://github.com/ivan-bilan/The-NLP-Pandect https://github.com/ivan-bilan/The-Engineering-Manager-Pandect https://github.com/ivan-bilan/The-Microservices-Pandect Ivan's presentation about NLP: https://www.youtube.com/watch?v=VRur3xey31s
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
What Researchers and Engineers Can Learn from Each Other - Mihail Eric
We talked about:
Mihail’s background NLP and self-driving vehicles Transitioning from academia to the industry Machine learning researchers Finding open-ended problems Machine learning engineers Is data science more engineering or research? What can engineers and researchers learn from one another? Bridging the disconnect between researchers and engineers Breaking down silos Fluid roles Full-stack data scientists Advice to machine learning researchers Advice to machine learning engineers Reading papers Choosing between engineering or research if you’re just starting Confetti.ai
Links:
https://twitter.com/mihail_eric http://confetti.ai/
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
What Data Scientists Don’t Mention in Their LinkedIn Profiles - Yury Kashnitsky
We talked about:
Yury’s background Failing fast: Grammarly for science Not failing fast: Keyword recommender Four steps to epiphany Lesson learned when bringing XGBoost into production When data scientists try to be engineers Joining a fintech startup: Doing NLP with thousands of GPUs Working at a Telco company Having too much freedom The importance of digital presence Work-life balance Quantifying impact of failing projects on our CVs Business trips to Perm: don’t work on the weekend What doesn’t kill you makes you stronger
Links:
Yury's course: https://mlcourse.ai/ Yury's Twitter: https://twitter.com/ykashnitsky
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html