NLP

Lessons from Applied AI: Tesla, Waymo, and Beyond - Aishwarya Jadhav

2025-10-10 · DataTalks.Club Listen

podcast_episode

by Aishwarya Jadhav (Waymo)

AI/ML Big Data GitHub HTML LLM

In this episode, we talked with Aishwarya Jadhav, a machine learning engineer whose career has spanned Morgan Stanley, Tesla, and now Waymo. Aishwarya shares her journey from big data in finance to applied AI in self-driving, gesture understanding, and computer vision. She discusses building an AI guide dog for the visually impaired, contributing to malaria mapping in Africa, and the challenges of deploying safe autonomous systems. We also explore the intersection of computer vision, NLP, and LLMs, and what it takes to break into the self-driving AI industry.TIMECODES00:51 Aishwarya’s career journey from finance to self-driving AI05:45 Building AI guide dog for the visually impaired12:03 Exploring LiDAR, radar, and Tesla’s camera-based approach16:24 Trust, regulation, and challenges in self-driving adoption19:39 Waymo, ride-hailing, and gesture recognition for traffic control24:18 Malaria mapping in Africa and AI for social good29:40 Deployment, safety, and testing in self-driving systems37:00 Transition from NLP to computer vision and deep learning43:37 Reinforcement learning, robotics, and self-driving constraints51:28 Testing processes, evaluations, and staged rollouts for autonomous driving52:53 Can multimodal LLMs be applied to self-driving?55:33 How to get started in self-driving AI careersConnect with Aishwarya- Linkedin - https://www.linkedin.com/in/aishwaryajadhav8/Connect with DataTalks.Club:- Join the community - https://datatalks.club/slack.html- Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/r?cid=ZjhxaWRqbnEwamhzY3A4ODA5azFlZ2hzNjBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ- Check other upcoming events - https://lu.ma/dtc-events- GitHub: https://github.com/DataTalksClub- LinkedIn - https://www.linkedin.com/company/datatalks-club/ - Twitter - https://twitter.com/DataTalksClub - Website - https://datatalks.club/

Building reliable AI products in the era of Gen AI and Agents - Ranjitha Kulkarni

2025-10-10 · DataTalks.Club Listen

podcast_episode

by Ranjitha Kulkarni (NeuBird AI (past: Microsoft, Dropbox))

AI/ML GenAI GitHub HTML LLM Microsoft RAG

In this episode, we talked with Ranjitha Kulkarni, a machine learning engineer with a rich career spanning Microsoft, Dropbox, and now NeuBird AI. Ranjitha shares her journey into ML and NLP, her work building recommendation systems, early AI agents, and cutting-edge LLM-powered products. She offers insights into designing reliable AI systems in the new era of generative AI and agents, and how context engineering and dynamic planning shape the future of AI products.TIMECODES00:00 Career journey and early curiosity04:25 Speech recognition at Microsoft05:52 Recommendation systems and early agents at Dropbox07:44 Joining NewBird AI12:01 Defining agents and LLM orchestration16:11 Agent planning strategies18:23 Agent implementation approaches22:50 Context engineering essentials30:27 RAG evolution in agent systems37:39 RAG vs agent use cases40:30 Dynamic planning in AI assistants43:00 AI productivity tools at Dropbox46:00 Evaluating AI agents53:20 Reliable tool usage challenges58:17 Future of agents in engineering Connect with Ranjitha- Linkedin - https://www.linkedin.com/in/ranjitha-gurunath-kulkarniConnect with DataTalks.Club:- Join the community - https://datatalks.club/slack.html- Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/r?cid=ZjhxaWRqbnEwamhzY3A4ODA5azFlZ2hzNjBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ- Check other upcoming events - https://lu.ma/dtc-events- GitHub: https://github.com/DataTalksClub- LinkedIn - https://www.linkedin.com/company/datatalks-club/ - Twitter - https://twitter.com/DataTalksClub - Website - https://datatalks.club/

Berlin Buzzwords 2025 Conference Interviews

2025-09-12 · DataTalks.Club Listen

podcast_episode

by Kacper Łukawski (Qdrant) , Filip Makraduli (Superlinked) , Brian Goldin (Voyager Search) , André Charton (Kleinanzeigen) , Manish Gill (ClickHouse) , Atita Arora

AI/ML ClickHouse Cloud Computing ELK Kubernetes LLM RAG

At Berlin Buzzwords, industry voices highlighted how search is evolving with AI and LLMs.

Kacper Łukawski (Qdrant) stressed hybrid search (semantic + keyword) as core for RAG systems and promoted efficient embedding models for smaller-scale use.
Manish Gill (ClickHouse) discussed auto-scaling OLAP databases on Kubernetes, combining infrastructure and database knowledge.
André Charton (Kleinanzeigen) reflected on scaling search for millions of classifieds, moving from Solr/Elasticsearch toward vector search, while returning to a hands-on technical role.
Filip Makraduli (Superlinked) introduced a vector-first framework that fuses multiple encoders into one representation for nuanced e-commerce and recommendation search.
Brian Goldin (Voyager Search) emphasized spatial context in retrieval, combining geospatial data with AI enrichment to add the “where” to search.
Atita Arora (Voyager Search) highlighted geospatial AI models, the renewed importance of retrieval in RAG, and the cautious but promising rise of AI agents.

Together, their perspectives show a common thread: search is regaining center stage in AI—scaling, hybridization, multimodality, and domain-specific enrichment are shaping the next generation of retrieval systems.

Kacper Łukawski Senior Developer Advocate at Qdrant, he educates users on vector and hybrid search. He highlighted Qdrant’s support for dense and sparse vectors, the role of search with LLMs, and his interest in cost-effective models like static embeddings for smaller companies and edge apps. Connect: https://www.linkedin.com/in/kacperlukawski/

Manish Gill
Engineering Manager at ClickHouse, he spoke about running ClickHouse on Kubernetes, tackling auto-scaling and stateful sets. His team focuses on making ClickHouse scale automatically in the cloud. He credited its speed to careful engineering and reflected on the shift from IC to manager.
Connect: https://www.linkedin.com/in/manishgill/

André Charton
Head of Search at Kleinanzeigen, he discussed shaping the company’s search tech—moving from Solr to Elasticsearch and now vector search with Vespa. Kleinanzeigen handles 60M items, 1M new listings daily, and 50k requests/sec. André explained his career shift back to hands-on engineering.
Connect: https://www.linkedin.com/in/andrecharton/

Filip Makraduli
Founding ML DevRel engineer at Superlinked, an open-source framework for AI search and recommendations. Its vector-first approach fuses multiple encoders (text, images, structured fields) into composite vectors for single-shot retrieval. His Berlin Buzzwords demo showed e-commerce search with natural-language queries and filters.
Connect: https://www.linkedin.com/in/filipmakraduli/

Brian Goldin
Founder and CEO of Voyager Search, which began with geospatial search and expanded into documents and metadata enrichment. Voyager indexes spatial data and enriches pipelines with NLP, OCR, and AI models to detect entities like oil spills or windmills. He stressed adding spatial context (“the where”) as critical for search and highlighted Voyager’s 12 years of enterprise experience.
Connect: https://www.linkedin.com/in/brian-goldin-04170a1/

Atita Arora
Director of AI at Voyager Search, with nearly 20 years in retrieval systems, now focused on geospatial AI for Earth observation data. At Berlin Buzzwords she hosted sessions, attended talks on Lucene, GPUs, and Solr, and emphasized retrieval quality in RAG systems. She is cautiously optimistic about AI agents and values the event as both learning hub and professional reunion.
Connect: https://www.linkedin.com/in/atitaarora/

EPISODE 26: Eat, Inhibit, Lay – How Worms Tune Reproduction to Food

2025-08-27 · WOrM Podcast: Whole Organism Analytics Podcast Listen

podcast_episode

by Veeren M Chauhan (University of Nottingham) , Yen-Chih Chen , Kara E. Zang , Niels Ringstad , Hassan Ahamed

AI/ML

Why do C. elegans lay eggs only when food is around? In this episode, we explore a newly uncovered neuromodulatory circuit that links food detection to reproductive behaviour using a clever form of disinhibition. At the heart of this is the AVK interneuron — silenced by dopamine when food is present — which normally blocks egg-laying until conditions are right.

We unpack:

How AVK neurons act as gatekeepers for egg-laying behaviour Dopamine from food-sensing neurons inhibits AVKs via DOP-3 receptors AVKs release a cocktail of neuropeptides (PDF-1, NLP-10, NLP-21) that modulate downstream AIY neurons Functional imaging, CRISPR mutants, and optogenetics map the full food-to-egg pathway How this reveals general principles of neuromodulation and disinhibition

📖 Based on the research article: “Food sensing controls C. elegans reproductive behavior by neuromodulatory disinhibition” Yen-Chih Chen, Kara E. Zang, Hassan Ahamed, Niels Ringstad Published in Science Advances (2025) 🔗 https://doi.org/10.1126/sciadv.adu5829

🎧 Subscribe to the WOrM Podcast for more full-organism insights at the interface of environment, brain, and behaviour.

This podcast is generated with artificial intelligence and curated by Veeren. If you’d like your publication featured on the show, please get in touch.

📩 More info: 🔗 ⁠⁠www.veerenchauhan.com⁠⁠ 📧 [email protected]

#305 RAG 2.0 and The New Era of RAG Agents with Douwe Kiela, CEO at Contextual AI, Adjunct Professor at Stanford University, Inventor of RAG

2025-06-09 · DataFramed Listen

podcast_episode

by Richie (DataCamp) , Douwe Kiela (Contextual AI)

AI/ML Data Governance GenAI Marketing RAG

Retrieval Augmented Generation (RAG) continues to be a foundational approach in AI despite claims of its demise. While some marketing narratives suggest RAG is being replaced by fine-tuning or long context windows, these technologies are actually complementary rather than competitive. But how do you build a truly effective RAG system that delivers accurate results in high-stakes environments? What separates a basic RAG implementation from an enterprise-grade solution that can handle complex queries across disparate data sources? And with the rise of AI agents, how will RAG evolve to support more dynamic reasoning capabilities? Douwe Kiela is the CEO and co-founder of Contextual AI, a company at the forefront of next-generation language model development. He also serves as an Adjunct Professor in Symbolic Systems at Stanford University, where he contributes to advancing the theoretical and practical understanding of AI systems. Before founding Contextual AI, Douwe was the Head of Research at Hugging Face, where he led groundbreaking efforts in natural language processing and machine learning. Prior to that, he was a Research Scientist and Research Lead at Meta’s FAIR (Fundamental AI Research) team, where he played a pivotal role in developing Retrieval-Augmented Generation (RAG)—a paradigm-shifting innovation in AI that combines retrieval systems with generative models for more grounded and contextually aware responses. In the episode, Richie and Douwe explore the misconceptions around the death of Retrieval Augmented Generation (RAG), the evolution to RAG 2.0, its applications in high-stakes industries, the importance of metadata and entitlements in data governance, the potential of agentic systems in enterprise settings, and much more. Links Mentioned in the Show: Contextual AIConnect with DouweCourse: Retrieval Augmented Generation (RAG) with LangChainRelated Episode: High Performance Generative AI Applications with Ram Sriharsha, CTO at PineconeRegister for RADAR AI - June 26 New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business

Build a Strong Career in Data - Lavanya Gupta

2025-05-09 · DataTalks.Club Listen

podcast_episode

by Lavanya Gupta (JPMorgan Chase)

AI/ML Data Science HTML LLM TensorFlow

In this podcast episode, we talked with Lavanya Gupta about Building a Strong Career in Data. About the Speaker: Lavanya is a Carnegie Mellon University (CMU) alumni of the Language Technologies Institute (LTI). She works as a Sr. AI/ML Applied Associate at JPMorgan Chase in their specialized Machine Learning Center of Excellence (MLCOE) vertical. Her latest research on long-context evaluation of LLMs was published in EMNLP 2024.

In addition to having a strong industrial research background of 5+ years, she is also an enthusiastic technical speaker. She has delivered talks at events such as Women in Data Science (WiDS) 2021, PyData, Illuminate AI 2021, TensorFlow User Group (TFUG), and MindHack! Summit. She also serves as a reviewer at top-tier NLP conferences (NeurIPS 2024, ICLR 2025, NAACL 2025). Additionally, through her collaborations with various prestigious organizations, like Anita BOrg and Women in Coding and Data Science (WiCDS), she is committed to mentoring aspiring machine learning enthusiasts.

In this episode, we talk about Lavanya Gupta’s journey from software engineer to AI researcher. She shares how hackathons sparked her passion for machine learning, her transition into NLP, and her current work benchmarking large language models in finance. Tune in for practical insights on building a strong data career and navigating the evolving AI landscape.

🕒 TIMECODES 00:00 Lavanya’s journey from software engineer to AI researcher 10:15 Benchmarking long context language models 12:36 Limitations of large context models in real domains 14:54 Handling large documents and publishing research in industry 19:45 Building a data science career: publications, motivation, and mentorship 25:01 Self-learning, hackathons, and networking 33:24 Community work and Kaggle projects 37:32 Mentorship and open-ended guidance 51:28 Building a strong data science portfolio 🔗 CONNECT WITH LAVANYALinkedIn - / lgupta18 🔗 CONNECT WITH DataTalksClub Join the community - https://datatalks.club/slack.html Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/... Check other upcoming events - https://lu.ma/dtc-events LinkedIn - / datatalks-club Twitter - / datatalksclub Website - https://datatalks.club/

#297 The Past and Future of Language Models with Andriy Burkov, Author of The Hundred-Page Machine Learning Book

2025-04-14 · DataFramed Listen

podcast_episode

by Andriy Burkov (TalentNeuron) , Richie (DataCamp)

AI/ML LLM RNNs

Misconceptions about AI's capabilities and the role of data are everywhere. Many believe AI is a singular, all-knowing entity, when in reality, it's a collection of algorithms producing intelligence-like outputs. Navigating and understanding the history and evolution of AI, from its origins to today's advanced language models is crucial. How do these developments, and misconceptions, impact your daily work? Are you leveraging the right tools for your needs, or are you caught up in the allure of cutting-edge technology without considering its practical application? Andriy Burkov is the author of three widely recognized books, The Hundred-Page Machine Learning Book, The Machine Learning Engineering Book, and recently The Hundred-Page Language Models book. His books have been translated into a dozen languages and are used as textbooks in many universities worldwide. His work has impacted millions of machine learning practitioners and researchers. He holds a Ph.D. in Artificial Intelligence and is a recognized expert in machine learning and natural language processing. As a machine learning expert and leader, Andriy has successfully led dozens of production-grade AI projects in different business domains at Fujitsu and Gartner. Andriy is currently Machine Learning Lead at TalentNeuron. In the episode, Richie and Andriy explore misconceptions about AI, the evolution of AI from the 1950s, the relevance of 20th-century AI research, the role of linear algebra in AI, the resurgence of recurrent neural networks, advancements in large language model architectures, the significance of reinforcement learning, the reality of AI agents, and much more. Links Mentioned in the Show: Andriy’s books: The Hundred-page Machine Learning Book, The Hundred-page Language Models BookTalentNeuronConnect with AndriySkill Track: AI FundamentalsRelated Episode: Unlocking Humanity in the Age of AI with Faisal Hoque, Founder and CEO of SHADOKARewatch sessions from RADAR: Skills Edition New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business

From 4GL to GenAI: How SQL Automation has Evolved - Audio Blog

2025-03-28 · Secrets of Data Analytics Leaders Listen

podcast_episode

GenAI SQL

This blog discusses how GenAI and natural language processing are transforming SQL query generation, allowing non-technical users to access and query data easily. Published at: https://www.eckerson.com/articles/from-4gl-to-genai-how-sql-automation-has-evolved

Linguistics and Fairness - Tamara Atanasoska

2025-01-17 · DataTalks.Club Listen

podcast_episode

by Tamara Atanasoska (:probably..)

AI/ML Computer Science GitHub HTML Scikit-learn

In this podcast episode, we talked with Tamara Atanasoska about building fair AI systems.

About the Speaker:Tamara works on ML explainability, interpretability and fairness as Open Source Software Engineer at probable. She is a maintainer of fairlearn, contributor to scikit-learn and skops. Tamara has both computer science/ software engineering and a computational linguistics(NLP) background.During the event, the guest discussed their career journey from software engineering to open-source contributions, focusing on explainability in AI through Scikit-learn and Fairlearn. They explored fairness in AI, including challenges in credit loans, hiring, and decision-making, and emphasized the importance of tools, human judgment, and collaboration. The guest also shared their involvement with PyLadies and encouraged contributions to Fairlearn. 00:00 Introduction to the event and the community 01:51 Topic introduction: Linguistic fairness and socio-technical perspectives in AI 02:37 Guest introduction: Tamara’s background and career 03:18 Tamara’s career journey: Software engineering, music tech, and computational linguistics 09:53 Tamara’s background in language and computer science 14:52 Exploring fairness in AI and its impact on society 21:20 Fairness in AI models26:21 Automating fairness analysis in models 32:32 Balancing technical and domain expertise in decision-making 37:13 The role of humans in the loop for fairness 40:02 Joining Probable and working on open-source projects 46:20 Scopes library and its integration with Hugging Face 50:48 PyLadies and community involvement 55:41 The ethos of Scikit-learn and Fairlearn

🔗 CONNECT WITH TAMARA ATANASOSKA Linkedin - https://www.linkedin.com/in/tamaraatanasoska GitHub- https://github.com/TamaraAtanasoska

🔗 CONNECT WITH DataTalksClub Join DataTalks.Club:⁠⁠https://datatalks.club/slack.html⁠⁠ Our events:⁠⁠https://datatalks.club/events.html⁠⁠ Datalike Substack -⁠⁠https://datalike.substack.com/⁠⁠ LinkedIn:⁠⁠ / datatalks-club

#275 Did Gen AI Kill NLP? with Meri Nova, Technical Founder at Break into Data

2025-01-16 · DataFramed Listen

podcast_episode

by Meri Nova (Break Into Data) , Richie (DataCamp)

AI/ML GenAI LLM RAG

As AI continues to advance, natural language processing (NLP) is at the forefront, transforming how businesses interact with data. From chatbots to document analysis, NLP offers numerous applications. But with the advent of generative AI, professionals face new challenges: When is it appropriate to use traditional NLP techniques versus more advanced models? How do you balance the costs and benefits of these technologies? Explore the strategic decisions and practical applications of NLP in the modern business world. Meri Nova is the founder of Break Into Data, a data careers company. Her work focuses on helping people switch to a career in data, and using machine learning to improve community engagement. Previously, she was a data scientist and machine learning engineer at Hyloc. Meri is the instructor of DataCamp's 'Retrieval Augmented Generation with LangChain' course. In the episode, Richie and Meri explore the evolution of natural language processing, the impact of generative AI on business applications, the balance between traditional NLP techniques and modern LLMs, the role of vector stores and knowledge graphs, and the exciting potential of AI in automating tasks and decision-making, and much more. Links Mentioned in the Show: Meri’s Breaking Into Data Handbook on GitHubBreak Into Data Discord GroupConnect with MeriSkill Track: Artificial Intelligence (AI) LeadershipRelated Episode: Industry Roundup #2: AI Agents for Data Work, The Return of the Full-Stack Data Scientist and Old languages Make a ComebackRewatch sessions from RADAR: Forward Edition New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business

Synthetic Data: The AI Gold Rush You Can't Afford to Miss

2025-01-15 · Data & AI with Mukundan | Learn AI by Building Listen

podcast_episode

by Mukundan Sankar

AI/ML TensorFlow

Episode Summary In this episode, we dive into the transformative power of synthetic data and its ability to bypass privacy barriers while accelerating AI innovation. Learn how industries like healthcare, finance, and retail leverage synthetic data to fuel progress and discover actionable steps to implement this game-changing technology. Key Topics Covered What Is Synthetic Data?Definition and importance.How it solves privacy and data scarcity challenges.Top 5 Breakthroughs in Synthetic Data:SafeSynthDP: Differential privacy for secure synthetic data generation.GANs for Healthcare: Generating synthetic patient records.CaPS: Collaborative synthetic data sharing across organizations.Private Text Data: Privacy-safe NLP dataset generation.Vertical Federated Learning: Secure synthetic data creation for tabular datasets.Applications Across Industries:Healthcare: HIPAA-compliant AI for diagnostics.Finance: Risk modeling with synthetic transaction data.Retail: Personalization using synthetic customer profiles.Action Plan:Learn and apply differential privacy techniques.Experiment with large language models for synthetic data.Use federated learning for collaborative data sharing.Build synthetic datasets for complex, messy data.Market privacy-first solutions to build customer trust.Resources Mentioned Research Papers:SafeSynthDP: Privacy-Preserving Data GenerationGANs for Healthcare DataCaPS: Collaborative Synthetic Data PlatformPrivate Predictions for NLPVertical Federated Learning for Tabular DataTools and Frameworks:TensorFlow Privacy LibraryPyTorch GAN ZooFlower Framework for Federated LearningTakeaways Synthetic data is not just a workaround—it’s a key enabler of privacy-compliant AI innovation.Industries across the board are adopting synthetic data to overcome regulatory and privacy challenges.You can start leveraging synthetic data today with available tools and frameworks.Ready to explore the power of synthetic data? Dive into the resources mentioned and start experimenting with synthetic data generation to give your AI strategy a competitive edge. Subscribe to our podcast for more cutting-edge insights into the world of AI and data innovation.

Website: https://mukundansankar.substack.com/

Large Hadron Collider and Mentorship – Anastasia Karavdina

2024-11-22 · DataTalks.Club Listen

podcast_episode

by Anastasia Karavdina (Large Hadron Collider; Blue Yonder; Kaufland e-commerce)

AI/ML Data Science HTML

We talked about:

00:00 DataTalks.Club intro

00:00 Large Hadron Collider and Mentorship

02:35 Career overview and transition from physics to data science

07:02 Working at the Large Hadron Collider

09:19 How particles collide and the role of detectors

11:03 Data analysis challenges in particle physics and data science similarities

13:32 Team structure at the Large Hadron Collider

20:05 Explaining the connection between particle physics and data science

23:21 Software engineering practices in particle physics

26:11 Challenges during interviews for data science roles

29:30 Mentoring and offering advice to job seekers

40:03 The STAR method and its value in interviews

50:32 Paid vs unpaid mentorship and finding the right fit

About the speaker:

Anastasia is a particle physicist turned data scientist, with experience in large-scale experiments like those at the Large Hadron Collider. She also worked at Blue Yonder, scaling AI-driven solutions for global supply chain giants, and at Kaufland e-commerce, focusing on NLP and search. Anastasia is a mentor for Ml/AI, dedicated to helping her mentees achieve their goals. She is passionate about growing the next generation of data science elite in Germany: from Data Analysts up to ML Engineers.

Join our Slack: https://datatalks .club/slack.html

5 Game-Changing Applications of Retrieval-Augmented Generation: Unlocking AI's Full Potential

2024-10-31 · Data & AI with Mukundan | Learn AI by Building Listen

podcast_episode

by Mukundan Sankar

AI/ML RAG

In this episode of The Deep Dive, we explore Retrieval-Augmented Generation, or RAG, and its revolutionary impact on AI. We break down five game-changing applications of RAG, each transforming how AI interacts with real-time data and complex information. Discover how RAG is enhancing everything from customer service to academic research, by tackling challenges like outdated information and static AI models. Key Highlights: Real-time Q&A Systems: How RAG ensures that AI provides the most up-to-date answers, making customer support smarter and more reliable.Dynamic Content Creation: No more stale reports—learn how RAG allows for content that updates in real-time.Multi-Source Summarization: Summarizing complex, often conflicting information from multiple sources for balanced insights.Intelligent Chatbots: Discover how RAG-driven chatbots bring up-to-the-minute responses, improving user experience in real-time.Specialized Knowledge Integration: From medical diagnoses to legal precedents, see how RAG is revolutionizing fields requiring precise, specialized knowledge.Tune in to see how RAG is shaping the future of AI, making it more adaptable, intelligent, and responsive to our world’s ever-changing landscape! Resources: Article: "5 Game-Changing Techniques to Boost Your NLP Projects with Retrieval Augmented Generation"Explore hands-on with RAG at Hugging FaceResearch and community forums for deeper learning and discussions on RAG

Human-Centered AI for Disordered Speech Recognition - Katarzyna Foremniak

2024-10-10 · DataTalks.Club Listen

podcast_episode

by Katarzyna Foremniak (University of Warsaw)

AI/ML Computer Science HTML

We talked about:

00:00 DataTalks.Club intro

08:06 Background and career journey of Katarzyna

09:06 Transition from linguistics to computational linguistics

11:38 Merging linguistics and computer science

15:25 Understanding phonetics and morpho-syntax

17:28 Exploring morpho-syntax and its relation to grammar

20:33 Connection between phonetics and speech disorders

24:41 Improvement of voice recognition systems

27:31 Overview of speech recognition technology

30:24 Challenges of ASR systems with atypical speech

30:53 Strategies for improving recognition of disordered speech

37:07 Data augmentation for training models

40:17 Transfer learning in speech recognition

42:18 Challenges of collecting data for various speech disorders

44:31 Stammering and its connection to fluency issues

45:16 Polish consonant combinations and pronunciation challenges

46:17 Use of Amazon Transcribe for generating podcast transcripts

47:28 Role of language models in speech recognition

49:19 Contextual understanding in speech recognition

51:27 How voice recognition systems analyze utterances

54:05 Personalization of ASR models for individuals

56:25 Language disorders and their impact on communication

58:00 Applications of speech recognition technology

1:00:34 Challenges of personalized and universal models

1:01:23 Voice recognition in automotive applications

1:03:27 Humorous voice recognition failures in cars

1:04:13 Closing remarks and reflections on the discussion

About the speaker:

Katarzyna is a computational linguist with over 10 years of experience in NLP and speech recognition. She has developed language models for automotive brands like Audi and Porsche and specializes in phonetics, morpho-syntax, and sentiment analysis.

Kasia also teaches at the University of Warsaw and is passionate about human-centered AI and multilingual NLP.

Join our slack: https://datatalks.club/slack.html

#245 Can We Make Generative AI Cheaper? With Natalia Vassilieva, Senior VP & Field CTO & Andy Hock, VP, Product & Strategy at Cerebras Systems

2024-09-19 · DataFramed Listen

podcast_episode

by Andy Hock (Cerebras Systems) , Richie (DataCamp) , Natalia Vassilieva (Cerebras Systems)

AI/ML Analytics Computer Science GenAI

With AI tools constantly evolving, the potential for innovation seems limitless. But with great potential comes significant costs, and the question of efficiency and scalability becomes crucial. How can you ensure that your AI models are not only pushing boundaries but also delivering results in a cost-effective way? What strategies can help reduce the financial burden of training and deploying models, while still driving meaningful business outcomes? Natalia Vassilieva is the VP & Field CTO of ML at Cerebras Systems. Natalia has a wealth of experience in research and development in natural language processing, computer vision, machine learning, and information retrieval. As Field CTO, she helps drive product adoption and customer engagement for Cerebras Systems' wafer-scale AI chips. Previously, Natalia was a Senior Research Manager at Hewlett Packard Labs, leading the Software and AI group. She also served as the head of HP Labs Russia leading research teams focused on developing algorithms and applications for text, image, and time-series analysis and modeling. Natalia has an academic background, having been a part-time Associate Professor at St. Petersburg State University and a lecturer at the Computer Science Center in St. Petersburg, Russia. She holds a PhD in Computer Science from St. Petersburg State University. Andy Hock is the Senior VP, Product & Strategy at Cerebras Systems. Andy runs the product strategy and roadmap for Cerebras Systems, focusing on integrating AI research, hardware, and software to accelerate the development and deployment of AI models. He has 15 years of experience in product management, technical program management, and enterprise business development; over 20 years of experience in research, algorithm development, and data analysis for image processing; and 9 years of experience in applied machine learning and AI. Previously he was Product Management lead for Data and Analytics for Terra Bella at Google, where he led the development of machine learning-powered data products from satellite imagery. Earlier, he was Senior Director for Advanced Technology Programs at Skybox Imaging (which became Terra Bella following its acquisition by Google in 2014), and before that was a Senior Program Manager and Senior Scientist at Arete Associates. He has a Ph.D. in Geophysics and Space Physics from the University of California, Los Angeles. In the episode, Richie, Natalia and Andy explore the dramatic recent progress in generative AI, cost and infrastructure challenges in AI, Cerebras’ custom AI chips and other hardware innovations, quantization in AI models, mixture of experts, RLHF, relevant AI use-cases, centralized vs decentralized AI compute, the future of AI and much more. Links Mentioned in the Show: CerebrasCerebras Launches the World’s Fastest AI InferenceConnect with Natalia and AndyCourse: Implementing AI Solutions in BusinessRewatch sessions from RADAR: AI Edition New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills witha...

#243 No-Code LLMs In Practice with Birago Jones & Karthik Dinakar, CEO & CTO at Pienso

2024-09-12 · DataFramed Listen

podcast_episode

by Karthik Dinakar (Pienso) , Richie (DataCamp) , Birago Jones (Pienso)

AI/ML LLM Microsoft

As AI becomes more accessible, a growing question is: should machine learning experts always be the ones training models, or is there a better way to leverage other subject matter experts in the business who know the use-case best? What if getting started building AI apps required no coding skills? As businesses look to implement AI at scale, what part can no-code AI apps play in getting projects off the ground, and how feasible are smaller, tailored solutions for department specific use-cases? Birago Jones is the CEO at Pienso. Pienso is an AI platform that empowers subject matter experts in various enterprises, such as business analysts, to create and fine-tune AI models without coding skills. Prior to Pienso, Birago was a Venture Partner at Indicator Ventures and a Research Assistant at MIT Media Lab where he also founded the Media Lab Alumni Association. Karthik Dinakar is a computer scientist specializing in machine learning, natural language processing, and human-computer interaction. He is the Chief Technology Officer and co-founder at Pienso. Prior to founding Pienso, Karthik held positions at Microsoft and Deutsche Bank. Karthik holds a doctoral degree from MIT in Machine Learning. In the episode, Richie, Birago and Karthik explore why no-code AI apps are becoming more prominent, uses-cases of no-code AI apps, the steps involved in creating an LLM, the benefits of small tailored models, how no-code can impact workflows, cost in AI projects, AI interfaces and the rise of the chat interface, privacy and customization, excitement about the future of AI, and much more. Links Mentioned in the Show: PiensoGoogle Gemini for BusinessConnect with Birago and KarthikAndreesen Horowitz Report: Navigating the High Cost of AI ComputeCourse: Artificial Intelligence (AI) StrategyRelated Episode: Designing AI Applications with Robb Wilson, Co-Founder & CEO at Onereach.aiRewatch sessions from RADAR: AI Edition New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business

#238 Data & AI for Improving Patient Outcomes with Terry Myerson, CEO at Truveta

2024-08-26 · DataFramed Listen

podcast_episode

by Richie (DataCamp) , Terry Myerson (Truveta)

AI/ML Data Quality Microsoft

One of the prerequisites for being able to do great data analyses is that the data is well structured and clean and high quality. For individual projects, this is often annoying to get right. On a corporate level, it’s often a huge blocker to productivity. And then there’s healthcare data. When you consider all the healthcare records across the USA, or any other country for that matter, there are so many data formats created by so many different organizations, it’s frankly a horrendous mess. This is a big problem because there’s a treasure trove of data that researchers and analysts can’t make use of to answer questions about which medical interventions work or not. Bad data is holding back progress on improving everyone’s health. Terry Myerson is the CEO and Co-Founder of Truveta. Truveta enables scientifically rigorous research on more than 18% of the clinical care in the U.S. from a growing collective of more than 30 health systems. Previously, Terry enjoyed a 21-year career at Microsoft. As Executive Vice President, he led the development of Windows, Surface, Xbox, and the early days of Office 365, while serving on the Senior Leadership Team of the company. Prior to Microsoft, he co-founded Intersé, one of the earliest Internet companies, which Microsoft acquired in 1997. In the episode, Richie and Terry explore the current state of health records, challenges when working with health records, data challenges including privacy and accessibility, data silos and fragmentation, AI and NLP for fragmented data, regulatory grade AI, ongoing data integration efforts in healthcare, the future of healthcare and much more. Links Mentioned in the Show: TruvetaConnect with TerryHIPAACourse - Introduction to Data PrivacyRelated Episode: Using AI to Improve Data Quality in HealthcareRewatch sessions from RADAR: AI Edition New to DataCamp? Learn on the go using the DataCamp mobile app Empower your business with world-class data and AI skills with DataCamp for business

#216 Perplexity & the Future of AI with Denis Yarats, Co-Founder and CTO at Perplexity AI

2024-06-17 · DataFramed Listen

podcast_episode

by Denis Yarats (Perplexity AI) , Adel (DataFramed)

AI/ML GenAI Vector DB

Arguably one of the verticals that is both at the same time most ripe for disruption by AI and the hardest to disrupt is search. We've seen many attempts at reimagining search using AI, and many are trying to usurp Google from its throne as the top search engine on the planet, but I think no one is laying the case better for AI assisted search than perplexity. AI. Perplexity doesn't need an introduction. It is an AI powered search engine that lets you get the information you need as fast as possible. Denis Yarats is the Co-Founder and Chief Technology Officer of Perplexity AI. He previously worked at Facebook as an AI Research Scientist. Denis Yarats attended New York University. His previous research interests broadly involved Reinforcement Learning, Deep Learning, NLP, robotics and investigating ways of semi-supervising Hierarchical Reinforcement Learning using natural language. In the episode, Adel and Denis explore Denis’ role at Perplexity.ai, key differentiators of Perplexity.ai when compared to other chatbot-powered tools, culture at perplexity, competition in the AI space, building genAI products, the future of AI and search, open-source vs closed-source AI and much more. Links Mentioned in the Show: Perplexity.aiNeurIPS Conference[Course] Artificial Intelligence (AI) StrategyRelated Episode: The Power of Vector Databases and Semantic Search with Elan Dekel, VP of Product at PineconeSign up to RADAR: AI Edition New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business

[AI and the Modern Data Stack] #183 Adding AI to the Data Warehouse with Sridhar Ramaswamy, CEO at Snowflake

2024-02-21 · DataFramed Listen

podcast_episode

by Richie (DataCamp) , Sridhar Ramaswamy (Snowflake)

AI/ML Analytics Cloud Computing Data Management Data Quality Databricks DWH GenAI Marketing Modern Data Stack Snowflake Thoughtspot

Snowflake has been foundational in the data space for years. In the mid-2010s, the platform was a major driver of moving data to the cloud. More recently, it's become apparent that combining data and AI in the cloud is key to accelerating innovation. Snowflake has been rapidly adding AI features to provide value to the modern data stack, but what’s really been going on under the hood? At the time of recording, Sridhar Ramaswamy was the SVP of AI at Snowflake, being appointed CEO at Snowflake in February 2024. Sridhar was formerly Co-Founder of Neeva, acquired in 2023 by Snowflake. Before founding Neeva, Ramaswamy oversaw Google's advertising products, including search, display, video advertising, analytics, shopping, payments, and travel. He joined Google in 2003 and was part of the growth of AdWords and Google's overall advertising business. He spent more than 15 years at Google, where he started as a software engineer and rose to SVP of Ads & Commerce. In the episode, Richie and Sridhar explore Snowflake and its uses, how generative AI is changing the attitudes of leaders towards data, how NLP and AI have impacted enterprise business operations as well as new applications of AI in an enterprise environment, the challenges of enterprise search, the importance of data quality, management and the role of semantic layers in the effective use of AI, a look into Snowflakes products including Snowpilot and Cortex, the collaboration required for successful data and AI projects, advice for organizations looking to improve their data management and much more. About the AI and the Modern Data Stack DataFramed Series This week we’re releasing 4 episodes focused on how AI is changing the modern data stack and the analytics profession at large. The modern data stack is often an ambiguous and all-encompassing term, so we intentionally wanted to cover the impact of AI on the modern data stack from different angles. Here’s what you can expect: Why the Future of AI in Data will be Weird with Benn Stancil, CTO at Mode & Field CTO at ThoughtSpot — Covering how AI will change analytics workflows and tools How Databricks is Transforming Data Warehousing and AI with Ari Kaplan, Head Evangelist & Robin Sutara, Field CTO at Databricks — Covering Databricks, data intelligence and how AI tools are changing data democratizationAdding AI to the Data Warehouse with Sridhar Ramaswamy, CEO at Snowflake — Covering Snowflake and its uses, how generative AI is changing the attitudes of leaders towards data, and how to improve your data managementAccelerating AI Workflows with Nuri Cankaya, VP of AI Marketing & La Tiffaney Santucci, AI Marketing Director at Intel — Covering AI’s impact on marketing analytics, how AI is being integrated into existing products, and the democratization of AI Links Mentioned in the Show: SnowflakeSnowflake acquires Neeva to accelerate search in the Data Cloud through generative AIUse AI in Seconds with Snowflake Cortex[Course] Introduction to SnowflakeRelated Episode: Why AI will Change Everything—with Former Snowflake CEO, Bob MugliaSign up to a...

Navigating Challenges and Innovations in Search Technologies - Atita Arora

2023-12-27 · DataTalks.Club Listen

podcast_episode

by Atita Arora

AI/ML GitHub HTML LLM Python RAG Vector DB

We talked about:

Atita’s background How NLP relates to search Atita’s experience with Lucidworks and OpenSource Connections Atita’s experience with Qdrant and vector databases Utilizing vector search Major changes to search Atita has noticed throughout her career RAG (Retrieval-Augmented Generation) Building a chatbot out of transcripts with LLMs Ingesting the data and evaluating the results Keeping humans in the loop Application of vector databases for machine learning Collaborative filtering Atita’s resource recommendations

Links:

LinkedIn: https://www.linkedin.com/in/atitaarora/
Twitter: https://x.com/atitaarora Github: https://github.com/atarora Human-in-the-Loop Machine Learning: https://www.manning.com/books/human-in-the-loop-machine-learning Relevant Search: https://www.manning.com/books/relevant-search Let's learn about Vectors: https://hub.superlinked.com/ Langchain: https://python.langchain.com/docs/get_started/introduction Qdrant blog: https://blog.qdrant.tech/ OpenSource Connections Blog: https://opensourceconnections.com/blog/

Free ML Engineering course: http://mlzoomcamp.com Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

talk-data.com

Activity Trend

Top Events

Top Speakers

Lessons from Applied AI: Tesla, Waymo, and Beyond - Aishwarya Jadhav

Building reliable AI products in the era of Gen AI and Agents - Ranjitha Kulkarni

Berlin Buzzwords 2025 Conference Interviews

EPISODE 26: Eat, Inhibit, Lay – How Worms Tune Reproduction to Food

#305 RAG 2.0 and The New Era of RAG Agents with Douwe Kiela, CEO at Contextual AI, Adjunct Professor at Stanford University, Inventor of RAG

Build a Strong Career in Data - Lavanya Gupta

#297 The Past and Future of Language Models with Andriy Burkov, Author of The Hundred-Page Machine Learning Book

From 4GL to GenAI: How SQL Automation has Evolved - Audio Blog

Linguistics and Fairness - Tamara Atanasoska

#275 Did Gen AI Kill NLP? with Meri Nova, Technical Founder at Break into Data

Synthetic Data: The AI Gold Rush You Can't Afford to Miss

Large Hadron Collider and Mentorship – Anastasia Karavdina

5 Game-Changing Applications of Retrieval-Augmented Generation: Unlocking AI's Full Potential

Human-Centered AI for Disordered Speech Recognition - Katarzyna Foremniak

#245 Can We Make Generative AI Cheaper? With Natalia Vassilieva, Senior VP & Field CTO & Andy Hock, VP, Product & Strategy at Cerebras Systems

#243 No-Code LLMs In Practice with Birago Jones & Karthik Dinakar, CEO & CTO at Pienso

#238 Data & AI for Improving Patient Outcomes with Terry Myerson, CEO at Truveta

#216 Perplexity & the Future of AI with Denis Yarats, Co-Founder and CTO at Perplexity AI

[AI and the Modern Data Stack] #183 Adding AI to the Data Warehouse with Sridhar Ramaswamy, CEO at Snowflake

Navigating Challenges and Innovations in Search Technologies - Atita Arora