talk-data.com talk-data.com

Topic

NLP

Natural Language Processing (NLP)

ai machine_learning text_analysis

252

tagged

Activity Trend

24 peak/qtr
2020-Q1 2026-Q1

Activities

252 activities · Newest first

In this podcast episode, we talked with Lavanya Gupta about Building a Strong Career in Data. About the Speaker: Lavanya is a Carnegie Mellon University (CMU) alumni of the Language Technologies Institute (LTI). She works as a Sr. AI/ML Applied Associate at JPMorgan Chase in their specialized Machine Learning Center of Excellence (MLCOE) vertical. Her latest research on long-context evaluation of LLMs was published in EMNLP 2024.

In addition to having a strong industrial research background of 5+ years, she is also an enthusiastic technical speaker. She has delivered talks at events such as Women in Data Science (WiDS) 2021, PyData, Illuminate AI 2021, TensorFlow User Group (TFUG), and MindHack! Summit. She also serves as a reviewer at top-tier NLP conferences (NeurIPS 2024, ICLR 2025, NAACL 2025). Additionally, through her collaborations with various prestigious organizations, like Anita BOrg and Women in Coding and Data Science (WiCDS), she is committed to mentoring aspiring machine learning enthusiasts.

In this episode, we talk about Lavanya Gupta’s journey from software engineer to AI researcher. She shares how hackathons sparked her passion for machine learning, her transition into NLP, and her current work benchmarking large language models in finance. Tune in for practical insights on building a strong data career and navigating the evolving AI landscape.

🕒 TIMECODES 00:00 Lavanya’s journey from software engineer to AI researcher 10:15 Benchmarking long context language models 12:36 Limitations of large context models in real domains 14:54 Handling large documents and publishing research in industry 19:45 Building a data science career: publications, motivation, and mentorship 25:01 Self-learning, hackathons, and networking 33:24 Community work and Kaggle projects 37:32 Mentorship and open-ended guidance 51:28 Building a strong data science portfolio 🔗 CONNECT WITH LAVANYALinkedIn -   / lgupta18  🔗 CONNECT WITH DataTalksClub Join the community - https://datatalks.club/slack.html Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/... Check other upcoming events - https://lu.ma/dtc-events LinkedIn -   / datatalks-club   Twitter -   / datatalksclub   Website - https://datatalks.club/

Misconceptions about AI's capabilities and the role of data are everywhere. Many believe AI is a singular, all-knowing entity, when in reality, it's a collection of algorithms producing intelligence-like outputs. Navigating and understanding the history and evolution of AI, from its origins to today's advanced language models is crucial. How do these developments, and misconceptions, impact your daily work? Are you leveraging the right tools for your needs, or are you caught up in the allure of cutting-edge technology without considering its practical application? Andriy Burkov is the author of three widely recognized books, The Hundred-Page Machine Learning Book, The Machine Learning Engineering Book, and recently The Hundred-Page Language Models book. His books have been translated into a dozen languages and are used as textbooks in many universities worldwide. His work has impacted millions of machine learning practitioners and researchers. He holds a Ph.D. in Artificial Intelligence and is a recognized expert in machine learning and natural language processing. As a machine learning expert and leader, Andriy has successfully led dozens of production-grade AI projects in different business domains at Fujitsu and Gartner. Andriy is currently Machine Learning Lead at TalentNeuron. In the episode, Richie and Andriy explore misconceptions about AI, the evolution of AI from the 1950s, the relevance of 20th-century AI research, the role of linear algebra in AI, the resurgence of recurrent neural networks, advancements in large language model architectures, the significance of reinforcement learning, the reality of AI agents, and much more. Links Mentioned in the Show: Andriy’s books: The Hundred-page Machine Learning Book, The Hundred-page Language Models BookTalentNeuronConnect with AndriySkill Track: AI FundamentalsRelated Episode: Unlocking Humanity in the Age of AI with Faisal Hoque, Founder and CEO of SHADOKARewatch sessions from RADAR: Skills Edition New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business

In this hands-on lab, you’ll build a practical Gmail add-on powered by Gemini and Vertex AI that performs sentiment analysis on your emails. Learn how to integrate powerful Natural Language Processing capabilities to automatically classify emails as positive, neutral, or negative. This tool will help you prioritize customer service responses, flag potentially sensitive messages, and streamline your email workflow for improved productivity.

If you register for a Learning Center lab, please ensure that you sign up for a Google Cloud Skills Boost account for both your work domain and personal email address. You will need to authenticate your account as well (be sure to check your spam folder!). This will ensure you can arrive and access your labs quickly onsite. You can follow this link to sign up!

Nous vous présenterons les enjeux de l’IA dans l’exercice de notre mission pour augmenter la connaissance sur les contenus radio et TV et améliorer leur découvrabilité. Nous l’illustrerons par les différents chantiers en cours : chapitrage automatique des podcasts, amélioration de la qualité de la transcription, recherche sémantique, etc.

Machine Learning Algorithms in Depth

Learn how machine learning algorithms work from the ground up so you can effectively troubleshoot your models and improve their performance. Fully understanding how machine learning algorithms function is essential for any serious ML engineer. In Machine Learning Algorithms in Depth you’ll explore practical implementations of dozens of ML algorithms including: Monte Carlo Stock Price Simulation Image Denoising using Mean-Field Variational Inference EM algorithm for Hidden Markov Models Imbalanced Learning, Active Learning and Ensemble Learning Bayesian Optimization for Hyperparameter Tuning Dirichlet Process K-Means for Clustering Applications Stock Clusters based on Inverse Covariance Estimation Energy Minimization using Simulated Annealing Image Search based on ResNet Convolutional Neural Network Anomaly Detection in Time-Series using Variational Autoencoders Machine Learning Algorithms in Depth dives into the design and underlying principles of some of the most exciting machine learning (ML) algorithms in the world today. With a particular emphasis on probabilistic algorithms, you’ll learn the fundamentals of Bayesian inference and deep learning. You’ll also explore the core data structures and algorithmic paradigms for machine learning. Each algorithm is fully explored with both math and practical implementations so you can see how they work and how they’re put into action. About the Technology Learn how machine learning algorithms work from the ground up so you can effectively troubleshoot your models and improve their performance. This book guides you from the core mathematical foundations of the most important ML algorithms to their Python implementations, with a particular focus on probability-based methods. About the Book Machine Learning Algorithms in Depth dissects and explains dozens of algorithms across a variety of applications, including finance, computer vision, and NLP. Each algorithm is mathematically derived, followed by its hands-on Python implementation along with insightful code annotations and informative graphics. You’ll especially appreciate author Vadim Smolyakov’s clear interpretations of Bayesian algorithms for Monte Carlo and Markov models. What's Inside Monte Carlo stock price simulation EM algorithm for hidden Markov models Imbalanced learning, active learning, and ensemble learning Bayesian optimization for hyperparameter tuning Anomaly detection in time-series About the Reader For machine learning practitioners familiar with linear algebra, probability, and basic calculus. About the Author Vadim Smolyakov is a data scientist in the Enterprise & Security DI R&D team at Microsoft. Quotes I love this book! It shows you how to implement common ML algorithms in plain Python with only the essential libraries, so you can see how the computation and math works in practice. - Junpeng Lao, Senior Data Scientist at Google I highly recommend this book. In the era of ChatGPT real knowledge of algorithms is invaluable. - Vatsal Desai, InfoDesk Explains algorithms so well that even a novice can digest it. - Harsh Raval, Zymr

Deep Learning and AI Superhero

"Deep Learning and AI Superhero" is an extensive resource for mastering the core concepts and advanced techniques in AI and deep learning using TensorFlow, Keras, and PyTorch. This comprehensive guide walks you through topics from foundational neural network concepts to implementing real-world machine learning solutions. You will gain hands-on experience and theoretical knowledge to elevate your AI development skills. What this Book will help me do Develop a solid foundation in neural networks, their structure, and their training methodologies. Understand and implement deep learning models using TensorFlow and Keras effectively. Gain experience using PyTorch for creating, training, and optimizing advanced machine learning models. Learn advanced applications such as CNNs for computer vision, RNNs for sequential data, and Transformers for natural language processing. Deploy AI models on cloud and edge platforms through practical examples and optimized workflows. Author(s) Cuantum Technologies LLC has established itself as a pioneer in creating educational resources for advanced AI technologies. Their team consists of experts and practitioners in the field, combining years of industry and academic experience. Their books are crafted to ensure readers can practically apply cutting-edge AI techniques with clarity and confidence. Who is it for? This book is ideally suited for software developers, AI enthusiasts, and data scientists who have a basic understanding of programming and machine learning concepts. It's perfect for those seeking to enhance their skills and tackle real-world AI challenges. Whether your goals are professional development, research, or personal learning, you'll find practical and detailed guidance throughout this book.

In this podcast episode, we talked with Tamara Atanasoska about ​building fair AI systems.

About the Speaker:​Tamara works on ML explainability, interpretability and fairness as Open Source Software Engineer at probable. She is a maintainer of fairlearn, contributor to scikit-learn and skops. Tamara has both computer science/ software engineering and a computational linguistics(NLP) background.During the event, the guest discussed their career journey from software engineering to open-source contributions, focusing on explainability in AI through Scikit-learn and Fairlearn. They explored fairness in AI, including challenges in credit loans, hiring, and decision-making, and emphasized the importance of tools, human judgment, and collaboration. The guest also shared their involvement with PyLadies and encouraged contributions to Fairlearn. 00:00 Introduction to the event and the community 01:51 Topic introduction: Linguistic fairness and socio-technical perspectives in AI 02:37 Guest introduction: Tamara’s background and career 03:18 Tamara’s career journey: Software engineering, music tech, and computational linguistics 09:53 Tamara’s background in language and computer science 14:52 Exploring fairness in AI and its impact on society 21:20 Fairness in AI models26:21 Automating fairness analysis in models 32:32 Balancing technical and domain expertise in decision-making 37:13 The role of humans in the loop for fairness 40:02 Joining Probable and working on open-source projects 46:20 Scopes library and its integration with Hugging Face 50:48 PyLadies and community involvement 55:41 The ethos of Scikit-learn and Fairlearn

🔗 CONNECT WITH TAMARA ATANASOSKA Linkedin - https://www.linkedin.com/in/tamaraatanasoska GitHub- https://github.com/TamaraAtanasoska

🔗 CONNECT WITH DataTalksClub Join DataTalks.Club:⁠⁠https://datatalks.club/slack.html⁠⁠ Our events:⁠⁠https://datatalks.club/events.html⁠⁠ Datalike Substack -⁠⁠https://datalike.substack.com/⁠⁠ LinkedIn:⁠⁠  / datatalks-club  

As AI continues to advance, natural language processing (NLP) is at the forefront, transforming how businesses interact with data. From chatbots to document analysis, NLP offers numerous applications. But with the advent of generative AI, professionals face new challenges: When is it appropriate to use traditional NLP techniques versus more advanced models? How do you balance the costs and benefits of these technologies? Explore the strategic decisions and practical applications of NLP in the modern business world. Meri Nova is the founder of Break Into Data, a data careers company. Her work focuses on helping people switch to a career in data, and using machine learning to improve community engagement. Previously, she was a data scientist and machine learning engineer at Hyloc. Meri is the instructor of DataCamp's 'Retrieval Augmented Generation with LangChain' course. In the episode, Richie and Meri explore the evolution of natural language processing, the impact of generative AI on business applications, the balance between traditional NLP techniques and modern LLMs, the role of vector stores and knowledge graphs, and the exciting potential of AI in automating tasks and decision-making, and much more. Links Mentioned in the Show: Meri’s Breaking Into Data Handbook on GitHubBreak Into Data Discord GroupConnect with MeriSkill Track: Artificial Intelligence (AI) LeadershipRelated Episode: Industry Roundup #2: AI Agents for Data Work, The Return of the Full-Stack Data Scientist and Old languages Make a ComebackRewatch sessions from RADAR: Forward Edition New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business

Episode Summary In this episode, we dive into the transformative power of synthetic data and its ability to bypass privacy barriers while accelerating AI innovation. Learn how industries like healthcare, finance, and retail leverage synthetic data to fuel progress and discover actionable steps to implement this game-changing technology. Key Topics Covered What Is Synthetic Data?Definition and importance.How it solves privacy and data scarcity challenges.Top 5 Breakthroughs in Synthetic Data:SafeSynthDP: Differential privacy for secure synthetic data generation.GANs for Healthcare: Generating synthetic patient records.CaPS: Collaborative synthetic data sharing across organizations.Private Text Data: Privacy-safe NLP dataset generation.Vertical Federated Learning: Secure synthetic data creation for tabular datasets.Applications Across Industries:Healthcare: HIPAA-compliant AI for diagnostics.Finance: Risk modeling with synthetic transaction data.Retail: Personalization using synthetic customer profiles.Action Plan:Learn and apply differential privacy techniques.Experiment with large language models for synthetic data.Use federated learning for collaborative data sharing.Build synthetic datasets for complex, messy data.Market privacy-first solutions to build customer trust.Resources Mentioned Research Papers:SafeSynthDP: Privacy-Preserving Data GenerationGANs for Healthcare DataCaPS: Collaborative Synthetic Data PlatformPrivate Predictions for NLPVertical Federated Learning for Tabular DataTools and Frameworks:TensorFlow Privacy LibraryPyTorch GAN ZooFlower Framework for Federated LearningTakeaways Synthetic data is not just a workaround—it’s a key enabler of privacy-compliant AI innovation.Industries across the board are adopting synthetic data to overcome regulatory and privacy challenges.You can start leveraging synthetic data today with available tools and frameworks.Ready to explore the power of synthetic data? Dive into the resources mentioned and start experimenting with synthetic data generation to give your AI strategy a competitive edge. Subscribe to our podcast for more cutting-edge insights into the world of AI and data innovation.

Website: https://mukundansankar.substack.com/

Anna Semjen: From Quick Wins to Revolutionising Productivity & CX with GenAI

🌟 Session Overview 🌟

Session Name: From Quick Wins to Revolutionising Productivity & CX with GenAI: Utilising Real-time and Open Source AI with Semantic Search Speaker: Anna Semjen Session Description: Join this session to discover how DataStax Astra DB can boost productivity, enable rapid deployment of GenAI applications, and transform customer experience. We’ll showcase an advanced semantic search use case, demonstrating how to vectorize entire videos with specific timestamps and use natural language processing to find precise moments from events like the Olympics. Learn about an open-source model that runs locally, making this powerful tool accessible and cost-effective. Additionally, explore hybrid search capabilities that integrate multiple videos into a single collection, streamlining processes by loading only embeddings and metadata. Perfect for enhancing content management and delivering exceptional user experiences.

🚀 About Big Data and RPA 2024 🚀

Unlock the future of innovation and automation at Big Data & RPA Conference Europe 2024! 🌟 This unique event brings together the brightest minds in big data, machine learning, AI, and robotic process automation to explore cutting-edge solutions and trends shaping the tech landscape. Perfect for data engineers, analysts, RPA developers, and business leaders, the conference offers dual insights into the power of data-driven strategies and intelligent automation. 🚀 Gain practical knowledge on topics like hyperautomation, AI integration, advanced analytics, and workflow optimization while networking with global experts. Don’t miss this exclusive opportunity to expand your expertise and revolutionize your processes—all from the comfort of your home! 📊🤖✨

📅 Yearly Conferences: Curious about the evolution of QA? Check out our archive of past Big Data & RPA sessions. Watch the strategies and technologies evolve in our videos! 🚀 🔗 Find Other Years' Videos: 2023 Big Data Conference Europe https://www.youtube.com/playlist?list=PLqYhGsQ9iSEpb_oyAsg67PhpbrkCC59_g 2022 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEryAOjmvdiaXTfjCg5j3HhT 2021 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEqHwbQoWEXEJALFLKVDRXiP

💡 Stay Connected & Updated 💡

Don’t miss out on any updates or upcoming event information from Big Data & RPA Conference Europe. Follow us on our social media channels and visit our website to stay in the loop!

🌐 Website: https://bigdataconference.eu/, https://rpaconference.eu/ 👤 Facebook: https://www.facebook.com/bigdataconf, https://www.facebook.com/rpaeurope/ 🐦 Twitter: @BigDataConfEU, @europe_rpa 🔗 LinkedIn: https://www.linkedin.com/company/73234449/admin/dashboard/, https://www.linkedin.com/company/75464753/admin/dashboard/ 🎥 YouTube: http://www.youtube.com/@DATAMINERLT

Anna Semjen: From Quick Wins to Revolutionising Productivity & CX with GenAI

🌟 Session Overview 🌟

Session Name: From Quick Wins to Revolutionising Productivity & CX with GenAI: Utilising Real-time and Open Source AI with Semantic Search Speaker: Anna Semjen Session Description: Join this session to discover how DataStax Astra DB can boost productivity, enable rapid deployment of GenAI applications, and transform customer experience. We’ll showcase an advanced semantic search use case, demonstrating how to vectorize entire videos with specific timestamps and use natural language processing to find precise moments from events like the Olympics. Learn about an open-source model that runs locally, making this powerful tool accessible and cost-effective. Additionally, explore hybrid search capabilities that integrate multiple videos into a single collection, streamlining processes by loading only embeddings and metadata. Perfect for enhancing content management and delivering exceptional user experiences.

🚀 About Big Data and RPA 2024 🚀

Unlock the future of innovation and automation at Big Data & RPA Conference Europe 2024! 🌟 This unique event brings together the brightest minds in big data, machine learning, AI, and robotic process automation to explore cutting-edge solutions and trends shaping the tech landscape. Perfect for data engineers, analysts, RPA developers, and business leaders, the conference offers dual insights into the power of data-driven strategies and intelligent automation. 🚀 Gain practical knowledge on topics like hyperautomation, AI integration, advanced analytics, and workflow optimization while networking with global experts. Don’t miss this exclusive opportunity to expand your expertise and revolutionize your processes—all from the comfort of your home! 📊🤖✨

📅 Yearly Conferences: Curious about the evolution of QA? Check out our archive of past Big Data & RPA sessions. Watch the strategies and technologies evolve in our videos! 🚀 🔗 Find Other Years' Videos: 2023 Big Data Conference Europe https://www.youtube.com/playlist?list=PLqYhGsQ9iSEpb_oyAsg67PhpbrkCC59_g 2022 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEryAOjmvdiaXTfjCg5j3HhT 2021 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEqHwbQoWEXEJALFLKVDRXiP

💡 Stay Connected & Updated 💡

Don’t miss out on any updates or upcoming event information from Big Data & RPA Conference Europe. Follow us on our social media channels and visit our website to stay in the loop!

🌐 Website: https://bigdataconference.eu/, https://rpaconference.eu/ 👤 Facebook: https://www.facebook.com/bigdataconf, https://www.facebook.com/rpaeurope/ 🐦 Twitter: @BigDataConfEU, @europe_rpa 🔗 LinkedIn: https://www.linkedin.com/company/73234449/admin/dashboard/, https://www.linkedin.com/company/75464753/admin/dashboard/ 🎥 YouTube: http://www.youtube.com/@DATAMINERLT

podcast_episode
by Anastasia Karavdina (Large Hadron Collider; Blue Yonder; Kaufland e-commerce)

We talked about:

00:00 DataTalks.Club intro

00:00 Large Hadron Collider and Mentorship

02:35 Career overview and transition from physics to data science

07:02 Working at the Large Hadron Collider

09:19 How particles collide and the role of detectors

11:03 Data analysis challenges in particle physics and data science similarities

13:32 Team structure at the Large Hadron Collider

20:05 Explaining the connection between particle physics and data science

23:21 Software engineering practices in particle physics

26:11 Challenges during interviews for data science roles

29:30 Mentoring and offering advice to job seekers

40:03 The STAR method and its value in interviews

50:32 Paid vs unpaid mentorship and finding the right fit

​About the speaker:

​Anastasia is a particle physicist turned data scientist, with experience in large-scale experiments like those at the Large Hadron Collider. She also worked at Blue Yonder, scaling AI-driven solutions for global supply chain giants, and at Kaufland e-commerce, focusing on NLP and search. Anastasia is a mentor for Ml/AI, dedicated to helping her mentees achieve their goals. She is passionate about growing the next generation of data science elite in Germany: from Data Analysts up to ML Engineers.

Join our Slack: https://datatalks .club/slack.html

This talk will provide a practical introduction to Gemma, Google's versatile family of open-source language models. We'll explore the various Gemma models available, discuss their strengths and ideal use cases, and guide you through the process of using them effectively. Whether you're interested in text generation, question answering, or other language-based tasks, this talk will equip you with the knowledge to harness the power of Gemma for your own projects.

In this episode of The Deep Dive, we explore Retrieval-Augmented Generation, or RAG, and its revolutionary impact on AI. We break down five game-changing applications of RAG, each transforming how AI interacts with real-time data and complex information. Discover how RAG is enhancing everything from customer service to academic research, by tackling challenges like outdated information and static AI models. Key Highlights: Real-time Q&A Systems: How RAG ensures that AI provides the most up-to-date answers, making customer support smarter and more reliable.Dynamic Content Creation: No more stale reports—learn how RAG allows for content that updates in real-time.Multi-Source Summarization: Summarizing complex, often conflicting information from multiple sources for balanced insights.Intelligent Chatbots: Discover how RAG-driven chatbots bring up-to-the-minute responses, improving user experience in real-time.Specialized Knowledge Integration: From medical diagnoses to legal precedents, see how RAG is revolutionizing fields requiring precise, specialized knowledge.Tune in to see how RAG is shaping the future of AI, making it more adaptable, intelligent, and responsive to our world’s ever-changing landscape! Resources: Article: "5 Game-Changing Techniques to Boost Your NLP Projects with Retrieval Augmented Generation"Explore hands-on with RAG at Hugging FaceResearch and community forums for deeper learning and discussions on RAG

LLMs have unlocked new opportunities in NLP with their possible applications. Features that used to take months to be planned and developed now require a day to be prototyped. But how can we make sure that a successful prototype will turn into a high-quality feature useful for millions of customers? In this talk, we will explore real examples of the challenges that arise when ensuring the quality of LLM outputs and how we address them at Grammarly.