NLP

Berlin Buzzwords 2025 Conference Interviews

2025-09-12 · DataTalks.Club Listen

podcast_episode

by Kacper Łukawski (Qdrant) , Filip Makraduli (Superlinked) , Brian Goldin (Voyager Search) , André Charton (Kleinanzeigen) , Manish Gill (ClickHouse) , Atita Arora

AI/ML ClickHouse Cloud Computing ELK Kubernetes LLM RAG

At Berlin Buzzwords, industry voices highlighted how search is evolving with AI and LLMs.

Kacper Łukawski (Qdrant) stressed hybrid search (semantic + keyword) as core for RAG systems and promoted efficient embedding models for smaller-scale use.
Manish Gill (ClickHouse) discussed auto-scaling OLAP databases on Kubernetes, combining infrastructure and database knowledge.
André Charton (Kleinanzeigen) reflected on scaling search for millions of classifieds, moving from Solr/Elasticsearch toward vector search, while returning to a hands-on technical role.
Filip Makraduli (Superlinked) introduced a vector-first framework that fuses multiple encoders into one representation for nuanced e-commerce and recommendation search.
Brian Goldin (Voyager Search) emphasized spatial context in retrieval, combining geospatial data with AI enrichment to add the “where” to search.
Atita Arora (Voyager Search) highlighted geospatial AI models, the renewed importance of retrieval in RAG, and the cautious but promising rise of AI agents.

Together, their perspectives show a common thread: search is regaining center stage in AI—scaling, hybridization, multimodality, and domain-specific enrichment are shaping the next generation of retrieval systems.

Kacper Łukawski Senior Developer Advocate at Qdrant, he educates users on vector and hybrid search. He highlighted Qdrant’s support for dense and sparse vectors, the role of search with LLMs, and his interest in cost-effective models like static embeddings for smaller companies and edge apps. Connect: https://www.linkedin.com/in/kacperlukawski/

Manish Gill
Engineering Manager at ClickHouse, he spoke about running ClickHouse on Kubernetes, tackling auto-scaling and stateful sets. His team focuses on making ClickHouse scale automatically in the cloud. He credited its speed to careful engineering and reflected on the shift from IC to manager.
Connect: https://www.linkedin.com/in/manishgill/

André Charton
Head of Search at Kleinanzeigen, he discussed shaping the company’s search tech—moving from Solr to Elasticsearch and now vector search with Vespa. Kleinanzeigen handles 60M items, 1M new listings daily, and 50k requests/sec. André explained his career shift back to hands-on engineering.
Connect: https://www.linkedin.com/in/andrecharton/

Filip Makraduli
Founding ML DevRel engineer at Superlinked, an open-source framework for AI search and recommendations. Its vector-first approach fuses multiple encoders (text, images, structured fields) into composite vectors for single-shot retrieval. His Berlin Buzzwords demo showed e-commerce search with natural-language queries and filters.
Connect: https://www.linkedin.com/in/filipmakraduli/

Brian Goldin
Founder and CEO of Voyager Search, which began with geospatial search and expanded into documents and metadata enrichment. Voyager indexes spatial data and enriches pipelines with NLP, OCR, and AI models to detect entities like oil spills or windmills. He stressed adding spatial context (“the where”) as critical for search and highlighted Voyager’s 12 years of enterprise experience.
Connect: https://www.linkedin.com/in/brian-goldin-04170a1/

Atita Arora
Director of AI at Voyager Search, with nearly 20 years in retrieval systems, now focused on geospatial AI for Earth observation data. At Berlin Buzzwords she hosted sessions, attended talks on Lucene, GPUs, and Solr, and emphasized retrieval quality in RAG systems. She is cautiously optimistic about AI agents and values the event as both learning hub and professional reunion.
Connect: https://www.linkedin.com/in/atitaarora/

Navigating healthcare scientific knowledge:building AI agents for accurate biomedical data retrieval

2025-09-02 · PyData Berlin 2025 Watch

talk

by Laura Dumont

AI/ML LLM Python SQL Vector DB

With a focus on healthcare applications where accuracy is non negotiable, this talk highlights challenges and delivers practical insights on building AI agents which query complex biological and scientific data to answer sophisticated questions. Drawing from our experience developing Owkin-K Navigator, a free-to-use AI co-pilot for biological research, I'll share hard-won lessons about combining natural language processing with SQL querying and vector database retrieval to navigate large biomedical knowledge sources, addressing challenges of preventing hallucinations and ensuring proper source attribution. This session is ideal for data scientists, ML engineers, and anyone interested in applying python and LLM ecosystem to the healthcare domain.

EPISODE 26: Eat, Inhibit, Lay – How Worms Tune Reproduction to Food

2025-08-27 · WOrM Podcast: Whole Organism Analytics Podcast Listen

podcast_episode

by Veeren M Chauhan (University of Nottingham) , Yen-Chih Chen , Kara E. Zang , Niels Ringstad , Hassan Ahamed

AI/ML

Why do C. elegans lay eggs only when food is around? In this episode, we explore a newly uncovered neuromodulatory circuit that links food detection to reproductive behaviour using a clever form of disinhibition. At the heart of this is the AVK interneuron — silenced by dopamine when food is present — which normally blocks egg-laying until conditions are right.

We unpack:

How AVK neurons act as gatekeepers for egg-laying behaviour Dopamine from food-sensing neurons inhibits AVKs via DOP-3 receptors AVKs release a cocktail of neuropeptides (PDF-1, NLP-10, NLP-21) that modulate downstream AIY neurons Functional imaging, CRISPR mutants, and optogenetics map the full food-to-egg pathway How this reveals general principles of neuromodulation and disinhibition

📖 Based on the research article: “Food sensing controls C. elegans reproductive behavior by neuromodulatory disinhibition” Yen-Chih Chen, Kara E. Zang, Hassan Ahamed, Niels Ringstad Published in Science Advances (2025) 🔗 https://doi.org/10.1126/sciadv.adu5829

🎧 Subscribe to the WOrM Podcast for more full-organism insights at the interface of environment, brain, and behaviour.

This podcast is generated with artificial intelligence and curated by Veeren. If you’d like your publication featured on the show, please get in touch.

📩 More info: 🔗 ⁠⁠www.veerenchauhan.com⁠⁠ 📧 [email protected]

Users or experts? Navigating the trade-offs in LLM quality annotation

2025-08-26 · How We Build High-Quality, User-Oriented LLM Features at Grammarly

talk

by Yulia Khalus (Grammarly)

linguistic annotation llms user studies

In this talk, we will examine how LLM outputs are evaluated by potential end users versus professional linguist-annotators, as two ways of ensuring alignment with real-world user needs and expectations. We will compare the two approaches, highlight the advantages and recurring pitfalls of user-driven annotation, and share the mitigation techniques we have developed from our own experience.

Machine Learning and AI for Absolute Beginners

2025-08-20 · O'Reilly AI & ML Books O'Reilly Amazon

book

by Oliver Theobald

AI/ML GenAI Python ai-ml data machine-learning

Explore AI and Machine Learning fundamentals, tools, and applications in this beginner-friendly guide. Learn to build models in Python and understand AI ethics. Key Features Covers AI fundamentals, Machine Learning, and Python model-building Provides a clear, step-by-step guide to learning AI techniques Explains ethical considerations and the future role of AI in society Book Description This book is an ideal starting point for anyone interested in Artificial Intelligence and Machine Learning. It begins with the foundational principles of AI, offering a deep dive into its history, building blocks, and the stages of development. Readers will explore key AI concepts and gradually transition to practical applications, starting with machine learning algorithms such as linear regression and k-nearest neighbors. Through step-by-step Python tutorials, the book helps readers build and implement models with hands-on experience. As the book progresses, readers will dive into advanced AI topics like deep learning, natural language processing (NLP), and generative AI. Topics such as recommender systems and computer vision demonstrate the real-world applications of AI technologies. Ethical considerations and privacy concerns are also addressed, providing insight into the societal impact of these technologies. By the end of the book, readers will have a solid understanding of both the theory and practice of AI and Machine Learning. The final chapters provide resources for continued learning, ensuring that readers can continue to grow their AI expertise beyond the book. What you will learn Understand key AI and ML concepts and how they work together Build and apply machine learning models from scratch Use Python to implement AI techniques and improve model performance Explore essential AI tools and frameworks used in the industry Learn the importance of data and data preparation in AI development Grasp the ethical considerations and the future of AI in work Who this book is for This book is ideal for beginners with no prior knowledge of AI or Machine Learning. It is tailored to those who wish to dive into these topics but are not yet familiar with the terminology or techniques. There are no prerequisites, though basic programming knowledge can be helpful. The book caters to a wide audience, from students and hobbyists to professionals seeking to transition into AI roles. Readers should be enthusiastic about learning and exploring AI applications for the future.

Handbook of Intelligent Automation Systems Using Computer Vision and Artificial Intelligence

2025-08-19 · O'Reilly AI & ML Books O'Reilly Amazon

book

by Shilpi Harnal , Susheela Hooda , Durgesh Srivastava , Rupali Gill

AI/ML ai-ml artificial-intelligence-ai artificial intelligence (ai) data

The book is essential for anyone seeking to understand and leverage the transformative power of intelligent automation technologies, providing crucial insights into current trends, challenges, and effective solutions that can significantly enhance operational efficiency and decision-making within organizations. Intelligent automation systems, also called cognitive automation, use automation technologies such as artificial intelligence, business process management, and robotic process automation, to streamline and scale decision-making across organizations. Intelligent automation simplifies processes, frees up resources, improves operational efficiencies, and has a variety of applications. Intelligent automation systems aim to reduce costs by augmenting the workforce and improving productivity and accuracy through consistent processes and approaches, which enhance quality, improve customer experience, and address compliance and regulations with confidence. Handbook of Intelligent Automation Systems Using Computer Vision and Artificial Intelligence explores the significant role, current trends, challenges, and potential solutions to existing challenges in the field of intelligent automation systems, making it an invaluable guide for researchers, industry professionals, and students looking to apply these innovative technologies. Readers will find the volume: Offers comprehensive coverage on intelligent automation systems using computer vision and AI, covering everything from foundational concepts to real-world applications and ethical considerations; Provides actionable knowledge with case studies and best practices for intelligent automation systems, computer vision, and AI; Explores the integration of various techniques, including facial recognition, natural language processing, neuroscience and neuromarketing. Audience The book is designed for AI and data scientists, software developers and engineers in industry and academia, as well as business leaders and entrepreneurs who are interested in the applications of intelligent automation systems.

AI and ML for Coders in PyTorch

2025-07-08 · O'Reilly AI & ML Books O'Reilly Amazon

book

by Laurence Moroney

AI/ML Cloud Computing GenAI PyTorch ai-ml data deep-learning machine-learning

Eager to learn AI and machine learning but unsure where to start? Laurence Moroney's hands-on, code-first guide demystifies complex AI concepts without relying on advanced mathematics. Designed for programmers, it focuses on practical applications using PyTorch, helping you build real-world models without feeling overwhelmed. From computer vision and natural language processing (NLP) to generative AI with Hugging Face Transformers, this book equips you with the skills most in demand for AI development today. You'll also learn how to deploy your models across the web and cloud confidently. Gain the confidence to apply AI without needing advanced math or theory expertise Discover how to build AI models for computer vision, NLP, and sequence modeling with PyTorch Learn generative AI techniques with Hugging Face Diffusers and Transformers

Retrieval Augmented Generation (RAG) for LLMs

2025-07-07 · SciPy 2025

talk

by Sukhada Kulkarni , Antoni Liria Sala , Siyu Qian , Xinling Luo

LLM RAG

Large Language Models (LLMs) have revolutionized natural language processing, but they come with limitations such as hallucinations and outdated knowledge. Retrieval-Augmented Generation (RAG) is a practical approach to mitigating these issues by integrating external knowledge retrieval into the LLM generation process.

This tutorial will introduce the core concepts of RAG, walk through its key components, and provide a hands-on session for building a complete RAG pipeline. We will also cover advanced techniques, such as hybrid search, re-ranking, ensemble retrieval, and benchmarking. By the end of this tutorial, participants will be equipped with both the theoretical understanding and practical skills needed to build robust RAG pipeline.

Scaling Clustering for Big Data: Leveraging RAPIDS cuML

2025-07-07 · SciPy 2025

talk

by Allison Ding

AI/ML Big Data DataViz

This tutorial will explore GPU-accelerated clustering techniques using RAPIDS cuML, optimizing algorithms like K-Means, DBSCAN, and HDBSCAN for large datasets. Traditional clustering methods struggle with scalability, but GPU acceleration significantly enhances performance and efficiency.

Participants will learn to leverage dimensionality reduction techniques (PCA, T-SNE, UMAP) for better data visualization and apply hyperparameter tuning with Optuna and cuML. The session also includes real-world applications like topic modeling in NLP and customer segmentation. By the end, attendees will be equipped to implement, optimize, and scale clustering algorithms effectively, unlocking faster and more powerful insights in machine learning workflows.

Airflow Uses in an on-prem Research Setting

2025-07-01 · Airflow Summit 2025

session

by Lawrence Gerstley

Agile/Scrum AI/ML Airflow Kubernetes LLM Python Spark

KP Division of Research uses Airflow as a central technology for integrating diverse technologies in an agile setting. We wish to present a set of use-cases for AI/ML workloads, including imaging analysis (tissue segmentation, mammography), NLP (early identification of psychosis), LLM processing (identification of vessel diameter from radiological impressions), and other large data processing tasks. We create these “short-lived” project workflows to accomplish specific aims, and then may never run the job again, so leveraging generalized patterns are crucial to quickly implementing these jobs. Our Advanced Computational Infrastructure is comprised of multiple Kubernetes clusters, and we use Airflow to democratize the use of our batch level resources in those clusters. We use Airflow form-based parameters to deploy pods running R and Python scripts where generalized parameters are injected into scripts that follow internal programming patterns. Finally, we also leverage Airflow to create headless services inside Kubernetes for large computational workloads (Spark & H2O) that subsequent pods consume ephemerally.

AI Agents from Scratch: A Beginner's LLM Workshop

2025-06-28 · AI Agents from Scratch: A Beginner's LLM Workshop

workshop

Python embeddings langchain langgraph llms retrieval-augmented generation transformers vector databases

Unlock the power of AI agents—even if you’re just starting out. In this hands-on, beginner-friendly workshop, you'll go from understanding how Large Language Models (LLMs) work to building a real AI agent using Python, LangChain, and LangGraph. Live Demo: Your First AI Agent — follow along as we build an AI agent that retrieves, reasons, and responds using LangChain and LangGraph.

Retail Genie: No-Code AI Apps for Empowering BI Users to be Self-Sufficient

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Harish Rajagopalan (Databricks) , Siddhesh Pore (Databricks)

AI/ML Analytics BI Databricks GenAI SQL

Explore how Databricks AI/BI Genie revolutionizes retail analytics, empowering business users to become self-reliant data explorers. This session highlights no-code AI apps that create a conversational interface for retail data analysis. Genie spaces harness NLP and generative AI to convert business questions into actionable insights, bypassing complex SQL queries. We'll showcase retail teams effortlessly analyzing sales trends, inventory and customer behavior through Genie's intuitive interface. Witness real-world examples of AI/BI Genie's adaptive learning, enhancing accuracy and relevance over time. Learn how this technology democratizes data access while maintaining governance via Unity Catalog integration. Discover Retail Genie's impact on decision-making, accelerating insights and cultivating a data-driven retail culture. Join us to see the future of accessible, intelligent retail analytics in action.

AI Meets SQL: Leverage GenAI at Scale to Enrich Your Data

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Sid Taneja (Databricks) , Youngbin Kim (Databricks)

AI/ML Databricks GenAI LLM RAG SQL

This session is repeated. Integrating AI into existing data workflows can be challenging, often requiring specialized knowledge and complex infrastructure. In this session, we'll share how SQL users can leverage AI/ML to access large language models (LLMs) and traditional machine learning directly from within SQL, simplifying the process of incorporating AI into data workflows. We will demonstrate how to use Databricks SQL for natural language processing, traditional machine learning, retrieval augmented generation and more. You'll learn about best practices and see examples of solving common use cases such as opinion mining, sentiment analysis, forecasting and other common AI/ML tasks.

AT&T AutoClassify: Unified Multi-Head Binary Classification From Unlabeled Text

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Hien Lam (AT&T) , Colton Peltier (Databricks)

Databricks LLM

We present AT&T AutoClassify, built jointly between AT&T's Chief Data Office (CDO) and Databricks professional services, a novel end-to-end system for automatic multi-head binary classifications from unlabeled text data. Our approach automates the challenge of creating labeled datasets and training multi-head binary classifiers with minimal human intervention. Starting only from a corpus of unlabeled text and a list of desired labels, AT&T AutoClassify leverages advanced natural language processing techniques to automatically mine relevant examples from raw text, fine-tune embedding models and train individual classifier heads for multiple true/false labels. This solution can reduce LLM classification costs by 1,000x, making it an efficient solution in operational costs. The end result is a highly optimized and low-cost model servable in Databricks capable of taking raw text and producing multiple binary classifications. An example use case using call transcripts will be examined.

Gen AI Application Development

2025-06-09 · Data + AI Summit 2025

talk

AI/ML Databricks GenAI LLM RAG Vector DB

This course provides participants with information and practical experience in building advanced LLM (Large Language Model) applications using multi-stage reasoning LLM chains and agents. In the initial section, participants will learn how to decompose a problem into its components and select the most suitable model for each step to enhance business use cases. Following this, participants will construct a multi-stage reasoning chain utilizing LangChain and HuggingFace transformers. Finally, participants will be introduced to agents and will design an autonomous agent using generative models on Databricks. Pre-requisites: Solid understanding of natural language processing (NLP) concepts, familiarity with prompt engineering and prompt engineering best practices, experience with the Databricks Data Intelligence Platform, experience with retrieval-augmented generation (RAG) techniques including data preparation, building RAG architectures, and concepts like embeddings, vectors, and vector databases Labs: Yes Certification Path: Databricks Certified Generative AI Engineer Associate

#305 RAG 2.0 and The New Era of RAG Agents with Douwe Kiela, CEO at Contextual AI, Adjunct Professor at Stanford University, Inventor of RAG

2025-06-09 · DataFramed Listen

podcast_episode

by Richie (DataCamp) , Douwe Kiela (Contextual AI)

AI/ML Data Governance GenAI Marketing RAG

Retrieval Augmented Generation (RAG) continues to be a foundational approach in AI despite claims of its demise. While some marketing narratives suggest RAG is being replaced by fine-tuning or long context windows, these technologies are actually complementary rather than competitive. But how do you build a truly effective RAG system that delivers accurate results in high-stakes environments? What separates a basic RAG implementation from an enterprise-grade solution that can handle complex queries across disparate data sources? And with the rise of AI agents, how will RAG evolve to support more dynamic reasoning capabilities? Douwe Kiela is the CEO and co-founder of Contextual AI, a company at the forefront of next-generation language model development. He also serves as an Adjunct Professor in Symbolic Systems at Stanford University, where he contributes to advancing the theoretical and practical understanding of AI systems. Before founding Contextual AI, Douwe was the Head of Research at Hugging Face, where he led groundbreaking efforts in natural language processing and machine learning. Prior to that, he was a Research Scientist and Research Lead at Meta’s FAIR (Fundamental AI Research) team, where he played a pivotal role in developing Retrieval-Augmented Generation (RAG)—a paradigm-shifting innovation in AI that combines retrieval systems with generative models for more grounded and contextually aware responses. In the episode, Richie and Douwe explore the misconceptions around the death of Retrieval Augmented Generation (RAG), the evolution to RAG 2.0, its applications in high-stakes industries, the importance of metadata and entitlements in data governance, the potential of agentic systems in enterprise settings, and much more. Links Mentioned in the Show: Contextual AIConnect with DouweCourse: Retrieval Augmented Generation (RAG) with LangChainRelated Episode: High Performance Generative AI Applications with Ram Sriharsha, CTO at PineconeRegister for RADAR AI - June 26 New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business

Conquering PDFs: document understanding beyond plain text

2025-06-07 · PyData London 2025 Watch

talk

by Ines Montani

Data Science Python

NLP and data science could be so easy if all of our data came as clean and plain text. But in practice, a lot of it is hidden away in PDFs, Word documents, scans and other formats that have been a nightmare to work with. In this talk, I'll present a new and modular approach for building robust document understanding systems, using state-of-the-art models and the awesome Python ecosystem. I'll show you how you can go from PDFs to structured data and even build fully custom information extraction pipelines for your specific use case.

Enhancing Fraud Detection with LLM-Generated Profiles: From Analyst Efficiency to Model Performance

2025-06-07 · PyData London 2025

talk

by Radion Bikmukhamedov

LLM

This talk explores how leveraging Large Language Models (LLMs) to generate structured customer profile summaries improved both compliance analyst workflows and fraud scoring models at a financial institution. Attendees will learn how embeddings derived from LLM-generated narratives outperformed traditional manual feature engineering and raw text embeddings, offering insights into practical applications of NLP in fraud detection.

Graph Theory for Multi-Agent Integration: Showcase Clinical Use Cases

2025-06-06 · PyData London 2025 Watch

talk

by Ahmad Albarqawi

AI/ML

Graph theory is a well-known concept for algorithms and can be used to orchestrate the building of multi-model pipelines. By translating tasks and dependencies into a Directed Acyclic Graph, we can orchestrate diverse AI models, including NLP, vision, and recommendation capabilities. This tutorial provides a step-by-step approach to designing graph-based AI model pipelines, focusing on clinical use cases from the field.

Executive Story: AstraZeneca Generative AI Capability Centre

2025-06-03 · gartner-data-analytics-india-2025

talk

by Raghu Balasubramanian (AstraZeneca) , Sujith Kumar Menta (AstraZeneca)

AI/ML API GenAI RAG

AstraZeneca has implemented a "platform" approach, which serves as a centralized repository of standardized, enterprise grade, reusable services and capabilities that are accessible to AI factories. This platform includes user interfaces, APIs that integrate AI services with enterprise systems, supporting resources like data import tools and agent orchestration services. AstraZeneca will share how, starting with a few generative AI use cases, they have successfully identified common services and capabilities, subsequently standardizing these elements to maximize their applicability through the platform. These solutions leverage technologies like GPT models, Natural Language Processing and Retrieval Augmented Generation (RAG) architecture.

talk-data.com

Activity Trend

Top Events

Top Speakers

Berlin Buzzwords 2025 Conference Interviews

Navigating healthcare scientific knowledge:building AI agents for accurate biomedical data retrieval

EPISODE 26: Eat, Inhibit, Lay – How Worms Tune Reproduction to Food

Users or experts? Navigating the trade-offs in LLM quality annotation

Machine Learning and AI for Absolute Beginners

Handbook of Intelligent Automation Systems Using Computer Vision and Artificial Intelligence

AI and ML for Coders in PyTorch

Retrieval Augmented Generation (RAG) for LLMs

Scaling Clustering for Big Data: Leveraging RAPIDS cuML

Airflow Uses in an on-prem Research Setting

AI Agents from Scratch: A Beginner's LLM Workshop

Retail Genie: No-Code AI Apps for Empowering BI Users to be Self-Sufficient

AI Meets SQL: Leverage GenAI at Scale to Enrich Your Data

AT&T AutoClassify: Unified Multi-Head Binary Classification From Unlabeled Text

Gen AI Application Development

#305 RAG 2.0 and The New Era of RAG Agents with Douwe Kiela, CEO at Contextual AI, Adjunct Professor at Stanford University, Inventor of RAG

Conquering PDFs: document understanding beyond plain text

Enhancing Fraud Detection with LLM-Generated Profiles: From Analyst Efficiency to Model Performance

Graph Theory for Multi-Agent Integration: Showcase Clinical Use Cases

Executive Story: AstraZeneca Generative AI Capability Centre