talk-data.com talk-data.com

Topic

NLP

Natural Language Processing (NLP)

ai machine_learning text_analysis

252

tagged

Activity Trend

24 peak/qtr
2020-Q1 2026-Q1

Activities

252 activities · Newest first

Graph Theory for Computer Science

This book is a vital resource for anyone looking to understand the essential role of graph theory as the unifying thread that connects and provides innovative solutions across a wide spectrum of modern computer science disciplines. Graph theory is a traditional mathematical discipline that has evolved as a basic tool for modeling and analyzing the complex relationships between different technological landscapes. Graph theory helps explain the semantic and syntactic relationships in natural language processing, a technology behind many businesses. Disciplinary and industry developments are seeing a major transition towards more interconnected and data-driven decision-making, and the application of graph theory will facilitate this transition. Disciplines such as parallel and distributive computing will gain insights into how graph theory can help with resource optimization and job scheduling, creating considerable change in the design and development of scalable systems. This book provides comprehensive coverage of how graph theory acts as the thread that connects different areas of computer science to create innovative solutions to modern technological problems. Using a multi-faceted approach, the book explores the fundamentals and role of graph theory in molding complex computational processes across a wide spectrum of computer science.

AWS re:Invent 2025 - High-performance NLP & geospatial analysis with Amazon Redshift (ANT334)

This session explores how Cambridge Mobile Telematics (CMT) uses Amazon Redshift for large-scale geospatial and text data analysis. We'll discuss why CMT chose AWS and Redshift based on scalability and integration needs, highlighting how Redshift's geospatial solution using H3 functions enable efficient processing of billions of location records. The presentation demonstrates CMT's use of Redshift's NLP capabilities, featuring QEv2 and Vega for advanced ad-hoc reporting. Finally, we examine how Redshift's architecture delivers the optimal price-performance balance for CMT's demanding data operations.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

This session presents the paper 'Talking to Patient Records,' an advanced Retrieval-Augmented Generation (RAG) chatbot designed to enhance healthcare information retrieval by integrating natural language processing with domain-specific knowledge bases to allow clinicians, researchers, and administrators to query patient records conversationally. By combining large language models with RAG techniques, the chatbot delivers accurate, context-aware, and secure responses, reducing the time required to locate critical patient information. The study outlines the system’s architecture, implementation, and potential applications in clinical decision support, patient engagement, and healthcare data management.

Join Kostia Omelianchuk and Lukas Beisteiner as they unpack the full scope of Grammatical Error Correction (GEC) from task framing, evaluation, and training to inference optimization and serving high-performance production systems at Grammarly. They will discuss: The modern GEC recipe (shift from heavily human-annotated corpora to semi-synthetic data generation), LLM-as-a-judge techniques for scalable evaluation, and techniques to make deployment fast and affordable, including Speculative Decoding.

This session showcases the seamless integration of agents in Microsoft Foundry with Microsoft SharePoint, enabling secure and intelligent interactions with enterprise documents through role-based access control (RBAC). By leveraging advanced natural language processing capabilities using Microsoft 365 Copilot alongside robust access controls, users can engage in contextual conversations with their SharePoint content while ensuring data integrity and compliance.

talk
by Tito Osadebey (Keele University; Synectics Solutions; Unify)

Fairness and inclusivity are critical challenges as AI systems influence decisions in healthcare, finance, and everyday life. Yet, most fairness frameworks are developed in limited contexts, often overlooking the data diversity needed for global reliability.

In this talk, Tito Osadebey shares lessons from his research on bias in computer vision models to highlight where fairness efforts often fall short and how data professionals can address these gaps. He’ll outline practical principles for building and evaluating inclusive AI systems, discuss pitfalls that lead to hidden biases, and explore what “fairness” really means in practice.

Tito Osadebey is an AI researcher and data scientist whose work focuses on fairness, inclusivity, and ethical representation in AI systems. He recently published a paper on bias in computer vision models using Nigerian food images, which examines how underrepresentation of the Global South affects model performance and trust.

Tito has contributed to research and industry projects spanning computer vision, NLP, GenAI and data science with organisations including Keele University, Synectics Solutions, and Unify. His work has been featured on BBC Radio, and he led a team from Keele University which secured 3rd place globally at the 2025 IEEE MetroXraine Forensic Handwritten Document Analysis Challenge.

He is passionate about making AI systems more inclusive, context-aware, and equitable bridging the gap between technical innovation and human understanding.

The problem of address matching arrives when the address of one physical place is written in two or more different ways. This situation is very common in companies that receive records of customers from different sources. The differences can be classified as syntactic and semantic. In the first type, the meaning is the same but the way they are written is different. For example, one can find "Street" vs "St". In the second type, the meaning is not exactly the same. For example, one can find "Road" instead of "Street". To solve this problem and match addresses, we have a couple of approaches. The first and simple is by using similarity metrics. The second uses natural language and transformers. This is a hands-on talk and is intended for data process analyst. We are going to go through these solutions implemented in a Jupyter notebook using Python.

AI agents are rapidly evolving, from the early efforts to simulate human cognition and social behavior, to sophisticated multi-agent systems that can act as practical collaborators in everyday work. Building on nearly 75 years of AI and NLP research, today’s AI agents can plan, reason, communicate, and even coordinate. This talk explores the growing capabilities of AI agents as both mirrors and extensions of human cognition and behavior. I will discuss advances in multi-agent frameworks that simulate social reasoning, cooperation, and emergent communication, and explore how these capabilities are being translated into workplace applications supporting brainstorming, narrative development, and research design.

AI agents are rapidly evolving, from the early efforts to simulate human cognition and social behavior, to sophisticated multi-agent systems that can act as practical collaborators in everyday work. Building on nearly 75 years of AI and NLP research, today’s AI agents can plan, reason, communicate, and even coordinate. This talk explores the growing capabilities of AI agents as both mirrors and extensions of human cognition and behavior. I will discuss advances in multi-agent frameworks that simulate social reasoning, cooperation, and emergent communication, and explore how these capabilities are being translated into workplace applications supporting brainstorming, narrative development, and research design.

In this episode, we talked with Aishwarya Jadhav, a machine learning engineer whose career has spanned Morgan Stanley, Tesla, and now Waymo. Aishwarya shares her journey from big data in finance to applied AI in self-driving, gesture understanding, and computer vision. She discusses building an AI guide dog for the visually impaired, contributing to malaria mapping in Africa, and the challenges of deploying safe autonomous systems. We also explore the intersection of computer vision, NLP, and LLMs, and what it takes to break into the self-driving AI industry.TIMECODES00:51 Aishwarya’s career journey from finance to self-driving AI05:45 Building AI guide dog for the visually impaired12:03 Exploring LiDAR, radar, and Tesla’s camera-based approach16:24 Trust, regulation, and challenges in self-driving adoption19:39 Waymo, ride-hailing, and gesture recognition for traffic control24:18 Malaria mapping in Africa and AI for social good29:40 Deployment, safety, and testing in self-driving systems37:00 Transition from NLP to computer vision and deep learning43:37 Reinforcement learning, robotics, and self-driving constraints51:28 Testing processes, evaluations, and staged rollouts for autonomous driving52:53 Can multimodal LLMs be applied to self-driving?55:33 How to get started in self-driving AI careersConnect with Aishwarya- Linkedin - https://www.linkedin.com/in/aishwaryajadhav8/Connect with DataTalks.Club:- Join the community - https://datatalks.club/slack.html- Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/r?cid=ZjhxaWRqbnEwamhzY3A4ODA5azFlZ2hzNjBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ- Check other upcoming events - https://lu.ma/dtc-events- GitHub: https://github.com/DataTalksClub- LinkedIn - https://www.linkedin.com/company/datatalks-club/ - Twitter - https://twitter.com/DataTalksClub - Website - https://datatalks.club/

In this episode, we talked with Ranjitha Kulkarni, a machine learning engineer with a rich career spanning Microsoft, Dropbox, and now NeuBird AI. Ranjitha shares her journey into ML and NLP, her work building recommendation systems, early AI agents, and cutting-edge LLM-powered products. She offers insights into designing reliable AI systems in the new era of generative AI and agents, and how context engineering and dynamic planning shape the future of AI products.TIMECODES00:00 Career journey and early curiosity04:25 Speech recognition at Microsoft05:52 Recommendation systems and early agents at Dropbox07:44 Joining NewBird AI12:01 Defining agents and LLM orchestration16:11 Agent planning strategies18:23 Agent implementation approaches22:50 Context engineering essentials30:27 RAG evolution in agent systems37:39 RAG vs agent use cases40:30 Dynamic planning in AI assistants43:00 AI productivity tools at Dropbox46:00 Evaluating AI agents53:20 Reliable tool usage challenges58:17 Future of agents in engineering Connect with Ranjitha- Linkedin - https://www.linkedin.com/in/ranjitha-gurunath-kulkarniConnect with DataTalks.Club:- Join the community - https://datatalks.club/slack.html- Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/r?cid=ZjhxaWRqbnEwamhzY3A4ODA5azFlZ2hzNjBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ- Check other upcoming events - https://lu.ma/dtc-events- GitHub: https://github.com/DataTalksClub- LinkedIn - https://www.linkedin.com/company/datatalks-club/ - Twitter - https://twitter.com/DataTalksClub - Website - https://datatalks.club/

Balancing Privacy and Utility: Efficient PII Detection and Replacement in Textual Data

Anonymizing free-text data is harder than it seems. While structured databases have well-established anonymization techniques, textual data — like invoices, resumes, or medical records — poses unique challenges. Personally identifiable information (PII) can appear anywhere, in unpredictable formats, and how to modify it while preserving the dataset's usefulness?

Let's explore a practical, open-source 2-step approach to text anonymization: (1) detecting PII using NER models and (2) replacing it while preserving key dataset characteristics (e.g. document formatting, statistical distributions). We will demonstrate how to build a robust pipeline leveraging tools such as pre-trained PII detection models, gliner for fine-tuning, or Faker for generating meaningful replacements.

Ideal for those with a basic understanding of NLP, this session offers practical insights for anyone working with sensitive textual data.

talk
by Dr. Rui Li (NYU Tandon School of Engineering)

In this talk, Dr. Rui 'Ray' Li presents groundbreaking work on multi-agent AI systems for Educational AI. Educational AI refers to the application of AI to improve teaching, learning, and educational management. It includes personalized learning systems that adapt to individual student needs, intelligent tutoring systems that provide real-time feedback, automated grading tools, and predictive analytics that help educators identify learning gaps. By leveraging NLP, ML, and data-driven insights, educational AI supports more engaging learning experiences, reduces administrative burdens, and enables equitable access to knowledge across diverse student populations. In this talk, we are discussing the most recent development of using AI agent in classroom learning such as assisting student group projects.

This project develops an enterprise-grade AI platform that automates the extraction of ESG data, regulatory compliance checks, and peer benchmarking for companies. Utilizing NLP and machine learning, the system converts unstructured sustainability reports into standardized metrics, facilitating real-time compliance monitoring and competitive intelligence across various industries. Business Impact: Targets the rapidly growing ESG software market, serving investment firms, consulting companies, and institutional investors requiring automated analysis for portfolio decisions and regulatory compliance.

ActiveTigger: A Collaborative Text Annotation Research Tool for Computational Social Sciences

The exponential growth of textual data—ranging from social media posts and digital news archives to speech-to-text transcripts—has opened new frontiers for research in the social sciences. Tasks such as stance detection, topic classification, and information extraction have become increasingly common. At the same time, the rapid evolution of Natural Language Processing, especially pretrained language models and generative AI, has largely been led by the computer science community, often leaving a gap in accessibility for social scientists.

To address this, we initiated since 2023 the development of ActiveTigger, a lightweight, open-source Python application (with a web frontend in React) designed to accelerate annotation process and manage large-scale datasets through the integration of fine-tuned models. It aims to support computational social science for a large public both within and outside social sciences. Already used by a dynamic community in social sciences, the stable version is planned for early June 2025.

From a more technical prospect, the API is designed to manage the complete workflow from project creation, embeddings computation, exploration of the text corpus, human annotation with active learning, fine-tuning of pre-trained models (BERT-like), prediction on a larger corpus, and export. It also integrates LLM-as-a-service capabilities for prompt-based annotation and information extraction, offering a flexible approach for hybrid manual/automatic labeling. Accessible both with a web frontend and a Python client, ActiveTigger encourages customization and adaptation to specific research contexts and practices.

In this talk, we will delve into the motivations behind the creation of ActiveTigger, outline its technical architecture, and walk through its core functionalities. Drawing on several ongoing research projects within the Computational Social Science (CSS) group at CREST, we will illustrate concrete use cases where ActiveTigger has accelerated data annotation, enabled scalable workflows, and fostered collaborations. Beyond the technical demonstration, the talk will also open a broader reflection on the challenges and opportunities brought by generative AI in academic research—especially in terms of reliability, transparency, and methodological adaptation for qualitative and quantitative inquiries.

The repository of the project : https://github.com/emilienschultz/activetigger/

The development of this software is funded by the DRARI Ile-de-France and supported by Progédo.

Sieves: Plug-and-Play NLP Pipelines With Zero-Shot Models

Generative models are dominating the spotlight lately - and rightly so. Their flexibility and zero-shot capabilities make it incredibly fast to prototype NLP applications. However, one-shotting complex NLP problems often isn't the best long-term strategy. Decomposing problems into modular, pipelined tasks leads to better debuggability, greater interpretability, and more reliable performance.

This modular pipeline approach pairs naturally with zero- and few-shot (ZFS) models, enabling rapid yet robust prototyping without requiring large datasets or fine-tuning. Crucially, many real-world applications need structured data outputs—not free-form text. Generative models often struggle to consistently produce structured results, which is why enforcing structured outputs is now a core feature across contemporary NLP tools (like Outlines, DSPy, LangChain, Ollama, vLLM, and others).

For engineers building NLP pipelines today, the landscape is fragmented. There’s no single standard for structured generation yet, and switching between tools can be costly and frustrating. The NLP tooling landscape lacks a flexible, model-agnostic solution that minimizes setup overhead, supports structured outputs, and accelerates iteration.

Introducing Sieves: a modular toolkit for building robust NLP document processing pipelines using ZFS models.

In the world of investment, inflation indicators play a pivotal role in planning for the future. Hedge funds in particular must grapple with text-based signals that provide deep insight into the future of stocks and industries. This talk will showcase how a combination of natural language processing, semantic embeddings, and cutting-edge large language models, can help transform those signals into bankable success.

Advances in Artificial Intelligence Applications in Industrial and Systems Engineering

Comprehensive guide offering actionable strategies for enhancing human-centered AI, efficiency, and productivity in industrial and systems engineering through the power of AI. Advances in Artificial Intelligence Applications in Industrial and Systems Engineering is the first book in the Advances in Industrial and Systems Engineering series, offering insights into AI techniques, challenges, and applications across various industrial and systems engineering (ISE) domains. Not only does the book chart current AI trends and tools for effective integration, but it also raises pivotal ethical concerns and explores the latest methodologies, tools, and real-world examples relevant to today’s dynamic ISE landscape. Readers will gain a practical toolkit for effective integration and utilization of AI in system design and operation. The book also presents the current state of AI across big data analytics, machine learning, artificial intelligence tools, cloud-based AI applications, neural-based technologies, modeling and simulation in the metaverse, intelligent systems engineering, and more, and discusses future trends. Written by renowned international contributors for an international audience, Advances in Artificial Intelligence Applications in Industrial and Systems Engineering includes information on: Reinforcement learning, computer vision and perception, and safety considerations for autonomous systems (AS) (NLP) topics including language understanding and generation, sentiment analysis and text classification, and machine translation AI in healthcare, covering medical imaging and diagnostics, drug discovery and personalized medicine, and patient monitoring and predictive analysis Cybersecurity, covering threat detection and intrusion prevention, fraud detection and risk management, and network security Social good applications including poverty alleviation and education, environmental sustainability, and disaster response and humanitarian aid. Advances in Artificial Intelligence Applications in Industrial and Systems Engineering is a timely, essential reference for engineering, computer science, and business professionals worldwide.