talk-data.com talk-data.com

Topic

LLM

Large Language Models (LLM)

nlp ai machine_learning

1405

tagged

Activity Trend

158 peak/qtr
2020-Q1 2026-Q1

Activities

1405 activities · Newest first

In this session, we’ll walk through why we decided to abandon Redshift, driven by the need for elasticity, cost efficiency, and faster iteration. We'll also discuss how we designed a structured migration blueprint, including a vendor evaluation, proof of concept, and a four-pillared framework to guide our journey.

We’ll then dive into the areas where things got turbulent: why off-the-shelf translation tools failed, why LLMs didn’t help, and the specific technical challenges we encountered. From surrogate key issues to Stitch's unflattened data and Snowplow’s monolithic event tables.

We’ll explain how we kept the migration on track by turning our analysts into co-navigators, embedding them into validation loops to leverage domain knowledge, and how a parallel ingestion strategy helped stabilize progress during high-risk phases.

Finally, we’ll share what this migration unlocked for us: faster ingestion, better permission management and more. Expect real-world pitfalls, lessons learned, and actionable insights for your own migration journey.

A l’occasion de cette démo, en partant d’une page blanche et de différentes sources de données, nous irons jusqu’à déployer une application Data Analytics augmentée par des LMM en utilisant ces deux produits lancés par OVHcloud en 2025.

OVHcloud DataPlatform : une solution unifiée et permettant vos équipes de gérer en self-service de bout en bout vos projets Data & Analytics : de la collecte de tous types de données, leur exploration, leur stockage, leurs transformations, jusqu’à la construction de tableaux de bords partagés via des applications dédiées. Une service pay-as-you-go pour accélérer de déploiement et simplifier la gestion des projets Data.

AI Endpoints : une solution serverless qui permet aux développeurs d’intégrer facilement des fonctionnalités d'IA avancées à leurs applications. Grâce à plus de 40 modèles open-source de pointe incluant LLM et IA générative – pour des usages comme les agents conversationnels, modèles vocaux, assistants de code, etc. - AI Endpoints démocratise l’utilisation de l'IA, indépendamment de la taille ou du secteur de l'organisation.

Et cela en s’appuyant sur les meilleurs standards Data open-source (Apache Iceberg, Spark, SuperSet, Trino, Jupyter Notebooks…) dans des environnements respectueux de votre souveraineté technologique.

L’objectif de cet atelier sera de présenter un outil, basé sur un LLM, développé conjointement entre l'Unédic et Artik Consulting, qui permet d'évaluer la conformité d'une application d'IA à l'AI Act. Les concepteurs de cette solution vous en présenteront les enjeux, les solutions technologiques retenues et les raison de ces choix, ainsi que les résultats obtenus, notamment le formulaire d'évaluation.

Building Resilient (ML) Pipelines for MLOps

This talk explores the disconnect between MLOps fundamental principles and their practical application in designing, operating and maintaining machine learning pipelines. We’ll break down these principles, examine their influence on pipeline architecture, and conclude with a straightforward, vendor-agnostic mind-map, offering a roadmap to build resilient MLOps systems for any project or technology stack. Despite the surge in tools and platforms, many teams still struggle with the same underlying issues: brittle data dependencies, poor observability, unclear ownership, and pipelines that silently break once deployed. Architecture alone isn't the answer — systems thinking is.

We'll use concrete examples to walk through common failure modes in ML pipelines, highlight where analogies fall apart, and show how to build systems that tolerate failure, adapt to change, and support iteration without regressions.

Topics covered include: - Common failure modes in ML pipelines - Modular design: feature, training, inference - Built-in observability, versioning, reuse - Orchestration across batch, real-time, LLMs - Platform-agnostic patterns that scale

Key takeaways: - Resilience > diagrams - Separate concerns, embrace change - Metadata is your backbone - Infra should support iteration, not block it

Repetita Non Iuvant: Why Generative AI Models Cannot Feed Themselves

As AI floods the digital landscape with content, what happens when it starts repeating itself? This talk explores model collapse, a progressive erosion where LLMs and image generators loop on their own results, hindering the creation of novel output.

We will show how self-training leads to bias and loss of diversity, examine the causes of this degradation, and quantify its impact on model creativity. Finally, we will also present concrete strategies to safeguard the future of generative AI, emphasizing the critical need to preserve innovation and originality.

By the end of this talk, attendees will gain insights into the practical implications of model collapse, understanding its impact on content diversity and the long-term viability of AI.

Documents Meet LLMs: Tales from the Trenches

Processing documents with LLMs comes with unexpected challenges: handling long inputs, enforcing structured outputs, catching hallucinations, and recovering from partial failures. In this talk, we’ll cover why large context windows are not a silver bullet, why chunking is deceptively hard and how to design input and output that allow for intelligent retrial. We'll also share practical prompting strategies, discuss OCR and parsing tools, compare different LLMs (and their cloud APIs) and highlight real-world insights from our experience developing production GenAI applications with multiple document processing scenarios.

Et si vos données devenaient vraiment intelligentes ?

Au croisement de l’IA générative, des agents autonomes et des graphes de connaissances, Neo4j révèle une nouvelle dimension de performance en assurant des réponses précises et contextualisées. En structurant les relations entre vos données et en intégrant RAG (Retrieval-Augmented Generation), Neo4j réduit les hallucinations des LLM, renforce la pertinence des réponses et décuple vos capacités de décision.

Venez découvrir comment cette alliance révolutionne les workflows IA, et pourquoi Neo4j devrait être le socle de votre stratégie IA.

Le processus d’allocation des coûts dans Alteryx s’appuie sur des workflows automatisés pour répartir avec précision les coûts des ETP entre les centres d’activités, selon des critères prédéfinis tels que les effectifs ou les volumes, en respectant des règles spécifiques d’allocation des coûts et de calcul des KPI.

1. Ingestion et préparation des données

Alteryx se connecte à plusieurs sources (par exemple, ERP, CRM, stockage cloud) pour extraire les données liées aux ETP, aux coûts et aux volumes. Le processus agrège, prépare et aligne ces ensembles de données disparates afin de créer une base de coûts unifiée.

2.Amélioration de la qualité des données

Des règles de transformation dynamiques sont appliquées pour garantir la cohérence, supprimer les doublons, gérer les valeurs manquantes et standardiser les types de données. Des outils de profilage des données offrent une visibilité sur les anomalies et valeurs aberrantes susceptibles d’impacter la logique d’allocation.

3. Logique d’allocation des coûts

Cela permet de définir des règles d’allocation flexibles et des étapes de validation — allant de ratios simples à des règles dynamiques dictées par les besoins métiers — en fonction des moteurs de coûts, ETP et volumes, pour garantir l’exactitude des calculs de KPI.

4. Intégration de l’IA générative

Les fonctions d’IA générative (par exemple via OpenAI ou les outils Gen AI d’Alteryx) renforcent le workflow en permettant :

La génération automatique de schémas de données adaptés à un format cible.

L’assistance via un outil Copilot pour créer des transformations à partir d’instructions en langage naturel.

La création de règles d’allocation dynamiques.

5. Sortie et visualisation

Les allocations finales peuvent être exportées vers des outils de reporting, des tableaux de bord ou des data lakes. Les utilisateurs peuvent consulter des synthèses d’allocation, des écarts et des vues détaillées pour appuyer la prise de décision via des applications analytiques personnalisées.

Dans un contexte où 50 % des véhicules sur les routes en 2040 seront encore thermiques ou hybrides, la robustesse des systèmes reste une exigence clé.

La complexité des groupes motopropulseurs hybrides multiplie les modes de défaillance potentiels. Pour garantir un bon niveau de qualité, une solution IA développée avec Altair accélère l’analyse des problèmes et permet de doubler le nombre de cas traités. Elle classe automatiquement les défaillances connues et assiste les experts via un LLM pour les cas complexes : les données proviennent des garages (pièces, ressentis client, diagnostics). Grâce à la solution low-no code Altair RapidMiner, les experts qualité peuvent adapter l’algorithme sans programmation.

Horse Technologies est une entreprise née de Renault, avec plus de 125 ans d’expertise dans les systèmes de groupes motopropulseurs. Elle fait désormais partie de Horse Powertrain, la JV créée par Renault, Geely et Aramco.

As Germany's largest insurance groups embrace AI transformation, the challenge isn't just implementing large language models—it's building scalable, compliant infrastructure for enterprise-grade AI workflows. This session explores how Inverso, as a specialized Snowflake MSP, is revolutionizing insurance operations by combining Snowflake's data platform with cutting-edge AI to create production-ready solutions for automated claims processing and policy interpretation.

Dans un contexte de forte pression opérationnelle, l’administration française a su innover pour améliorer la qualité de service tout en optimisant ses ressources. Grâce à la plateforme d’IA développée par ATHENA Decision Systems – combinant LLM, moteur de règles et orchestration intelligente – le projet DELPHES permet de traiter plus rapidement et équitablement les demandes des usagers étrangers en préfecture.

Cette solution, fondée sur l’IA de confiance d’IBM, démontre comment l’automatisation responsable, sous supervision humaine, peut transformer des processus complexes, réduire les délais, améliorer la satisfaction client et limiter les coûts.

Un retour d’expérience concret, inspirant pour les organisations publiques comme privées, confrontées à des enjeux similaires de volume, de qualité de service et de conformité. Venez découvrir comment cette approche peut s’adapter à vos propres défis métiers.

In this episode, I sit down with Saket Saurabh (CEO of Nexla) to discuss the fundamental shift happening in the AI landscape. The conversation is moving beyond the race to build the biggest foundational models and towards a new battleground: context. We explore what it means to be a "model company" versus a "context company" and how this changes everything for data strategy and enterprise AI.

Join us as we cover: Model vs. Context Companies: The emerging divide between companies building models (like OpenAI) and those whose advantage lies in their unique data and integrations. The Limits of Current Models: Why we might be hitting an asymptote with the current transformer architecture for solving complex, reliable business processes. "Context Engineering": What this term really means, from RAG to stitching together tools, data, and memory to feed AI systems. The Resurgence of Knowledge Graphs: Why graph databases are becoming critical for providing deterministic, reliable information to probabilistic AI models, moving beyond simple vector similarity. AI's Impact on Tooling: How tools like Lovable and Cursor are changing workflows for prototyping and coding, and the risk of creating the "-10x engineer." The Future of Data Engineering: How the field is expanding as AI becomes the primary consumer of data, requiring a new focus on architecture, semantics, and managing complexity at scale.

Deep Research architectures have become increasingly popular over the past three months as users want more complex tasks to be handled by LLMs. The LastMile AI team has gone through a journey to understand what core components are needed for a Deep Research Agent and what qualities enable it to be ready for production. We'll be sharing the core architecture for a Deep Research Agent and how these components influence the behavior of the agent.

ActiveTigger: A Collaborative Text Annotation Research Tool for Computational Social Sciences

The exponential growth of textual data—ranging from social media posts and digital news archives to speech-to-text transcripts—has opened new frontiers for research in the social sciences. Tasks such as stance detection, topic classification, and information extraction have become increasingly common. At the same time, the rapid evolution of Natural Language Processing, especially pretrained language models and generative AI, has largely been led by the computer science community, often leaving a gap in accessibility for social scientists.

To address this, we initiated since 2023 the development of ActiveTigger, a lightweight, open-source Python application (with a web frontend in React) designed to accelerate annotation process and manage large-scale datasets through the integration of fine-tuned models. It aims to support computational social science for a large public both within and outside social sciences. Already used by a dynamic community in social sciences, the stable version is planned for early June 2025.

From a more technical prospect, the API is designed to manage the complete workflow from project creation, embeddings computation, exploration of the text corpus, human annotation with active learning, fine-tuning of pre-trained models (BERT-like), prediction on a larger corpus, and export. It also integrates LLM-as-a-service capabilities for prompt-based annotation and information extraction, offering a flexible approach for hybrid manual/automatic labeling. Accessible both with a web frontend and a Python client, ActiveTigger encourages customization and adaptation to specific research contexts and practices.

In this talk, we will delve into the motivations behind the creation of ActiveTigger, outline its technical architecture, and walk through its core functionalities. Drawing on several ongoing research projects within the Computational Social Science (CSS) group at CREST, we will illustrate concrete use cases where ActiveTigger has accelerated data annotation, enabled scalable workflows, and fostered collaborations. Beyond the technical demonstration, the talk will also open a broader reflection on the challenges and opportunities brought by generative AI in academic research—especially in terms of reliability, transparency, and methodological adaptation for qualitative and quantitative inquiries.

The repository of the project : https://github.com/emilienschultz/activetigger/

The development of this software is funded by the DRARI Ile-de-France and supported by Progédo.

Investing for Programmers

Maximize your portfolio, analyze markets, and make data-driven investment decisions using Python and generative AI. Investing for Programmers shows you how you can turn your existing skills as a programmer into a knack for making sharper investment choices. You’ll learn how to use the Python ecosystem, modern analytic methods, and cutting-edge AI tools to make better decisions and improve the odds of long-term financial success. In Investing for Programmers you’ll learn how to: Build stock analysis tools and predictive models Identify market-beating investment opportunities Design and evaluate algorithmic trading strategies Use AI to automate investment research Analyze market sentiments with media data mining In Investing for Programmers you'll learn the basics of financial investment as you conduct real market analysis, connect with trading APIs to automate buy-sell, and develop a systematic approach to risk management. Don’t worry—there’s no dodgy financial advice or flimsy get-rich-quick schemes. Real-life examples help you build your own intuition about financial markets, and make better decisions for retirement, financial independence, and getting more from your hard-earned money. About the Technology A programmer has a unique edge when it comes to investing. Using open-source Python libraries and AI tools, you can perform sophisticated analysis normally reserved for expensive financial professionals. This book guides you step-by-step through building your own stock analysis tools, forecasting models, and more so you can make smart, data-driven investment decisions. About the Book Investing for Programmers shows you how to analyze investment opportunities using Python and machine learning. In this easy-to-read handbook, experienced algorithmic investor Stefan Papp shows you how to use Pandas, NumPy, and Matplotlib to dissect stock market data, uncover patterns, and build your own trading models. You’ll also discover how to use AI agents and LLMs to enhance your financial research and decision-making process. What's Inside Build stock analysis tools and predictive models Design algorithmic trading strategies Use AI to automate investment research Analyze market sentiment with media data mining About the Reader For professional and hobbyist Python programmers with basic personal finance experience. About the Author Stefan Papp combines 20 years of investment experience in stocks, cryptocurrency, and bonds with decades of work as a data engineer, architect, and software consultant. Quotes Especially valuable for anyone looking to improve their investing. - Armen Kherlopian, Covenant Venture Capital A great breadth of topics—from basic finance concepts to cutting-edge technology. - Ilya Kipnis, Quantstrat Trader A top tip for people who want to leverage development skills to improve their investment possibilities. - Michael Zambiasi, Raiffeisen Digital Bank Brilliantly bridges the worlds of coding and finance. - Thomas Wiecki, PyMC Labs