LLM

Sevdesk: Expectation - Seamless Migration. Reality: Debugging at 2 AM

2025-10-01 · Snowflake World Tour Berlin

session

Redshift Snowplow Stitch

In this session, we’ll walk through why we decided to abandon Redshift, driven by the need for elasticity, cost efficiency, and faster iteration. We'll also discuss how we designed a structured migration blueprint, including a vendor evaluation, proof of concept, and a four-pillared framework to guide our journey.

We’ll then dive into the areas where things got turbulent: why off-the-shelf translation tools failed, why LLMs didn’t help, and the specific technical challenges we encountered. From surrogate key issues to Stitch's unflattened data and Snowplow’s monolithic event tables.

We’ll explain how we kept the migration on track by turning our analysts into co-navigators, embedding them into validation loops to leverage domain knowledge, and how a parallel ingestion strategy helped stabilize progress during high-risk phases.

Finally, we’ll share what this migration unlocked for us: faster ingestion, better permission management and more. Expect real-world pitfalls, lessons learned, and actionable insights for your own migration journey.

Créer en 20 minutes une appli Data enrichie avec l’IA c’est possible ! Démo avec OVHcloud Data Platform et AI Endpoints

2025-10-01 · Big Data & AI Paris 2025

Face To Face

by Darius Matboo (OVHcloud) , Elea Petton (OVHcloud)

AI/ML Analytics Data Analytics Iceberg Spark Superset Trino

A l’occasion de cette démo, en partant d’une page blanche et de différentes sources de données, nous irons jusqu’à déployer une application Data Analytics augmentée par des LMM en utilisant ces deux produits lancés par OVHcloud en 2025.

OVHcloud DataPlatform : une solution unifiée et permettant vos équipes de gérer en self-service de bout en bout vos projets Data & Analytics : de la collecte de tous types de données, leur exploration, leur stockage, leurs transformations, jusqu’à la construction de tableaux de bords partagés via des applications dédiées. Une service pay-as-you-go pour accélérer de déploiement et simplifier la gestion des projets Data.

AI Endpoints : une solution serverless qui permet aux développeurs d’intégrer facilement des fonctionnalités d'IA avancées à leurs applications. Grâce à plus de 40 modèles open-source de pointe incluant LLM et IA générative – pour des usages comme les agents conversationnels, modèles vocaux, assistants de code, etc. - AI Endpoints démocratise l’utilisation de l'IA, indépendamment de la taille ou du secteur de l'organisation.

Et cela en s’appuyant sur les meilleurs standards Data open-source (Apache Iceberg, Spark, SuperSet, Trino, Jupyter Notebooks…) dans des environnements respectueux de votre souveraineté technologique.

Une IA pour évaluer la conformité à l'AI Act

2025-10-01 · Big Data & AI Paris 2025

Face To Face

by Kim NOEL (Artik consulting) , Stephanie TERRASSE (Unédic)

AI/ML

L’objectif de cet atelier sera de présenter un outil, basé sur un LLM, développé conjointement entre l'Unédic et Artik Consulting, qui permet d'évaluer la conformité d'une application d'IA à l'AI Act. Les concepteurs de cette solution vous en présenteront les enjeux, les solutions technologiques retenues et les raison de ces choix, ainsi que les résultats obtenus, notamment le formulaire d'évaluation.

Building Resilient (ML) Pipelines for MLOps

2025-10-01 · PyData Paris 2025 Watch

talk

by Lex Avstreikh (Hopsworks)

AI/ML MLOps

This talk explores the disconnect between MLOps fundamental principles and their practical application in designing, operating and maintaining machine learning pipelines. We’ll break down these principles, examine their influence on pipeline architecture, and conclude with a straightforward, vendor-agnostic mind-map, offering a roadmap to build resilient MLOps systems for any project or technology stack. Despite the surge in tools and platforms, many teams still struggle with the same underlying issues: brittle data dependencies, poor observability, unclear ownership, and pipelines that silently break once deployed. Architecture alone isn't the answer — systems thinking is.

We'll use concrete examples to walk through common failure modes in ML pipelines, highlight where analogies fall apart, and show how to build systems that tolerate failure, adapt to change, and support iteration without regressions.

Topics covered include: - Common failure modes in ML pipelines - Modular design: feature, training, inference - Built-in observability, versioning, reuse - Orchestration across batch, real-time, LLMs - Platform-agnostic patterns that scale

Key takeaways: - Resilience > diagrams - Separate concerns, embrace change - Metadata is your backbone - Infra should support iteration, not block it

Repetita Non Iuvant: Why Generative AI Models Cannot Feed Themselves

2025-10-01 · PyData Paris 2025 Watch

talk

by Valeria Zuccoli

AI/ML GenAI

As AI floods the digital landscape with content, what happens when it starts repeating itself? This talk explores model collapse, a progressive erosion where LLMs and image generators loop on their own results, hindering the creation of novel output.

We will show how self-training leads to bias and loss of diversity, examine the causes of this degradation, and quantify its impact on model creativity. Finally, we will also present concrete strategies to safeguard the future of generative AI, emphasizing the critical need to preserve innovation and originality.

By the end of this talk, attendees will gain insights into the practical implications of model collapse, understanding its impact on content diversity and the long-term viability of AI.

Documents Meet LLMs: Tales from the Trenches

2025-10-01 · PyData Paris 2025 Watch

talk

by Miklos Erdelyi , Nour El Mawass

API Cloud Computing GenAI

Processing documents with LLMs comes with unexpected challenges: handling long inputs, enforcing structured outputs, catching hallucinations, and recovering from partial failures. In this talk, we’ll cover why large context windows are not a silver bullet, why chunking is deceptively hard and how to design input and output that allow for intelligent retrial. We'll also share practical prompting strategies, discuss OCR and parsing tools, compare different LLMs (and their cloud APIs) and highlight real-world insights from our experience developing production GenAI applications with multiple document processing scenarios.

Et si le plus important en 2025 c’était la donnée, et non le modèle de LLM ?

2025-10-01 · PyData Paris 2025

Face To Face

Et si nous parlions qualités - et non pas qualité - de la donnée ?

Et si le plus important en 2025 c’était la donnée, et non le modèle de LLM ?

2025-10-01 · Big Data & AI Paris 2025

Face To Face

by Alexis De Saint Jean (BLUEWAY)

Et si nous parlions qualités - et non pas qualité - de la donnée ?

IA, Agents et GraphRAG: une synergie pour l’intelligence augmentée

2025-10-01 · Big Data & AI Paris 2025

Face To Face

by Pierre Halftermeyer

AI/ML Neo4j RAG

Et si vos données devenaient vraiment intelligentes ?

Au croisement de l’IA générative, des agents autonomes et des graphes de connaissances, Neo4j révèle une nouvelle dimension de performance en assurant des réponses précises et contextualisées. En structurant les relations entre vos données et en intégrant RAG (Retrieval-Augmented Generation), Neo4j réduit les hallucinations des LLM, renforce la pertinence des réponses et décuple vos capacités de décision.

Venez découvrir comment cette alliance révolutionne les workflows IA, et pourquoi Neo4j devrait être le socle de votre stratégie IA.

Réinventer l’allocation des coûts: quand la qualité des données rencontre l’IA générative dans Alteryx

2025-10-01 · Big Data & AI Paris 2025

Face To Face

by Anne-Fatima Dione

AI/ML Alteryx Cloud Computing CRM DataViz ERP GenAI KPI

Le processus d’allocation des coûts dans Alteryx s’appuie sur des workflows automatisés pour répartir avec précision les coûts des ETP entre les centres d’activités, selon des critères prédéfinis tels que les effectifs ou les volumes, en respectant des règles spécifiques d’allocation des coûts et de calcul des KPI.

1. Ingestion et préparation des données

Alteryx se connecte à plusieurs sources (par exemple, ERP, CRM, stockage cloud) pour extraire les données liées aux ETP, aux coûts et aux volumes. Le processus agrège, prépare et aligne ces ensembles de données disparates afin de créer une base de coûts unifiée.

2.Amélioration de la qualité des données

Des règles de transformation dynamiques sont appliquées pour garantir la cohérence, supprimer les doublons, gérer les valeurs manquantes et standardiser les types de données. Des outils de profilage des données offrent une visibilité sur les anomalies et valeurs aberrantes susceptibles d’impacter la logique d’allocation.

3. Logique d’allocation des coûts

Cela permet de définir des règles d’allocation flexibles et des étapes de validation — allant de ratios simples à des règles dynamiques dictées par les besoins métiers — en fonction des moteurs de coûts, ETP et volumes, pour garantir l’exactitude des calculs de KPI.

4. Intégration de l’IA générative

Les fonctions d’IA générative (par exemple via OpenAI ou les outils Gen AI d’Alteryx) renforcent le workflow en permettant :

La génération automatique de schémas de données adaptés à un format cible.

L’assistance via un outil Copilot pour créer des transformations à partir d’instructions en langage naturel.

La création de règles d’allocation dynamiques.

5. Sortie et visualisation

Les allocations finales peuvent être exportées vers des outils de reporting, des tableaux de bord ou des data lakes. Les utilisateurs peuvent consulter des synthèses d’allocation, des écarts et des vues détaillées pour appuyer la prise de décision via des applications analytiques personnalisées.

IA@Horse Technologies : comment nous avons doublé notre capacité d’analyse qualité avec Altair RapidMiner ?

2025-10-01 · Big Data & AI Paris 2025

Face To Face

by Cristian Tudose (Horse) , Oscar Calvo (Altair)

AI/ML

Dans un contexte où 50 % des véhicules sur les routes en 2040 seront encore thermiques ou hybrides, la robustesse des systèmes reste une exigence clé.

La complexité des groupes motopropulseurs hybrides multiplie les modes de défaillance potentiels. Pour garantir un bon niveau de qualité, une solution IA développée avec Altair accélère l’analyse des problèmes et permet de doubler le nombre de cas traités. Elle classe automatiquement les défaillances connues et assiste les experts via un LLM pour les cas complexes : les données proviennent des garages (pièces, ressentis client, diagnostics). Grâce à la solution low-no code Altair RapidMiner, les experts qualité peuvent adapter l’algorithme sans programmation.

Horse Technologies est une entreprise née de Renault, avec plus de 125 ans d’expertise dans les systèmes de groupes motopropulseurs. Elle fait désormais partie de Horse Powertrain, la JV créée par Renault, Geely et Aramco.

Inverso (Sparkassen-Finanzgruppe): From Claims to AI: Building Production-Ready LLM Workflows on Snowflake for Insurance Automation

2025-10-01 · Snowflake World Tour Berlin

session

AI/ML Snowflake

As Germany's largest insurance groups embrace AI transformation, the challenge isn't just implementing large language models—it's building scalable, compliant infrastructure for enterprise-grade AI workflows. This session explores how Inverso, as a specialized Snowflake MSP, is revolutionizing insurance operations by combining Snowflake's data platform with cutting-edge AI to create production-ready solutions for automated claims processing and policy interpretation.

Réinventer le service public avec l’IA de confiance : l’exemple concret de l’administration française

2025-10-01 · Big Data & AI Paris 2025

Face To Face

by Roberto Caurand , Aya CHERKAOUI MAKNASSI (IBM)

AI/ML Athena IBM

Dans un contexte de forte pression opérationnelle, l’administration française a su innover pour améliorer la qualité de service tout en optimisant ses ressources. Grâce à la plateforme d’IA développée par ATHENA Decision Systems – combinant LLM, moteur de règles et orchestration intelligente – le projet DELPHES permet de traiter plus rapidement et équitablement les demandes des usagers étrangers en préfecture.

Cette solution, fondée sur l’IA de confiance d’IBM, démontre comment l’automatisation responsable, sous supervision humaine, peut transformer des processus complexes, réduire les délais, améliorer la satisfaction client et limiter les coûts.

Un retour d’expérience concret, inspirant pour les organisations publiques comme privées, confrontées à des enjeux similaires de volume, de qualité de service et de conformité. Venez découvrir comment cette approche peut s’adapter à vos propres défis métiers.

Beyond Prototyping: Building Production-Level Apps with Streamlit

2025-10-01 · PyData Paris 2025 Watch

talk

by Johannes Rieke (Streamlit) , Arnaud Miribel (Streamlit)

Streamlit is a great tool for prototyping data apps, but is it also fit for complex, production-level apps? In this talk, the Streamlit team will showcase new features, LLM integrations, and deployment options that can help you effectively use Streamlit in your company, whether it’s a small startup or a large enterprise.

Venez voir QDA Miner & WordStat 2025 en action !

2025-10-01 · PyData Paris 2025

Face To Face

AI/ML

Gardez le contrôle sur l'IA: choisissez votre moteur (OpenAI, Gemini, etc.) et personnalisez les prompts pour une transparence totale.

The Rise of the Context Company: Reshaping Data Engineering with Saket Saurabh

2025-10-01 · The Joe Reis Show Listen

podcast_episode

by Saket Saurabh (AWS) , Joe Reis (DeepLearning.AI)

AI/ML Data Engineering RAG

In this episode, I sit down with Saket Saurabh (CEO of Nexla) to discuss the fundamental shift happening in the AI landscape. The conversation is moving beyond the race to build the biggest foundational models and towards a new battleground: context. We explore what it means to be a "model company" versus a "context company" and how this changes everything for data strategy and enterprise AI.

Join us as we cover: Model vs. Context Companies: The emerging divide between companies building models (like OpenAI) and those whose advantage lies in their unique data and integrations. The Limits of Current Models: Why we might be hitting an asymptote with the current transformer architecture for solving complex, reliable business processes. "Context Engineering": What this term really means, from RAG to stitching together tools, data, and memory to feed AI systems. The Resurgence of Knowledge Graphs: Why graph databases are becoming critical for providing deterministic, reliable information to probabilistic AI models, moving beyond simple vector similarity. AI's Impact on Tooling: How tools like Lovable and Cursor are changing workflows for prototyping and coding, and the risk of creating the "-10x engineer." The Future of Data Engineering: How the field is expanding as AI becomes the primary consumer of data, requiring a new focus on architecture, semantics, and managing complexity at scale.

Deep Research Agent with MCP

2025-09-30 · AI Agent Meetup

talk

by Sarmad Sadri (Last Mile)

AI/ML

Deep Research architectures have become increasingly popular over the past three months as users want more complex tasks to be handled by LLMs. The LastMile AI team has gone through a journey to understand what core components are needed for a Deep Research Agent and what qualities enable it to be ready for production. We'll be sharing the core architecture for a Deep Research Agent and how these components influence the behavior of the agent.

ActiveTigger: A Collaborative Text Annotation Research Tool for Computational Social Sciences

2025-09-30 · PyData Paris 2025 Watch

talk

by Emilien SCHULTZ , Paul Girard , Julien Boelaert

AI/ML API Computer Science GenAI GitHub NLP Python React

The exponential growth of textual data—ranging from social media posts and digital news archives to speech-to-text transcripts—has opened new frontiers for research in the social sciences. Tasks such as stance detection, topic classification, and information extraction have become increasingly common. At the same time, the rapid evolution of Natural Language Processing, especially pretrained language models and generative AI, has largely been led by the computer science community, often leaving a gap in accessibility for social scientists.

To address this, we initiated since 2023 the development of ActiveTigger, a lightweight, open-source Python application (with a web frontend in React) designed to accelerate annotation process and manage large-scale datasets through the integration of fine-tuned models. It aims to support computational social science for a large public both within and outside social sciences. Already used by a dynamic community in social sciences, the stable version is planned for early June 2025.

From a more technical prospect, the API is designed to manage the complete workflow from project creation, embeddings computation, exploration of the text corpus, human annotation with active learning, fine-tuning of pre-trained models (BERT-like), prediction on a larger corpus, and export. It also integrates LLM-as-a-service capabilities for prompt-based annotation and information extraction, offering a flexible approach for hybrid manual/automatic labeling. Accessible both with a web frontend and a Python client, ActiveTigger encourages customization and adaptation to specific research contexts and practices.

In this talk, we will delve into the motivations behind the creation of ActiveTigger, outline its technical architecture, and walk through its core functionalities. Drawing on several ongoing research projects within the Computational Social Science (CSS) group at CREST, we will illustrate concrete use cases where ActiveTigger has accelerated data annotation, enabled scalable workflows, and fostered collaborations. Beyond the technical demonstration, the talk will also open a broader reflection on the challenges and opportunities brought by generative AI in academic research—especially in terms of reliability, transparency, and methodological adaptation for qualitative and quantitative inquiries.

The repository of the project : https://github.com/emilienschultz/activetigger/

The development of this software is funded by the DRARI Ile-de-France and supported by Progédo.

Investing for Programmers

2025-09-29 · O'Reilly Data Science Books O'Reilly Amazon

book

by Stefan Papp

AI/ML API GenAI Matplotlib NumPy Pandas Python data data-science data-science-tools

Maximize your portfolio, analyze markets, and make data-driven investment decisions using Python and generative AI. Investing for Programmers shows you how you can turn your existing skills as a programmer into a knack for making sharper investment choices. You’ll learn how to use the Python ecosystem, modern analytic methods, and cutting-edge AI tools to make better decisions and improve the odds of long-term financial success. In Investing for Programmers you’ll learn how to: Build stock analysis tools and predictive models Identify market-beating investment opportunities Design and evaluate algorithmic trading strategies Use AI to automate investment research Analyze market sentiments with media data mining In Investing for Programmers you'll learn the basics of financial investment as you conduct real market analysis, connect with trading APIs to automate buy-sell, and develop a systematic approach to risk management. Don’t worry—there’s no dodgy financial advice or flimsy get-rich-quick schemes. Real-life examples help you build your own intuition about financial markets, and make better decisions for retirement, financial independence, and getting more from your hard-earned money. About the Technology A programmer has a unique edge when it comes to investing. Using open-source Python libraries and AI tools, you can perform sophisticated analysis normally reserved for expensive financial professionals. This book guides you step-by-step through building your own stock analysis tools, forecasting models, and more so you can make smart, data-driven investment decisions. About the Book Investing for Programmers shows you how to analyze investment opportunities using Python and machine learning. In this easy-to-read handbook, experienced algorithmic investor Stefan Papp shows you how to use Pandas, NumPy, and Matplotlib to dissect stock market data, uncover patterns, and build your own trading models. You’ll also discover how to use AI agents and LLMs to enhance your financial research and decision-making process. What's Inside Build stock analysis tools and predictive models Design algorithmic trading strategies Use AI to automate investment research Analyze market sentiment with media data mining About the Reader For professional and hobbyist Python programmers with basic personal finance experience. About the Author Stefan Papp combines 20 years of investment experience in stocks, cryptocurrency, and bonds with decades of work as a data engineer, architect, and software consultant. Quotes Especially valuable for anyone looking to improve their investing. - Armen Kherlopian, Covenant Venture Capital A great breadth of topics—from basic finance concepts to cutting-edge technology. - Ilya Kipnis, Quantstrat Trader A top tip for people who want to leverage development skills to improve their investment possibilities. - Michael Zambiasi, Raiffeisen Digital Bank Brilliantly bridges the worlds of coding and finance. - Thomas Wiecki, PyMC Labs

Part 3: Cross-modal Projection

2025-09-27 · Building AI Agents with Multimodal Models: NVIDIA DLI Workshop for Academia

workshop

ocr pdf processing vision language model (vlm)

Transform an LLM into a Vision Language Model (VLM). Process PDFs like a pro with OCR tools.

talk-data.com

Activity Trend

Top Events

Top Speakers

Sevdesk: Expectation - Seamless Migration. Reality: Debugging at 2 AM

Créer en 20 minutes une appli Data enrichie avec l’IA c’est possible ! Démo avec OVHcloud Data Platform et AI Endpoints

Une IA pour évaluer la conformité à l'AI Act

Building Resilient (ML) Pipelines for MLOps

Repetita Non Iuvant: Why Generative AI Models Cannot Feed Themselves

Documents Meet LLMs: Tales from the Trenches

Et si le plus important en 2025 c’était la donnée, et non le modèle de LLM ?

Et si le plus important en 2025 c’était la donnée, et non le modèle de LLM ?

IA, Agents et GraphRAG: une synergie pour l’intelligence augmentée

Réinventer l’allocation des coûts: quand la qualité des données rencontre l’IA générative dans Alteryx

IA@Horse Technologies : comment nous avons doublé notre capacité d’analyse qualité avec Altair RapidMiner ?

Inverso (Sparkassen-Finanzgruppe): From Claims to AI: Building Production-Ready LLM Workflows on Snowflake for Insurance Automation

Réinventer le service public avec l’IA de confiance : l’exemple concret de l’administration française

Beyond Prototyping: Building Production-Level Apps with Streamlit

Venez voir QDA Miner & WordStat 2025 en action !

The Rise of the Context Company: Reshaping Data Engineering with Saket Saurabh

Deep Research Agent with MCP

ActiveTigger: A Collaborative Text Annotation Research Tool for Computational Social Sciences

Investing for Programmers

Part 3: Cross-modal Projection