API

Sécurité, souveraineté et fiabilité : de l'IA Gen au multi-agents en production

2025-10-02 · Big Data & AI Paris 2025

Face To Face

by Gabrielle Balleyguier (Ionos France) , Jawad Alaoui (CEO of NORMA)

AI/ML LLM

Déployer des agents IA en production pose trois défis : la sécurité des données, la souveraineté de l’infrastructure et la fiabilité des résultats.

Avec le AI Model Hub de IONOS, vous accédez à des modèles open source via une API OpenAI-compatible, stateless et opérée en Europe, garantissant conformité et souveraineté.

Découvrez un exemple concret de système multi-agents qui combine LLMs, accès à des sources variées, utilisation d'outils, étapes de processing et mécanismes de sélection pour fournir des réponses fiables et contextualisées.

La plateforme NORMA, disponible en version open source, vient évaluer chaque étape (extraction, classification, génération) pour détecter faiblesses ou régressions, et garantir un comportement sûr.

Ses capacités de batch testing et d'intégration continue permettent de comparer vos versions, suivre la qualité dans le temps et bloquer toute régression avant mise en production.

En combinant l'infrastructure souveraine de IONOS et l’évaluation continue de NORMA, vous obtenez un pipeline robuste pour transformer vos PoC en solutions IA fiables et sécurisées!

Siemens | Talk-To-Your-Data at Scale: SiemensGPT meets Cortex Analyst

2025-10-01 · Snowflake World Tour Berlin

session

React Snowflake SQL

With over 50,000 active users, discover how we transformed enterprise data interaction through Snowflake's Cortex Analyst API with SiemensGPT. Our plugin architecture, powered by the ReACT agent model, converts natural language into SQL queries and dynamic visualizations, orchestrating everything through a unified interface. Beyond productivity gains, this solution democratizes data access across Siemens, enabling employees at all levels to derive business insights through simple conversations.

CoSApp: an open-source library to design complex systems

2025-10-01 · PyData Paris 2025 Watch

talk

by Étienne Lac

GitLab Python

CoSApp, for Collaborative System Approach, is a Python library dedicated to the simulation and design of multi-disciplinary systems. It is primarily intended for engineers and system architects during the early stage of industrial product design. The API of CoSApp is focused on simplicity and explicit declaration of design problems. Special attention is given to modularity; a very flexible mechanism of solver assembly allows users to construct complex, customized simulation workflows. This presentation aims at presenting the key features of the framework.

https://cosapp.readthedocs.io https://gitlab.com/cosapp/cosapp

End-to-End Data Engineering with Python in Snowflake

2025-10-01 · Snowflake World Tour Berlin

session

Data Engineering Pandas Python Snowflake

Modern data engineering leverages Python to build robust, scalable, end-to-end workflows. In this talk, we will cover how Snowflake offers you a flexible development environment for developing Python data pipelines, performing transformation at scale, orchestrating and deploying your pipelines at scale. Topics we’ll cover include: – Ingest: Data source APIs, Snowflake file-to-read and ingest data of any format when files arrive, with sources outside Snowflake – Develop: Packaging (artifact repo), Python runtimes, IDE (Notebook, vscode) – Transform: Snowpark pandas, UDFs, UDAFs – Deploy: Tasks, Notebook scheduling

Documents Meet LLMs: Tales from the Trenches

2025-10-01 · PyData Paris 2025 Watch

talk

by Miklos Erdelyi , Nour El Mawass

Cloud Computing GenAI LLM

Processing documents with LLMs comes with unexpected challenges: handling long inputs, enforcing structured outputs, catching hallucinations, and recovering from partial failures. In this talk, we’ll cover why large context windows are not a silver bullet, why chunking is deceptively hard and how to design input and output that allow for intelligent retrial. We'll also share practical prompting strategies, discuss OCR and parsing tools, compare different LLMs (and their cloud APIs) and highlight real-world insights from our experience developing production GenAI applications with multiple document processing scenarios.

ActiveTigger: A Collaborative Text Annotation Research Tool for Computational Social Sciences

2025-09-30 · PyData Paris 2025 Watch

talk

by Emilien SCHULTZ , Paul Girard , Julien Boelaert

AI/ML Computer Science GenAI GitHub LLM NLP Python React

The exponential growth of textual data—ranging from social media posts and digital news archives to speech-to-text transcripts—has opened new frontiers for research in the social sciences. Tasks such as stance detection, topic classification, and information extraction have become increasingly common. At the same time, the rapid evolution of Natural Language Processing, especially pretrained language models and generative AI, has largely been led by the computer science community, often leaving a gap in accessibility for social scientists.

To address this, we initiated since 2023 the development of ActiveTigger, a lightweight, open-source Python application (with a web frontend in React) designed to accelerate annotation process and manage large-scale datasets through the integration of fine-tuned models. It aims to support computational social science for a large public both within and outside social sciences. Already used by a dynamic community in social sciences, the stable version is planned for early June 2025.

From a more technical prospect, the API is designed to manage the complete workflow from project creation, embeddings computation, exploration of the text corpus, human annotation with active learning, fine-tuning of pre-trained models (BERT-like), prediction on a larger corpus, and export. It also integrates LLM-as-a-service capabilities for prompt-based annotation and information extraction, offering a flexible approach for hybrid manual/automatic labeling. Accessible both with a web frontend and a Python client, ActiveTigger encourages customization and adaptation to specific research contexts and practices.

In this talk, we will delve into the motivations behind the creation of ActiveTigger, outline its technical architecture, and walk through its core functionalities. Drawing on several ongoing research projects within the Computational Social Science (CSS) group at CREST, we will illustrate concrete use cases where ActiveTigger has accelerated data annotation, enabled scalable workflows, and fostered collaborations. Beyond the technical demonstration, the talk will also open a broader reflection on the challenges and opportunities brought by generative AI in academic research—especially in terms of reliability, transparency, and methodological adaptation for qualitative and quantitative inquiries.

The repository of the project : https://github.com/emilienschultz/activetigger/

The development of this software is funded by the DRARI Ile-de-France and supported by Progédo.

Sparrow, Pirates of the Apache Arrow

2025-09-30 · PyData Paris 2025 Watch

talk

by Johan Mabille , Alexis Placet

Arrow C++

Sparrow is a lightweight C++20 idiomatic implementation of the Apache Arrow memory specification. Designed for compatibility with the Arrow C data interface, Sparrow enables seamless data exchange with other libraries supporting the Arrow format. It also offers high-level APIs, ensuring interoperability with standard modern C++ algorithms.

A Hitchhiker's Guide to the Array API Standard Ecosystem

2025-09-30 · PyData Paris 2025 Watch

talk

by Lucas Colley

NumPy Python PyTorch Scikit-learn SciPy

The array API standard is unifying the ecosystem of Python array computing, facilitating greater interoperability between code written for different array libraries, including NumPy, CuPy, PyTorch, JAX, and Dask.

But what are all of these "array-api-" libraries for? How can you use these libraries to 'future-proof' your libraries, and provide support for GPU and distributed arrays to your users? Find out in this talk, where I'll guide you through every corner of the array API standard ecosystem, explaining how SciPy and scikit-learn are using all of these tools to adopt the standard. I'll also be sharing progress updates from the past year, to give you a clear picture of where we are now, and what the future holds.

Investing for Programmers

2025-09-29 · O'Reilly Data Science Books O'Reilly Amazon

book

by Stefan Papp

AI/ML GenAI LLM Matplotlib NumPy Pandas Python data data-science data-science-tools

Maximize your portfolio, analyze markets, and make data-driven investment decisions using Python and generative AI. Investing for Programmers shows you how you can turn your existing skills as a programmer into a knack for making sharper investment choices. You’ll learn how to use the Python ecosystem, modern analytic methods, and cutting-edge AI tools to make better decisions and improve the odds of long-term financial success. In Investing for Programmers you’ll learn how to: Build stock analysis tools and predictive models Identify market-beating investment opportunities Design and evaluate algorithmic trading strategies Use AI to automate investment research Analyze market sentiments with media data mining In Investing for Programmers you'll learn the basics of financial investment as you conduct real market analysis, connect with trading APIs to automate buy-sell, and develop a systematic approach to risk management. Don’t worry—there’s no dodgy financial advice or flimsy get-rich-quick schemes. Real-life examples help you build your own intuition about financial markets, and make better decisions for retirement, financial independence, and getting more from your hard-earned money. About the Technology A programmer has a unique edge when it comes to investing. Using open-source Python libraries and AI tools, you can perform sophisticated analysis normally reserved for expensive financial professionals. This book guides you step-by-step through building your own stock analysis tools, forecasting models, and more so you can make smart, data-driven investment decisions. About the Book Investing for Programmers shows you how to analyze investment opportunities using Python and machine learning. In this easy-to-read handbook, experienced algorithmic investor Stefan Papp shows you how to use Pandas, NumPy, and Matplotlib to dissect stock market data, uncover patterns, and build your own trading models. You’ll also discover how to use AI agents and LLMs to enhance your financial research and decision-making process. What's Inside Build stock analysis tools and predictive models Design algorithmic trading strategies Use AI to automate investment research Analyze market sentiment with media data mining About the Reader For professional and hobbyist Python programmers with basic personal finance experience. About the Author Stefan Papp combines 20 years of investment experience in stocks, cryptocurrency, and bonds with decades of work as a data engineer, architect, and software consultant. Quotes Especially valuable for anyone looking to improve their investing. - Armen Kherlopian, Covenant Venture Capital A great breadth of topics—from basic finance concepts to cutting-edge technology. - Ilya Kipnis, Quantstrat Trader A top tip for people who want to leverage development skills to improve their investment possibilities. - Michael Zambiasi, Raiffeisen Digital Bank Brilliantly bridges the worlds of coding and finance. - Thomas Wiecki, PyMC Labs

Minus Three Tier: Data Architecture Turned Upside Down

2025-09-26 · PyData Amsterdam 2025 Watch

talk

by Hannes Mühleisen (DuckDB Labs)

Data Engineering Data Lakehouse DWH

Every data architecture diagram out there makes it abundantly clear who's in charge: At the bottom sits the analyst, above that is an API server, and on the very top sits the mighty data warehouse. This pattern is so ingrained we never ever question its necessity, despite its various issues like slow data response time, multi-level scaling issues, and massive cost.

But there is another way: Disconnect of storage and compute enables localization of query processing closer to people, leading to much snappier responses, natural scaling with client-side query processing, and much reduced cost.

In this talk, it will be discussed how modern data engineering paradigms like decomposition of storage, single-node query processing, and lakehouse formats enable a radical departure from the tired three-tier architecture. By inverting the architecture we can put user's needs first. We can rely on commoditised components like object store to enable fast, scalable, and cost-effective solutions.

What's Hard About Developing LLM-Powered Applications?

2025-09-25 · Big Data LDN 2025

Face To Face

by Ian Massingham (Griptape.ai)

LLM

In this talk, you will learn about some of the common challenges that you might encounter while developing meaningful applications with large language models.

Using non-deterministic systems as the basis for important applications is certainly an 'interesting' new frontier for software development, but hope is not lost. In this session, we will explore some of the well known (and less well known) issues in building applications on the APIs provided by LLM providers, and on 'open' LLMs, such as Mistral, Llama, or DeepSeek.

We will also (of course) dive into some of the approaches that you can take to address these challenges, and mitigate some of the inherent behaviors that are present within LLMs, enabling you to build more reliable and robust systems on top of LLMs, unlocking the potential of this new development paradigm.

Small Language Models Beginners course

2025-09-25 · Big Data LDN 2025

Face To Face

by Roberto Flores (Magnum Ice Cream Company (a division of Unilever))

Databricks

In this session, we will explore the world of small language models, focusing on their unique advantages and practical applications. We will cover the basics of language models, the benefits of using smaller models, and provide hands-on examples to help beginners get started. By the end of the session, attendees will have a solid understanding of how to leverage small language models in their projects. The session will highlight the efficiency, customization, and adaptability of small models, making them ideal for edge devices and real-time applications.

We will introduce attendees to two highly used Small Language Models: Qwen3 and SmolLM3. Specifically, we will cover:

1. Accessing Models: How to navigate HuggingFace to explore and select available models. How to view model documentation and determine its usefulness for specific tasks

2. Deployment: How to get started using

(a) Inference Provider - using HuggingFace inference API or Google CLI

(b) On-Tenant - using Databricks Model Serving

(c) Running the Model Locally - Using Ollama and LMstudio

3. We also examine the tradeoffs of each route

Future of Data Engineering in an Agentic World

2025-09-25 · Big Data LDN 2025

Face To Face

by Cyril Sonnefraud (Matillion) , Joe Herbert (Matillion)

Data Engineering ETL/ELT Snowflake YAML

This session will provide a Maia demo with roadmap teasers. The demo will showcase Maia's core capabilities: authoring pipelines in business language, multiplying productivity by accelerating tasks, and enabling self-service. It demonstrates how Maia takes natural language prompts and translates them into YAML-based, human-readable Data Pipeline Language (DPL), generating graphical pipelines. Expect to see Maia interacting with Snowflake metadata to sample data and suggest transformations, as well as its ability to troubleshoot and debug pipelines in real-time. The session will also cover how Maia can create custom connectors from REST API documentation in seconds, a task that traditionally takes days . Roadmap teasers will likely include the upcoming Semantic Layer, a Pipeline Reviewing Agent, and enhanced file type support for various legacy ETL tools and code conversions.

Live Demo - Build a custom Fivetran connector in 20 minutes

2025-09-25 · Big Data LDN 2025

Face To Face

by Aymen Ben Azouz (Fivetran) , Mark James (Fivetran)

AI/ML Fivetran Snowflake

In this 20-minute session, you'll learn how to build a custom Fivetran connector using the Fivetran Connector SDK and the Anthropic Workbench (AI Assistant) to integrate data from a custom REST API into Snowflake.

You'll then learn how to create a Streamlit in Snowflake data application powering metrics and Snowflake Cortex AI-driven applications.

From AI skepticism to shipping: Building AI that actually works

2025-09-25 · Big Data LDN 2025

Face To Face

by Arielle Strong (Omni) , Colin Zima (Omni)

AI/ML Analytics Omni

In this session, Omni CEO Colin Zima and VP of Product Arielle Strong will share how early experiments led to AI features our customers actually use and love: from natural language chat, to embeddable AI products, to APIs and an MCP server.

They’ll walk through what worked, what didn’t, and how AI has reshaped our product roadmap. Expect real-world examples of AI analytics in production, along with best practices for getting your data AI-ready.

Building at Scale: Real-World MLOps, Data Quality & Enterprise AI Integration

2025-09-24 · Big Data LDN 2025

Face To Face

by Andrea Isoni (AI Technologies) , Julia Pattie (Kubrick) , Ravit Jain (The Ravit Show) , Patrik Liu Tran (Validio) , Justin Langford , Ben Johnson (Uptitude)

AI/ML Data Quality MLOps

As AI adoption accelerates across industries, many organisations are realising that building a model is only the beginning. Real-world deployment of AI demands robust infrastructure, clean and connected data, and secure, scalable MLOps pipelines. In this panel, experts from across the AI ecosystem share lessons from the frontlines of operationalising AI at scale.

We’ll dig into the tough questions:

• What are the biggest blockers to AI adoption in large enterprises — and how can we overcome them?

• Why does bad data still derail even the most advanced models, and how can we fix the data quality gap?

• Where does synthetic data fit into real-world AI pipelines — and how do we define “real” data?

• Is Agentic AI the next evolution, or just noise — and how should MLOps prepare?

• What does a modern, secure AI stack look like when using external partners and APIs?

Expect sharp perspectives on data integration, model lifecycle management, and the cyber-physical infrastructure needed to make AI more than just a POC.

Revolutionising Iceberg Integration: The Qlik Open Lakehouse Difference

2025-09-24 · Big Data LDN 2025

Face To Face

by Ted Orme (Qlik)

Data Lakehouse Iceberg Qlik

Unlock the true potential of your data with the Qlik Open Lakehouse, a revolutionary approach to Iceberg integration designed for the enterprise. Many organizations face the pain points of managing multiple, costly data platforms and struggling with low-latency ingestion. While Apache Iceberg offers robust features like ACID transactions and schema evolution, achieving optimal performance isn't automatic; it requires sophisticated maintenance. Introducing the Qlik Open Lakehouse, a fully managed and optimized solution built on Apache Iceberg, powered by Qlik's Adaptive Iceberg Optimizer. Discover how you can do data differently and achieve 10x faster queries, a 33-42% reduction in file API overhead, and ultimately, a 50% reduction in costs through streamlined operations and compute savings.

Faster, Smarter Insights: How AlphaSights Transformed Analytics

2025-09-24 · Big Data LDN 2025

Face To Face

by Rami Alaeddine (AlphaSights) , Cedric Contesto (AlphaSights)

AI/ML Analytics

Is your analytics workflow stuck in fragmented chaos? AlphaSights, the global leader in expert knowledge on demand, used to juggle queries, scripts, spreadsheets, and dashboards across different tools just to get one analysis out the door. Manual updates slowed their teams, stakeholders waited too long for insights, and opportunities slipped through the cracks.With Hex, AlphaSights built a fully integrated Research Hub that unifies data queries, API calls, ML-powered enrichment, and reporting — all in one place. They eliminated manual work, automated updates, and empowered business teams to act faster on opportunities.The result: faster reaction times, broader coverage, and measurable commercial impact. Join this session to see how AlphaSights turned fragmented workflows into a seamless, automated pipeline — and learn how your team can build faster, smarter insights too.

From pipelines automation to trusted agents: THE PATH TO HIGH DATA ROI

2025-09-24 · Big Data LDN 2025

Face To Face

by Taylor McGrath (Boomi)

AI/ML Data Management

In the age of agentic AI, competitive advantage lies not only in AI models, but in the quality of the data agents reason on and the agility of the tools that feed them. To fully realize the ROI of agentic AI, organizations need a platform that enables high-quality data pipelines and provides scalable, enterprise-grade tools. In this session, discover how a unified platform for integration, data management, MCP server management, API management, and agent orchestration can help you to bring cohesion and control to how data and agents are used across your organization.

Death, Taxes, and Deprecation

2025-09-11 · GraphQL Berlin Meetup #29

talk

When we launched our new GraphQL API at Netflix, it felt perfect—destined to power hundreds of millions of devices. Yet, change is inevitable. Even if your schema seems flawless today (which it isn't), requirements will shift, new features will emerge, and regrets will follow. GraphQL promises evolvability, allowing us to move forward without multiple API versions. But how does this hold up in practice? We mark fields as @deprecated, but what happens next? How can we embrace experimentation without entombing technical debt in the API? Does federation complicate things? Evolving your schema without breaking clients is easy, right? Right???. Drawing from experience with the Netflix API, this talk explores techniques for evolving your schema safely and painlessly. We'll cover the schema lifecycle—from experimentation to design, deprecation, and deletion. Attendees will leave with: - Schema design principles that facilitate change - Practical techniques for evolving GraphQL schemas - Strategies for managing a deprecation workflow Join us as we learn to face the inevitability of change with confidence and serenity.

talk-data.com

Activity Trend

Top Events

Top Speakers

Sécurité, souveraineté et fiabilité : de l'IA Gen au multi-agents en production

Siemens | Talk-To-Your-Data at Scale: SiemensGPT meets Cortex Analyst

CoSApp: an open-source library to design complex systems

End-to-End Data Engineering with Python in Snowflake

Documents Meet LLMs: Tales from the Trenches

ActiveTigger: A Collaborative Text Annotation Research Tool for Computational Social Sciences

Sparrow, Pirates of the Apache Arrow

A Hitchhiker's Guide to the Array API Standard Ecosystem

Investing for Programmers

Minus Three Tier: Data Architecture Turned Upside Down

What's Hard About Developing LLM-Powered Applications?

Small Language Models Beginners course

Future of Data Engineering in an Agentic World

Live Demo - Build a custom Fivetran connector in 20 minutes

From AI skepticism to shipping: Building AI that actually works

Building at Scale: Real-World MLOps, Data Quality & Enterprise AI Integration

Revolutionising Iceberg Integration: The Qlik Open Lakehouse Difference

Faster, Smarter Insights: How AlphaSights Transformed Analytics

From pipelines automation to trusted agents: THE PATH TO HIGH DATA ROI

Death, Taxes, and Deprecation