talk-data.com talk-data.com

Topic

Python

programming_language data_science web_development

1446

tagged

Activity Trend

185 peak/qtr
2020-Q1 2026-Q1

Activities

1446 activities · Newest first

Vos modèles prédictifs vieillissent mal ? Une mise à jour de vos packages (pandas, scikit-learn, lightgbm…), et c’est la panne assurée en production…

Avec Scoring.AI, reprenez le contrôle total et garantissez leur pérennité. Notre outil innovant construit des scores hyper performants et traduit automatiquement leur déploiement en code Python pur, basé uniquement sur Pandas et NumPy.

Résultat ?

Une portabilité totale : vos modèles fonctionnent en production indépendamment des packages et outils qui ont servi à les construire

Une maintenance simplifiée : les équipes IT peuvent mettre à jour leur stack technique sans risque de casse

Propriété et transparence accrue : un code lisible, auditable et facile à déployer, même dans des environnements contraints

À travers des cas concrets et une démo live, explorez comment désenclaver vos modèles des dépendances logicielles et garantir leur survie sur le long terme. Parce qu’un bon modèle, c’est un modèle qui dure !

CoSApp: an open-source library to design complex systems

CoSApp, for Collaborative System Approach, is a Python library dedicated to the simulation and design of multi-disciplinary systems. It is primarily intended for engineers and system architects during the early stage of industrial product design. The API of CoSApp is focused on simplicity and explicit declaration of design problems. Special attention is given to modularity; a very flexible mechanism of solver assembly allows users to construct complex, customized simulation workflows. This presentation aims at presenting the key features of the framework.

https://cosapp.readthedocs.io https://gitlab.com/cosapp/cosapp

Parallel processing using CRDTs

Beyond embarrassingly parallel processing problems, data must be shared between workers for them to do something useful. This can be done by: - sharing memory between threads, with the issue of preventing access to shared data to avoid race conditions. - copying memory to subprocesses, with the challenge of synchronizing data whenever it is mutated.

In Python, using threads is not an option because of the GIL (global interpreter lock), which prevents true parallelism. This might change in the future with the removal of the GIL, but usual problems with multithreading will appear, such as using locks and managing their complexity. Subprocesses don't suffer from the GIL, but usually need to access a database for sharing data, which is often too slow. Algorithms such as HAMT (hash array mapped trie) have been used to efficiently and safely share data stored in immutable data structures, removing the need for locks. In this talk we will show how CRDTs (conflict-free replicated data type) can be used for the same purpose.

Modern data engineering leverages Python to build robust, scalable, end-to-end workflows. In this talk, we will cover how Snowflake offers you a flexible development environment for developing Python data pipelines, performing transformation at scale, orchestrating and deploying your pipelines at scale. Topics we’ll cover include: – Ingest: Data source APIs, Snowflake file-to-read and ingest data of any format when files arrive, with sources outside Snowflake – Develop: Packaging (artifact repo), Python runtimes, IDE (Notebook, vscode) – Transform: Snowpark pandas, UDFs, UDAFs – Deploy: Tasks, Notebook scheduling

How to do real TDD in data science? A journey from pandas to polars with pelage!

In the world of data, inconsistencies or inaccuracies often presents a major challenge to extract valuable insights. Yet the number of robust tools and practices to address those issues remain limited. Particularly, the practice of TDD remains quite difficult in data science, while it is a standard among classic software development, also because of poorly adapted tools and frameworks.

To address this issue we released Pelage, an open-source Python package to facilitate data exploration and testing, which relies on Polars intuitive syntax and speed. Pelage empowers data scientists and analysts to facilitate data transformation, enhance data quality and improve code clarity.

We will demonstrate, in a test-first approach, how you can use this library in a meaningful data science workflow to gain greater confidence for your data transformations.

See website: https://alixtc.github.io/pelage/

PyPI in the face: running jokes that PyPI download stats can play on you

We all love to tell stories with data and we all love to listen to them. Wouldn't it be great if we could also draw actionable insights from these nice stories?

As scikit-learn maintainers, we would love to use PyPI download stats and other proxy metrics (website analytics, github repository statistics, etc ...) to help inform some of our decisions like: - how do we increase user awareness of best practices (please use Pipeline and cross-validation)? - how do we advertise our recent improvements (use HistGradientBoosting rather than GradientBoosting, TunedThresholdClassifier, PCA and a few other models can run on GPU) ? - do users care more about new features from recent releases or consolidation of what already exists? - how long should we support older versions of Python, numpy or scipy ?

In this talk we will highlight a number of lessons learned while trying to understand the complex reality behind these seemingly simple metrics.

Telling nice stories is not always hard, trying to grasp the reality behind these metrics is often tricky.

Sharing computational course material at larger scale: a French multi-tenant attempt

With the rise of computation and data as pillars of science, institutions are struggling to provide large-scale training to their students and staff. Often, this leads to redundant, fragmented efforts, with each organization producing its own bespoke training material. In this talk, we report on a collaborative multi-tenant initiative to produce a shared corpus of interactive training resources in the Python language, designed as a digital common that can be adapted to diverse contexts and formats in French higher education and beyond.

Send us a text Replay Episode: Python, Anaconda, and the AI Frontier with Peter Wang Peter Wang — Chief AI & Innovation Officer and Co-founder of Anaconda — is back on Making Data Simple! Known for shaping the open-source ecosystem and making Python a powerhouse, Peter dives into Anaconda’s new AI incubator, the future of GenAI, and why Python isn’t just “still a thing”… it’s the thing. From branding and security to leadership and philosophy, this episode is a wild ride through the biggest opportunities (and risks) shaping AI today. Timestamps:  01:27 Meet Peter Wang 05:10 Python or R? 05:51 Anaconda’s Differentiation 07:08 Why the Name Anaconda 08:24 The AI Incubator 11:40 GenAI 14:39 Enter Python 16:08 Anaconda Commercial Services 18:40 Security 20:57 Common Points of Failure 22:53 Branding 24:50 watsonx Partnership 28:40 AI Risks 34:13 Getting Philosophical 36:13 China 44:52 Leadership Style

Linkedin: linkedin.com/in/pzwang Website: https://www.linkedin.com/company/anacondainc/, https://www.anaconda.com/ Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

Coding with AI

Practical techniques to accelerate software development using generative AI. Let’s get real. You’d like to hand off a lot of tedious software development tasks to an assistant—and now you can! AI-powered coding tools like Copilot can accelerate research, design, code creation, testing, troubleshooting, documentation, refactoring and more. Coding with AI shows you how. Written for working developers, this book fast-tracks you to AI-powered productivity with bite-size projects, tested prompts, and techniques for getting the most out of AI. In Coding with AI you’ll learn how to: Incorporate AI tools into your development workflow Create pro-quality documentation and tests Debug and refactor software efficiently Create and organize reusable prompts Coding with AI takes you through several small Python projects with the help of AI tools, showing you exactly how to use AI to create and refine real software. This book skips the baby steps and goes straight to the techniques you’ll use on the job, every day. You’ll learn to sidestep AI inefficiencies like hallucination and identify the places where AI can save you the most time and effort. About the Technology Taking a systematic approach to coding with Al will deliver the clarity, consistency, and scalability you need for production-grade applications. With practice, you can use AI tools to break down complex problems, generate maintainable code, enhance your models, and streamline debugging, testing, and collaboration. As you learn to work with AI’s strengths—and recognize its limitations—you’ll build more reliable software and find that the quality of your generated code improves significantly. About the Book Coding with AI shows you how to gain massive benefits from a powerful array of AI-driven development tools and techniques. And it shares the insights and methods you need to use them effectively in professional projects. Following realistic examples, you’ll learn AI coding for database integration, designing a UI, and establishing an automated testing suite. You’ll even vibe code a game—but only after you’ve built a rock-solid foundation. What's Inside Incorporate AI into your development workflow Create pro-quality documentation and tests Debug and refactor software efficiently Create and organize reusable prompts About the Reader For professional software developers. Examples in Python. About the Author Jeremy C. Morgan has two decades of experience as an engineer building software for everything from Fortune 100 companies to tiny startups. Quotes Delivers exactly what working developers need: practical techniques that actually work. - Scott Hanselman, Microsoft You’ll be writing prompt engineering poetry. - Lars Klint, Atlassian Blends years of software experience with hands-on knowledge of top AI coding techniques. Essential. - Steve Buchanan, Jamf Detailed use of AI in real-world applications. A great job! - Santosh Yadav, Celonis

In this talk, Alex and Brent describe how Python not only meets these challenges but also drives innovation through the development of novel bioinformatics tools like CITEgeist. Biological datasets often face challenges of high sparsity and noise. CITEgeist harnesses Python’s robust ecosystem to provide an efficient, scalable pipeline that deconvolutes messy spatial signals into actionable, clinically relevant features.

Big ideas shaping scientific Python: the quest for performance and usability

Behind every technical leap in scientific Python lies a human ecosystem of volunteers, companies, and institutions working in tension and collaboration. This keynote explores how innovation actually happens in open source, through the lens of recent and ongoing initiatives that aim to move the needle on performance and usability - from the ideas that went into NumPy 2.0 and its relatively smooth rollout to the ongoing efforts to leverage the performance GPUs offer without sacrificing maintainability and usability.

Takeaways for the audience: Whether you’re an ML engineer tired of debugging GPU-CPU inconsistencies, a researcher pushing Python to its limits, or an open-source maintainer seeking sustainable funding, this keynote will equip you with both practical solutions and a clear vision of where scientific Python is headed next.

ActiveTigger: A Collaborative Text Annotation Research Tool for Computational Social Sciences

The exponential growth of textual data—ranging from social media posts and digital news archives to speech-to-text transcripts—has opened new frontiers for research in the social sciences. Tasks such as stance detection, topic classification, and information extraction have become increasingly common. At the same time, the rapid evolution of Natural Language Processing, especially pretrained language models and generative AI, has largely been led by the computer science community, often leaving a gap in accessibility for social scientists.

To address this, we initiated since 2023 the development of ActiveTigger, a lightweight, open-source Python application (with a web frontend in React) designed to accelerate annotation process and manage large-scale datasets through the integration of fine-tuned models. It aims to support computational social science for a large public both within and outside social sciences. Already used by a dynamic community in social sciences, the stable version is planned for early June 2025.

From a more technical prospect, the API is designed to manage the complete workflow from project creation, embeddings computation, exploration of the text corpus, human annotation with active learning, fine-tuning of pre-trained models (BERT-like), prediction on a larger corpus, and export. It also integrates LLM-as-a-service capabilities for prompt-based annotation and information extraction, offering a flexible approach for hybrid manual/automatic labeling. Accessible both with a web frontend and a Python client, ActiveTigger encourages customization and adaptation to specific research contexts and practices.

In this talk, we will delve into the motivations behind the creation of ActiveTigger, outline its technical architecture, and walk through its core functionalities. Drawing on several ongoing research projects within the Computational Social Science (CSS) group at CREST, we will illustrate concrete use cases where ActiveTigger has accelerated data annotation, enabled scalable workflows, and fostered collaborations. Beyond the technical demonstration, the talk will also open a broader reflection on the challenges and opportunities brought by generative AI in academic research—especially in terms of reliability, transparency, and methodological adaptation for qualitative and quantitative inquiries.

The repository of the project : https://github.com/emilienschultz/activetigger/

The development of this software is funded by the DRARI Ile-de-France and supported by Progédo.

Optimal Transport in Python: A Practical Introduction with POT

Optimal Transport (OT) is a powerful mathematical framework with applications in machine learning, statistics, and data science. This talk introduces the Python Optimal Transport toolbox (POT), an open-source library designed to efficiently solve OT problems. Attendees will learn the basics of OT, explore real-world use cases, and gain hands-on experience with POT (https://pythonot.github.io/) .

Reproducible software provisioning for high performance computing (HPC) and research software engineering (RSE) using Spack

In this talk we focus on installing software (stacks) beyond just the Python ecosystem. In the first part of the talk we give an introduction to using the package manager Spack (https://spack.readthedocs.io). In the second part we explain how we use Spack at our institute to manage the software stack on the local HPC.

How to make public data more accessible with "baked" data and DuckDB

Publicly available data is rarely analysis-ready, hampering researchers, organizations, and the public from easily accessing the information these datasets contain. One way to address this shortcoming is to "bake" the data into a structured format and ship it alongside code that can be used for analysis. For analytical work in particular, DuckDB provides a performant way to query the structured data in a variety of contexts.

This talk will explore the benefits and tradeoffs of this architectural pattern using the design of scipeds–an open source Python package for analyzing higher-education data in the US–as a case study.

No DuckDB experience required, beginner Python and programming experience recommended. This talk is aimed at data practitioners, especially those who work with public datasets.

Tackling Domain Shift with SKADA: A Hands-On Guide to Domain Adaptation

Domain adaptation addresses the challenge of applying ML models to data that differs from the training distribution—a common issue in real-world applications. SKADA is a new Python library that brings domain adaptation tools to the sci-kit-learn and PyTorch ecosystem. This talk covers SKADA’s design, its integration with standard ML workflows, and how it helps practitioners build models that generalize better across domains.