Computer Science

How to Optimize your Python Program for Slowness: Inspired by New Turing Machine Results

2025-11-08 · PyData Seattle 2025 Watch

talk

by Carl Kadie

Python

Many talks show how to make Python code faster. This one flips the script: what if we try to make our Python as slow as possible? By exploring deliberately inefficient programs — from infinite loops to Turing machines that halt only after an astronomically long time — we’ll discover surprising lessons about computation, large numbers, and the limits of programming languages. Inspired by new Turing machine results, this talk will connect Python experiments with deep questions in theoretical computer science.

We don't dataframe shame: A love letter to dataframes

2025-11-08 · PyData Seattle 2025 Watch

talk

by Devin Petersohn

This lighthearted educational talk explores the wild west of dataframes. We discuss where dataframes got their origin (it wasn't R), how dataframes have evolved over time, and why dataframe is such a confusing term (what even is a dataframe?). We will look at what makes dataframes special from both a theoretical computer science perspective (the math is brief, I promise!) and from a technology landscape perspective. This talk doesn't advocate for any specific tool or technology, but instead surveys the broad field of dataframes as a whole.

ActiveTigger: A Collaborative Text Annotation Research Tool for Computational Social Sciences

2025-09-30 · PyData Paris 2025 Watch

talk

by Emilien SCHULTZ , Paul Girard , Julien Boelaert

AI/ML API GenAI GitHub LLM NLP Python React

The exponential growth of textual data—ranging from social media posts and digital news archives to speech-to-text transcripts—has opened new frontiers for research in the social sciences. Tasks such as stance detection, topic classification, and information extraction have become increasingly common. At the same time, the rapid evolution of Natural Language Processing, especially pretrained language models and generative AI, has largely been led by the computer science community, often leaving a gap in accessibility for social scientists.

To address this, we initiated since 2023 the development of ActiveTigger, a lightweight, open-source Python application (with a web frontend in React) designed to accelerate annotation process and manage large-scale datasets through the integration of fine-tuned models. It aims to support computational social science for a large public both within and outside social sciences. Already used by a dynamic community in social sciences, the stable version is planned for early June 2025.

From a more technical prospect, the API is designed to manage the complete workflow from project creation, embeddings computation, exploration of the text corpus, human annotation with active learning, fine-tuning of pre-trained models (BERT-like), prediction on a larger corpus, and export. It also integrates LLM-as-a-service capabilities for prompt-based annotation and information extraction, offering a flexible approach for hybrid manual/automatic labeling. Accessible both with a web frontend and a Python client, ActiveTigger encourages customization and adaptation to specific research contexts and practices.

In this talk, we will delve into the motivations behind the creation of ActiveTigger, outline its technical architecture, and walk through its core functionalities. Drawing on several ongoing research projects within the Computational Social Science (CSS) group at CREST, we will illustrate concrete use cases where ActiveTigger has accelerated data annotation, enabled scalable workflows, and fostered collaborations. Beyond the technical demonstration, the talk will also open a broader reflection on the challenges and opportunities brought by generative AI in academic research—especially in terms of reliability, transparency, and methodological adaptation for qualitative and quantitative inquiries.

The repository of the project : https://github.com/emilienschultz/activetigger/

The development of this software is funded by the DRARI Ile-de-France and supported by Progédo.

Searching for Meaning in the Age of AI

2025-06-11 · Data + AI Summit 2025 Watch

lightning_talk

by Bryan McCann (You.com)

AI/ML

Bryan McCann, You.com’s co-founder and CTO, shares his journey from studying philosophy and meaning to the Stanford Computer Science Department working on groundbreaking AI research alongside Richard Socher. Right now, AI is reshaping everything we hold dear — our jobs, creativity, and identities. It’s also our greatest source of inspiration. The Age of AI is simultaneously a Renaissance, Enlightenment, Industrial Revolution and likely source of humanity’s greatest existential crisis. To surmount this, Bryan will discuss how he uses AI responses as new starting points rather than answers, building teams like neural networks optimized for learning and how the answer to our meaning crisis may be for humans to be more like AI. Exploring AI’s impact on politics, economics, healthcare, education and culture, Bryan asserts that we must all take part in authoring humanity’s new story — AI can inspire us to become something new, rather than merely replace what we are now.

Photon for Dummies: How Does this New Execution Engine Actually Work?

2023-07-25 · Databricks DATA + AI Summit 2023 Watch

video

by Holly Smith (Databricks)

Databricks Java Spark Virtual Machine

Did you finish the Photon whitepaper and think, wait, what? I know I did; it’s my job to understand it, explain it, and then use it. If your role involves using Apache Spark™ on Databricks, then you need to know about Photon and where to use it. Join me, chief dummy, nay "supreme" dummy, as I break down this whitepaper into easy to understand explanations that don’t require a computer science degree. Together we will unravel mysteries such as:

Why is a Java Virtual Machine the current bottleneck for Spark enhancements?
What does vectorized even mean? And how was it done before?
Why is the relationship status between Spark and Photon "complicated?"

In this session, we’ll start with the basics of Apache Spark, the details we pretend to know, and where those performance cracks are starting to show through. Only then will we start to look at Photon, how it’s different, where the clever design choices are and how you can make the most of this in your own workloads. I’ve spent over 50 hours going over the paper in excruciating detail; every reference, and in some instances, the references of the references so that you don’t have to.

Talk by: Holly Smith

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

ML in Production: What Does Production Even Mean | Dagshub

2023-05-11 · Data Council 2023 Watch

video

by Dean Pleban (DagsHub)

AI/ML Analytics Data Engineering MLOps

ABOUT THE TALK: While giving a talk to a group of up-and-coming data scientists, a question that surprised Dean Pleban was: "When you say “production”, what exactly do you mean?"

In this talk, Dean defines what production actually means. I’ll present a first-principles, step-by-step approach to thinking about deploying a model to production. He will talk about challenges you might face in each step, and provides further reading if you want to dive deeper into each one.

ABOUT THE SPEAKER: Dean Pleban has a background combining physics and computer science. He’s worked on quantum optics and communication, computer vision, software development and design. He’s currently CEO at DagsHub, where he builds products that enable data scientists to work together and get their models to production, using popular open source tools. He’s also the host of the MLOps Podcast, where he speaks with industry experts about ML in production.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

How to Build a Streaming Database in Three Challenging Steps | Materialize

2023-05-11 · Data Council 2023 Watch

video

by Frank McSherry (Materialize)

AI/ML Analytics Data Engineering Dataflow Rust SQL Data Streaming

ABOUT THE TALK: A streaming database is a potentially intimidating product to build. Frank McSherry, Chief Scientist at Materialize, breaks down the manageable parts, through three foundational choices that fit together well. Frank also talks about the trade-offs, and how their simplifications lead to a much more manageable streaming database.

ABOUT THE SPEAKER: Frank McSherry is Chief Scientist at Materialize, where he (and others) convert SQL into scale-out, streaming, and interactive dataflows. Before this, he developed the timely and differential dataflow Rust libraries (with colleagues at ETHZ), and led the Naiad research project and co-invented differential privacy while at MSR Silicon Valley. He has a PhD in computer science from the University of Washington.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

talk-data.com

Activity Trend

Top Events

Top Speakers

How to Optimize your Python Program for Slowness: Inspired by New Turing Machine Results

We don't dataframe shame: A love letter to dataframes

ActiveTigger: A Collaborative Text Annotation Research Tool for Computational Social Sciences

Searching for Meaning in the Age of AI

Photon for Dummies: How Does this New Execution Engine Actually Work?

ML in Production: What Does Production Even Mean | Dagshub

How to Build a Streaming Database in Three Challenging Steps | Materialize