talk-data.com talk-data.com

Event

PyConDE & PyData Berlin 2023

2023-04-17 – 2023-04-19 PyData

Activities tracked

191

Sessions & talks

Showing 151–175 of 191 · Newest first

Search within this event →

Have your cake and eat it too: Rapid model development and stable, high-performance deployments

2023-04-17
talk

At the boundary of model development and MLOps lies the balance between the speed of deploying new models and ensuring operational constraints. These include factors like low latency prediction, the absence of vulnerabilities in dependencies and the need for the model behavior to stay reproducible for years. The longer the list of constraints, the longer it usually takes to take a model from its development environment into production. In this talk, we present how we seemingly managed to square the circle and have both a rapid, highly dynamic model development and yet also a stable and high-performance deployment.

Performing Root Cause Analysis with DoWhy, a Causal Machine-Learning Library

2023-04-17
talk

In this talk, we will introduce the audience to DoWhy, a library for causal machine-learning (ML). We will introduce typical problems where causal ML can be applied and will specifically do a deep dive on root cause analysis using DoWhy. To do this, we will lay out what typical problem spaces for causal ML look like, what kind of problems we're trying to solve, and then show how to use DoWhy's API to solve these problems. Expect to see a lot of code with a hands-on example. We will close this session by zooming out a bit and also talk about the PyWhy organization governing DoWhy.

Polars - make the switch to lightning-fast dataframes

2023-04-17
talk

In this talk, we will report on our experiences switching from Pandas to Polars in a real-world ML project. Polars is a new high-performance dataframe library for Python based on Apache Arrow and written in Rust. We will compare the performance of polars with the popular pandas library, and show how polars can provide significant speed improvements for data manipulation and analysis tasks. We will also discuss the unique features of polars, such as its ability to handle large datasets that do not fit into memory, and how it feels in practice to make the switch from Pandas. This talk is aimed at data scientists, analysts, and anyone interested in fast and efficient data processing in Python.

WALD: A Modern & Sustainable Analytics Stack

2023-04-17
talk

The name WALD-stack stems from the four technologies it is composed of, i.e. a cloud-computing Warehouse like Snowflake or Google BigQuery, the open-source data integration engine Airbyte, the open-source full-stack BI platform Lightdash, and the open-source data transformation tool DBT.

Using a Formula 1 Grand Prix dataset, I will give an overview of how these four tools complement each other perfectly for analytics tasks in an ELT approach. You will learn the specific uses of each tool as well as their particular features. My talk is based on a full tutorial, which you can find under waldstack.org.

Common issues with Time Series data and how to solve them

2023-04-17
talk

Time-series data is all around us: from logistics to digital marketing, from pricing to stock markets. It’s hard to imagine a modern business that has no time series data to forecast. However, mastering such forecasting is not an easy task. For this talk, together with other domain experts, I have collected a list of common time series issues that data professionals commonly run into. After this talk, you will learn to identify, understand, and resolve such issues. This will include stabilising divergent time series, organising delayed / irregular data, handling missing values without anomaly propagation, and reducing the impact of noise and outliers on your forecasting models.

Exploring the Power of Cyclic Boosting: A Pure-Python, Explainable, and Efficient ML Method

2023-04-17
talk

We have recently open-sourced a pure-Python implementation of Cyclic Boosting, a family of general-purpose, supervised machine learning algorithms. Its predictions are fully explainable on individual sample level, and yet Cyclic Boosting can deliver highly accurate and robust models. For this, it requires little hyperparameter tuning and minimal data pre-processing (including support for missing information and categorical variables of high cardinality), making it an ideal off-the-shelf method for structured, heterogeneous data sets. Furthermore, it is computationally inexpensive and fast, allowing for rapid improvement iterations. The modeling process, especially the infamous but unavoidable feature engineering, is facilitated by automatic creation of an extensive set of visualizations for data dependencies and training results. In this presentation, we will provide an overview of the inner workings of Cyclic Boosting, along with a few sample use cases, and demonstrate the usage of the new Python library.

You can find Cyclic Boosting on GitHub: https://github.com/Blue-Yonder-OSS/cyclic-boosting

How to baseline in NLP and where to go from there

2023-04-17
talk
NLP

In this talk, we will explore the build-measure-learn paradigm and the role of baselines in natural language processing (NLP). We will cover the common NLP tasks of classification, clustering, search, and named entity recognition, and describe the baseline approaches that can be used for each task. We will also discuss how to move beyond these baselines through weak learning and transfer learning. By the end of this talk, attendees will have a better understanding of how to establish and improve upon baselines in NLP.

Practical Session: Learning on Heterogeneous Graphs with PyG

2023-04-17
talk

Learn how to build and analyze heterogeneous graphs using PyG, a machine graph learning library in Python. This workshop will provide a practical introduction to the concept of heterogeneous graphs and their applications, including their ability to capture the complexity and diversity of real-world systems. Participants will gain experience in creating a heterogeneous graph from multiple data tables, preparing a dataset, and implementing and training a model using PyG.

Raised by Pandas, striving for more: An opinionated introduction to Polars

2023-04-17
talk

Pandas is the de-facto standard for data manipulation in python, which I personally love for its flexible syntax and interoperability. But Pandas has well-known drawbacks such as memory in-efficiency, inconsistent missing data handling and lacking multicore-support. Multiple open-source projects aim to solve those issues, the most interesting is Polars.

Polars uses Rust and Apache Arrow to win in all kinds of performance-benchmarks and evolves fast. But is it already stable enough to migrate an existing Pandas' codebase? And does it meet the high-expectations on query language flexibility of long-time Pandas-lovers?

In this talk, I will explain, how Polars can be that fast, and present my insights on where Polars shines and in which scenarios I stay with pandas (at least for now!)

Staying Alert: How to Implement Continuous Testing for Machine Learning Models

2023-04-17
talk

Proper monitoring of machine learning models in production is essential to avoid performance issues. Setting up monitoring can be easy for a single model, but it often becomes challenging at scale or when you face alert fatigue based on many metrics and dashboards.

In this talk, I will introduce the concept of test-based ML monitoring. I will explore how to prioritize metrics based on risks and model use cases, integrate checks in the prediction pipeline and standardize them across similar models and model lifecycle. I will also take an in-depth look at batch model monitoring architecture and the use of open-source tools for setup and analysis.

The CPU in your browser: WebAssembly demystified

2023-04-17
talk

In the recent years we saw an explosion of usage of Python in the browser: Pyodide, CPython on WASM, PyScript, etc. All of this is possible thanks to the powerful functionalities of the underlying platform, WebAssembly, which is essentially a virtual CPU inside the browser.

Coffee Break

2023-04-17
talk

Coffee Break

2023-04-17
talk

Coffee Break

2023-04-17
talk

Coffee Break

2023-04-17
talk

Coffee Break

2023-04-17
talk

Coffee Break

2023-04-17
talk

Coffee Break

2023-04-17
talk

Keynote - A journey through 4 industries with Python: Python's versatile problem-solving toolkit

2023-04-17
talk

In this keynote, I will share the lessons learned from using Python in 4 industries. Apart from machine learning applications that I build in my day to day as a data scientist and machine learning engineer, I also use Python to develop games for my own gaming company, Quill Game Studios. There is a lot of versatility in Python, and it's been my pleasure to use it to solve many interesting problems. I hope that this talk can give inspiration to various types of applications in your own industry as well.

Announcements 15min

2023-04-17
talk

Lunch

2023-04-17
talk

Lunch

2023-04-17
talk

Lunch

2023-04-17
talk

Lunch

2023-04-17
talk

Lunch

2023-04-17
talk