PyConDE & PyData Berlin 2023

Let's contribute to pandas (3 hours) #1

2023-04-18

talk

Patrick Hoefler , Noa Tamir

HTML Pandas Python

PyData Berlin are excited to bring you this open source workshop dedicated to contributing to pandas. This tutorial is 3 hours. We will have a break and continue with the same group of people.

pandas is a data wrangling platform for Python widely adopted in the scientific computing community. In this session, you will be guided on how you can make your own contributions to the project, no prior experience contributing required! Not only will this teach you new skills and boost your CV, you'll also likely get a nice adrenaline rush when your contribution is accepted!

If you don’t finish your contribution during the event, we hope you will continue to work on it after the tutorial. pandas offers regular new contributor meetings and has a slack space to provide ongoing support for new contributors. For more details, see our contributor community page: http://pandas.pydata.org/docs/dev/development/community.html .

How Python enables future computer chips

2023-04-18

talk

Tim Hoffmann

Python

At the semiconductor division of Carl Zeiss it's our mission to continuously make computer chips faster and more energy efficient. To do so, we go to the very limits of what is possible, both physically and technologically. This is only possible through massive research and development efforts.

In this talk, we tell the story how Python became a central tool for our R&D activities. This includes technical aspects as well as organization and culture. How do you make sure that hundreds of people work in consistent environments? – How do you get all people on board to work together with Python? – You have lots of domain experts without much software background. How do you prevent them from creating a mess when projects get larger?

Maps with Django

2023-04-18

talk

Paolo Melchiorre

Python

Keeping in mind the Pythonic principle that “simple is better than complex” we'll see how to create a web map with the Python based web framework Django using its GeoDjango module, storing geographic data in your local database on which to run geospatial queries.

Observability for Distributed Computing with Dask

2023-04-18

talk

Hendrik Makait

AI/ML Cloud Computing Data Engineering Data Science NumPy Pandas

Debugging is hard. Distributed debugging is hell.

Dask is a popular library for parallel and distributed computing in Python. Dask is commonly used in data science, actual science, data engineering, and machine learning to distribute workloads onto clusters of many hundreds of workers with ease.

However, when things go wrong life can become difficult due to all of the moving parts. These parts include your code, other PyData libraries like NumPy/pandas, the machines you’re running on, the network between them, storage, the cloud, and of course issues with Dask itself. It can be difficult to understand what is going on, especially when things seem slower than they should be or fail unexpectedly. Observability is the key to sanity and success.

In this talk, we describe the tools Dask offers to help you observe your distributed cluster, analyze performance, and monitor your cluster to react to unexpected changes quickly. We will dive into distributed logging, automated metrics, event-based monitoring, and root-causing problems with diagnostic tooling. Throughout the talk, we will leverage real-world use cases to show how these tools help to identify and solve problems for large-scale users in the wild.

This talk should be particularly insightful for Dask users, but the approaches to observing distributed systems should be relevant to anyone operating at scale in production.

BLE and Python: How to build a simple BLE project on Linux with Python

2023-04-18

talk

Bruno Vollmer

Linux Python

Bluetooth Low Energy (BLE) is a part of the Bluetooth standard aimed at bringing wireless technology to low-power devices, and it's getting into everything - lightbulbs, robots, personal health and fitness devices, and plenty more. One of the main advantages of BLE is that everybody can integrate those devices into their tools or projects.

However, BLE is not the most developer-friendly protocol and these devices most of the time don't come with good documentation. In addition, there are not a lot of good open-source tools, examples, and tutorials on how to use Python with BLE. Especially if one wants to build both sides of the communication.

In this talk, I will introduce the concepts and properties used in BLE interactions and look at how we can use the Linux Bluetooth Stack (Bluez) to communicate with other devices. We will look at a simple example and learn along the way about common pitfalls and debugging options while working with BLE and Python.

This talk is for everybody that has a basic understanding of Python and wants to have a deeper understanding of how BLE works and how one could use it in a private project.

Rusty Python: A Case Study

2023-04-18

talk

Robin Raymond

Python Rust

Python is a very expressive and powerful language, but it is not always the fastest option for performance-critical parts of an application. Rust, on the other hand, is known for its lightning-fast runtime and low-level control, making it an attractive option for speeding up performance-sensitive portions of Python programs.

In this talk, we will present a case study of using Rust to speed up a critical component of a Python application. We will cover the following topics:

An overview of Rust and its benefits for Python developers
Profiling and identifying performance bottlenecks in Python application
Implementing a solution in Rust and integrating it with the Python application using PyO3
Measuring the performance improvements and comparing them to other optimization techniques

Attendees will learn about the potential for using Rust to boost the performance of their Python programs and how to go about doing so in their own projects.

Aspect-oriented Programming - Diving deep into Decorators

2023-04-18

talk

Mike Müller

Python

The aspect-oriented programming paradigm can support the separation of cross-cutting concerns such as logging, caching, or checking of permissions. This can improve code modularity and maintainability. Python offers decorator to implement re-usable code for cross-cutting task.

This tutorial is an in-depth introduction to decorators. It covers the usage of decorators and how to implement simple and more advanced decorators. Use cases demonstrate how to work with decorators. In addition to showing how functions can use closures to create decorators, the tutorial introduces callable class instance as alternative. Class decorators can solve problems that use be to be tasks for metaclasses. The tutorial provides uses cases for class decorators.

While the focus is on best practices and practical applications, the tutorial also provides deeper insight into how Python works behind the scene. After the tutorial participants will feel comfortable with functions that take functions and return new functions.

Geospatial Data Processing with Python: A Comprehensive Tutorial

2023-04-18

talk

Martin Christen

Python

In this tutorial, you will learn about the various Python modules for processing geospatial data, including GDAL, Rasterio, Pyproj, Shapely, Folium, Fiona, OSMnx, Libpysal, Geopandas, Pydeck, Whitebox, ESDA, and Leaflet. You will gain hands-on experience working with real-world geospatial data and learn how to perform tasks such as reading and writing spatial data, reprojecting data, performing spatial analyses, and creating interactive maps. This tutorial is suitable for beginners as well as intermediate Python users who want to expand their knowledge in the field of geospatial data processing

The State of Production Machine Learning in 2023

2023-04-18

talk

Alejandro Saucedo

AI/ML Python

As the number of production machine learning use-cases increase, we find ourselves facing new and bigger challenges where more is at stake. Because of this, it's critical to identify the key areas to focus our efforts, so we can ensure our machine learning pipelines are reliable and scalable. In this talk we dive into the state of production machine learning in the Python Ecosystem, and we will cover the concepts that make production machine learning so challenging, as well as some of the recommended tools available to tackle these challenges.

This talk will cover key principles, patterns and frameworks around the open source frameworks powering single or multiple phases of the end-to-end ML lifecycle, incluing model training, deploying, monitoring, etc. We will be covering a high level overview of the production ML ecosystem and dive into best practices that have been abstracted from production use-cases of machine learning operations at scale, as well as how to leverage tools to that will allow us to deploy, explain, secure, monitor and scale production machine learning systems.

Specifying behavior with Protocols, Typeclasses or Traits. Who wears it better (Python, Scala 3, Rust)?

2023-04-17

talk

Kolja Maier

Python Rust Scala

In this talk, we will explore the use of Python's typing.Protocol, Scala's Typeclasses, and Rust's Traits. They all offer a very powerful & elegant mechanism for abstracting over various concepts (such as Serialization) in a modular manner. We will compare and contrast the syntax and implementation of these constructs in each language and discuss their strengths and weaknesses. We will also look at real-world examples of how these features are used in each language to specify behavior, and consider differences in terms of type system expressiveness and effectiveness. By the end of the talk, attendees will have a better understanding of the differences and similarities between these three language features, and will be able to make informed decisions about which one is best suited for their needs.

A concrete guide to time-series databases with Python

2023-04-17

talk

Heiner Tholen , Ellen König

Python

We evaluated time-series databases and complementary services to stream-process sensor data. In this talk, our evaluation will be presented. The final implementation will be shown, alongside python-tools we’ve built and lessons learned during the process.

Driving down the Memray lane - Profiling your data science work

2023-04-17

talk

Cheuk Ting Ho

Data Science Python

When handling a large amount of data, memory profiling the data science workflow becomes more important. It gives you insight into which process consumes lots of memory. In this talk, we will introduce Mamray, a Python memory profiling tool and its new Jupyter plugin.

Polars - make the switch to lightning-fast dataframes

2023-04-17

talk

Thomas Bierhance

AI/ML Arrow Pandas Polars Python Rust

In this talk, we will report on our experiences switching from Pandas to Polars in a real-world ML project. Polars is a new high-performance dataframe library for Python based on Apache Arrow and written in Rust. We will compare the performance of polars with the popular pandas library, and show how polars can provide significant speed improvements for data manipulation and analysis tasks. We will also discuss the unique features of polars, such as its ability to handle large datasets that do not fit into memory, and how it feels in practice to make the switch from Pandas. This talk is aimed at data scientists, analysts, and anyone interested in fast and efficient data processing in Python.

Exploring the Power of Cyclic Boosting: A Pure-Python, Explainable, and Efficient ML Method

2023-04-17

talk

Felix Wick

AI/ML GitHub Python

We have recently open-sourced a pure-Python implementation of Cyclic Boosting, a family of general-purpose, supervised machine learning algorithms. Its predictions are fully explainable on individual sample level, and yet Cyclic Boosting can deliver highly accurate and robust models. For this, it requires little hyperparameter tuning and minimal data pre-processing (including support for missing information and categorical variables of high cardinality), making it an ideal off-the-shelf method for structured, heterogeneous data sets. Furthermore, it is computationally inexpensive and fast, allowing for rapid improvement iterations. The modeling process, especially the infamous but unavoidable feature engineering, is facilitated by automatic creation of an extensive set of visualizations for data dependencies and training results. In this presentation, we will provide an overview of the inner workings of Cyclic Boosting, along with a few sample use cases, and demonstrate the usage of the new Python library.

You can find Cyclic Boosting on GitHub: https://github.com/Blue-Yonder-OSS/cyclic-boosting

Practical Session: Learning on Heterogeneous Graphs with PyG

2023-04-17

talk

Matthias Fey , Ramona Bendias

Python

Learn how to build and analyze heterogeneous graphs using PyG, a machine graph learning library in Python. This workshop will provide a practical introduction to the concept of heterogeneous graphs and their applications, including their ability to capture the complexity and diversity of real-world systems. Participants will gain experience in creating a heterogeneous graph from multiple data tables, preparing a dataset, and implementing and training a model using PyG.

Raised by Pandas, striving for more: An opinionated introduction to Polars

2023-04-17

talk

Nico Kreiling

Arrow Pandas Polars Python Rust

Pandas is the de-facto standard for data manipulation in python, which I personally love for its flexible syntax and interoperability. But Pandas has well-known drawbacks such as memory in-efficiency, inconsistent missing data handling and lacking multicore-support. Multiple open-source projects aim to solve those issues, the most interesting is Polars.

Polars uses Rust and Apache Arrow to win in all kinds of performance-benchmarks and evolves fast. But is it already stable enough to migrate an existing Pandas' codebase? And does it meet the high-expectations on query language flexibility of long-time Pandas-lovers?

In this talk, I will explain, how Polars can be that fast, and present my insights on where Polars shines and in which scenarios I stay with pandas (at least for now!)

The CPU in your browser: WebAssembly demystified

2023-04-17

talk

Antonio Cuni

Python

In the recent years we saw an explosion of usage of Python in the browser: Pyodide, CPython on WASM, PyScript, etc. All of this is possible thanks to the powerful functionalities of the underlying platform, WebAssembly, which is essentially a virtual CPU inside the browser.

Keynote - A journey through 4 industries with Python: Python's versatile problem-solving toolkit

2023-04-17

talk

Susan Shu Chang

AI/ML Python

In this keynote, I will share the lessons learned from using Python in 4 industries. Apart from machine learning applications that I build in my day to day as a data scientist and machine learning engineer, I also use Python to develop games for my own gaming company, Quill Game Studios. There is a lot of versatility in Python, and it's been my pleasure to use it to solve many interesting problems. I hope that this talk can give inspiration to various types of applications in your own industry as well.

An unbiased evaluation of environment management and packaging tools

2023-04-17

talk

Anna-Lena Popkes

Python

Python packaging is quickly evolving and new tools pop up on a regular basis. Lots of talks and posts on packaging exist but none of them give a structured, unbiased overview of the available tools.

This talk will shed light on the jungle of packaging and environment management tools, comparing them on a basis of predefined features.

Large Scale Feature Engineering and Datascience with Python & Snowflake

2023-04-17

talk

Michael Gorkow

Data Science GitHub Python Cyber Security Snowflake

Snowflake as a data platform is the core data repository of many large organizations.
With the introduction of Snowflake's Snowpark for Python, Python developers can now collaborate and build on one platform with a secure Python sandbox, providing developers with dynamic scalability & elasticity as well as security and compliance.

In this talk I'll explain the core concepts of Snowpark for Python and how they can be used for large scale feature engineering and data science.

Accelerate Python with Julia

2023-04-17

talk

Stephan Sahm

Python

Speeding up Python code has traditionally been achieved by writing C/C++ — an alien world for most Python users. Today, you can write high performance code in Julia instead, which is much much easier for Python users. This tutorial will give you hands-on experience writing a Python library that incorporates Julia for performance optimization.

Apache StreamPipes for Pythonistas: IIoT data handling made easy!

2023-04-17

talk

Tim Bossenmaier , Sven Oehler

AI/ML IoT Python

The industrial environment offers a lot of interesting use cases for data enthusiasts. There are myriads of interesting challenges that can be solved by data scientists. However, collecting industrial data in general and industrial IoT (IIoT) data in particular, is cumbersome and not really appealing for anyone who just wants to work with data. Apache StreamPipes addresses this pitfall and allows anyone to extract data from IIoT data sources without messing around with (old-fashioned) protocols. In addition, StreamPipes newly developed Python client now gives Pythonistas the ability to programmatically access and work with them in a Pythonic way.

This talk will provide a basic introduction into the functionality of Apache StreamPipes itself, followed by a deeper discussion of the Python client. Finally, a live demo will show how IIoT data can be easily derived in Python and used directly for visualization and ML model training.

From notebook to pipeline in no time with LineaPy

2023-04-17

talk

Thomas Fraunholz

Airflow Data Science MLOps Python

The nightmare before data science production: You found a working prototype for your problem using a Jupyter notebook and now it's time to build a production grade solution from that notebook. Unfortunately, your notebook looks anything but production grade. The good news is, there's finally a cure!

The open-source python package LineaPy aims to automate data science workflow generation and expediting the process of going from data science development to production. And truly, it transforms messy notebooks into data pipelines like Apache Airflow, DVC, Argo, Kubeflow, and many more. And if you can't find your favorite orchestration framework, you are welcome to work with the creators of LineaPy to contribute a plugin for it!

In this talk, you will learn the basic concepts of LineaPy and how it supports your everyday tasks as a data practitioner. For this purpose, we will transform a notebook step by step together to create a DVC pipeline. Finally, we will discuss what place LineaPy will take in the MLOps universe. Will you only have to check in your notebook in the future?

How to teach NLP to a newbie & get them started on their first project

2023-04-17

talk

Dr. Lisa Andreevna Chalaguine

NLP Python

The materials presented during this tutorial are open source and can be used by coaches and tutors who want to teach their students how to use Python for text processing and text classification. (A minimal understanding of programming (in any language) is required by the students)

talk-data.com

PyConDE & PyData Berlin 2023

Top Topics

Top Speakers

Let's contribute to pandas (3 hours) #1

How Python enables future computer chips

Maps with Django

Observability for Distributed Computing with Dask

BLE and Python: How to build a simple BLE project on Linux with Python

Rusty Python: A Case Study

Aspect-oriented Programming - Diving deep into Decorators

Geospatial Data Processing with Python: A Comprehensive Tutorial

The State of Production Machine Learning in 2023

Specifying behavior with Protocols, Typeclasses or Traits. Who wears it better (Python, Scala 3, Rust)?

A concrete guide to time-series databases with Python

Driving down the Memray lane - Profiling your data science work

Polars - make the switch to lightning-fast dataframes

Exploring the Power of Cyclic Boosting: A Pure-Python, Explainable, and Efficient ML Method

Practical Session: Learning on Heterogeneous Graphs with PyG

Raised by Pandas, striving for more: An opinionated introduction to Polars

The CPU in your browser: WebAssembly demystified

Keynote - A journey through 4 industries with Python: Python's versatile problem-solving toolkit

An unbiased evaluation of environment management and packaging tools

Large Scale Feature Engineering and Datascience with Python & Snowflake

Accelerate Python with Julia

Apache StreamPipes for Pythonistas: IIoT data handling made easy!

From notebook to pipeline in no time with LineaPy

How to teach NLP to a newbie & get them started on their first project