talk-data.com talk-data.com

Event

PyConDE & PyData Berlin 2023

2023-04-17 – 2023-04-19 PyData

Activities tracked

191

Sessions & talks

Showing 176–191 of 191 · Newest first

Search within this event →

Lunch

2023-04-17
talk

Lunch

2023-04-17
talk

An unbiased evaluation of environment management and packaging tools

2023-04-17
talk

Python packaging is quickly evolving and new tools pop up on a regular basis. Lots of talks and posts on packaging exist but none of them give a structured, unbiased overview of the available tools.

This talk will shed light on the jungle of packaging and environment management tools, comparing them on a basis of predefined features.

AutoGluon: AutoML for Tabular, Multimodal and Time Series Data

2023-04-17
talk

AutoML, or automated machine learning, offers the promise of transforming raw data into accurate predictions with minimal human intervention, expertise, and manual experimentation. In this talk, we will introduce AutoGluon, a cutting-edge toolkit that enables AutoML for tabular, multimodal and time series data. AutoGluon emphasizes usability, enabling a wide variety of tasks from regression to time series forecasting and image classification through a unified and intuitive API. We will specifically focus on tasks on tabular and time series tasks where AutoGluon is the current state-of-the-art, and demonstrate how AutoGluon can be used to achieve competitive performance on tabular and time series competition data sets. We will also discuss the techniques used to automatically build and train these models, peeking under the hood of AutoGluon.

Hyperparameter optimization for the impatient

2023-04-17
talk

In the last years, Hyperparameter Optimization (HPO) became a fundamental step in the training of Machine Learning (ML) models and in the creation of automatic ML pipelines. Unfortunately, while HPO improves the predictive performance of the final model, it comes with a significant cost both in terms of computational resources and waiting time. This leads many practitioners to try to lower the cost of HPO by employing unreliable heuristics.

In this talk we will provide simple and practical algorithms for users that want to train models with almost-optimal predictive performance, while incurring in a significantly lower cost and waiting time. The presented algorithms are agnostic to the application and the model being trained so they can be useful in a wide range of scenarios.

We provide results from an extensive experimental activity on public benchmarks, including comparisons with well-known techniques like Bayesian Optimization (BO), ASHA, Successive Halving. We will describe in which scenarios the biggest gains are observed (up to 30x) and provide examples for how to use these algorithms in a real-world environment.

All the code used for this talk is available on (GitHub)[https://github.com/awslabs/syne-tune].

Incorporating GPT-3 into practical NLP workflows

2023-04-17
talk

In this talk, I'll show how large language models such as GPT-3 complement rather than replace existing machine learning workflows. Initial annotations are gathered from the OpenAI API via zero- or few-shot learning, and then corrected by a human decision maker using an annotation tool. The resulting annotations can then be used to train and evaluate models as normal. This process results in higher accuracy than can be achieved from the OpenAI API alone, with the added benefit that you'll own and control the model for runtime.

Large Scale Feature Engineering and Datascience with Python & Snowflake

2023-04-17
talk

Snowflake as a data platform is the core data repository of many large organizations.
With the introduction of Snowflake's Snowpark for Python, Python developers can now collaborate and build on one platform with a secure Python sandbox, providing developers with dynamic scalability & elasticity as well as security and compliance.

In this talk I'll explain the core concepts of Snowpark for Python and how they can be used for large scale feature engineering and data science.

Accelerate Python with Julia

2023-04-17
talk

Speeding up Python code has traditionally been achieved by writing C/C++ — an alien world for most Python users. Today, you can write high performance code in Julia instead, which is much much easier for Python users. This tutorial will give you hands-on experience writing a Python library that incorporates Julia for performance optimization.

Apache StreamPipes for Pythonistas: IIoT data handling made easy!

2023-04-17
talk

The industrial environment offers a lot of interesting use cases for data enthusiasts. There are myriads of interesting challenges that can be solved by data scientists. However, collecting industrial data in general and industrial IoT (IIoT) data in particular, is cumbersome and not really appealing for anyone who just wants to work with data. Apache StreamPipes addresses this pitfall and allows anyone to extract data from IIoT data sources without messing around with (old-fashioned) protocols. In addition, StreamPipes newly developed Python client now gives Pythonistas the ability to programmatically access and work with them in a Pythonic way.

This talk will provide a basic introduction into the functionality of Apache StreamPipes itself, followed by a deeper discussion of the Python client. Finally, a live demo will show how IIoT data can be easily derived in Python and used directly for visualization and ML model training.

Cooking up a ML Platform: Growing pains and lessons learned

2023-04-17
talk

What is a ML platform and do you even need one? When should you consider investing in your own ML platform? What challenges can you expect building and maintaining one? Tune in and discover (some) answers to these questions and more! I will share a first-hand account of our ongoing journey towards becoming a ML platform team within Delivery Hero's Logistics department, including how we got here, how we structure our work, and what challenges and tools we are focussing on next.

From notebook to pipeline in no time with LineaPy

2023-04-17
talk

The nightmare before data science production: You found a working prototype for your problem using a Jupyter notebook and now it's time to build a production grade solution from that notebook. Unfortunately, your notebook looks anything but production grade. The good news is, there's finally a cure!

The open-source python package LineaPy aims to automate data science workflow generation and expediting the process of going from data science development to production. And truly, it transforms messy notebooks into data pipelines like Apache Airflow, DVC, Argo, Kubeflow, and many more. And if you can't find your favorite orchestration framework, you are welcome to work with the creators of LineaPy to contribute a plugin for it!

In this talk, you will learn the basic concepts of LineaPy and how it supports your everyday tasks as a data practitioner. For this purpose, we will transform a notebook step by step together to create a DVC pipeline. Finally, we will discuss what place LineaPy will take in the MLOps universe. Will you only have to check in your notebook in the future?

Honey, I broke the PyTorch model >.< - Debugging custom PyTorch models in a structured manner

2023-04-17
talk

When building PyTorch models for custom applications from scratch there's usually one problem: The model does not learn anything. In a complex project, it can be tricky to identify the cause: Is it the data? A bug in the model? Choosing the wrong loss function at 3 am after an 8-hour coding session?

In this talk, we will build a toolbox to find the culprits in a structured manner. We will focus on simple ways to ensure a training loop is correct, generate synthetic training data to determine whether we have a model bug or problematic real-world data, and leverage pytest to safely refactor PyTorch models.

After this talk, visitors will be well equipped to take the right steps when a model is not learning, quickly identify the underlying reasons, and prevent bugs in the future.

How to teach NLP to a newbie & get them started on their first project

2023-04-17
talk

The materials presented during this tutorial are open source and can be used by coaches and tutors who want to teach their students how to use Python for text processing and text classification. (A minimal understanding of programming (in any language) is required by the students)

Pandas 2.0 and beyond

2023-04-17
talk

Pandas has reached a 2.0 milestone in 2023. But what does that mean? And what is coming after 2.0? This talk will give an overview of what happened in the latest releases of pandas and highlight some topics and major new features the pandas project is working on.

Opening Session

2023-04-17
talk

Registration

2023-04-17
talk