talk-data.com talk-data.com

Event

PyConDE & PyData Berlin 2023

2023-04-17 – 2023-04-19 PyData

Activities tracked

11

Filtering by: Data Science ×

Sessions & talks

Showing 1–11 of 11 · Newest first

Search within this event →

Code Cleanup: A Data Scientist's Guide to Sparkling Code

2023-04-19
talk

Does your production code look like it’s been copied from Untitled12.ipynb? Are your engineers complaining about the code but you can’t find the time to work on improving the code base? This talk will go through some of the basics of clean coding and how to best implement them in a data science team.

The future of the Jupyter Notebook interface

2023-04-19
talk

Jupyter Notebooks have been a widely popular tool for data science in recent years due to their ability to combine code, text, and visualizations in a single document.

Despite its popularity, the core functionality and user experience of the Classic Jupyter Notebook interface has remained largely unchanged over the past years.

Lately the Jupyter Notebook project decided to base its next major version 7 on JupyterLab components and extensions, which means many JupyterLab features are also available to Jupyter Notebook users.

In this presentation, we will demo the new features coming in Jupyter Notebook version 7 and how they are relevant to existing users of the Classic Notebook.

Create interactive Jupyter websites with JupyterLite

2023-04-19
talk

Jupyter notebooks are a popular tool for data science and scientific computing, allowing users to mix code, text, and multimedia in a single document. However, sharing Jupyter notebooks can be challenging, as they require installing a specific software environment to be viewed and executed.

JupyterLite is a Jupyter distribution that runs entirely in the web browser without any server components. A significant benefit of this approach is the ease of deployment. With JupyterLite, the only requirement to provide a live computing environment is a collection of static assets. In this talk, we will show how you can create such static website and deploy it to your users.

Ask-A-Question: an FAQ-answering service for when there's little to no data

2023-04-18
talk

Doing data science in international development often means finding the right-sized solution in resource-constrained settings.

This talk walks you through how my team helped answer thousands of questions from pregnant folks and new parents on a South African maternal and child health helpline, which model we ended up choosing and why (hint: resource-constraints!), and how we've packaged everything into a service that anyone can start for themselves,

By the end of the talk, I hope you'll know how to start your own FAQ-answering service and learn about one example of doing data science in international development.

Everybody knows our yellow vans, trucks and planes around the world. But do you know how data drives our business and how we leverage algorithms and technology in our core operations? We will share some “behind the scenes” insights on Deutsche Post DHL Group’s journey towards a Data-Driven Company. • Large-Scale Use Cases: Challenging and high impact Use Cases in all major areas of logistics, including Computer Vision and NLP • Fancy Algorithms: Deep-Neural Networks, TSP Solvers and the standard toolkit of a Data Scientist • Modern Tooling: Cloud Platforms, Kubernetes , Kubeflow, Auto ML • No rusty working mode: small, self-organized, agile project teams, combining state of the art Machine Learning with MLOps best practices • A young, motivated and international team – German skills are only “nice to have” But we have more to offer than slides filled with buzzwords. We will demonstrate our passion for our work, deep dive into our largest use cases that impact your everyday life and share our approach for a timeseries forecasting library - combining data science, software engineering and technology for efficient and easy to maintain machine learning projects..

Observability for Distributed Computing with Dask

2023-04-18
talk

Debugging is hard. Distributed debugging is hell.

Dask is a popular library for parallel and distributed computing in Python. Dask is commonly used in data science, actual science, data engineering, and machine learning to distribute workloads onto clusters of many hundreds of workers with ease.

However, when things go wrong life can become difficult due to all of the moving parts. These parts include your code, other PyData libraries like NumPy/pandas, the machines you’re running on, the network between them, storage, the cloud, and of course issues with Dask itself. It can be difficult to understand what is going on, especially when things seem slower than they should be or fail unexpectedly. Observability is the key to sanity and success.

In this talk, we describe the tools Dask offers to help you observe your distributed cluster, analyze performance, and monitor your cluster to react to unexpected changes quickly. We will dive into distributed logging, automated metrics, event-based monitoring, and root-causing problems with diagnostic tooling. Throughout the talk, we will leverage real-world use cases to show how these tools help to identify and solve problems for large-scale users in the wild.

This talk should be particularly insightful for Dask users, but the approaches to observing distributed systems should be relevant to anyone operating at scale in production.

Software Design Pattern for Data Science

2023-04-18
talk

Even if every data science work is special, a lot can be learned from similar problems solved in the past. In this talk, I will share some specific software design concepts that data scientists can use to build better data products.

Keynote - How Are We Managing? Data Teams Management IRL

2023-04-18
talk

The title “Data Scientist” has been in use for 15 years now. We have been attending PyData conferences for over 10 years as well. The hype around data science and AI seems higher than ever before. But How are we managing?

Driving down the Memray lane - Profiling your data science work

2023-04-17
talk

When handling a large amount of data, memory profiling the data science workflow becomes more important. It gives you insight into which process consumes lots of memory. In this talk, we will introduce Mamray, a Python memory profiling tool and its new Jupyter plugin.

Large Scale Feature Engineering and Datascience with Python & Snowflake

2023-04-17
talk

Snowflake as a data platform is the core data repository of many large organizations.
With the introduction of Snowflake's Snowpark for Python, Python developers can now collaborate and build on one platform with a secure Python sandbox, providing developers with dynamic scalability & elasticity as well as security and compliance.

In this talk I'll explain the core concepts of Snowpark for Python and how they can be used for large scale feature engineering and data science.

From notebook to pipeline in no time with LineaPy

2023-04-17
talk

The nightmare before data science production: You found a working prototype for your problem using a Jupyter notebook and now it's time to build a production grade solution from that notebook. Unfortunately, your notebook looks anything but production grade. The good news is, there's finally a cure!

The open-source python package LineaPy aims to automate data science workflow generation and expediting the process of going from data science development to production. And truly, it transforms messy notebooks into data pipelines like Apache Airflow, DVC, Argo, Kubeflow, and many more. And if you can't find your favorite orchestration framework, you are welcome to work with the creators of LineaPy to contribute a plugin for it!

In this talk, you will learn the basic concepts of LineaPy and how it supports your everyday tasks as a data practitioner. For this purpose, we will transform a notebook step by step together to create a DVC pipeline. Finally, we will discuss what place LineaPy will take in the MLOps universe. Will you only have to check in your notebook in the future?