talk-data.com talk-data.com

Event

PyConDE & PyData Berlin 2023

2023-04-17 – 2023-04-19 PyData

Activities tracked

191

Sessions & talks

Showing 1–25 of 191 · Newest first

Search within this event →

Closing Session

2023-04-19
talk

Coffee Break

2023-04-19
talk

Coffee Break

2023-04-19
talk

Coffee Break

2023-04-19
talk

Coffee Break

2023-04-19
talk

Coffee Break

2023-04-19
talk

Coffee Break

2023-04-19
talk

Coffee Break

2023-04-19
talk

Fear the mutants. Love the mutants.

2023-04-19
talk

Developers often use code coverage as a target, which makes it a bad measure of test quality.

Mutation testing changes the game: create mutant versions of your code that break your tests, and you'll quickly start to write better tests!

Come and learn to use it as part of your CI/CD process. I promise, you'll never look at penguins the same way again!

Great Security Is One Question Away

2023-04-19
talk

After a decade of writing code, I joined the application security team. During the transition process, I discovered that there are many myths about security, and how difficult it is. Often devs choose to ignore it because they think that writing more secure code would take them ages. It is not true. Security doesn’t have to be scary. From my talk, you will learn the most useful piece from the Application Security theory. It will be practical and not boring at all.

How to increase diversity in open source communities

2023-04-19
talk

Today state of the art technology and scientific research strongly depend on open source libraries. The demographic of the contributors to these libraries is predominantly white and male [1][2][3][4]. This situation creates problems not only for individual contributors outside of this demographic but also for open source projects such as loss of career opportunities and less robust technologies, respectively [1][7]. In recent years there have been a number of various recommendations and initiatives to increase the participation in open source projects of groups who are underrepresented in this domain [1][3][5][6]. While these efforts are valuable and much needed, contributor diversity remains a challenge in open source communities [2][3][7]. This talk highlights the underlying problems and explores how we can overcome them.

Postmodern Architecture: The Python Powered Modern Data Stack

2023-04-19
talk

The Modern Data Stack has brought a lot of new buzzwords into the data engineering lexicon: "data mesh", "data observability", "reverse ETL", "data lineage", "analytics engineering". In this light-hearted talk we will demystify the evolving revolution that will define the future of data analytics & engineering teams.

Our journey begins with the PyData Stack: pandas pipelines powering ETL workflows...clean code, tested code, data validation, perfect for in-memory workflows. As demand for self-serve analytics grows, new data sources bring more APIs to model, more code to maintain, DAG workflow orchestration tools, new nuances to capture ("the tax team defines revenue differently"), more dashboards, more not-quite-bugs ("but my number says this...").

This data maturity journey is a well-trodden path with common pitfalls & opportunities. After dashboards comes predictive modelling ("what will happen"), prescriptive modelling ("what should we do?"), perhaps eventually automated decision making. Getting there is much easier with the advent of the Python Powered Modern Data Stack.

In this talk, we will cover the shift from ETL to ELT, the open-source Modern Data Stack tools you should know, with a focus on how dbt's new Python integration is changing how data pipelines are built, run, tested & maintained. By understanding the latest trends & buzzwords, attendees will gain a deeper insight into Python's role at the core of the future of data engineering.

Rethinking codes of conduct

2023-04-19
talk

Did you know that the Python Software Foundation Code of Conduct is turning 10 years old in 2023? It was voted in as they felt they were “unbalanced and not seeing the true spectrum of the greater community”. Why is that a big thing? Come to my talk and find out!

Cloud Infrastructure From Python Code: How Far Could We Go?

2023-04-19
talk

Discover how Infrastructure From Code (IfC) can revolutionize Cloud DevOps automation by generating cloud deployment templates directly from Python code. Learn how this technology empowers Python developers to easily deploy and operate cost-effective, secure, reliable, and sustainable cloud software. Join us to explore the strategic potential of IfC.

evosax: JAX-Based Evolution Strategies

2023-04-19
talk

Tired of having to handle asynchronous processes for neuroevolution? Do you want to leverage massive vectorization and high-throughput accelerators for evolution strategies (ES)? evosax allows you to leverage JAX, XLA compilation and auto-vectorization/parallelization to scale ES to your favorite accelerators. In this talk we will get to know the core API and how to solve distributed black-box optimization problems with evolution strategies.

Giving and Receiving Great Feedback through PRs

2023-04-19
talk

Do you struggle with PRs? Have you ever had to change code even though you disagreed with the change just to land the PR? Have you ever given feedback that would have improved the code only to get into a comment war? We'll discuss how to give and receive feedback to extract maximum value from it and avoid all the communication problems that come with PRs.

Introduction to Async programming

2023-04-19
talk

Asynchronous programming is a type of parallel programming in which a unit of work is allowed to run separately from the primary application thread. Post execution, it notifies the main thread about the completion or failure of the worker thread. There are numerous benefits to using it, such as improved application performance, enhanced responsiveness, and effective usage of CPU.

Asynchronicity seems to be a big reason why Node.js is so popular for server-side programming. Most of the code we write, especially in heavy IO applications like websites, depends on external resources. This could be anything from a remote database POST API call. As soon as you ask for any of these resources, your code is waiting around for process completion with nothing to do. With asynchronous programming, you allow your code to handle other tasks while waiting for these other resources to respond.

In this session, we are going to talk about asynchronous programming in Python. Its benefits and multiple ways to implement it.

The Beauty of Zarr

2023-04-19
talk

In this talk, I’d be talking about Zarr, an open-source data format for storing chunked, compressed N-dimensional arrays. This talk presents a systematic approach to understanding and implementing Zarr by showing how it works, the need for using it, and a hands-on session at the end. Zarr is based on an open technical specification, making implementations across several languages possible. I’d mainly talk about Zarr’s Python implementation and show how it beautifully interoperates with the existing libraries in the PyData stack.

Contributing to an open-source content library for NLP

2023-04-19
talk

Bricks is an open-source content library for natural language processing, which provides the building blocks to quickly and easily enrich, transform or analyze text data for machine learning projects. For many Pythonistas, contributing to an open-source project seems scary and intimidating. In this tutorial, we offer a hands-on experience in which programmers and data scientists learn how to code their own building blocks and share their creations with the community with ease.

The Battle of Giants: Causality vs NLP => From Theory to Practice

2023-04-19
talk

With an average of 3.2 new papers published on Arxiv every day in 2022, causal inference has exploded in popularity, attracting large amount of talent and interest from top researchers and institutions including industry giants like Amazon or Microsoft. Text data, with its high complexity, posits an exciting challenge for causal inference community. In the workshop, we'll review the latest advances in the field of Causal NLP and implement a causal Transformer model to demonstrate how to translate these developments into a practical solution that can bring real business value. All in Python!

Apache Arrow: connecting and accelerating dataframe libraries across the PyData ecosystem

2023-04-19
talk

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing, and is becoming the de facto standard for tabular data. This talk will give an overview of the recent developments both in Apache Arrow itself as how it is being adopted in the PyData ecosystem (and beyond) and can improve your day-to-day data analytics workflows.

Behind the Scenes of tox: The Journey of Rewriting a Python Tool with more than 10 Million Monthly Downloads

2023-04-19
talk

tox is a widely-used tool for automating testing in Python. In this talk, we will go behind the scenes of the creation of tox 4, the latest version of the tool. We will discuss the motivations for the rewrite, the challenges and lessons learned during the development process. We will have a look at the new features and improvements introduced in tox 4. But most importantly, you will get to know the maintainers.

Bringing NLP to Production (an end to end story about some multi-language NLP services)

2023-04-19
talk

Models in Natural Language Processing are fun to train but can be difficult to deploy. The size of their models, libraries and necessary files can be challenging, especially in a microservice environment. When services should be built as lightweight and slim as possible, large (language) models can lead to a lot of problems. With a recent real-world use case as an example, which runs productively for over a year and in 10 different languages, I will walk you through my experiences with deploying NLP models. What kind of pitfalls, shortcuts, and tricks are possible while bringing an NLP model to production?

In this talk, you will learn about different ways and possibilities to deploy NLP services. I will speak briefly about the way leading from data to model and a running service (without going into much detail) before I will focus on the MLOps part in the end. I will take you with me on my past journey of struggles and successes so that you don’t need to take these detours by yourselves.

Machine Learning Lifecycle for NLP Classification in E-Commerce

2023-04-19
talk

Running machine learning models in a production environment brings its own challenges. In this talk we would like to present our solution of a machine learning lifecycle for the text-based cataloging classification system from idealo.de. We will share lessons learned and talk about our experiences during the lifecycle migration from a hosted cluster to a cloud solution within the last 3 years. In addition, we will outline how we embedded our ML components as part of the overall idealo.de processing architecture.

You've got trust issues, we've got solutions: Differential Privacy

2023-04-19
talk

As we are in an era of big data where large groups of information are assimilated and analyzed, for insights into human behavior, data privacy has become a hot topic. Since there is a lot of private information which once leaked can be misused, all data cannot be released for research. This talk aims to discuss Differential Privacy, a cutting-edge technique of cybersecurity that claims to preserve an individual’s privacy, how it is employed to minimize the risks with private data, its applications in various domains, and how Python eases the task of employing it in our models with PyDP.