talk-data.com talk-data.com

Event

PyConDE & PyData Berlin 2023

2023-04-17 – 2023-04-19 PyData

Activities tracked

191

Sessions & talks

Showing 126–150 of 191 · Newest first

Search within this event →

What could possibly go wrong? - An incomplete guide on how to prevent, detect & mitigate biases in data products

2023-04-18
talk

Within this talk, I want to look at the topic of data ethics with a practical lens and facilitate the discussion about how we can establish ethical data practices into our day to day work. I will shed some light on the multiple sources of biases in data applications: Where are potential pitfalls and how can we prevent, detect and mitigate them early so they never become a risk for our data product. I will walk you through the different stages of a data product lifecycle and dive deeper into the questions we as data professionals have to ask ourselves throughout the process. Furthermore, I will present methods, tools and libraries that can support our work. Being well aware that there is no universal solution as tools and strategies need to be chosen to specifically address requirements of the use-case and models at hand, my talk will provide a good starting point for your own data ethics journey.

Coffee Break

2023-04-18
talk

Coffee Break

2023-04-18
talk

Coffee Break

2023-04-18
talk

Coffee Break

2023-04-18
talk

Coffee Break

2023-04-18
talk

Coffee Break

2023-04-18
talk

Coffee Break

2023-04-18
talk

Keynote - How Are We Managing? Data Teams Management IRL

2023-04-18
talk

The title “Data Scientist” has been in use for 15 years now. We have been attending PyData conferences for over 10 years as well. The hype around data science and AI seems higher than ever before. But How are we managing?

Announcements 15Min

2023-04-18
talk

Lightning Talks 60Min

2023-04-17
talk

Coffee Break

2023-04-17
talk

Coffee Break

2023-04-17
talk

Coffee Break

2023-04-17
talk

Coffee Break

2023-04-17
talk

Coffee Break

2023-04-17
talk

Coffee Break

2023-04-17
talk

Coffee Break

2023-04-17
talk

BHAD: Explainable unsupervised anomaly detection using Bayesian histograms

2023-04-17
talk

The detection of outliers or anomalous data patterns is one of the most prominent machine learning use cases in industrial applications. I present a Bayesian histogram anomaly detector (BHAD), where the number of bins is treated as an additional unknown model parameter with an assigned prior distribution. BHAD scales linearly with the sample size and enables a straightforward explanation of individual scores, which makes it very suitable for industrial applications when model interpretability is crucial. I study the predictive performance of the proposed BHAD algorithm with various SoA anomaly detection approaches using simulated data and also using popular benchmark datasets for outlier detection. The reported results indicate that BHAD has very competitive predictive accuracy compared to other more complex and computationally more expensive algorithms, while being explainable and fast.

Building a Personal Assistant With GPT and Haystack: How to Feed Facts to Large Language Models and Reduce Hallucination.

2023-04-17
talk

Large Language Models (LLM), like ChatGPT, have shown miraculous performances on various tasks. But there are still unsolved issues with these models: they can be confidently wrong and their knowledge becomes outdated. GPT also does not have any of the information that you have stored in your own data. In this talk, you'll learn how to use Haystack, an open source framework, to chain LLMs with other models and components to overcome these issues. We will build a practical application using these techniques. And you will walk away with a deeper understanding of how to use LLMs to build NLP products that work.

FastAPI and Celery: Building Reliable Web Applications with TDD

2023-04-17
talk

In this talk, we will explore how to use the FastAPI web framework and Celery task queue to build reliable and scalable web applications in a test-driven manner. We will start by setting up a testing environment and writing unit tests for the core functionality of our application. Next, we will use FastAPI to create an api to perform some long-running task. Finally, we will then see how Celery can help us offload long-running tasks and improve the performance of our application. By the end of this talk, attendees will have a strong understanding of TDD and how to apply it to your FastAPI and Celery projects, and you will be able to write tests that ensure the reliability and maintainability of your code.

How to build observability into a ML Platform

2023-04-17
talk

As machine learning becomes more prevalent across nearly every business and industry, making sure that these technologies are working and delivering quality is critical. In her talk, Alicia will discuss the importance of machine learning observability and why it should be a fundamental tool of modern machine learning architectures. Not only does it ensure models are accurate, but it helps teams iterate and improve models quicker. Alicia will dive into how Shopify has been prototyping building observability into different parts of its machine learning platform. This talk will provide insights on how to track model performance, how to catch any unexpected or erroneous behaviour, what types of behavior to look for in your data (e.g. drift, quality metrics) and in your model/predictions, and how observability could work with large language models and Chat AIs.

Specifying behavior with Protocols, Typeclasses or Traits. Who wears it better (Python, Scala 3, Rust)?

2023-04-17
talk

In this talk, we will explore the use of Python's typing.Protocol, Scala's Typeclasses, and Rust's Traits. They all offer a very powerful & elegant mechanism for abstracting over various concepts (such as Serialization) in a modular manner. We will compare and contrast the syntax and implementation of these constructs in each language and discuss their strengths and weaknesses. We will also look at real-world examples of how these features are used in each language to specify behavior, and consider differences in terms of type system expressiveness and effectiveness. By the end of the talk, attendees will have a better understanding of the differences and similarities between these three language features, and will be able to make informed decisions about which one is best suited for their needs.

A concrete guide to time-series databases with Python

2023-04-17
talk

We evaluated time-series databases and complementary services to stream-process sensor data. In this talk, our evaluation will be presented. The final implementation will be shown, alongside python-tools we’ve built and lessons learned during the process.

Driving down the Memray lane - Profiling your data science work

2023-04-17
talk

When handling a large amount of data, memory profiling the data science workflow becomes more important. It gives you insight into which process consumes lots of memory. In this talk, we will introduce Mamray, a Python memory profiling tool and its new Jupyter plugin.