PyConDE & PyData Berlin 2023

Coffee Break

2023-04-19

talk

Haystack for climate Q/A

2023-04-19

talk

Vibha Vikram Rao

NLP

How can NLP and Haystack help answer sustainability questions and fight climate change? In this talk we walkthrough our experience using Haystack to build Question Answering Models for the climate change and sustainability domain. We discuss how we did it, some of the challenges we faced, and what we learnt along the way!

Shrinking gigabyte sized scikit-learn models for deployment

2023-04-19

talk

Yasin Tatar , Pavel Zwerschke

AI/ML Scikit-learn

We present an open source library to shrink pickled scikit-learn and lightgbm models. We will provide insights of how pickling ML models work and how to improve the disk representation. With this approach, we can reduce the deployment size of machine learning applications up to 6x.

Teaching Neural Networks a Sense of Geometry

2023-04-19

talk

Jens Agerberg

AI/ML Python PyTorch Scikit-learn

By taking neural networks back to the school bench and teaching them some elements of geometry and topology we can build algorithms that can reason about the shape of data. Surprisingly these methods can be useful not only for computer vision – to model input data such as images or point clouds through global, robust properties – but in a wide range of applications, such as evaluating and improving the learning of embeddings, or the distribution of samples originating from generative models. This is the promise of the emerging field of Topological Data Analysis (TDA) which we will introduce and review recent works at its intersection with machine learning. TDA can be seen as being part of the increasingly popular movement of Geometric Deep Learning which encourages us to go beyond seeing data only as vectors in Euclidean spaces and instead consider machine learning algorithms that encode other geometric priors. In the past couple of years TDA has started to take a step out of the academic bubble, to a large extent thanks to powerful Python libraries written as extensions to scikit-learn or PyTorch.

Thou Shall Judge But With Fairness: Methods to Ensure an Unbiased Model

2023-04-19

talk

Nandana Sreeraj

AI/ML

Is your model prejudicial? Is your model deviating from the predictions it ought to have made? Has your model misunderstood the concept? In the world of artificial intelligence and machine learning, the word "fairness" is particularly common. It is described as having the quality of being impartial or fair. Fairness in ML is essential for contemporary businesses. It helps build consumer confidence and demonstrates to customers that their issues are important. Additionally, it aids in ensuring adherence to guidelines established by authorities. So guaranteeing that the idea of responsible AI is upheld. In this talk, let's explore how certain sensitive features are influencing the model and introducing bias into it. We'll also look at how we can make it better.

Unlocking Information - Creating Synthetic Data for Open Access.

2023-04-19

talk

Antonia Scherz

Data Vault

Many good project ideas fail before they even start due to the sensitive personal data required. The good news: a synthetic version of this data does not need protection. Synthetic data copies the actual data's structure and statistical properties without recreating personally identifiable information. The bad news: It is difficult to create synthetic data for open-access use, without recreating the exact copy of actual data. This talk will give hands-on insights into synthetic data creation and challenges along its lifecycle. We will learn how to create and evaluate synthetic data for any use case using the open-source package Synthetic Data Vault. We will find answers to why it takes so long to synthesize the huge amount of data dormant in public administration. The talk addresses owners who want to create access to their private data as well as analysts looking to use synthetic data. After this session, listeners will know which steps to take to generate synthetic data for multi-purpose use and its limitations for real-world analyses.

Accelerating Python Code

2023-04-19

talk

Jens Nie

Python Cyber Security

Python is a beautiful language for fast prototyping and and sketching ideas quickly. People often struggle to get their code into production though for various reasons. Besides of all security and safety concerns that usually are not addressed from the very beginning when playing around with an algorithmic idea, performance concerns are quite frequently a reason for not taking the Python code to the next level.

We will look at the "missing performance" worries using a simple numerical problem and how to speed the corresponding Python code up to top notch performance.

Advanced Visual Search Engine with Self-Supervised Learning (SSL) Representations and Milvus

2023-04-19

talk

Antoine Toubhans , Noé Achache

Vector DB

Image retrieval is the process of searching for images in a large database that are similar to one or more query images. A classical approach is to transform the database images and the query images into embeddings via a feature extractor (e.g., a CNN or a ViT), so that they can be compared via a distance metric. Self-supervised learning (SSL) can be used to train a feature extractor without the need for expensive and time-consuming labeled training data. We will use DINO's SSL method to build a feature extractor and Milvus, an open-source vector database built for evolutionary similarity search, to index image representation vectors for efficient retrieval. We will compare the SSL approach with supervised and pre-trained feature extractors.

Building Hexagonal Python Services

2023-04-19

talk

Shahriyar Rzayev

Python

The importance of enterprise architecture patterns is all well-known and applicable to varied types of tasks. Thinking about the architecture from the beginning of the journey is crucial to have a maintainable, therefore testable, and flexible code base. In We are going to explore the Ports and Adapters(Hexagonal) pattern by showing a simple web app using Repository, Unit of Work, and Services(Use Cases) patterns tied together with Dependency Injection. All those patterns are quite famous in other languages but they are relatively new for the Python ecosystem, which is a crucial missing part. As a web framework, we are going to use FastAPI which can be replaced with any framework in a matter of time because of the abstractions we have added.

Create interactive Jupyter websites with JupyterLite

2023-04-19

talk

Jérémy Tuloup

Data Science

Jupyter notebooks are a popular tool for data science and scientific computing, allowing users to mix code, text, and multimedia in a single document. However, sharing Jupyter notebooks can be challenging, as they require installing a specific software environment to be viewed and executed.

JupyterLite is a Jupyter distribution that runs entirely in the web browser without any server components. A significant benefit of this approach is the ease of deployment. With JupyterLite, the only requirement to provide a live computing environment is a collection of static assets. In this talk, we will show how you can create such static website and deploy it to your users.

Monorepos with Python

2023-04-19

talk

AbdealiLoKo

CI/CD Python

Working with python is fun. Managing python packaging, linters, tests, CI, etc. is not as fun.

Every maintainer needs to worry about consistent styling, quality, speed of tests, etc as the project grows.

Monorepos have been successful in other communities - how does it work in Python ?

The Spark of Big Data: An Introduction to Apache Spark

2023-04-19

talk

Pasha Finkelshteyn

API Big Data PySpark Python Spark SQL

Get ready to level up your big data processing skills! Join us for an introductory talk on Apache Spark, the distributed computing system used by tech giants like Netflix and Amazon. We'll cover PySpark DataFrames and how to use them. Whether you're a Python developer new to big data or looking to explore new technologies, this talk is for you. You'll gain foundational knowledge about Apache Spark and its capabilities, and learn how to leverage DataFrames and SQL APIs to efficiently process large amounts of data. Don't miss out on this opportunity to up your big data game!

Why GPU Clusters Don't Need to Go Brrr? Leverage Compound Sparsity to Achieve the Fastest Inference Performance on CPUs

2023-04-19

talk

Damian Bogunowicz

NLP

Forget specialized hardware. Get GPU-class performance on your commodity CPUs with compound sparsity and sparsity-aware inference execution. This talk will demonstrate the power of compound sparsity for model compression and inference speedup for NLP and CV domains, with a special focus on the recently popular Large Language Models. The combination of structured + unstructured pruning (to 90%+ sparsity), quantization, and knowledge distillation can be used to create models that run an order of magnitude faster than their dense counterparts, without a noticeable drop in accuracy. The session participants will learn the theory behind compound sparsity, state-of-the-art techniques, and how to apply it in practice using the Neural Magic platform.

Keynote - Lorem ipsum dolor sit amet

2023-04-19

talk

Miroslav Šedivý

A life without joy is like software without meaningful test data - it's uncertain and unreliable. The search for the perfect test data is a challenge. Real data should not be too real. Random data should not be too random. This is a randomly real and a really random journey to discover the balance between these two, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

Announcements

2023-04-19

talk

Social Gathering @BCC

2023-04-18

talk

Lightning Talks 60Min

2023-04-18

talk

PyLadies Workshop

2023-04-18

talk

A workshop for PyLadies members with the Berlin Tech Workers Council discussing the legal frameworks on contracts and termination agreements, as well as how employees can defend themselves in situations where they are made redundant due to mass layoffs.

Coffee Break

2023-04-18

talk

Coffee Break

2023-04-18

talk

Coffee Break

2023-04-18

talk

Coffee Break

2023-04-18

talk

Coffee Break

2023-04-18

talk

Ask-A-Question: an FAQ-answering service for when there's little to no data

2023-04-18

talk

Suzin You

Data Science

Doing data science in international development often means finding the right-sized solution in resource-constrained settings.

This talk walks you through how my team helped answer thousands of questions from pregnant folks and new parents on a South African maternal and child health helpline, which model we ended up choosing and why (hint: resource-constraints!), and how we've packaged everything into a service that anyone can start for themselves,

By the end of the talk, I hope you'll know how to start your own FAQ-answering service and learn about one example of doing data science in international development.

Neo4j graph databases for climate policy

2023-04-18

talk

Marcus Tedesco

Neo4j Python

In this talk we walkthrough our experience using Neo4j and Python to model climate policy as a graph database. We discuss how we did it, some of the challenges we faced, and what we learnt along the way!

talk-data.com

PyConDE & PyData Berlin 2023

Top Topics

Top Speakers

Coffee Break

Haystack for climate Q/A

Shrinking gigabyte sized scikit-learn models for deployment

Teaching Neural Networks a Sense of Geometry

Thou Shall Judge But With Fairness: Methods to Ensure an Unbiased Model

Unlocking Information - Creating Synthetic Data for Open Access.

Accelerating Python Code

Advanced Visual Search Engine with Self-Supervised Learning (SSL) Representations and Milvus

Building Hexagonal Python Services

Create interactive Jupyter websites with JupyterLite

Monorepos with Python

The Spark of Big Data: An Introduction to Apache Spark

Why GPU Clusters Don't Need to Go Brrr? Leverage Compound Sparsity to Achieve the Fastest Inference Performance on CPUs

Keynote - Lorem ipsum dolor sit amet

Announcements

Social Gathering @BCC

Lightning Talks 60Min

PyLadies Workshop

Coffee Break

Coffee Break

Coffee Break

Coffee Break

Coffee Break

Ask-A-Question: an FAQ-answering service for when there's little to no data

Neo4j graph databases for climate policy