talk-data.com talk-data.com

Filter by Source

Select conferences and events

Activities & events

Title & Speakers Event

If you like cool talks about 🧑‍🔬 Data Science, 🤖Artificial Intelligence, 🐍 coding or 🤗 community, the Big PyData BBQ is the place to be! 🔥

KÖNIGSWEG and hei_INNOVATION invite you to join the 5️⃣th edition of the annual big gathering of the PyData Südwest community. Besides talks there will be a lot of time for networking over a delicious 🥦+🍖 BBQ .

This year's topic: Large Language Models 🔥🤩

Confirmed Speakers

  1. Ines Montani (ExplosionAI / spaCy)
  2. Alejandro Saucedo (Director of Eng, Science, Product & Analytics Zalando / Institute for Ethical AI & Machine Learning)
  3. Michael Gertz (Institute of Computer Science, Heidelberg University)
  4. Alina Lenhardt (Cerence)

The event will be live streamed and published on PyDataTV.

18:00 Welcome 👋 📺 18:20 Talk Ines Montani 📺 19:00 BBQ 🍖🥦 UPDATE: 20:30 🛋️ Panel: Alejandro, Alexander, Alina, Ines, Michael 📺 UPDATE: 21:30 ⚡ Lightning Talks. 📺 UPDATE: 21:30 Networking. 🍻 UPDATE: 22:00 End. 📺 = live stream 🍖🥦, 🍻 = locally, only

About our speakers:

Ines Montani, a renowned software developer, is a co-founder of Explosion AI, a digital laboratory specializing in artificial intelligence and machine learning. She is a lead developer of spaCy, a widely used open-source library for advanced natural language processing (NLP) in Python. Together with Matthew Honnibal, she also developed Prodigy, a machine learning annotation tool that aids in the efficient creation of training data. Montani is an advocate for OSS, working tirelessly to make the fields of AI and ML more accessible.

Alejandro Saucedo is a technology entrepreneur and software engineer known for his work in machine learning and artificial intelligence. He is Director of Eng, Science, Product & Analytics at Zalando and the Chief Scientist at The Institute for Ethical AI & Machine Learning, a London-based research organization focused on developing best practices for machine learning and artificial intelligence, among others. Saucedo has a strong technical background and has worked in software development, ML and data science. He has spoken at numerous events and is promoting ethical practices in AI development

Michael Gertz is a full professor at Heidelberg University where he heads the Database Systems Research Group at the faculty of Mathematics and Computer Science. He received his diploma in Computer Science from the TU Dortmund University, and his Dr. rer. nat. from the Leibniz University of Hannover in 1996. From 1997 until 2008 he was a faculty at Department of Computer Science at the University of California at Davis. His interdisciplinary research interests include text analytics, data mining, complex networks, and scientific data management, with applications in the medical sciences, law, physics, political sciences, and economics.

Alina Lenhardt is a computational linguistat at Cerence and Program Committee Chair at PyConDE & PyData Berlin 2023.

Alexander Hendorf is one of the organizers of PyData Südwest and is heavily involved in the Python & PyData community. For him, contributing to open source and the community means giving something back, as his company Königsweg uses open source to implement Data Science & AI for its customers.

⚡️ Lightning Talks (5 min. each)

1. Alessandro Angioi - Supercharge your language learning journey with Python 2. Bela Stoyan - Automatically transform complex python methods to polars expressions 3. Irina Smirnova-Pinchukova - Croshapes - using graph to design a toy

😍 A big thank you to our sponsors:

Contact If you have any questions or suggestions, please feel free to contact us via:

Big PyData BBQ #5: Large Language Models
Closing Session 2023-04-19 · 14:10
Coffee Break 2023-04-19 · 13:40
Coffee Break 2023-04-19 · 13:40
Coffee Break 2023-04-19 · 13:40
Coffee Break 2023-04-19 · 13:40
Coffee Break 2023-04-19 · 13:40
Coffee Break 2023-04-19 · 13:40
Coffee Break 2023-04-19 · 13:40

Today state of the art technology and scientific research strongly depend on open source libraries. The demographic of the contributors to these libraries is predominantly white and male [1][2][3][4]. This situation creates problems not only for individual contributors outside of this demographic but also for open source projects such as loss of career opportunities and less robust technologies, respectively [1][7]. In recent years there have been a number of various recommendations and initiatives to increase the participation in open source projects of groups who are underrepresented in this domain [1][3][5][6]. While these efforts are valuable and much needed, contributor diversity remains a challenge in open source communities [2][3][7]. This talk highlights the underlying problems and explores how we can overcome them.

Rethinking codes of conduct 2023-04-19 · 13:10
Tereza Iofciu – guest

Did you know that the Python Software Foundation Code of Conduct is turning 10 years old in 2023? It was voted in as they felt they were “unbalanced and not seeing the true spectrum of the greater community”. Why is that a big thing? Come to my talk and find out!

Python

After a decade of writing code, I joined the application security team. During the transition process, I discovered that there are many myths about security, and how difficult it is. Often devs choose to ignore it because they think that writing more secure code would take them ages. It is not true. Security doesn’t have to be scary. From my talk, you will learn the most useful piece from the Application Security theory. It will be practical and not boring at all.

Cyber Security

Developers often use code coverage as a target, which makes it a bad measure of test quality.

Mutation testing changes the game: create mutant versions of your code that break your tests, and you'll quickly start to write better tests!

Come and learn to use it as part of your CI/CD process. I promise, you'll never look at penguins the same way again!

CI/CD

The Modern Data Stack has brought a lot of new buzzwords into the data engineering lexicon: "data mesh", "data observability", "reverse ETL", "data lineage", "analytics engineering". In this light-hearted talk we will demystify the evolving revolution that will define the future of data analytics & engineering teams.

Our journey begins with the PyData Stack: pandas pipelines powering ETL workflows...clean code, tested code, data validation, perfect for in-memory workflows. As demand for self-serve analytics grows, new data sources bring more APIs to model, more code to maintain, DAG workflow orchestration tools, new nuances to capture ("the tax team defines revenue differently"), more dashboards, more not-quite-bugs ("but my number says this...").

This data maturity journey is a well-trodden path with common pitfalls & opportunities. After dashboards comes predictive modelling ("what will happen"), prescriptive modelling ("what should we do?"), perhaps eventually automated decision making. Getting there is much easier with the advent of the Python Powered Modern Data Stack.

In this talk, we will cover the shift from ETL to ELT, the open-source Modern Data Stack tools you should know, with a focus on how dbt's new Python integration is changing how data pipelines are built, run, tested & maintained. By understanding the latest trends & buzzwords, attendees will gain a deeper insight into Python's role at the core of the future of data engineering.

Analytics Analytics Engineering API Data Analytics Data Engineering dbt ETL/ELT Modern Data Stack Pandas Python

Discover how Infrastructure From Code (IfC) can revolutionize Cloud DevOps automation by generating cloud deployment templates directly from Python code. Learn how this technology empowers Python developers to easily deploy and operate cost-effective, secure, reliable, and sustainable cloud software. Join us to explore the strategic potential of IfC.

Cloud Computing DevOps Python

Do you struggle with PRs? Have you ever had to change code even though you disagreed with the change just to land the PR? Have you ever given feedback that would have improved the code only to get into a comment war? We'll discuss how to give and receive feedback to extract maximum value from it and avoid all the communication problems that come with PRs.

Tired of having to handle asynchronous processes for neuroevolution? Do you want to leverage massive vectorization and high-throughput accelerators for evolution strategies (ES)? evosax allows you to leverage JAX, XLA compilation and auto-vectorization/parallelization to scale ES to your favorite accelerators. In this talk we will get to know the core API and how to solve distributed black-box optimization problems with evolution strategies.

API GitHub
The Beauty of Zarr 2023-04-19 · 12:35

In this talk, I’d be talking about Zarr, an open-source data format for storing chunked, compressed N-dimensional arrays. This talk presents a systematic approach to understanding and implementing Zarr by showing how it works, the need for using it, and a hands-on session at the end. Zarr is based on an open technical specification, making implementations across several languages possible. I’d mainly talk about Zarr’s Python implementation and show how it beautifully interoperates with the existing libraries in the PyData stack.

GitHub HTML Python

Asynchronous programming is a type of parallel programming in which a unit of work is allowed to run separately from the primary application thread. Post execution, it notifies the main thread about the completion or failure of the worker thread. There are numerous benefits to using it, such as improved application performance, enhanced responsiveness, and effective usage of CPU.

Asynchronicity seems to be a big reason why Node.js is so popular for server-side programming. Most of the code we write, especially in heavy IO applications like websites, depends on external resources. This could be anything from a remote database POST API call. As soon as you ask for any of these resources, your code is waiting around for process completion with nothing to do. With asynchronous programming, you allow your code to handle other tasks while waiting for these other resources to respond.

In this session, we are going to talk about asynchronous programming in Python. Its benefits and multiple ways to implement it.

API JavaScript Python
Aleksander Molak – Causal Ambassador

With an average of 3.2 new papers published on Arxiv every day in 2022, causal inference has exploded in popularity, attracting large amount of talent and interest from top researchers and institutions including industry giants like Amazon or Microsoft. Text data, with its high complexity, posits an exciting challenge for causal inference community. In the workshop, we'll review the latest advances in the field of Causal NLP and implement a causal Transformer model to demonstrate how to translate these developments into a practical solution that can bring real business value. All in Python!

Microsoft NLP Python