Search – talk-data.com

Title & Speakers	Event
PyData Berlin 2025 February Meetup 2025-02-19 · 17:45 Welcome to the PyData Berlin January meetup! Please provide your first and last name, current role and organization name for the registration because this is required for the venue's entry policy. If you cannot attend, please cancel your spot so others are able to join as the space is limited. We would like to welcome you all starting from 18:45. There will be food and drinks. The talks begin around 19.30 and the doors will close at 19:30. Make sure to arrive on time! Please provide your first and last name for the registration because this is required for the venue's entry policy. If you cannot attend, please cancel your spot so others are able to join as the space is limited. Host: Google is excited to welcome you to this month's version of PyData. ************************************************************************ The Lineup for the evening Talk 1: Vector Streaming: Memory-efficient Indexing to VectorDB Abstract: Embedding creation is mostly done synchronously; a lot of time is wasted while the chunks are being created, as chunking is not a compute-heavy operation. As the chunks are being made, passing them to the embedding model would be efficient. This problem further intensifies with late interaction embeddings like CoLBert or ColPali. The solution is to create an asynchronous chunking and embedding task. We can effectively spawn threads to handle this task using Rust's concurrency patterns and thread safety. This is done using Rust's MPSC (Multi-producer Single Consumer) module, which passes messages between threads. Thus, this creates a stream of chunks passed into the embedding thread with a buffer. Once the buffer is complete, it embeds the chunks and sends the embeddings back to the main thread, where they are sent to the vector database. This ensures no time is wasted on a single operation and no bottlenecks. Moreover, only the chunks and embeddings in the buffer are stored in the system memory. They are erased from the memory once moved to the vector database. All this is then bound into Python using pyo3 and maturin, so it's easily accessible from Python, but the core is still asynchronous with Rust. Speaker: Sonam Pankaj Bio*: Sonam is the Generative AI Evangelist at Articul8. She is also the co-creator of EmbedAnything, a Rust-based library that streamlines ingestion, inference, and indexing and provides you blazing speed in your genAI pipeline. She has previously worked as an AI researcher at Saama and has worked in clinical trials. Her work has been published in the ACL Anthology, and she is passionate about speaking at developer conferences and mentoring the tech community. Talk 2:* Fine-Tuning SLMs like Phi-3x-Vision: Insights from Scene Analysis in ADAS Abstract: Fine-tuning AI models isn’t just for big research labs—it’s a powerful way to address specific real-world problems and improve model performance. In this talk, Amit will discuss the key aspects of fine-tuning small language models (SLMs) like Phi-3x-vision and how one can setup fine-tuning workflow using Hugging Face libraries. He’ll outline key considerations, approaches, essential components, and tools that simplify the process. Using an Advanced Driver Assistance Systems (ADAS) use case, he will demonstrate how fine-tuning helped improve anomaly detection in road scene analytics. The session will cover practical challenges, lessons learned, and broader takeaways—providing a clear understanding of how fine-tuning can be applied across different domains. Whether you’re a data scientist, ML engineer, developer, or just curious about AI customization, this talk will offer valuable insights into making models more effective for specialized tasks. Speaker: Amit Tyagi Bio: Amit is a Lead Applied Scientist at Microsoft Healthcare and Life Sciences, leading the EU expansion of DAX Copilot. He focuses on developing and optimizing AI models for healthcare, ensuring they align with regulatory and operational needs in the region. Before this, he worked in the customer-facing Enterprise team on various AI-driven solutions, including an autonomous driving project where he fine-tuned vision and language models for scene analytics. He also contributed to AI projects in forecasting, retrieval-augmented generation (RAG) chatbots, and other domain-specific SLM fine-tuning efforts. Lightning talks There will be slots for 2-3 Lightning Talks (3-5 Minutes for each) between the two main talks. Kindly let us know if you would like to present something :) * NumFOCUS Code of Conduct THE SHORT VERSION Be kind to others. Do not insult or put down others. Behave professionally. Remember that harassment and sexist, racist, or exclusionary jokes are not appropriate for NumFOCUS. All communication should be appropriate for a professional audience including people of many different backgrounds. Sexual language and imagery are not appropriate. NumFOCUS is dedicated to providing a harassment-free community for everyone, regardless of gender, sexual orientation, gender identity, and expression, disability, physical appearance, body size, race, or religion. We do not tolerate harassment of community members in any form. Thank you for helping make this a welcoming, friendly community for all. If you haven't yet, please read the detailed version here: https://numfocus.org/code-of-conduct *	PyData Berlin 2025 February Meetup
PyData Berlin 2025 January Meetup 2025-01-15 · 17:45 Welcome to the PyData Berlin January meetup! We would like to welcome you all starting from 18:45. There will be food and drinks. The talks begin around 19.30 and the doors will close at 19:30. Make sure to arrive on time! Please provide your first and last name for the registration because this is required for the venue's entry policy. If you cannot attend, please cancel your spot so others are able to join as the space is limited. Host: GetYourGuide is excited to welcome you to this month's version of PyData. ************************************************************************ The Lineup for the evening Talk 1: Building Delivery Hero’s Product Semantic Similarity Using Real-Time Vector Search Abstract: Delivery Hero's Quick Commerce service offers a convenient way for customers to order items from grocery stores for delivery. However, the Marketplace has thousands of stores and millions of products. The company risks revenue loss and churn if Customers cannot easily find products that match their needs and preferences. In this talk, we will explore how Delivery Hero developed a Product Semantic Similarity recommender to identify Similar Products in multiple touchpoints of Customer's purchasing journey, by using Transformer-based product embeddings and Atlas Vector Search. Speaker: Fahad Yousaf Bio: Hailing from Pakistan, I am a Machine Learning Engineer at Delivery Hero. My previous roles include working at i2c Inc., and Turing. Over the course of my career, I have developed and productionized a range of machine learning applications, such as Call Transcription Analytics powered by Natural Language Processing, Mobile Remote Deposit Cheque processing using Computer Vision, Semantic Search systems etc. Outside of work, I enjoy learning about the mysteries of the universe. Talk 2: Introduction to the open-source world: It's all just a series of gifts! Abstract: The Python community is an incredibly welcome and diverse community, open to people from all backgrounds and of any experience level. With a vast variety of projects to get involved with, it can serve as your point of entry into the world of open-source software, whatever your interests might be. From PyGame for gamers interested in game engines to NumPy for data scientists and mathematicians, it's got something for everyone. In this talk, we'll set out on a journey through the different parts of the Python community and we'll use that to discuss ways to get started with open-source software, get a better idea of what makes a good contributor and hopefully give everyone a better picture of what to expect when getting into this very exciting world. Speaker: Lysandros Nikolaou Bio: Lysandros works as a Senior Software Engineer at Quansight Labs. He is a CPython core developer, specializing in the parser, the tokenizer and the REPL. He recently worked on supercharging f-strings in Python 3.12, the new REPL for Python 3.13 and introducing fast string ufuncs in NumPy 2.0. Currently, he's mostly dealing with improving support for free-threaded Python in the PyData ecosystem. Lightning talks There will be slots for 2-3 Lightning Talks (3-5 Minutes for each) between the two main talks. Kindly let us know if you would like to present something :) * NumFOCUS Code of Conduct THE SHORT VERSION Be kind to others. Do not insult or put down others. Behave professionally. Remember that harassment and sexist, racist, or exclusionary jokes are not appropriate for NumFOCUS. All communication should be appropriate for a professional audience including people of many different backgrounds. Sexual language and imagery are not appropriate. NumFOCUS is dedicated to providing a harassment-free community for everyone, regardless of gender, sexual orientation, gender identity, and expression, disability, physical appearance, body size, race, or religion. We do not tolerate harassment of community members in any form. Thank you for helping make this a welcoming, friendly community for all. If you haven't yet, please read the detailed version here: https://numfocus.org/code-of-conduct ***	PyData Berlin 2025 January Meetup
Event January Members Talk evening 2025-01-14
Predicting Delivery Risks with Machine Learning: A TrustCourier Innovation 2025-01-14 · 20:35 Claudia Stangarone – Data analyst @ GLS Studio , Helen FitzGerald – Data analyst Claims in logistics, especially for lost parcels or delivery issues, can be a significant cost for companies. In this talk, we’ll present the framework and share some early results of a new feature within TrustCourier. This feature uses machine learning to predict and flag high-risk deliveries before they escalate into costly claims. machine learning
Writing a custom scikit-learn estimator 2025-01-14 · 19:25 Tamara Atanasoska – Open Source Software Engineer @ :probably.. Scikit-learn is a popular machine learning library. It currently has over 200 estimators ready to use for a vast array of use cases. What if you are working on something special that still hasn't found its way into the library? Scikit-learn offers a way to write new compatible estimators, which can be seamlessly integrated with the rest of the library. We will look into what an estimator is, what API that scikit-learn estimators have, reasons why you would like to implement your own and an example of how to. We will end with real-world examples of how other OSS projects use this for their needs. scikit-learn Python
Contributing to OpenSource - how to get started in 5 minutes! 2025-01-14 · 19:20 Stefanie Senger – Open source developer @ :probabl. This talk will introduce scikit-learn users to the new API for metadata routing, a feature introduced in the recent releases and almost fully available since version 1.5 (released in May 2024). Python scikit-learn open source

Activities & events