talk-data.com
People (12 results)
See all 12 →Activities & events
| Title & Speakers | Event |
|---|---|
|
PyData Berlin 2025 February Meetup
2025-02-19 · 17:45
Welcome to the PyData Berlin January meetup! Please provide your first and last name, current role and organization name for the registration because this is required for the venue's entry policy. If you cannot attend, please cancel your spot so others are able to join as the space is limited. We would like to welcome you all starting from 18:45. There will be food and drinks. The talks begin around 19.30 and the doors will close at 19:30. Make sure to arrive on time! Please provide your first and last name for the registration because this is required for the venue's entry policy. If you cannot attend, please cancel your spot so others are able to join as the space is limited. Host: Google is excited to welcome you to this month's version of PyData. ************************************************************************** The Lineup for the evening Talk 1: Vector Streaming: Memory-efficient Indexing to VectorDB Abstract: Embedding creation is mostly done synchronously; a lot of time is wasted while the chunks are being created, as chunking is not a compute-heavy operation. As the chunks are being made, passing them to the embedding model would be efficient. This problem further intensifies with late interaction embeddings like CoLBert or ColPali. The solution is to create an asynchronous chunking and embedding task. We can effectively spawn threads to handle this task using Rust's concurrency patterns and thread safety. This is done using Rust's MPSC (Multi-producer Single Consumer) module, which passes messages between threads. Thus, this creates a stream of chunks passed into the embedding thread with a buffer. Once the buffer is complete, it embeds the chunks and sends the embeddings back to the main thread, where they are sent to the vector database. This ensures no time is wasted on a single operation and no bottlenecks. Moreover, only the chunks and embeddings in the buffer are stored in the system memory. They are erased from the memory once moved to the vector database. All this is then bound into Python using pyo3 and maturin, so it's easily accessible from Python, but the core is still asynchronous with Rust. Speaker: Sonam Pankaj Bio: Sonam is the Generative AI Evangelist at Articul8. She is also the co-creator of EmbedAnything, a Rust-based library that streamlines ingestion, inference, and indexing and provides you blazing speed in your genAI pipeline. She has previously worked as an AI researcher at Saama and has worked in clinical trials. Her work has been published in the ACL Anthology, and she is passionate about speaking at developer conferences and mentoring the tech community. Talk 2: Fine-Tuning SLMs like Phi-3x-Vision: Insights from Scene Analysis in ADAS Abstract: Fine-tuning AI models isn’t just for big research labs—it’s a powerful way to address specific real-world problems and improve model performance. In this talk, Amit will discuss the key aspects of fine-tuning small language models (SLMs) like Phi-3x-vision and how one can setup fine-tuning workflow using Hugging Face libraries. He’ll outline key considerations, approaches, essential components, and tools that simplify the process. Using an Advanced Driver Assistance Systems (ADAS) use case, he will demonstrate how fine-tuning helped improve anomaly detection in road scene analytics. The session will cover practical challenges, lessons learned, and broader takeaways—providing a clear understanding of how fine-tuning can be applied across different domains. Whether you’re a data scientist, ML engineer, developer, or just curious about AI customization, this talk will offer valuable insights into making models more effective for specialized tasks. Speaker: Amit Tyagi Bio: Amit is a Lead Applied Scientist at Microsoft Healthcare and Life Sciences, leading the EU expansion of DAX Copilot. He focuses on developing and optimizing AI models for healthcare, ensuring they align with regulatory and operational needs in the region. Before this, he worked in the customer-facing Enterprise team on various AI-driven solutions, including an autonomous driving project where he fine-tuned vision and language models for scene analytics. He also contributed to AI projects in forecasting, retrieval-augmented generation (RAG) chatbots, and other domain-specific SLM fine-tuning efforts. Lightning talks There will be slots for 2-3 Lightning Talks (3-5 Minutes for each) between the two main talks. Kindly let us know if you would like to present something :) *** NumFOCUS Code of Conduct THE SHORT VERSION Be kind to others. Do not insult or put down others. Behave professionally. Remember that harassment and sexist, racist, or exclusionary jokes are not appropriate for NumFOCUS. All communication should be appropriate for a professional audience including people of many different backgrounds. Sexual language and imagery are not appropriate. NumFOCUS is dedicated to providing a harassment-free community for everyone, regardless of gender, sexual orientation, gender identity, and expression, disability, physical appearance, body size, race, or religion. We do not tolerate harassment of community members in any form. Thank you for helping make this a welcoming, friendly community for all. If you haven't yet, please read the detailed version here: https://numfocus.org/code-of-conduct *** |
PyData Berlin 2025 February Meetup
|
|
Nov 22 - Berlin AI, Machine Learning and Computer Vision Meetup
2024-11-22 · 16:30
Register to reserve your spot! Date and Time Nov 22, 2024 from 5:30 PM to 8:30 PM Location The Meetup will take place at MotionLab.Berlin, Bouchéstraße 12/Halle 20 in Berlin When the Medium is the Message: Addressing Input Biases in Multimodal/Multilingual Models An embedding model is trained to produce outputs that ensure that semantic similarity is preserved as distance in embedding spaces — like is near like and far from unlike. But models trained with diverse kinds of inputs, i.e. different media and different languages, learn to treat those properties as semantic properties. Two pictures are more “semantically alike” than a picture and a descriptive text that matches it. Similar problems arise with multilingual models: Two English sentences are more alike than an English sentence and a Chinese translation. This undermines the general utility of embedding models. This presentation shows evidence of where this comes from and offers approaches to mitigate the problem. About the Speaker Scott Martens is a long-term veteran of AI and NLP research, having started working at AI start-ups in 1994, and a KU Leuven graduate with a doctorate in linguistics. His background includes machine translation development and the intersection between linguistics, philology, and modern AI. Dr. Martens is a Senior Content Manager and Evangelist at Jina AI in Berlin. Vector Streaming: The Memory Efficient Indexing for Vector Databases Vector databases are everywhere, powering LLMs. Indexing vectors, especially multivector embeddings like ColPali and Colbert, at a bulk is memory intensive. Vector streaming solves this problem by parallelizing the tasks of parsing, chunking, and embedding generation and indexing it continuously chunk by chunk instead of bulk. This not only increase the speed but also makes the whole task more optimized and memory efficient. Supports, Weaviate, Elastic and Pinecone. About the Speaker Sonam Pankaj is a GenerativeAI Evangelist at Articul8-ai and the co-creator and maintainer of the open-source library called Embed-Anything, which helps to create local dense, splade, and multimodal embeddings and index them to vector databases; it’s built-in Rust for speed and efficiency . She worked previously at Qdrant Engine and Rasa. Previously, she also worked as an AI researcher at Saama and has worked extensively on clinical trial analytics. She is passionate about topics like metric learning and biases in language models. She has also published a paper in the most reputed journal of computational linguistics, COLING, in ACL Anthology. How to Unlock More Value from Self-Driving Datasets AV/ADAS is one of the most advanced fields in Visual AI. However, getting your hands on a high quality dataset can be tough, let alone working with them to get a model to production. In this talk, I will show you the leading methods and tools to help visualize as well take these datasets to the next level. I will demonstrate how to clean and curate AV datasets as well as perform state of the art augmentations using diffusion models to create synthetic data that can empower the self driving car models of the future, About the Speaker Daniel Gural is a seasoned Machine Learning Engineer at Voxel51 with a strong passion for empowering Data Scientists and ML Engineers to unlock the full potential of their data. When the Medium is the Message: Addressing Input Biases in Multimodal/Multilingual Models Thanks to deep learning, autonomous cars equipped with cameras and LiDAR can accurately recognize common objects such as cars, streets, and pedestrians, significantly enhancing their understanding of the environment. However, these models often display overconfidence, which can result in misidentifications. For example, consider an exaggerated scenario where an elephant on the street might be mistakenly identified as a trunk because the model has not been trained to recognize elephants. This issue stems from the models’ design to make decisions rather than acknowledge uncertainty by saying ‘I don’t know.’ In this talk, we will discuss how models can recognize their limitations and avoid making uncertain decisions, particularly through the lens of an autonomous car About the Speaker Hanieh Shojaei is a PhD researcher at the Institute of Cartography and Geoinformatics (IKG) at Leibniz University Hannover, specializing in uncertainty estimation and reliability of AI models. Her research focuses on using deep learning for LiDAR scene segmentation to enhance environmental perception and assess prediction reliability for autonomous vehicles. |
Nov 22 - Berlin AI, Machine Learning and Computer Vision Meetup
|
|
Computer Vision Technical Talks at Motionlab Berlin
2024-11-22 · 16:30
"Vector Streaming: Memory Efficient Indexing for Vector Databases" Speaker: Sonam Pankaj, Starlight / Embed-Anything An exploration of memory-efficient indexing techniques for vector databases, focusing on the use of vector streaming to optimize performance in high-dimensional data applications. "How to Unlock More Value from Self-Driving Datasets" Speaker: Dan Gural, Voxel51 A discussion on methods for handling self-driving datasets to enhance training and deployment in autonomous driving models. Registration Please register through Voxel51's page to confirm your attendance. |
Computer Vision Technical Talks at Motionlab Berlin
|
|
Lightning Talk: Stephen Batifol
2023-12-07 · 20:30
Stephen Batifol
– Machine Learning Engineer
@ Wolt
|
|
|
Lightning Talk: Sonam Pankaj
2023-12-07 · 20:25
Sonam Pankaj
– Developer Advocate
@ QDrant
|
|
|
Lightning Talk: Marina Zemskova
2023-12-07 · 20:20
Marina Zemskova
– Senior Data Scientist
@ GetYourGuide
|
|
|
Demetrios Brinkmann
– Chief Vibe Officer
@ MLOps Community
,
Asma Zgolli
– Machine Learning Engineer
,
Verena Weber
– ML Scientist, Generative AI strategy
,
Janina J. Renk
– Machine Learning Engineer in Medical AI
@ Medical AI Analytics & Information GmbH
Panel discussion on the current state of gender diversity in the field of machine learning. |
|
|
Making protein biomaterials with AI
2023-12-07 · 19:00
Pierre Salvy
– Head of Engineering
@ Cambrium
,
Lucile Bonnin
– Head of R&D
@ Cambrium
Main talk by Pierre Salvy and Lucile Bonnin |
|