talk-data.com talk-data.com

Topic

Python

programming_language data_science web_development

1446

tagged

Activity Trend

185 peak/qtr
2020-Q1 2026-Q1

Activities

1446 activities · Newest first

Spanish: Charla Programs that write prompts with DSPy por Martín Quesada. Los sistemas que interactúan con modelos generativos y dependen de prompts escritos a mano son frágiles, y resulta difícil actualizarlos cuando hay un cambio. DSPy es una librería de código abierto que permite representar flujos de IA como código. En esta charla aprenderemos cómo utilizarla para sustituir prompts artesanales por módulos de Python optimizados automáticamente. Martín Quesada es Senior Data Scientist en Datamaran. En el turno de preguntas se permitirá tanto el uso de inglés como español. Tras la charla se ofrecerá un aperitivo para animar el networking, cortesía de Datamaran. English: Talk Programs that write prompts with DSPy by Martín Quesada. Systems that interact with generative models and rely on hand-written prompts are often fragile and hard to maintain: any change to the metrics or model requires updating text blobs through trial and error. DSPy is an open-source framework that tackles this issue by allowing developers to build complete AI workflows with pure Python code. In this talk, we will learn how to use it to replace handcrafted prompts with compact modules that are automatically optimized. Martín Quesada is Senior Data Scientist at Datamaran. You will be free to ask questions in Spanish or English during the Q&A. After the talk, we will enjoy some networking with appetizers, courtesy of Datamaran.

Time Series Forecasting Using Foundation Models

Make accurate time series predictions with powerful pretrained foundation models! You don’t need to spend weeks—or even months—coding and training your own models for time series forecasting. Time Series Forecasting Using Foundation Models shows you how to make accurate predictions using flexible pretrained models. In Time Series Forecasting Using Foundation Models you will discover: The inner workings of large time models Zero-shot forecasting on custom datasets Fine-tuning foundation forecasting models Evaluating large time models Time Series Forecasting Using Foundation Models teaches you how to do efficient forecasting using powerful time series models that have already been pretrained on billions of data points. You’ll appreciate the hands-on examples that show you what you can accomplish with these amazing models. Along the way, you’ll learn how time series foundation models work, how to fine-tune them, and how to use them with your own data. About the Technology Time-series forecasting is the art of analyzing historical, time-stamped data to predict future outcomes. Foundational time series models like TimeGPT and Chronos, pre-trained on billions of data points, can now effectively augment or replace painstakingly-built custom time-series models. About the Book Time Series Forecasting Using Foundation Models explores the architecture of large time models and shows you how to use them to generate fast, accurate predictions. You’ll learn to fine-tune time models on your own data, execute zero-shot probabilistic forecasting, point forecasting, and more. You’ll even find out how to reprogram an LLM into a time series forecaster—all following examples that will run on an ordinary laptop. What's Inside How large time models work Zero-shot forecasting on custom datasets Fine-tuning and evaluating foundation models About the Reader For data scientists and machine learning engineers familiar with the basics of time series forecasting theory. Examples in Python. About the Author Marco Peixeiro builds cutting-edge open-source forecasting Python libraries at Nixtla. He is the author of Time Series Forecasting in Python. Quotes Clear and hands-on, featuring both theory and easy-to-follow examples. - Eryk Lewinson, Author of Python for Finance Cookbook Bridges the gap between classical forecasting methods and the new developments in the foundational models. A fantastic resource. - Juan Orduz, PyMC Labs A foundational guide to forecasting’s next chapter. - Tyler Blume, daybreak An immensely practical introduction to forecasting using foundation models. - Stephan Kolassa, SAP Switzerland

In the analysis of diverse omics data, a common and important preliminary step involves computing low-dimensional embeddings using techniques such as PCA, UMAP, t-SNE, or variational autoencoders. These embeddings provide a global overview of sample distributions and their relationships, often serving as the basis for formulating biological hypotheses. To facilitate rapid and intuitive exploration of such low-dimensional embeddings, we developed Yomix, a interactive omics-agnostic visualisation and data exploration tool. Yomix enables users to flexibly define subsets of interest using a lasso selection tool, instantly compute their feature signatures, and compare their distributions. Yomix is a fast and efficient tool for interactive exploration of diverse omics datasets.

Build a multi-agent application leveraging MCP (Model Context Protocol) with the Microsoft Agent Framework in C# or LangGraph in Python, integrated with Azure Cosmos DB for scalable and high-performance data persistence and retrieval. Define agents, functions, and external service integrations, implement memory, state management, and semantic search using Azure Cosmos DB. By the end, you’ll have a robust AI agent system designed for real-world applications.

Please RSVP and arrive at least 5 minutes before the start time, at which point remaining spaces are open to standby attendees.

Scikit-learn now makes it easier to explore estimators by displaying their parameter values and allowing them to be copied. In the next release, each parameter will also include a short documentation preview and a link to the full reference page. More enhancements are on the way to make model inspection even richer and more intuitive. This work blends front-end development with Python. Dea's path into open source and the PyData ecosystem started with a desire for a new career direction and a lifelong curiosity for technical challenges.

This demo-heavy session highlights the enhanced MSSQL extension for Visual Studio Code, now more robust than ever with new AI-driven enhancements to streamline your SQL development experience. With GitHub Copilot, you can move faster from schema to code, generate sample data, explore relationships, and help your app and backend stay in sync. With our latest mssql-python driver, you can develop with ease, security, and performance, across SQL Server, Azure SQL and SQL database in Fabric.

Pro Oracle GoldenGate 23ai for the DBA: Powering the Foundation of Data Integration and AI

Transform your data replication strategy into a competitive advantage with Oracle GoldenGate 23ai. This comprehensive guide delivers the practical knowledge DBAs and architects need to implement, optimize , and scale Oracle GoldenGate 23ai in production environments. Written by Oracle ACE Director Bobby Curtis, it blends deep technical expertise with real-world business insights from hundreds of implementations across manufacturing, financial services, and technology sectors. Beyond traditional replication, this book explores the groundbreaking capabilities that make GoldenGate 23ai essential for modern AI initiatives. Learn how to implement real-time vector replication for RAG systems, integrate with cloud platforms like GCP and Snowflake, and automate deployments using REST APIs and Python. Each chapter offers proven strategies to deliver measurable ROI while reducing operational risk. Whether you're upgrading from Classic GoldenGate , deploying your first cloud data pipeline, or building AI-ready data architectures, this book provides the strategic guidance and technical depth to succeed. With Bobby's signature direct approach, you'll avoid common pitfalls and implement best practices that scale with your business. What You Will Learn Master the microservices architecture and new capabilities of Oracle GoldenGate 23ai Implement secure, high-performance data replication across Oracle, PostgreSQL, and cloud databases Configure vector replication for AI and machine learning workloads, including RAG systems Design and build multi-master replication models with automatic conflict resolution Automate deployments and management using RESTful APIs and Python Optimize performance for sub-second replication lag in production environments Secure your replication environment with enterprise-grade features and compliance Upgrade from Classic to Microservices architecture with zero downtime Integrate with cloud platforms including OCI, GCP, AWS, and Azure Implement real-time data pipelines to BigQuery , Snowflake, and other cloud targets Navigate Oracle licensing models and optimize costs Who This Book Is For Database administrators, architects, and IT leaders working with Oracle GoldenGate —whether deploying for the first time, migrating from Classic architecture, or enabling AI-driven replication—will find actionable guidance on implementation, performance tuning, automation, and cloud integration. Covers unidirectional and multi-master replication and is packed with real-world use cases.

Summary  In this episode Preeti Somal, EVP of Engineering at Temporal, talks about the durable execution model and how it reshapes the way teams build reliable, stateful systems for data and AI. She explores Temporal’s code‑first programming model—workflows, activities, task queues, and replay—and how it eliminates hand‑rolled retry, checkpoint, and error‑handling scaffolding while letting data remain where it lives. Preeti shares real-world patterns for replacing DAG-first orchestration, integrating application and data teams through signals and Nexus for cross-boundary calls, and using Temporal to coordinate long-running, human-in-the-loop, and agentic AI workflows with full observability and auditability. Shee also discusses heuristics for choosing Temporal alongside (or instead of) traditional orchestrators, managing scale without moving large datasets, and lessons from running durable execution as a cloud service. 

Announcements  Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to dataengineeringpodcast.com/bruin today to get started. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Preeti Somal about how to incorporate durable execution and state management into AI application architectures Interview   IntroductionHow did you get involved in the area of data management?Can you describe what durable execution is and how it impacts system architecture?With the strong focus on state maintenance and high reliability, what are some of the most impactful ways that data teams are incorporating tools like Temporal into their work?One of the core primitives in Temporal is a "workflow". How does that compare to similar primitives in common data orchestration systems such as Airflow, Dagster, Prefect, etc.?  What are the heuristics that you recommend when deciding which tool to use for a given task, particularly in data/pipeline oriented projects? Even if a team is using a more data-focused orchestration engine, what are some of the ways that Temporal can be applied to handle the processing logic of the actual data?AI applications are also very dependent on reliable data to be effective in production contexts. What are some of the design patterns where durable execution can be integrated into RAG/agent applications?What are some of the conceptual hurdles that teams experience when they are starting to adopt Temporal or other durable execution frameworks?What are the most interesting, innovative, or unexpected ways that you have seen Temporal/durable execution used for data/AI services?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Temporal?When is Temporal/durable execution the wrong choice?What do you have planned for the future of Temporal for data and AI systems? Contact Info   LinkedIn Parting Question   From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements   Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story. Links   TemporalDurable ExecutionFlinkMachine Learning EpochSpark StreamingAirflowDirected Acyclic Graph (DAG)Temporal NexusTensorZeroAI Engineering Podcast Episode The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA  

Data scientists have the skills to model complex systems, work with messy data, and uncover hidden patterns. Quant scientists do all of that, but with the added thrill (and pressure) of putting real money on the line. In this episode, we sit down with Jason Strimpel, Founder of PyQuant News and Co-founder of Quant Science, to explore why data scientists are uniquely positioned to excel in algorithmic trading. Whether you're a data scientist curious about finance, or simply interested in seeing your models have a more personal impact, this show offers a fresh perspective on how your skills can translate into the world of algorithmic trading. What You'll Learn: How your Python, stats, and modeling skills transfer directly into the markets The mindset shifts required Why reproducibility, auditability, and backtesting discipline are the data scientist's secret weapon Common pitfalls when transitioning into quant roles, and how to avoid them The tools and workflows Jason recommends to get started fast   🤝 Follow Jason on LinkedIn! Subscribe to PyQuant News   Register for free to be part of the next live session: https://bit.ly/3XB3A8b   Follow us on Socials: LinkedIn YouTube Instagram (Mavens of Data) Instagram (Maven Analytics) TikTok Facebook Medium X/Twitter

Summary In this episode of the Data Engineering Podcast Ariel Pohoryles, head of product marketing for Boomi's data management offerings, talks about a recent survey of 300 data leaders on how organizations are investing in data to scale AI. He shares a paradox uncovered in the research: while 77% of leaders trust the data feeding their AI systems, only 50% trust their organization's data overall. Ariel explains why truly productionizing AI demands broader, continuously refreshed data with stronger automation and governance, and highlights the challenges posed by unstructured data and vector stores. The conversation covers the need to shift from manual reviews to automated pipelines, the resurgence of metadata and master data management, and the importance of guardrails, traceability, and agent governance. Ariel also predicts a growing convergence between data teams and application integration teams and advises leaders to focus on high-value use cases, aggressive pipeline automation, and cataloging and governing the coming sprawl of AI agents, all while using AI to accelerate data engineering itself.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to dataengineeringpodcast.com/bruin today to get started. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Ariel Pohoryles about data management investments that organizations are making to enable them to scale AI implementationsInterview IntroductionHow did you get involved in the area of data management?Can you start by describing the motivation and scope of your recent survey on data management investments for AI across your respondents?What are the key takeaways that were most significant to you?The survey reveals a fascinating paradox: 77% of leaders trust the data used by their AI systems, yet only half trust their organization's overall data quality. For our data engineering audience, what does this suggest about how companies are currently sourcing data for AI? Does it imply they are using narrow, manually-curated "golden datasets," and what are the technical challenges and risks of that approach as they try to scale?The report highlights a heavy reliance on manual data quality processes, with one expert noting companies feel it's "not reliable to fully automate validation" for external or customer data. At the same time, maturity in "Automated tools for data integration and cleansing" is low, at only 42%. What specific technical hurdles or organizational inertia are preventing teams from adopting more automation in their data quality and integration pipelines?There was a significant point made that with generative AI, "biases can scale much faster," making automated governance essential. From a data engineering perspective, how does the data management strategy need to evolve to support generative AI versus traditional ML models? What new types of data quality checks, lineage tracking, or monitoring for feedback loops are required when the model itself is generating new content based on its own outputs?The report champions a "centralized data management platform" as the "connective tissue" for reliable AI. How do you see the scale and data maturity impacting the realities of that effort?How do architectural patterns in the shape of cloud warehouses, lakehouses, data mesh, data products, etc. factor into that need for centralized/unified platforms?A surprising finding was that a third of respondents have not fully grasped the risk of significant inaccuracies in their AI models if they fail to prioritize data management. In your experience, what are the biggest blind spots for data and analytics leaders?Looking at the maturity charts, companies rate themselves highly on "Developing a data management strategy" (65%) but lag significantly in areas like "Automated tools for data integration and cleansing" (42%) and "Conducting bias-detection audits" (24%). If you were advising a data engineering team lead based on these findings, what would you tell them to prioritize in the next 6-12 months to bridge the gap between strategy and a truly scalable, trustworthy data foundation for AI?The report states that 83% of companies expect to integrate more data sources for their AI in the next year. For a data engineer on the ground, what is the most important capability they need to build into their platform to handle this influx?What are the most interesting, innovative, or unexpected ways that you have seen teams addressing the new and accelerated data needs for AI applications?What are some of the noteworthy trends or predictions that you have for the near-term future of the impact that AI is having or will have on data teams and systems?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links BoomiData ManagementIntegration & Automation DemoAgentstudioData Connector Agent WebinarSurvey ResultsData GovernanceShadow ITPodcast EpisodeThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Traditional subgraph isomorphism algorithms like VF2 rely on sequential tree-search that can't leverage parallel computing. This talk introduces Δ-Motif, a data-centric approach that transforms graph matching into data operations using Python's data science stack. Δ-Motif decomposes graphs into small "motifs" to reconstruct matches. By representing graphs as tabular data with RAPIDS cuDF and Pandas, we achieve 10-595X speedups over VF2 without custom GPU kernels. I'll demonstrate practical applications from social networks to quantum computing, and show when GPU acceleration provides the biggest benefits for graph analysis problems. Perfect for data scientists working with network analysis, recommendation systems, or pattern matching at scale

The problem of address matching arrives when the address of one physical place is written in two or more different ways. This situation is very common in companies that receive records of customers from different sources. The differences can be classified as syntactic and semantic. In the first type, the meaning is the same but the way they are written is different. For example, one can find "Street" vs "St". In the second type, the meaning is not exactly the same. For example, one can find "Road" instead of "Street". To solve this problem and match addresses, we have a couple of approaches. The first and simple is by using similarity metrics. The second uses natural language and transformers. This is a hands-on talk and is intended for data process analyst. We are going to go through these solutions implemented in a Jupyter notebook using Python.

Accelerating Python using the GPU is much easier than you might think. We will explore the powerful CUDA-enabled Python ecosystem in this tutorial through hands-on examples using some of the most popular accelerated scientific computing libraries.

Looking to contribute to open source, but wasn’t sure where to start? Want to level up your skills in debugging, programming, collaboration and more? Curious about how to fix a bug or add a feature you’re missing in your favorite software project? Come to our special newcomer sprint to learn how and try it for yourself! Newcomers to Python or open source are welcome and encouraged, as well as attendees with open source experience to help guide them!