talk-data.com talk-data.com

Topic

Rust

programming_language

104

tagged

Activity Trend

11 peak/qtr
2020-Q1 2026-Q1

Activities

104 activities · Newest first

Notebooks struggle when data vastly exceeds RAM: pagination hacks, fragile sampling, and surprise OOMs. Buckaroo is a modern data table for notebooks built to quickly make sense of dataframes by providing search, summary stats, and scrolling with every view. This talk reviews how Buckaroo uses out‑of‑core design patterns, viewport streaming, lazy Polars pipelines, batched background stats, and a series cache to make interactive exploration fast and reliable on commodity laptops. We’ll walk through the lifecycle of opening a large Parquet/CSV file: detecting formats, avoiding full materialization, fetching only requested row/column ranges, and throttling UI updates for smoothness. We’ll show how column‑level hashing (via a lightweight Rust extension) enables stable, cache keys so warm loads render the first viewport and stats in under a second. CSV specifics and a practical CSV→Parquet streaming path round out the approach. The ideas are tool‑agnostic and reproducible with the open‑source PyData stack; Buckaroo serves as a concrete reference implementation. You’ll leave with guidelines and snippets to bring these patterns to your own workflows.

Advancing Windows device security through Surface innovation

Discover how Surface and Microsoft boost Windows device security with memory-safe Rust firmware and drivers, reducing vulnerabilities and improving reliability. Learn about the open-source windows-drivers-rs project and how IT and engineering teams can help build safer, resilient devices through collaborative innovation and Microsoft’s commitment to secure, inclusive technology.

PySpark’s Arrow-based Python UDFs open the door to dramatically faster data processing by avoiding expensive serialization overhead. At the same time, Polars, a high-performance DataFrame library built on Rust, offers zero-copy interoperability with Apache Arrow. This talk shows how combining these two technologies unlocks new performance gains: writing Arrow UDFs with Polars in PySpark can deliver performance speedups compared to Python UDFs. Attendees will learn how Arrow UDFs work in PySpark, how it can be used with other data processing libraries, and how to apply this approach to real-world Spark pipelines for faster, more efficient workloads.

SQLite is the most deployed database in the world, and a crucial player in the small data movement. It powers everything we touch, from small wearables to server-side applications. But as the world changes, is it ready for the challenges that modern infrastructure demands? We believe the answer is "no": from its lack of support for concurrent writes, to its inability to work with complex data like vector embeddings, SQLite needs a fundamental overhaul. In this talk we will explore why a complete rewrite is the most practical path forward to bring SQLite into the modern era. We'll dive deep into how Turso, our full rewrite of SQLite in Rust, tackles these challenges head-on—delivering true concurrency, native vector support, and dramatic performance improvements. Expect concrete benchmarks, implementation details, and a clear roadmap for SQLite's future.

Brought to You By: •⁠ Statsig ⁠ — ⁠ The unified platform for flags, analytics, experiments, and more. Companies like Graphite, Notion, and Brex rely on Statsig to measure the impact of the pace they ship. Get a 30-day enterprise trial here. •⁠ Linear – The system for modern product development. Linear is a heavy user of Swift: they just redesigned their native iOS app using their own take on Apple’s Liquid Glass design language. The new app is about speed and performance – just like Linear is. Check it out. — Chris Lattner is one of the most influential engineers of the past two decades. He created the LLVM compiler infrastructure and the Swift programming language – and Swift opened iOS development to a broader group of engineers. With Mojo, he’s now aiming to do the same for AI, by lowering the barrier to programming AI applications. I sat down with Chris in San Francisco, to talk language design, lessons on designing Swift and Mojo, and – of course! – compilers. It’s hard to find someone who is as enthusiastic and knowledgeable about compilers as Chris is! We also discussed why experts often resist change even when current tools slow them down, what he learned about AI and hardware from his time across both large and small engineering teams, and why compiler engineering remains one of the best ways to understand how software really works. — Timestamps (00:00) Intro (02:35) Compilers in the early 2000s (04:48) Why Chris built LLVM (08:24) GCC vs. LLVM (09:47) LLVM at Apple  (19:25) How Chris got support to go open source at Apple (20:28) The story of Swift  (24:32) The process for designing a language  (31:00) Learnings from launching Swift  (35:48) Swift Playgrounds: making coding accessible (40:23) What Swift solved and the technical debt it created (47:28) AI learnings from Google and Tesla  (51:23) SiFive: learning about hardware engineering (52:24) Mojo’s origin story (57:15) Modular’s bet on a two-level stack (1:01:49) Compiler shortcomings (1:09:11) Getting started with Mojo  (1:15:44) How big is Modular, as a company? (1:19:00) AI coding tools the Modular team uses  (1:22:59) What kind of software engineers Modular hires  (1:25:22) A programming language for LLMs? No thanks (1:29:06) Why you should study and understand compilers — The Pragmatic Engineer deepdives relevant for this episode: •⁠ AI Engineering in the real world • The AI Engineering stack • Uber's crazy YOLO app rewrite, from the front seat • Python, Go, Rust, TypeScript and AI with Armin Ronacher • Microsoft’s developer tools roots — Production and marketing by ⁠⁠⁠⁠⁠⁠⁠⁠https://penname.co/⁠⁠⁠⁠⁠⁠⁠⁠. For inquiries about sponsoring the podcast, email [email protected].

Get full access to The Pragmatic Engineer at newsletter.pragmaticengineer.com/subscribe

Découvrez comment la création de divers projets parallèles a révélé le besoin d'un outil plus performant et sécurisé pour interagir avec Elasticsearch. Explorez avec nous le processus qui nous a amenés à choisir Rust pour son potentiel en termes de performance et de sécurité. Ce talk présente un POC (Proof of Concept) illustrant comment ces projets parallèles ont inspiré et façonné sa création. Nous examinerons un écosystème riche, les défis rencontrés et les solutions innovantes mises en œuvre pour aboutir à un outil robuste.

Brought to You By: •⁠ Statsig ⁠ — ⁠ The unified platform for flags, analytics, experiments, and more. Most teams end up in this situation: ship a feature to 10% of users, wait a week, check three different tools, try to correlate the data, and you’re still unsure if it worked. The problem is that each tool has its own user identification and segmentation logic. Statsig solved this problem by building everything within a unified platform. Check out Statsig. •⁠ Linear – The system for modern product development. In the episode, Armin talks about how he uses an army of “AI interns” at his startup. With Linear, you can easily do the same: Linear’s Cursor integration lets you add Cursor as an agent to your workspace. This agent then works alongside you and your team to make code changes or answer questions. You’ve got to try it out: give Linear a spin and see how it integrates with Cursor. — Armin Ronacher is the creator of the Flask framework for Python, was one of the first engineers hired at Sentry, and now the co-founder of a new startup. He has spent his career thinking deeply about how tools shape the way we build software. In this episode of The Pragmatic Engineer Podcast, he joins me to talk about how programming languages compare, why Rust may not be ideal for early-stage startups, and how AI tools are transforming the way engineers work. Armin shares his view on what continues to make certain languages worth learning, and how agentic coding is driving people to work more, sometimes to their own detriment.  We also discuss:  • Why the Python 2 to 3 migration was more challenging than expected • How Python, Go, Rust, and TypeScript stack up for different kinds of work  • How AI tools are changing the need for unified codebases • What Armin learned about error handling from his time at Sentry • And much more  Jump to interesting parts: • (06:53) How Python, Go, and Rust stack up and when to use each one • (30:08) Why Armin has changed his mind about AI tools • (50:32) How important are language choices from an error-handling perspective? — Timestamps (00:00) Intro (01:34) Why the Python 2 to 3 migration created so many challenges (06:53) How Python, Go, and Rust stack up and when to use each one (08:35) The friction points that make Rust a bad fit for startups (12:28) How Armin thinks about choosing a language for building a startup (22:33) How AI is impacting the need for unified code bases (24:19) The use cases where AI coding tools excel  (30:08) Why Armin has changed his mind about AI tools (38:04) Why different programming languages still matter but may not in an AI-driven future (42:13) Why agentic coding is driving people to work more and why that’s not always good (47:41) Armin’s error-handling takeaways from working at Sentry  (50:32) How important is language choice from an error-handling perspective (56:02) Why the current SDLC still doesn’t prioritize error handling  (1:04:18) The challenges language designers face  (1:05:40) What Armin learned from working in startups and who thrives in that environment (1:11:39) Rapid fire round — The Pragmatic Engineer deepdives relevant for this episode:

— Production and marketing by ⁠⁠⁠⁠⁠⁠⁠⁠https://penname.co/⁠⁠⁠⁠⁠⁠⁠⁠. For inquiries about sponsoring the podcast, email [email protected].

Get full access to The Pragmatic Engineer at newsletter.pragmaticengineer.com/subscribe

AI copilots are reshaping how we interact with software—but what makes them really work under the hood? In this talk, we unpack the design patterns and architectural choices that go into building a reliable copilot, then show how Rust and Rig make it practical to implement. From streaming responses to modular agents, we'll cover the key techniques while demystifying systems that enhance and power many developers' workflows.

Wingfoil is a blazingly fast, highly scalable stream processing framework designed for ultra-low-latency use cases such as electronic trading and real-time AI. Embracing stream-oriented programming makes it simple to receive, process, and distribute streaming data. This talk will explore where stream-oriented techniques are most effective and demonstrate how Wingfoil can be leveraged to build robust, high-performance systems.

Advanced Polars: Lazy Queries and Streaming Mode

Do you find yourself struggling with Pandas' limitations when handling massive datasets or real-time data streams?

Discover Polars, the lightning-fast DataFrame library built in Rust. This talk presents two advanced features of the next-generation dataframe library: lazy queries and streaming mode.

Lazy evaluation in Polars allows you to build complex data pipelines without the performance bottlenecks of eager execution. By deferring computation, Polars optimises your queries using techniques like predicate and projection pushdown, reducing unnecessary computations and memory overhead. This leads to significant performance improvements, particularly with datasets larger than your system’s physical memory.

Polars' LazyFrames form the foundation of the library’s streaming mode, enabling efficient streaming pipelines, real-time transformations, and seamless integration with various data sinks.

This session will explore use cases and technical implementations of both lazy queries and streaming mode. We’ll also include live-coding demonstrations to introduce the tool, showcase best practices, and highlight common pitfalls.

Attendees will walk away with practical knowledge of lazy queries and streaming mode, ready to apply these tools in their daily work as data engineers or data scientists.

I don't have a background in functional programming - and I never set out to write it. But somewhere between writing trait-based epidemiological pipelines, composing data transformations, and leaning hard on Result, enums, and pattern matching, I started hearing from others: 'That's pretty functional.' In this talk, I'll explore what it means to write functional-ish Rust as someone solving real-world scientific problems. I'll walk through the patterns I reach for - like chaining iterators, avoiding shared state, and embracing expressive types - and reflect on which functional programming ideas emerge naturally in Rust, even if you're not trying. I'll also share how designing for epidemiologists - most of whom are used to chaining functions in Python (like Pandas) or R - has pushed me toward creating ergonomic Rust APIs with Python and R bindings. These tools aim to feel familiar to scientists while leveraging Rust's power and safety under the hood. This is a talk for functional programmers curious about Rust, and for Rustaceans wondering if they've been functional all along. No formal theory required - just real code, real use cases, and a pragmatic perspective from someone building public health tools in Rust.

As organizations increasingly adopt data lake architectures, analytics databases face significant integration challenges beyond simple data ingestion. This talk explores the complex technical hurdles encountered when building robust connections between analytics engines and modern data lake formats.

We'll examine critical implementation challenges, including the absence of native library support for formats like Delta Lake, which necessitates expansion into new programming languages such as Rust to achieve optimal performance. The session explores the complexities of managing stateful systems, addressing caching inconsistencies, and reconciling state across distributed environments.

A key focus will be on integrating with external catalogs while maintaining data consistency and performance - a challenge that requires careful architectural decisions around metadata management and query optimization. We'll explore how these technical constraints impact system design and the trade-offs involved in different implementation approaches.

Attendees will gain a practical understanding of the engineering complexity behind seamless data lake integration and actionable approaches to common implementation obstacles.

In a world where AI agents, complex workflows, and accelerating data demands are reshaping every enterprise, the challenge isn’t just managing data, it’s creating trusted context that connects people, processes, and technology. 

Join Rebecca O’Kill, Chief Data & Analytics Officer at Axis Capital, for an Honest No-BS conversation about how her team is transforming governance from a compliance checkbox into a strategic enabler of business value. 

Together, we’ll unpack: 

• Minimal Valuable Governance (MVG): why the old ivory tower “govern everything” mindset fails, and how focusing on just enough governance creates immediate business impact. 

• The ACTIVE framework, a practical approach for governance built on: Alignment, Clarity, Trust, Iterative, Value, Enablement 

• How Axis Capital is embedding governance across the organization by uniting the “front office” (what and why) with the “back office” (how). 

• Why context and knowledge are critical for the next era of agentic AI and multi-agent workflows, and how Axis is preparing for it today. 

By the end, you’ll see how Axis Capital is turning governance into a competitive advantage and why this approach is essential for any organization looking to thrive in a world of AI-driven automation and connected workflows. 

In this episode, Conor and Bryce chat with Sean Parent about Rust and AI! Link to Episode 252 on WebsiteDiscuss this episode, leave a comment, or ask a question (on GitHub)Socials ADSP: The Podcast: TwitterConor Hoekstra: Twitter | BlueSky | MastodonBryce Adelstein Lelbach: TwitterAbout the Guest: Sean Parent is a senior principal scientist and software architect managing Adobe's Software Technology Lab. Sean first joined Adobe in 1993 working on Photoshop and is one of the creators of Photoshop Mobile, Lightroom Mobile, and Lightroom Web. In 2009 Sean spent a year at Google working on Chrome OS before returning to Adobe. From 1988 through 1993 Sean worked at Apple, where he was part of the system software team that developed the technologies allowing Apple’s successful transition to PowerPC. Show Notes Date Recorded: 2025-08-21 Date Released: 2025-09-19 C++ Under the SeaBetter codeAdobe ASL Adam & Eve ArchitectureAdobe Software Technology LabASL LibrariesRust Programming LanguageIntro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

The practice of data science in genomics and computational biology is fraught with friction. This is largely due to a tight coupling of bioinformatic tools to file input/output. While omic data is specialized and the storage formats for high-throughput sequencing and related data are often standardized, the adoption of emerging open standards not tied to bioinformatics can help better integrate bioinformatic workflows into the wider data science, visualization, and AI/ML ecosystems. Here, we present two bridge libraries as short vignettes for composable bioinformatics. First, we present Anywidget, an architecture and toolkit based on modern web standards for sharing interactive widgets across all Jupyter-compatible runtimes, including JupyterLab, Google Colab, VSCode, and more. Second, we present Oxbow, a Rust and Python-based adapter library that unifies access to common genomic data formats by efficiently transforming queries into Apache Arrow, a standard in-memory columnar representation for tabular data analytics. Together, we demonstrate the composition of these libraries to build a custom connected genomic analysis and visualization environments. We propose that components such as these, which leverage scientific domain-agnostic standards to unbundle specialized file manipulation, analytics, and web interactivity, can serve as reusable building blocks for composing flexible genomic data analysis and machine learning workflows as well as systems for exploratory data analysis and visualization.

Apache Airflow is a powerful workflow orchestrator, but as workloads grow, its Python-based components can become performance bottlenecks. This talk explores how Rust, with its speed, safety, and concurrency advantages, can enhance Airflow’s core components (e.g, scheduler, DAG processor, etc). We’ll dive into the motivations behind using Rust, architectural trade-offs, and the challenges of bridging the gap between Python and Rust. A proof-of-concept showcasing an Airflow scheduler rewritten in Rust will demonstrate the potential benefits of this approach.