talk-data.com talk-data.com

Topic

Rust

programming_language

104

tagged

Activity Trend

11 peak/qtr
2020-Q1 2026-Q1

Activities

104 activities · Newest first

Pandas is the de-facto standard for data manipulation in python, which I personally love for its flexible syntax and interoperability. But Pandas has well-known drawbacks such as memory in-efficiency, inconsistent missing data handling and lacking multicore-support. Multiple open-source projects aim to solve those issues, the most interesting is Polars.

Polars uses Rust and Apache Arrow to win in all kinds of performance-benchmarks and evolves fast. But is it already stable enough to migrate an existing Pandas' codebase? And does it meet the high-expectations on query language flexibility of long-time Pandas-lovers?

In this talk, I will explain, how Polars can be that fast, and present my insights on where Polars shines and in which scenarios I stay with pandas (at least for now!)

In this episode, Conor and Bryce talk to Barry Revzin about Rust, Val, Carbon, ChatGPT, error propagation in C++26 and more! Link to Episode 114 on Website

Twitter ADSP: The PodcastConor HoekstraBryce Adelstein LelbachAbout the Guest Barry Revzin is a senior C++ developer at Jump Trading in Chicago. After programming for many years, he got really into the nuances and intricacies of C++ by being unreasonably active on StackOverflow (where he is the top contributor in C++14, C++17, and C++20). He is also a C++ committee member, having written dozens of papers for C++20 and C++23.

Show Notes

Date Recorded: 2023-01-15 Date Released: 2023-01-27 ADSP Episode 113: The C++26 Pipeline Operator with Barry Revzin!P2011 A pipeline-rewrite operatorP2672 Exploring the Design Space for a Pipeline OperatorRust Programming LanguageRust TraitsSwift ProtocolsRust std::iter::IteratorRust The Cargo BookVal Programming LanguageCarbon Programming LanguageCarbon Operator PrecendenceEpochs: a backward-compatible language evolution mechanismADSP Episode 97: C++ vs Carbon vs Circle vs CppFront with Sean BaxterCircle CompilerChatGPT: Optimizing Language Models for DialogueGPTDuckOxide and Friends PodcastBryan Cantrill on TwitterBryan Cantrill: The Summer of RUSTOn The Metal PodcastOxide and Friends: NeXT, Objective-C, and contrasting historiesElixir DocsRust DocsP2561 An error propagation operatorSy Brand’s tl::expectedP0798R4 - Monadic operations for std::optionalC++23 std::expectedChicago C++ Meetup: Defining Range Formatting

In this episode, Conor and Bryce talk to Barry Revzin about the pipeline operator |>, C++ Ranges and more! Link to Episode 113 on Website

Twitter ADSP: The PodcastConor HoekstraBryce Adelstein LelbachAbout the Guest Barry Revzin is a senior C++ developer at Jump Trading in Chicago, a research and technology driven trading firm. After programming for many years, he got really into the nuances and intricacies of C++ by being unreasonably active on StackOverflow (where he is the top contributor in C++14, C++17, and C++20). A lot of his C++ knowledge comes from just answering questions that he doesn’t know the answers to, especially when he answers them incorrectly at first.

His C++ involvement escalated when he started attending standards committee meetings in 2016, having written dozens of papers for C++20 and now C++23. You might know him from such features as , pack expansion in lambda init-capture, explicit(bool), conditionally trivial special member functions and, recently approved for C++23, deducing this.

Outside of the C++ world, Barry is an obsessive swimming fan. He writes fun data articles for SwimSwam and also does analysis for the DC Trident, a professional swim team featuring Olympic Gold Medalists Zach Apple and Anna Hopkin, managed by two-time Olympian Kaitlin Sandeno.

Show Notes

Date Recorded: 2023-01-15 Date Released: 2023-01-20 Iterators and Ranges: Comparing C++ to D to Rust - Barry Revzin - [CppNow 2021]Keynote: Iterators and Ranges: Comparing C++ to D, Rust, and Others - Barry Revzin - CPPP 2021Kona Photo of Barry and Michael SwimmingCppCast Episode 237: Packs and PipelinesP2011 A pipeline-rewrite operatorP2672 Exploring the Design Space for a Pipeline OperatorC++20/23 Ranges LibaryRanges-v3 LibraryBoost.Lambda LibraryBoost.Lambda2 LibraryTC39 Pipe Operator (|>) for JavaScriptIntro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

In this episode, Conor and Bryce conclude their 2022 retro and talk about running! Link to Episode 112 on Website

Twitter ADSP: The PodcastConor HoekstraBryce Adelstein LelbachShow Notes Date Recorded: 2023-01-04 Date Released: 2023-01-13 NVIDIA/stdexec - Senders - A Standard Model for Asynchronous Execution in C++Rust Programming LanguageLanguishTalk Python To MeLightning Talk: Runner’s Guide to C++ Conferences - Timur Doumler - CppNorth 2022Optic FlowWorld Marathon Majorscode::dive ConferenceLamdaDays ConferenceStrange Loop ConferenceIntro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

In this episode, Conor and Bryce finish their conversation with Jane Losare-Lusby about the Rust Programming Language. Link to Episode 108 on Website

Twitter ADSP: The PodcastConor HoekstraBryce Adelstein LelbachAbout the Guest: Jane Losare-Lusby is currently on both the Rust Library Team and the Rust Library API Team. She is also the Error Handling Project Group Lead, the Rust Foundation Project Director of Collaboration, and a Principal Rust Open Source Engineer at Futurewei Technologies.

Show Notes

Date Recorded: 2022-11-02 Date Released: 2022-12-16 https://cheats.rs/ADSP Episode 106: Jane Losare-Lusby on Rust!ADSP Episode 107: Jane Losare-Lusby on Rust! (Part 2)Rust Evangelism Strike ForceRust Evangelism StrikeforceRust Governance TeamsA List of Companies that Use Array Languages (J, K, APL, q)A List of companies that use HaskellHoogleRoogleKotlin Programming LanguageCarbon Language: An experimental successor to C++ - Chandler Carruth - CppNorth 2022Carbon GithubAwesome Rust MentorsClojure BridgeIntro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

In this episode, Conor and Bryce continue their conversation with Jane Losare-Lusby about the Rust Programming Language. Link to Episode 107 on Website

Twitter ADSP: The PodcastConor HoekstraBryce Adelstein LelbachAbout the Guest: Jane Losare-Lusby is currently on both the Rust Library Team and the Rust Library API Team. She is also the Error Handling Project Group Lead, the Rust Foundation Project Director of Collaboration, and a Principal Rust Open Source Engineer at Futurewei Technologies.

Show Notes

Date Recorded: 2022-11-02 Date Released: 2022-12-09 https://cheats.rs/ADSP Episode 106: Jane Losare-Lusby on Rust!Rust std::slice::iterRust std:IntoIterator::into_iterC++20 ConceptsRust TraitsC++ Pattern Matching ProposalC++ Pattern matching using is and asO3DCON 2022: Keynote C++ Horizons Bryce Adelstein Lelbachwww.crates.ioADSO Episode 92: Special Guest Kate Gregory!C++Club Episode 155: WG21 October mailing, Carbon, Cpp2, SafetyIntro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

In this episode, Conor and Bryce talk to Jane Losare-Lusby about the Rust Programming Language. Link to Episode 106 on Website

Twitter ADSP: The PodcastConor HoekstraBryce Adelstein LelbachAbout the Guest: Jane Losare-Lusby is currently on both the Rust Library Team and the Rust Library API Team. She is also the Error Handling Project Group Lead, the Rust Foundation Project Director of Collaboration, and a Principal Rust Open Source Engineer at Futurewei Technologies.

Show Notes

Date Recorded: 2022-11-02 Date Released: 2022-12-02 https://cheats.rs/Rustacean Station: Error Handling in Rust with Jane Losare-LusbyAre We Podcast Yet with Jane Losare-LusbyADSP poll about becoming a Rust podcastConor’s Tweet about /cpp vs /rustADSP Episode 101: C++ Developers Try Rust!C++23 std::views::zipRust std::iter::Iterator::zipRust ClippyRust TraitsC++20 ConceptsEsteban K ber on TwitterRust unsafeRust miriThis Week in RustRust AnalyzerRust std::iter::Iterator::flat_mapRust std::iter::Iterator::enumerateIntro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

In this episode, Conor continues his conversation with Jason Turner! Link to Episode 104 on Website

Twitter ADSP: The PodcastConor HoekstraBryce Adelstein LelbachAbout the Guest: Jason is host of the YouTube channel C++ Weekly, co-host emeritus of the podcast CppCast, author of C++ Best Practices, and author of the first casual puzzle books designed to teach C++ fundamentals while having fun! A list of Jason’s content: C++ Weekly YouTube ChannelThe [Fill in the Blank] Programmer YouTube ChannelC++ BooksTalk PlaylistShow Notes Date Recorded: 2022-10-26 Date Released: 2022-11-18 Final Episode of CppCastA talk with Jason Turner: the history of CppCast, and why it was shut downThe [Fill in the Blank] Programmer YouTube ChannelC++ autoMaking C++ Fun, Safe, and Accessible – Jason Turner - C++ on Sea 2022C++ Weekly - Ep 347 - This PlayStation Jailbreak NEVER SHOULD HAVE HAPPENEDC++ std::unordered_map::operator=Python defaultdictC++Now 2019: Peter Sommerlad “How I learned to Stop Worrying and Love the C++ Type System”C++ explicit specifierHoogle Haskell Function Search EngineRoogle Rust Function Search EngineCLion Code CompletionDenver C++ MeetupIntro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

In this episode, Bryce and Conor live code some Rust and talk about scans! Link to Episode 102 on Website

Twitter ADSP: The PodcastConor HoekstraBryce Adelstein LelbachShow Notes Date Recorded: 2022-10-27 Date Released: 2022-11-04 Rust Programming LanguageC++23 std::mdspanC++98 std::partial_sumRust scanRust OptionC++17 std::optionalSwift optionalHaskell MaybeBQN UnderGitHub scan line of code in rust-txC++17 std::inclusive_scanC++17 std::exclusive_scanHaskell scanlHaskell scanl1Rust filterIntro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

In this episode, Bryce and Conor live code some Rust! Link to Episode 101 on Website Twitter ADSP: The PodcastConor HoekstraBryce Adelstein LelbachShow Notes Date Recorded: 2022-10-27 Date Released: 2022-10-28 CityStrides.comRust Podcast Twitter PollRust Programming LanguageC++ std::vector::frontRust std::iter::IntoIterator::into_iterRust std::option::Option::unwrapRust std::iter::Iterator::nextAccessing First Element of Vec in Rust (Godbolt)JT on TwitterMara Bos on TwitterJakt Programming LanguageRefactor of Rust vec[0] CommitIntro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

Sound Data Engineering in Rust—From Bits to DataFrames

Spark applications often need to query external data sources such as file-based data sources or relational data sources. In order to do this, Spark provides Data Source APIs to access structured data through Spark SQL.

Data Source APIs have optimization rules such as filter push down and column pruning to reduce the amount of data that needs to be processed to improve query performance. As part of our ongoing project to provide generic Data Source V2 push down APIs, we have introduced partial aggregate push down, which significantly speeds up spark jobs by dramatically reducing the amount of data transferred between data sources and Spark. We have implemented aggregate push down in both JDBC and parquet.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Streaming Data into Delta Lake with Rust and Kafka

Scribd's data architecture was originally batch-oriented, but in the last couple years, we introduced streaming data ingestion to provide near-real-time ad hoc query capability, mitigate the need for more batch processing tasks, and set the foundation for building real-time data applications.

Kafka and Delta Lake are the two key components of our streaming ingestion pipeline. Various applications and services write messages to Kafka as events are happening. We were tasked with getting these messages into Delta Lake quickly and efficiently.

Our first solution was to deploy Spark Structured Streaming jobs. This got us off the ground quickly, but had some downsides.

Since Delta Lake and the Delta transaction protocol are open source, we kicked off a project to implement our own Rust ingestion daemon. We were confident we could deliver a Rust implementation since our ingestion jobs are append only. Rust offers high performance with a focus on code safety and modern syntax.

In this talk I will describe Scribd's unique approach to ingesting messages from Kafka topics into Delta Lake tables. I will describe the architecture, deployment model, and performance of our solution, which leverages the kafka-delta-ingest Rust daemon and the delta-rs crate hosted in auto-scaling ECS services. I will discuss foundational design aspects for achieving data integrity such as distributed locking with DynamoDb to overcome S3's lack of "PutIfAbsent" semantics, and avoiding duplicates or data loss when multiple concurrent tasks are handling the same stream. I'll highlight the reliability and performance characteristics we've observed so far. I'll also describe the Terraform deployment model we use to deliver our 70-and-growing production ingestion streams into AWS.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Ensuring Correct Distributed Writes to Delta Lake in Rust with Formal Verification

Rust guarantees zero memory access bug once a program compiles. However, one can still introduce logical bugs in the implementation.

In this talk, I will first give a high level overview on common formal verification methods used in distributed system designs and implementations. Then I will talk about our experiences with using TLA+ and Stateright to formally model delta-rs' multi-writer S3 backend implementation. The end result of combining both Rust and formal verification is we end up with an efficient native Delta Lake implementation that is both memory safe and logical bug free!

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

DataFusion and Arrow: Supercharge Your Data Analytical Tool with a Rusty Query Engine

Learn how Rust, the Apache Arrow project, and the Data Fusion Query Engine are increasingly being used to accelerate the creation of modern data stacks.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Delta Lake 2.0 Overview

After three years of hard work by the Delta community, we are proud to announce the release of Delta Lake 2.0. Completing the work to open-source all of Delta Lake while tens of thousands of organizations were running in production was no small feat and we have the ever-expanding Delta community to thank! Join this session to learn about how the wider Delta community collaborated together to bring these features and integrations together.

Join this session to learn about how the wider Delta community collaborated together to bring these features and integrations together. This includes the Integrations with Apache Spark™, Apache Flink, Apache Pulsar, Presto, Trino, and more.

Features such as OPTIMIZE ZORDER, data skipping using column stats, S3 multi-cluster writes, Change Data Feed, and more.

Language APIs including Rust, Python, Ruby, GoLang, Scala, and Java.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Polars: Blazingly Fast DataFrames in Rust and Python

This talk will introduce Polars a blazingly fast DataFrame library written in Rust on top of Apache Arrow. Its a DataFrame library that brings exploratory data analysis closer to the lessons learned in database research.

CPU's today's come with many cores and with their superscalar designs and SIMD registers allow for even more parallelism. Polars is written from the ground up to fully utilize the CPU's of this generation.

Besides blazingly fast algorithms, cache efficient memory layout and multi-threading, it consist of a lazy query engine, allowing Polars to do several optimizations that may improve query time and memory usage.

Read more:

https://github.com/pola-rs/polars https://www.ritchievink.com/blog/2021/02/28/i-wrote-one-of-the-fastest-dataframe-libraries/

Join the talk to learn more.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

In this episode, Bryce takes the programming language quiz! Twitter ADSP: The PodcastConor HoekstraBryce Adelstein LelbachShow Notes Date Recorded: 2022-03-19 Date Released: 2022-03-25 Guadalupe TrailBublyOOKLAEPOXYTony Van Eerd’s TweetmemtestProgramming Language Dependency GraphPython’s graphvizDOTThe Programming Language PodcastDavid Koontz on TwitterSmalltalkPharo-FunctionalErik Meijer on TwitterC# 3.0 LINQC++17 std::optionalC++23 std::expectedRust OptionRust ErrorOCamlJared Roesch on TwitterHaskell + C++ = Rust TweetSimula67AdaStepanovPapers.comSwiftObjective-CErlangElixirClojureScalaDavid TurnerKRCSASLMirandaAWKJavaScriptSelfIoSeven Languages in Seven Weeks: Ch 5 - Scala MeetupIntro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

Язык Rust для высоконагруженного сетевого сервиса -  Александр Сербул

Big Data Days Онсайт и онлайн 22-25 ноября, 2022 Узнать больше о конференции: https://bit.ly/30YNt99 Присоединяйтесь к нашей следующей конференции Big Data Days 22-25 ноября в 2022 г. Здесь вы сможете получить знания от мировых экспертов, выступающих с техническими докладами и практическими мастер-классами в области Big Data, High Load, Data Science, Machine Learning и AI. В этом году конференция будет проходить в гибридной форме, это позволит вам послушать доклады и посетить мастер-классы онсайт и онлайн.

In this episode, Bryce and Conor talk about each of their favorite data structures. Date Recorded: 2020-11-28 Date Released: 2020-12-04 C++ | Containers OCaml | Containers Java | Collections Python | Collections Kotlin | Collections Scala | Collections Rust | Collections Go | Collections Haskell | Collections TS | Collections Ruby | Collections JS | Collections F# | Collection Types Racket | Data Structures Clojure | Data Structures What do you mean by “cache friendly”? - Björn Fahller - code::dive 2019Alan J. Perlis’ Epigrams on Programmingstd::vectorP1072 basic_string::resize_default_initstd::arraystd::unique_ptr (Array Specialization)P0316 allocate_unique and allocator_deletethurst::allocate_uniqueIntro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

Summary Transactional databases used in applications are optimized for fast reads and writes with relatively simple queries on a small number of records. Data warehouses are optimized for batched writes and complex analytical queries. Between those use cases there are varying levels of support for fast reads on quickly changing data. To address that need more completely the team at Materialize has created an engine that allows for building queryable views of your data as it is continually updated from the stream of changes being generated by your applications. In this episode Frank McSherry, chief scientist of Materialize, explains why it was created, what use cases it enables, and how it works to provide fast queries on continually updated data.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Corinium Global Intelligence, ODSC, and Data Council. Upcoming events include the Software Architecture Conference in NYC, Strata Data in San Jose, and PyCon US in Pittsburgh. Go to dataengineeringpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today. Your host is Tobias Macey and today I’m interviewing Frank McSherry about Materialize, an engine for maintaining materialized views on incrementally updated data from change data captures

Interview

Introduction How did you get involved in the area of data management? Can you start by describing what Materialize is and the problems that you are aiming to solve with it?

What was your motivation for creating it?

What use cases does Materialize enable?

What are some of the existing tools or systems that you have seen employed to address those needs which can be replaced by Materialize? How does it fit into the broader ecosystem of data tools and platforms?

What are some of the use cases that Materialize is uniquely able to support? How is Materialize architected and how has the design evolved since you first began working on it? Materialize is based on your timely-dataflow project, which itself is based on the work you did on Naiad. What was your reasoning for using Rust as the implementation target and what benefits has it provided?

What are some of the components or primitives that were missing in the Rust ecosystem as compared to what is available in Java or C/C++, which have been the dominant languages for distributed data systems?

In the list of features, you highlight full support for ANSI SQL 92. What were some of the edge cases that you faced in complying with that standard given the distributed execution context for Materialize?

A majority of SQL oriented platforms define custom extensions or built-in functions that are specific to their problem domain. What are some of the existing or