talk-data.com talk-data.com

Topic

GitHub

version_control collaboration code_hosting

661

tagged

Activity Trend

79 peak/qtr
2020-Q1 2026-Q1

Activities

661 activities · Newest first

In this episode, Conor and Bryce chat with Tristan Brindle about his new library Flux and his predecessor library Flow. Link to Episode 126 on WebsiteDiscuss this episode, leave a comment, or ask a question (on GitHub)Twitter ADSP: The PodcastConor HoekstraBryce Adelstein LelbachAbout the Guest Tristan Brindle a freelance programmer and trainer based in London, mostly focussing on C++. He is a member of the UK national body (BSI) and ISO WG21. Occasionally I can be found at C++ conferences. He is also a director of C++ London Uni, a not-for-profit organisation offering free beginner programming classes in London and online. He has a few fun projects on GitHub that you can find out about here.

Show Notes

Date Recorded: 2023-04-05 Date Released: 2023-04-21 ADSP Episode 125: NanoRange with Tristan BrindleKeynote: Iterators and Ranges: Comparing C++ to D, Rust, and Others - Barry Revzin - CPPP 2021Rust IteratorsFlowFluxSwift SequencesEpisode 124: Vectorizing std::views::filterC++ std::find_ifC++17 std::reduceC++ std::accumulateCppCon 2016: Ben Deane “std::accumulate: Exploring an Algorithmic Empire”Intro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

Tired of having to handle asynchronous processes for neuroevolution? Do you want to leverage massive vectorization and high-throughput accelerators for evolution strategies (ES)? evosax allows you to leverage JAX, XLA compilation and auto-vectorization/parallelization to scale ES to your favorite accelerators. In this talk we will get to know the core API and how to solve distributed black-box optimization problems with evolution strategies.

In this talk, I’d be talking about Zarr, an open-source data format for storing chunked, compressed N-dimensional arrays. This talk presents a systematic approach to understanding and implementing Zarr by showing how it works, the need for using it, and a hands-on session at the end. Zarr is based on an open technical specification, making implementations across several languages possible. I’d mainly talk about Zarr’s Python implementation and show how it beautifully interoperates with the existing libraries in the PyData stack.

In modern software engineering, plugin systems are a ubiquitous way to extend and modify the behavior of applications and libraries. When software is written in a way that is plugin friendly, it encourages the use of modular organization where the contracts between the core software and the plugin have been well thought out. In this talk, we cover exactly how to define this contract and how you can start designing your software to be more plugin friendly.

Throughout the talk we will be creating our own plugin friendly application using the pluggy library to show these design principles in action. At the end of the talk, I also cover a real-life case study of how the package manager conda is currently making its 10 year old code more plugin friendly to illustrate how to retrofit an existing project.

We have recently open-sourced a pure-Python implementation of Cyclic Boosting, a family of general-purpose, supervised machine learning algorithms. Its predictions are fully explainable on individual sample level, and yet Cyclic Boosting can deliver highly accurate and robust models. For this, it requires little hyperparameter tuning and minimal data pre-processing (including support for missing information and categorical variables of high cardinality), making it an ideal off-the-shelf method for structured, heterogeneous data sets. Furthermore, it is computationally inexpensive and fast, allowing for rapid improvement iterations. The modeling process, especially the infamous but unavoidable feature engineering, is facilitated by automatic creation of an extensive set of visualizations for data dependencies and training results. In this presentation, we will provide an overview of the inner workings of Cyclic Boosting, along with a few sample use cases, and demonstrate the usage of the new Python library.

You can find Cyclic Boosting on GitHub: https://github.com/Blue-Yonder-OSS/cyclic-boosting

In the last years, Hyperparameter Optimization (HPO) became a fundamental step in the training of Machine Learning (ML) models and in the creation of automatic ML pipelines. Unfortunately, while HPO improves the predictive performance of the final model, it comes with a significant cost both in terms of computational resources and waiting time. This leads many practitioners to try to lower the cost of HPO by employing unreliable heuristics.

In this talk we will provide simple and practical algorithms for users that want to train models with almost-optimal predictive performance, while incurring in a significantly lower cost and waiting time. The presented algorithms are agnostic to the application and the model being trained so they can be useful in a wide range of scenarios.

We provide results from an extensive experimental activity on public benchmarks, including comparisons with well-known techniques like Bayesian Optimization (BO), ASHA, Successive Halving. We will describe in which scenarios the biggest gains are observed (up to 30x) and provide examples for how to use these algorithms in a real-world environment.

All the code used for this talk is available on (GitHub)[https://github.com/awslabs/syne-tune].

Snowflake as a data platform is the core data repository of many large organizations.
With the introduction of Snowflake's Snowpark for Python, Python developers can now collaborate and build on one platform with a secure Python sandbox, providing developers with dynamic scalability & elasticity as well as security and compliance.

In this talk I'll explain the core concepts of Snowpark for Python and how they can be used for large scale feature engineering and data science.

Machine Learning for High-Risk Applications

The past decade has witnessed the broad adoption of artificial intelligence and machine learning (AI/ML) technologies. However, a lack of oversight in their widespread implementation has resulted in some incidents and harmful outcomes that could have been avoided with proper risk management. Before we can realize AI/ML's true benefit, practitioners must understand how to mitigate its risks. This book describes approaches to responsible AI—a holistic framework for improving AI/ML technology, business processes, and cultural competencies that builds on best practices in risk management, cybersecurity, data privacy, and applied social science. Authors Patrick Hall, James Curtis, and Parul Pandey created this guide for data scientists who want to improve real-world AI/ML system outcomes for organizations, consumers, and the public. Learn technical approaches for responsible AI across explainability, model validation and debugging, bias management, data privacy, and ML security Learn how to create a successful and impactful AI risk management practice Get a basic guide to existing standards, laws, and assessments for adopting AI technologies, including the new NIST AI Risk Management Framework Engage with interactive resources on GitHub and Colab

In this episode, Conor and Bryce chat with Tristan Brindle about collection oriented programming and NanoRange. Link to Episode 125 on WebsiteDiscuss this episode, leave a comment, or ask a question (on GitHub)Twitter ADSP: The PodcastConor HoekstraBryce Adelstein LelbachAbout the Guest Tristan Brindle a freelance programmer and trainer based in London, mostly focussing on C++. He is a member of the UK national body (BSI) and ISO WG21. Occasionally I can be found at C++ conferences. He is also a director of C++ London Uni, a not-for-profit organisation offering free beginner programming classes in London and online. He has a few fun projects on GitHub that you can find out about here.

Show Notes

Date Recorded: 2023-04-05 Date Released: 2023-04-14 C++ On Sea ConferenceACCU Conferencecpp.chat PodcastCppCast PodcastPhil Nash on TwitterCppCast Episode on C++ Uni (with Tristan Brindle)C++ London MeetupFluxFlowD RangesC++ RangesRust IteratorsCollection Oriented ProgrammingFunctional ProgrammingSmalltalkJava 8 StreamsSETLThrust LibrarycuCollectionsSwift SequencesSwift CollectionsRanges TSRange-v3 LibraryNanoRangeConquering C++20 Ranges - Tristan Brindle - CppCon 2021CppCon 2015: Eric Niebler “Ranges for the Standard Library”What a View! Building Your Own (Lazy) Range Adaptors (part 1 of 2) - Chris Di Bella - CppCon 2019What a View! Building Your Own (Lazy) Range Adaptors (part 2 of 2) - Chris Di Bella - CppCon 2019cmcstl2Intro Song Info Miss You by Sarah Jansen

We talked about:

Aaisha’s background How homeschooling affects self-study Deciding on what to learn about Establishing whether a resource is good How Aaisha focuses on learning Deciding on what kind of project to build Find research materials Aaisha’s experience with the Data Talks Club ML Zoomcamp ML Zoomcamp projects Aaisha’s interest in bioinformatics Keeping motivated with deadlines Notes and time-tracking tools Drawbacks to self-studying Aaisha’s interest in machine learning Aaisha’s least favorable part of ML Zoomcamp Helping people as a way to learn Using ChatGPT as a “study group” Is it possible to use self-studying to learn high-level topics Switching topics to avoid burnout Aaisha’s resource recommendations

Links:

LinkedIn: https://www.linkedin.com/in/aaisha-muhammad/ Twitter: https://twitter.com/ZealousMushroom Github: https://github.com/AaishaMuhammad Website: http://www.aaishamuhammad.co.za/

Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

In this episode, Conor and Bryce talk about vectorizing std::views::filter. Link to Episode 124 on WebsiteDiscuss this episode, leave a comment, or ask a question (on GitHub)Twitter ADSP: The PodcastConor HoekstraBryce Adelstein LelbachShow Notes Date Recorded: 2023-03-21 Date Released: 2023-04-07 YouTube Video of this episodeSpaces Prototype Godbolt LinkMD Iteration Comparison Godbolt LinkRanges Vectorization Brainstorming Godbolt LinkMinimal Filter Vectorization Example #0 Godbolt LinkMinimal Filter Vectorization Example #1 Godbolt LinkC++20 std::views::filterAuto-Vectorization in LLVMC++20 std::ranges::replace_ifC++20 std::views::transformBryce’s spaces/view_optimization.hppP0931 Structured bindings with polymorphic lambasC++20 std::views::takeC++20 std::views::dropIntro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

We talked about:

Shir’s background Debrief culture The responsibilities of a group manager Defining the success of a DS manager The three pillars of data science management Managing up Managing down Managing across Managing data science teams vs business teams Scrum teams, brainstorming, and sprints The most important skills and strategies for DS and ML managers Making sure proof of concepts get into production

Links:

The secret sauce of data science management: https://www.youtube.com/watch?v=tbBfVHIh-38 Lessons learned leading AI teams: https://blogs.intuit.com/2020/06/23/lessons-learned-leading-ai-teams/ How to avoid conflicts and delays in the AI development process (Part I): https://blogs.intuit.com/2020/12/08/how-to-avoid-conflicts-and-delays-in-the-ai-development-process-part-i/ How to avoid conflicts and delays in the AI development process (Part II): https://blogs.intuit.com/2021/01/06/how-to-avoid-conflicts-and-delays-in-the-ai-development-process-part-ii/ Leading AI teams deck: https://drive.google.com/drive/folders/1_CnqjugtsEbkIyOUKFHe48BeRttX0uJG Leading AI teams video: https://www.youtube.com/watch?app=desktop&v=tbBfVHIh-38

Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

In this episode, Conor and Bryce talk about a taxonomy of algorithms, C++20 std::views::filter and more C++20/23/26 ranges. Link to Episode 123 on WebsiteDiscuss this episode, leave a comment, or ask a question (on GitHub)Twitter ADSP: The PodcastConor HoekstraBryce Adelstein LelbachShow Notes Date Recorded: 2023-03-21 Date Released: 2023-03-31 C++20 std::views::filterHoogle Translate Tweet of filterC++98 std::find_ifC++20 std::views::takeC++20 std::views::droprange-v3 adjacent_remove_ifrange-v3 remove_ifC++20 std::views::splitC++23 std::views::chunkC++23 std::views::chunk_bychunk_by_key (mentioned in P2214)Sy Brand’s “Livecoding C++ Ranges: chunk_by and chunk_by_key”Python itertools groupbyIntro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

We talked about:

Nadia’s background Academic research in software engineering Design patterns Software engineering for ML systems Problems that people in industry have with software engineering and ML Communication issues and setting requirements Artifact research in open source products Product vs model Nadia’s open source product dataset Failure points in machine learning projects Finding solutions to issues using Nadia’s dataset and experience The problem of siloing data scientists and other structure issues The importance of documentation and checklists Responsible AI How data scientists and software engineers can work in an Agile way

Links:

Model Card: https://arxiv.org/abs/1810.03993 Datasheets: https://arxiv.org/abs/1803.09010 Factsheets: https://arxiv.org/abs/1808.07261 Research Paper: https://www.cs.cmu.edu/~ckaestne/pdf/icse22_seai.pdf Arxiv version: https://arxiv.org/pdf/2110.

Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

In this episode, Conor and Bryce chat about ChapGPT, the NVIDIA GTC 2023 conference and intelligence augmentation. Link to Episode 122 on WebsiteDiscuss this episode, leave a comment, or ask a question (on GitHub)Twitter ADSP: The PodcastConor HoekstraBryce Adelstein LelbachShow Notes Date Recorded: 2023-03-21 Date Released: 2023-03-24 RUN FOR THE FUN OF IT! Running PodcastChatGPTDeep Blue vs KasparovGTC 2023 ConferenceConor & Bryce Chat About C++ Algorithms + Combinators (YouTube Stream)The NVIDIA AI Podcast - Glean Founders Talk AI-Powered Enterprise Search on NVIDIA Podcast - Ep. 190ADSP Episode 97: C++ vs Carbon vs Circle vs CppFront with Sean BaxterClang-TidyGPTDuckOxide & Friends PodcastNVIDIA GTC 2023 - C++ Standard Parallelism - Bryce LelbachNVIDIA GTC 2023 - Defining the Quantum-Accelerated Supercomputer - Timothy CostaAPL Seeds 2023 Conferencecode_report GTC 2023 Trip ReportIntro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

We talked about:

Aleksander’s background The difficulty of selling data stack as a service How Aleksander got into consulting The Mom Test – extracting feedback from people User interviews Why Aleksander’s data stack as a service startup was not viable How Aleksander decided to switch to consulting Finding clients to consult Figuring out how to position your services Geographical limitations Figuring out your target audience The importance of networking and marketing Pricing your services The pitfalls of daily and hourly pricing and how to balance incentives Is Germany a good place to found a company? Aleksander’s book recommendations

Links:

LinkedIn: https://www.linkedin.com/in/alkrusz/ Twitter: https://twitter.com/alkrusz Website: www.leukos.io

Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

In this episode, Conor and Bryce get some random stories fom Zach Laine and chat about other random topics. Link to Episode 121 on WebsiteDiscuss this episode, leave a comment, or ask a question (on GitHub)Twitter ADSP: The PodcastConor HoekstraBryce Adelstein LelbachAbout the Guest Zach Laine has been using C++ in industry for 15 years, focusing on data visualization, numeric computing, games, generic programming, and good library design. He finds the process of writing bio blurbs to be a little uncomfortable.

Show Notes

Date Recorded: 2023-02-16 Date Released: 2023-03-17 CppCastC++NowElixirConf 2015 - Keynote: Elixir Should Take Over the World by Jessica KerrStop working on your slides - Andrei AlexandrescuCppCon 2018: Louis Dionne “Compile-time programming and reflection in C++20 and beyond”PLDI 2022 ConferenceAgdaChip WarIntro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

We talked about:

Ruslan’s background Fighting procrastination and perfectionism What is biohacking? The role of dopamine and other hormones in daily life How meditation can help The influence light has on our bodies Behavioral biohacking Daylight lamps and using light to wake up Sleep cycles How nutrition affects productivity Measuring productivity Examples of unsuccessful biohacking attempts Stoicism, voluntary discomfort, and self-challenges Biohacking risks and ways to prevent them Coffee and tea biohacking Using self-reflection and tracking to measure results Mindset shifting Stoicism book recommendation Work/life balance Ruslan’s biohacking resource recommendation

Links:

LinkedIn: https://www.linkedin.com/in/ruslanshchuchkin/

ree data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

In this episode, Conor and Bryce talk to Zach Laine about safety in C++, tuples, variants, reductions and more. Link to Episode 120 on WebsiteDiscuss this episode, leave a comment, or ask a question (on GitHub)Twitter ADSP: The PodcastConor HoekstraBryce Adelstein LelbachAbout the Guest Zach Laine has been using C++ in industry for 15 years, focusing on data visualization, numeric computing, games, generic programming, and good library design. He finds the process of writing bio blurbs to be a little uncomfortable. Show Notes Date Recorded: 2023-02-16 Date Released: 2023-03-10 Oxide & Friends PodcastYael Grauer on TwitterYael WritesConsumer Reports: Report: Future of Memory SafetyUnsafe RustC++98 std::unordered_mapC++98 std::vectorC++20 ConceptsC++20 CoroutinesC++20 RangesC++17 std::variantP0095 C++ Language Variant ProposalC++17 std::holds_alternativeC++ boost::hana::tupleC++23 std::views::enumeratePython enumerateADSP Episode 25: The Lost ReductionC++23 std::views::adjacent_transformFutharkArrayCast Episode 37: Troels Henriksen and FutharkFuthark reduceFuthark reduce_commNVIDIA thrust::reduceNVIDIA associative-only reduce ProposalIntro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

We talked about:

Parvathy’s background Brainstorming sessions with nonprofits to establish data maturity Example of an Analytics for a Better World project The overall data maturity situation of nonprofits vs private sector Solving the skill gap Publicly available content The Analytics for a Better World Academy The Academy’s target audience How researchers can work with Analytics for a Better World Improving data maturity in nonprofit organizations People, processes, and technology Typical tools that Analytics for a Better World recommends to nonprofits Profiles in nonprofits Does Analytics for a Better World has a need for data engineers? The Analytics for a Better World team Factors that help organizations become more data-driven Parvathy’s resource recommendations

Links:

LinkedIn: https://www.linkedin.com/in/parvathykrishnank/ Twitter:  https://twitter.com/ABWInstitute Github: https://github.com/Analytics-for-a-Better-World Website:  https://analyticsbetterworld.org/

Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html