GitHub

Episode 148: 🇸🇮 SRT23 - Robert Leahy on C++ in FinTech

2023-09-22 · ADSP: Algorithms + Data Structures = Programs Listen

podcast_episode

by Conor Hoekstra , Bryce Adelstein Lelbach (NVIDIA) , Robert Leahy

In this episode, Conor and Bryce record live from Venice while walking and interview Rob Leahy about C++ in FinTech. Link to Episode 148 on WebsiteDiscuss this episode, leave a comment, or ask a question (on GitHub)Twitter ADSP: The PodcastConor HoekstraBryce Adelstein LelbachAbout the Guest: Robert Leahy is a graduate of the University of Victoria where he specialized in graphics, gaming, and digital geometry processing. After spending 4.5 years in full stack web development he pivoted to financial infrastructure in early 2016 and now works on next generation market data storage and retrieval mechanisms. In 2019 he became involved in the ISO C++ committee with a particular focus on library evolution.

Show Notes

Date Recorded: 2023-06-21 Date Released: 2023-09-22 CityStrides.complrank.comMay StreetLondon Stock Exchange GroupQ and KDB+ArrayCast Episode 41: John Earnest and Versions of kADSP Episode 96: The K Programming LanguageUDPC++ std::hiveRobert Leahy on InstagramIntro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

Episode 147: 🇸🇮 SRT23 - Parallel std::unique Revisited (on a Walk in Venice)

2023-09-15 · ADSP: Algorithms + Data Structures = Programs Listen

podcast_episode

by Conor Hoekstra , Bryce Adelstein Lelbach (NVIDIA)

C++

In this episode, Conor and Bryce record live from Venice while walking and revisit the parallel std::unique implementation for a final time. Link to Episode 147 on WebsiteDiscuss this episode, leave a comment, or ask a question (on GitHub)Twitter ADSP: The PodcastConor HoekstraBryce Adelstein LelbachShow Notes Date Recorded: 2023-06-21 Date Released: 2023-09-15 C++11 std::adjacent_differencethrust::adjacent_differenceC++23 std::views::adjacent_transformthrust::zip_iteratorthrust::transform_iteratorthrust::copy_ifthrust::copy_if (stencil overload)Excel SUMIFC++11 std::uniquethrust::uniquethrust::find_ifthrust::unique_countthrust::unique_by_keyThrust and the C++ Standard Algorithms - Conor Hoekstra - GTC 2021thrust::sort_by_keythrust::unique_copyRAPIDS.aiIntro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

Low-Code AI

2023-09-13 · O'Reilly AI & ML Books O'Reilly Amazon

book

by Gwendolyn Stripling (Google Cloud) , Michael Abel (Google Cloud)

AI/ML BigQuery Data Management Keras Scikit-learn ai-ml data machine-learning

Take a data-first and use-case-driven approach with Low-Code AI to understand machine learning and deep learning concepts. This hands-on guide presents three problem-focused ways to learn no-code ML using AutoML, low-code using BigQuery ML, and custom code using scikit-learn and Keras. In each case, you'll learn key ML concepts by using real-world datasets with realistic problems. Business and data analysts get a project-based introduction to ML/AI using a detailed, data-driven approach: loading and analyzing data; feeding data into an ML model; building, training, and testing; and deploying the model into production. Authors Michael Abel and Gwendolyn Stripling show you how to build machine learning models for retail, healthcare, financial services, energy, and telecommunications. You'll learn how to: Distinguish between structured and unstructured data and the challenges they present Visualize and analyze data Preprocess data for input into a machine learning model Differentiate between the regression and classification supervised learning models Compare different ML model types and architectures, from no code to low code to custom training Design, implement, and tune ML models Export data to a GitHub repository for data management and governance

Pragmatic and Standardized MLOps - Maria Vechtomova

2023-09-08 · DataTalks.Club Listen

podcast_episode

by Maria Vechtomova (Marvelous MLOps)

AI/ML Data Engineering DevOps HTML LLM MLOps

We talked about:

Maria's background Marvelous MLOps Maria's definition of MLOps Alternate team setups without a central MLOps team Pragmatic vs non-pragmatic MLOps Must-have ML tools (categories) Maturity assessment What to start with in MLOps Standardized MLOps Convincing DevOps to implement Understanding what the tools are used for instead of knowing all the tools Maria's next project plans Is LLM Ops a thing? What Ahold Delhaize does Resource recommendations to learn more about MLOps The importance of data engineering knowledge for ML engineers

Links:

LinkedIn: https://www.linkedin.com/company/marvelous-mlops/

Website: https://marvelousmlops.substack.com/

Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Episode 146: 🇸🇮 SRT23 - Algorithms, BQN's Superpowers & More!

2023-09-08 · ADSP: Algorithms + Data Structures = Programs Listen

podcast_episode

by Conor Hoekstra , Bryce Adelstein Lelbach (NVIDIA)

C++ Python

In this episode, Conor and Bryce record live from Italy while driving to Venice and chat about improvements to our parallel std::unique implementation, essential data structures, our favorite algorithms revisited and BQN’s superpowers. Link to Episode 146 on WebsiteDiscuss this episode, leave a comment, or ask a question (on GitHub)Twitter ADSP: The PodcastConor HoekstraBryce Adelstein LelbachShow Notes Date Recorded: 2023-06-21 Date Released: 2023-09-08 C++11 std::uniquethrust::uniquethrust::inclusive_scanC++17 std::transform_reduceHaskell’s outerProductC++17 std::reduceC++17 std::inclusive_scanNVIDIA cucollections (cuco)HyperLogLogC++23 std::views::chunk_byCTCI: Cracking the coding interview by Gayle Laakmann McDowellBigOCheatSheet.comPython listPython setPython dictionary (hashmap)Python collectionsPython sortedcollectionsBQN ⁼ (undo)BQN / (indices)J :. (obverse)BQN ⌾ (under)CombinatoryLogic.comPsi Combinator:BQN ○ (atop)Haskell’s onHaskell groupByIntro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

Episode 145: 🇸🇮 SRT23 - Parallel std::unique

2023-09-01 · ADSP: Algorithms + Data Structures = Programs Listen

podcast_episode

by Conor Hoekstra , Bryce Adelstein Lelbach (NVIDIA)

C++

In this episode, Conor and Bryce record live from Italy while driving and chat how to implement a parallel std::unique. Link to Episode 145 on WebsiteDiscuss this episode, leave a comment, or ask a question (on GitHub)Twitter ADSP: The PodcastConor HoekstraBryce Adelstein LelbachShow Notes Date Recorded: 2023-06-21 Date Released: 2023-09-01 C++11 std::uniqueRust dedupKotlin distinctC++11 std::copy_ifC++11 std::adjacent_differencethrust::copy_ifthrust::adjacent_differencethrust::detail::head_flagsthrust::details::tail_flagsHaskell mapAdjacentKotlin zipWithNextq priorq deltasq differthrust::inclusive_scan

Democratizing Causality - Aleksander Molak

2023-08-25 · DataTalks.Club Listen

podcast_episode

by Aleksander Molak

AI/ML HTML LLM MLOps NLP Python

We talked about:

Aleksander's background Aleksander as a Causal Ambassador Using causality to make decisions Counterfactuals and and Judea Pearl Meta-learners vs classical ML models Average treatment effect Reducing causal bias, the super efficient estimator, and model uplifting Metrics for evaluating a causal model vs a traditional ML model Is the added complexity of a causal model worth implementing? Utilizing LLMs in causal models (text as outcome) Text as treatment and style extraction The viability of A/B tests in causal models Graphical structures and nonparametric identification Aleksander's resource recommendations

Links:

The Book of Why: https://amzn.to/3OZpvBk Causal Inference and Discovery in Python: https://amzn.to/46Pperr Book's GitHub repo: https://github.com/PacktPublishing/Causal-Inference-and-Discovery-in-Python The Battle of Giants: Causality vs NLP (PyData Berlin 2023): https://www.youtube.com/watch?v=Bd1XtGZhnmw New Frontiers in Causal NLP (papers repo): https://bit.ly/3N0TFTL

Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Episode 144: 🇸🇮 SRT23 - Nigeria, Here We Come! (and How Bryce Almost Died)

2023-08-25 · ADSP: Algorithms + Data Structures = Programs Listen

podcast_episode

by Conor Hoekstra , Bryce Adelstein Lelbach (NVIDIA)

Java

In this episode, Conor and Bryce record live from Slovenia, Croatia and Italy while driving and chat about next year’s 2024 Nigeria Road Trip as well as Bryce’s near death experience. This episode is very light on the technical content (so feel free to skip). Link to Episode 144 on WebsiteDiscuss this episode, leave a comment, or ask a question (on GitHub)Twitter ADSP: The PodcastConor HoekstraBryce Adelstein LelbachShow Notes Date Recorded: 2023-06-21 Date Released: 2023-08-25 PiranFireship Java YouTube Video (Java is mounting a huge comeback)Run for the Fun of It PodcastHaskell Programming LanguageClojure Programming LanguageIntro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

M-statistics

2023-08-22 · O'Reilly Data Science Books O'Reilly Amazon

book

by Eugene Demidenko

Data Science data data-science data-science-tasks statistics

M-STATISTICS A comprehensive resource providing new statistical methodologies and demonstrating how new approaches work for applications M-statistics introduces a new approach to statistical inference, redesigning the fundamentals of statistics, and improving on the classical methods we already use. This book targets exact optimal statistical inference for a small sample under one methodological umbrella. Two competing approaches are offered: maximum concentration (MC) and mode (MO) statistics combined under one methodological umbrella, which is why the symbolic equation M=MC+MO. M-statistics defines an estimator as the limit point of the MC or MO exact optimal confidence interval when the confidence level approaches zero, the MC and MO estimator, respectively. Neither mean nor variance plays a role in M-statistics theory. Novel statistical methodologies in the form of double-sided unbiased and short confidence intervals and tests apply to major statistical parameters: Exact statistical inference for small sample sizes is illustrated with effect size and coefficient of variation, the rate parameter of the Pareto distribution, two-sample statistical inference for normal variance, and the rate of exponential distributions. M-statistics is illustrated with discrete, binomial, and Poisson distributions. Novel estimators eliminate paradoxes with the classic unbiased estimators when the outcome is zero. Exact optimal statistical inference applies to correlation analysis including Pearson correlation, squared correlation coefficient, and coefficient of determination. New MC and MO estimators along with optimal statistical tests, accompanied by respective power functions, are developed. M-statistics is extended to the multidimensional parameter and illustrated with the simultaneous statistical inference for the mean and standard deviation, shape parameters of the beta distribution, the two-sample binomial distribution, and finally, nonlinear regression. Our new developments are accompanied by respective algorithms and R codes, available at GitHub, and as such readily available for applications. M-statistics is suitable for professionals and students alike. It is highly useful for theoretical statisticians and teachers, researchers, and data science analysts as an alternative to classical and approximate statistical inference.

Mastering Data Engineering as a Remote Worker - José María Sánchez Salas

2023-08-18 · DataTalks.Club Listen

podcast_episode

by José María Sánchez Salas

Data Engineering HTML MLOps

We talked about:

José's background How José relocated to Norway and his schedule Tech companies in Norway and José role Challenges of working as a remote data engineer José's newsletter on how to make use of data The process of making data useful Where José gets inspiration for his newsletter Dealing with burnout When in Norway, do as the Norwegians do The legalities of working remotely in Norway The benefits of working remotely

Links:

LinkedIn: https://www.linkedin.com/in/jmssalas Github: https://github.com/jmssalas Website & Newsletter: https://jmssalas.com

Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Episode 143: 🇸🇮 SRT23 - Hiša Franko, Postojna, Podcasts and R

2023-08-18 · ADSP: Algorithms + Data Structures = Programs Listen

podcast_episode

by Conor Hoekstra , Bryce Adelstein Lelbach (NVIDIA)

In this episode, Conor and Bryce record live from Slovenia while driving and recap Hiša Franko, the Idrija Mercury Mines and the Postojna Caves as well as chat about some podcasts and the R programming language. Link to Episode 143 on WebsiteDiscuss this episode, leave a comment, or ask a question (on GitHub)Twitter ADSP: The PodcastConor HoekstraBryce Adelstein LelbachShow Notes Date Recorded: 2023-06-20 Date Released: 2023-08-18 ADSP Episode 57: Holiday Special #2 - Ljubljana, Here We Come!Hiša FrankoPredjama CastlePostojna CavesMoses Schönfinkel - On the building blocks of mathematical logicIdrija Mercury MinesTwo’s Complement PodcastTwo’s Complement Yak Shaving, Live!Two’s Complement Yak Shaving, Part 2, Also Live!Compiler ExplorerSoftware UnscriptedRichard FeldmanWhy Isn’t Functional Programming the Norm? – Richard FeldmanRoc Programming LanguageElm Programming LanguageCoRecursive PodcastFunctional Geekery PodcastSoftware Unscripted - Comparing Haskell to R with Will KurtR Programming LanguageR Pipeline Operator %>%R actuar ModuleR outerR ReduceIntro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

Episode 142: 🇸🇮 SRT23 - Lake Bled & Bled Cake + Haskell, Rust & C++

2023-08-11 · ADSP: Algorithms + Data Structures = Programs Listen

podcast_episode

by Conor Hoekstra , Bryce Adelstein Lelbach (NVIDIA) , Gašper Ažman

Rust

In this episode, Conor and Bryce record live from Slovenia while driving and review Lake Bled and Bled Cream Cake and solve one problem in Haskell, Rust and C++! Link to Episode 142 on WebsiteDiscuss this episode, leave a comment, or ask a question (on GitHub)Twitter ADSP: The PodcastConor HoekstraBryce Adelstein LelbachShow Notes Date Recorded: 2023-06-18 Date Released: 2023-08-11 Lake BledBled Cream CakeMastermind Board GameGašper Ažman on TwitterRamanujan Numbers“Point-Free or Die: Tacit Programming in Haskell and Beyond” by Amar ShahLambdaDays 2023: Composition Intuition - Conor HoekstraC++17 std::transform_reduceC++98 std::inner_productC++98 std::equal_toC++98 std::equalC++98 std::plusHaskell zipWithHaskell fromEnumHaskell sumBQN LanguageAPL LanguageexactMatches Tweet from Composition Intuition Talkexact_matches Rust TweetRust Iterator traitRust str::chars()Intro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

The Good, the Bad and the Ugly of GPT - Sandra Kublik

2023-08-04 · DataTalks.Club Listen

podcast_episode

by Sandra Kublik

AI/ML HTML LLM MLOps

We talked about:

Sandra's background Making a YouTube channel to break into the LLM space The business cases for LLMs LLMs as amplifiers The befits of keeping a human in the loop when using LLMs (AI limitations) Using LLMs as assistants Building an app that uses an LLM Prompt whisperers and how to improve your prompts Sandra's 7-day LLM experiment Sandra's LLM content recommendations Finding Sandra online

Links:

LinkedIn: https://www.linkedin.com/in/sandrakublik/ Twitter: https://twitter.com/sandra_kublik Youtube: https://www.youtube.com/@sandra_kublik

Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Episode 141: 🇨🇦 CppNorth Live 🇨🇦 Kate Gregory, Jessica Kerr & Kristen Shaker!

2023-08-04 · ADSP: Algorithms + Data Structures = Programs Listen

podcast_episode

by Conor Hoekstra , Jessica Kerr , Kate Gregory (Gregory Consulting) , Kristen Shaker , Ben Deane

In this episode, Conor and Ben Deane record live from CppNorth 2023 in Toronto, Canada and interview more speakers and attendees from the conference! Link to Episode 141 on WebsiteDiscuss this episode, leave a comment, or ask a question (on GitHub)Twitter ADSP: The PodcastConor HoekstraBen DeaneGuests Interviewed Kate GregoryKristen ShakerJessica KerrShow Notes Date Recorded: 2023-07-19 Date Released: 2023-08-04 CppNorthCppNorth 2023: Keynote - Optimizing for Change - Ben DeaneCppNorth 2023: Keynote - Steps to Wisdom for C++ Developers - Kate GregoryCppNorth 2023: Iteration Revisited - Tristan BrindleC++ On Sea 2023: Iteration Revisited - Tristan BrindleNDC Tech Town 2023 Kongsberg ConferenceNYC++ MeetupCppNorth 2023: What’s New in Compiler Explorer? - Matt GodboltLightning Talk: Using Clang Query to Isolate AST Elements - Kristen Shaker - C++ on Sea 2022CppNorth 2023: Keynote - I can write the code. But getting something done is another matter - Jessica KerrHoneyComb.ioChangeLog Jessica Kerr EpisodesJessica Kerr WebsiteCppNorth 2023: Jessica Kerr Lightning TalkIntro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

Strategies For A Successful Data Platform Migration

2023-07-31 · Data Engineering Podcast Listen

podcast_episode

by Rob Goretsky , Gleb Mezhanskiy (Datafold) , Tobias Macey

AI/ML Airflow Analytics Amazon EMR BigQuery Dagster Data Engineering Data Management Data Science Datafold dbt ELK +9 more

Summary

All software systems are in a constant state of evolution. This makes it impossible to select a truly future-proof technology stack for your data platform, making an eventual migration inevitable. In this episode Gleb Mezhanskiy and Rob Goretsky share their experiences leading various data platform migrations, and the hard-won lessons that they learned so that you don't have to.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack Modern data teams are using Hex to 10x their data impact. Hex combines a notebook style UI with an interactive report builder. This allows data teams to both dive deep to find insights and then share their work in an easy-to-read format to the whole org. In Hex you can use SQL, Python, R, and no-code visualization together to explore, transform, and model data. Hex also has AI built directly into the workflow to help you generate, edit, explain and document your code. The best data teams in the world such as the ones at Notion, AngelList, and Anthropic use Hex for ad hoc investigations, creating machine learning models, and building operational dashboards for the rest of their company. Hex makes it easy for data analysts and data scientists to collaborate together and produce work that has an impact. Make your data team unstoppable with Hex. Sign up today at dataengineeringpodcast.com/hex to get a 30-day free trial for your team! Your host is Tobias Macey and today I'm interviewing Gleb Mezhanskiy and Rob Goretsky about when and how to think about migrating your data stack

Interview

Introduction How did you get involved in the area of data management? A migration can be anything from a minor task to a major undertaking. Can you start by describing what constitutes a migration for the purposes of this conversation? Is it possible to completely avoid having to invest in a migration? What are the signals that point to the need for a migration?

What are some of the sources of cost that need to be accounted for when considering a migration? (both in terms of doing one, and the costs of not doing one) What are some signals that a migration is not the right solution for a perceived problem?

Once the decision has been made that a migration is necessary, what are the questions that the team should be asking to determine the technologies to move to and the sequencing of execution? What are the preceding tasks that should be completed before starting the migration to ensure there is no breakage downstream of the changing component(s)? What are some of the ways that a migration effort might fail? What are the major pitfalls that teams need to be aware of as they work through a data platform migration? What are the opportunities for automation during the migration process? What are the most interesting, innovative, or unexpected ways that you have seen teams approach a platform migration? What are the most interesting, unexpected, or challenging lessons that you have learned while working on data platform migrations? What are some ways that the technologies and patterns that we use can be evolved to reduce the cost/impact/need for migraitons?

Contact Info

Gleb

LinkedIn @glebmm on Twitter

Rob

LinkedIn RobGoretsky on GitHub

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

Datafold

Podcast Episode

Informatica Airflow Snowflake

Podcast Episode

Redshift Eventbrite Teradata BigQuery Trino EMR == Elastic Map-Reduce Shadow IT

Podcast Episode

Mode Analytics Looker Sunk Cost Fallacy data-diff

Podcast Episode

SQLGlot Dagster dbt

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Sponsored By: Hex: Hex Tech Logo

Hex is a collaborative workspace for data science and analytics. A single place for teams to explore, transform, and visualize data into beautiful interactive reports. Use SQL, Python, R, no-code and AI to find and share insights across your organization. Empower everyone in an organization to make an impact with data. Sign up today at [dataengineeringpodcast.com/hex](https://www.dataengineeringpodcast.com/hex} and get 30 days free!Rudderstack:

Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstackSupport Data Engineering Podcast

LLMs for Everyone - Meryem Arik

2023-07-28 · DataTalks.Club Listen

podcast_episode

by Meryem Arik (TitanML)

API Data Quality HTML LLM MLOps Vector DB

We talked about:

Meryam's background The constant evolution of startups How Meryam became interested in LLMs What is an LLM (generative vs non-generative models)? Why LLMs are important Open source models vs API models What TitanML does How fine-tuning a model helps in LLM use cases Fine-tuning generative models How generative models change the landscape of human work How to adjust models over time Vector databases and LLMs How to choose an open source LLM or an API Measuring input data quality Meryam's resource recommendations

Links:

Website: https://www.titanml.co/ Beta docs: https://titanml.gitbook.io/iris-documentation/overview/guide-to-titanml... Using llama2.0 in TitanML Blog: https://medium.com/@TitanML/the-easiest-way-to-fine-tune-and-inference-llama-2-0-8d8900a57d57 Discord: https://discord.gg/83RmHTjZgf Meryem LinkedIn: https://www.linkedin.com/in/meryemarik/

Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Episode 140: 🇨🇦 CppNorth Live 🇨🇦 Victor Ciura, Andreas Weis & More!

2023-07-28 · ADSP: Algorithms + Data Structures = Programs Listen

podcast_episode

by Conor Hoekstra , Victor Ciura , Tristan Brindle (C++ London Uni) , Vincent Zalzal , Bryce Adelstein Lelbach (NVIDIA) , Andreas Weis , Ben Deane

In this episode, Conor and Ben Deane record live from CppNorth 2023 in Toronto, Canada and interview speakers! Link to Episode 140 on WebsiteDiscuss this episode, leave a comment, or ask a question (on GitHub)Twitter ADSP: The PodcastConor HoekstraBen DeaneGuests Interviewed Ben DeaneVincent ZalzalVictor CuiraTristan BrindleAndreas WeisShow Notes Date Recorded: 2023-07-18 Date Released: 2023-07-28 CppNorthCppNorth 2023: Calendrical C++ - Ben DeaneCppNorth 2023: Keynote - Optimizing for Change - Ben DeaneC++Now 2023: Calendrical C++ - Ben DeaneCppNorth 2023: Composition Intuition - Conor HoekstraCppNorth 2023: And Then() Some(T) - Victor CiuraAll of Ben Deane’s ADSP EpisodesCppNorth 2023: Writing C++ to Be Read - Vincent ZalzalC# LINQC++20 std::views::iotaC++23 std::views::zipC++23 std::views::enumeratePython enumerateRust enumerateC++20 flux LibraryLambdaDays 2023: Composition Intuition - Conor HoekstraCppNorth 2023: Iteration Revisited - Tristan BrindleC++ On Sea 2023: Iteration Revisited - Tristan BrindleCppNorth 2023: Keynote - Steps to Wisdom for C++ Developers - Kate GregoryMind in Motion by Barbara TverskyCombinator Logic: Volume I by Curry & FeysCppNorth 2023: Building Interfaces That Are Hard to Use Incorrectly - Andreas Weis2023 Annual C++ Developer Survey “Lite” by ISOJetBrains C++ State of Ecosystem in 2022BlackBerry MovieIntro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

Colossal AI: Scaling AI Models in Big Model Era

2023-07-26 · Databricks DATA + AI Summit 2023 Watch

video

by Yang You , James Demmel

AI/ML Databricks LLM MLOps

The proliferation of large models based on Transformer has outpaced advances in hardware, resulting in an urgent need for the ability to distribute enormous models across multiple GPUs. Despite this growing demand, best practices for choosing an optimal strategy are still lacking due to the breadth of knowledge required across HPC, DL, and distributed systems. These difficulties have stimulated both AI and HPC developers to explore the key questions: How can training and inference efficiency of large models be improved to reduce costs? How can larger AI models be accommodated even with limited resources?

What can be done to enable more community members to easily access large models and large-scale applications? In this session, we investigate efforts to solve the questions mentioned above. Firstly, diverse parallelization is an important tool to improve the efficiency of large model training and inference. Heterogeneous memory management can help enhance the model accommodation capacity of processors like GPUs.

Furthermore, user-friendly DL systems for large models significantly reduce the specialized background knowledge users need, allowing more community members to get started with larger models more efficiently. We will provide participants with a system-level open-source solution, Colossal-AI. More information can be found at https://github.com/hpcaitech/ColossalAI.

Talk by: James Demmel and Yang You

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Writing Data-Sharing Apps Using Node.js and Delta Sharing

2023-07-26 · Databricks DATA + AI Summit 2023 Watch

video

by Will Girten

Analytics Data Analytics Databricks Delta JavaScript Pandas TensorFlow

JavaScript remains the top programming language today with most code repositories written using JavaScript on GitHub. However, JavaScript is evolving beyond just a language for web application development into a language built for tomorrow. Everyday tasks like data wrangling, data analysis, and predictive analytics are possible today directly from a web browser. For example, many popular data analytics libraries, like Tensorflow.js, now support JavaScript SDKs.

Another popular library, Danfo.js, makes it possible to wrangle data using familiar pandas-like operations, shortening the learning curve and arming the typical data engineer or data scientist with another data tool in their toolbox. In this presentation, we’ll explore using the Node.js connector for Delta Sharing to build a data analytics app that summarizes a Twitter dataset.

Talk by: Will Girten

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Investing in Open-Source Data Tools - Bela Wiertz

2023-07-21 · DataTalks.Club Listen

podcast_episode

by Bela Wiertz (TKM Family Office)

HTML MLOps

We talked about:

Bela's background Why startups even need investors Why open source is a viable go-to-market strategy Building a bottom-up community The investment thesis for the TKM Family Office and the blurriness of the funding round naming convention Angel investors vs VC Funds vs family offices Bela's investment criteria and GitHub stars as a metric Inbound sourcing, outbound sourcing, and investor networking Making a good impression on an investor Balancing open and closed source parts of a product The future of open source Recent successes of open source companies Bela's resource recommendations

Links:

Understand who is engaging with your open source project article: https://www.crowd.dev/ Top 6 Books on Developer Community Building: https://www.crowd.dev/post/top-6-books-on-developer-community-building Which open source software metrics matter: https://www.bvp.com/atlas/measuring-the-engagement-of-an-open-source-software-community#Which-open-source-software-metrics-matter

Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

talk-data.com

Activity Trend

Top Events

Top Speakers

Episode 148: 🇸🇮 SRT23 - Robert Leahy on C++ in FinTech

Episode 147: 🇸🇮 SRT23 - Parallel std::unique Revisited (on a Walk in Venice)

Low-Code AI

Pragmatic and Standardized MLOps - Maria Vechtomova

Episode 146: 🇸🇮 SRT23 - Algorithms, BQN's Superpowers & More!

Episode 145: 🇸🇮 SRT23 - Parallel std::unique

Democratizing Causality - Aleksander Molak

Episode 144: 🇸🇮 SRT23 - Nigeria, Here We Come! (and How Bryce Almost Died)

M-statistics

Mastering Data Engineering as a Remote Worker - José María Sánchez Salas

Episode 143: 🇸🇮 SRT23 - Hiša Franko, Postojna, Podcasts and R

Episode 142: 🇸🇮 SRT23 - Lake Bled & Bled Cake + Haskell, Rust & C++

The Good, the Bad and the Ugly of GPT - Sandra Kublik

Episode 141: 🇨🇦 CppNorth Live 🇨🇦 Kate Gregory, Jessica Kerr & Kristen Shaker!

Strategies For A Successful Data Platform Migration

LLMs for Everyone - Meryem Arik

Episode 140: 🇨🇦 CppNorth Live 🇨🇦 Victor Ciura, Andreas Weis & More!

Colossal AI: Scaling AI Models in Big Model Era

Writing Data-Sharing Apps Using Node.js and Delta Sharing

Investing in Open-Source Data Tools - Bela Wiertz