Python

Demo du Code Interpreter

2023-07-18 · ChatGPTPlus - Code Interpreter Demo

talk

Data Science NLP data analysis machine learning visualization

Présentation de la version beta du Code Interpreter et démonstration de ses capacités, notamment: calculs mathématiques (algèbre, trigonométrie, statistiques), manipulation et analyse de données, visualisation de données, exécution de scripts Python, entraînement et évaluation de modèles d'apprentissage automatique, et traitement du texte et du langage naturel (tokenisation, stemming, fréquence des mots, etc.). Notez que l’outil est restreint par des règles de sécurité (pas d’accès à Internet ni de requêtes à des API externes ou téléchargement de fichiers via le web).

Datapreneurs - How Todays Business Leaders Are Using Data To Define The Future

2023-07-17 · Data Engineering Podcast Listen

podcast_episode

by Bob Muglia (Snowflake; Microsoft) , Tobias Macey

AI/ML Data Engineering Data Management Databricks Fivetran Looker Modern Data Stack Microsoft Fabric Pinecone Redshift SaaS +3 more

Summary

Data has been one of the most substantial drivers of business and economic value for the past few decades. Bob Muglia has had a front-row seat to many of the major shifts driven by technology over his career. In his recent book "Datapreneurs" he reflects on the people and businesses that he has known and worked with and how they relied on data to deliver valuable services and drive meaningful change.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack Your host is Tobias Macey and today I'm interviewing Bob Muglia about his recent book about the idea of "Datapreneurs" and the role of data in the modern economy

Interview

Introduction How did you get involved in the area of data management? Can you describe what your concept of a "Datapreneur" is?

How is this distinct from the common idea of an entreprenur?

What do you see as the key inflection points in data technologies and their impacts on business capabilities over the past ~30 years? In your role as the CEO of Snowflake you had a first-row seat for the rise of the "modern data stack". What do you see as the main positive and negative impacts of that paradigm?

What are the key issues that are yet to be solved in that ecosmnjjystem?

For technologists who are thinking about launching new ventures, what are the key pieces of advice that you would like to share? What do you see as the short/medium/long-term impact of AI on the technical, business, and societal arenas? What are the most interesting, innovative, or unexpected ways that you have seen business leaders use data to drive their vision? What are the most interesting, unexpected, or challenging lessons that you have learned while working on the Datapreneurs book? What are your key predictions for the future impact of data on the technical/economic/business landscapes?

Contact Info

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

Datapreneurs Book SQL Server Snowflake Z80 Processor Navigational Database System R Redshift Microsoft Fabric Databricks Looker Fivetran

Podcast Episode

Databricks Unity Catalog RelationalAI 6th Normal Form Pinecone Vector DB

Podcast Episode

Perplexity AI

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Sponsored By: Rudderstack:

Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstackSupport Data Engineering Podcast

How to be a good reviewer: Be honest, nice, and a badass

2023-07-11 · Members Talk Evening [in person and streamed]

talk

by Raana Saheb Nassagh

black gitlab runner mypy

The first time I was asked to review a colleague's code, I was unsure: What was expected of me? What exactly was I supposed to check? And, most importantly, wouldn't I make myself unpopular by pointing out others' mistakes? In my presentation, I will describe what I have learned since then. Using real examples, I’ll point out what you should look for when reviewing code (e.g. readability, redundancy, files & data), which tools you can use (e.g. gitlab runner, black, mypy) and how to stay friends while being brutally honest with each other :-) By the way: The examples of code bugs are not only from my colleagues. After all, my own code is constantly reviewed and fixed by others. And yes, I admit, it hurts every single time…

Modeling mental hops from one word to the next with NLP and Python

2023-07-11 · Members Talk Evening [in person and streamed]

talk

by Tamara Atanasoska (:probably..)

NLP

How would you model the mental hops that lead from one word to the next? And how about when instead of a word, the starting point are concepts grounded explicitly or implicitly in an image? These questions, and more, were the topic of my latest research project. Working to automatically generate image-term pairs for an image-grounded, collaborative Wordle game, I looked for combinations that spark the desired type of dialogue - illuminating the participants' decision-making. The project fits the broader efforts toward natural language explainability that Prof. Schlangen’s research group at the University of Potsdam is undertaking. We will look at the method I developed from an engineering perspective, going over all the NLP concepts composing it, and touch upon a bit of linguistics theory too. Level: Beginner to the domain (already familiar with Python)

Reduce Friction In Your Business Analytics Through Entity Centric Data Modeling

2023-07-09 · Data Engineering Podcast Listen

podcast_episode

by Maxime Beauchemin (Preset) , Tobias Macey

Activity Schema AI/ML Airflow Analytics Data Engineering Data Management Data Modelling dbt ETL/ELT GitHub Informatica dimensional modeling +3 more

Summary

For business analytics the way that you model the data in your warehouse has a lasting impact on what types of questions can be answered quickly and easily. The major strategies in use today were created decades ago when the software and hardware for warehouse databases were far more constrained. In this episode Maxime Beauchemin of Airflow and Superset fame shares his vision for the entity-centric data model and how you can incorporate it into your own warehouse design.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack Your host is Tobias Macey and today I'm interviewing Max Beauchemin about the concept of entity-centric data modeling for analytical use cases

Interview

Introduction How did you get involved in the area of data management? Can you describe what entity-centric modeling (ECM) is and the story behind it?

How does it compare to dimensional modeling strategies? What are some of the other competing methods Comparison to activity schema

What impact does this have on ML teams? (e.g. feature engineering)

What role does the tooling of a team have in the ways that they end up thinking about modeling? (e.g. dbt vs. informatica vs. ETL scripts, etc.)

What is the impact on the underlying compute engine on the modeling strategies used?

What are some examples of data sources or problem domains for which this approach is well suited?

What are some cases where entity centric modeling techniques might be counterproductive?

What are the ways that the benefits of ECM manifest in use cases that are down-stream from the warehouse?

What are some concrete tactical steps that teams should be thinking about to implement a workable domain model using entity-centric principles?

How does this work across business domains within a given organization (especially at "enterprise" scale)?

What are the most interesting, innovative, or unexpected ways that you have seen ECM used?

What are the most interesting, unexpected, or challenging lessons that you have learned while working on ECM?

When is ECM the wrong choice?

What are your predictions for the future direction/adoption of ECM or other modeling techniques?

Contact Info

mistercrunch on GitHub LinkedIn

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

Entity Centric Modeling Blog Post Max's Previous Apperances

Defining Data Engineering with Maxime Beauchemin Self Service Data Exploration And Dashboarding With Superset Exploring The Evolving Role Of Data Engineers Alumni Of AirBnB's Early Years Reflect On What They Learned About Building Data Driven Organizations

Apache Airflow Apache Superset Preset Ubisoft Ralph Kimball The Rise Of The Data Engineer The Downfall Of The Data Engineer The Rise Of The Data Scientist Dimensional Data Modeling Star Schema Databas

Learn Enough Python to Be Dangerous: Software Development, Flask Web Apps, and Beginning Data Science with Python

2023-07-07 · O'Reilly Data Science Books O'Reilly Amazon

book

by Michael Hartl

AI/ML Data Science DataViz programming-languages software-development

All You Need to Know, and Nothing You Don't, to Solve Real Problems with Python Python is one of the most popular programming languages in the world, used for everything from shell scripts to web development to data science. As a result, Python is a great language to learn, but you don't need to learn "everything" to get started, just how to use it efficiently to solve real problems. In Learn Enough Python to Be Dangerous, renowned instructor Michael Hartl teaches the specific concepts, skills, and approaches you need to be professionally productive. Even if you've never programmed before, Hartl helps you quickly build technical sophistication and master the lore you need to succeed. Hartl introduces Python both as a general-purpose language and as a specialist tool for web development and data science, presenting focused examples and exercises that help you internalize what matters, without wasting time on details pros don't care about. Soon, it'll be like you were born knowing this stuff--and you'll be suddenly, seriously dangerous. Learn enough about . . . Applying core Python concepts with the interactive interpreter and command line Writing object-oriented code with Python's native objects Developing and publishing self-contained Python packages Using elegant, powerful functional programming techniques, including Python comprehensions Building new objects, and extending them via Test-Driven Development (TDD) Leveraging Python's exceptional shell scripting capabilities Creating and deploying a full web app, using routes, layouts, templates, and forms Getting started with data-science tools for numerical computations, data visualization, data analysis, and machine learning Mastering concrete and informal skills every developer needs Michael Hartl's Learn Enough Series includes books and video courses that focus on the most important parts of each subject, so you don't have to learn everything to get started--you just have to learn enough to be dangerous and solve technical problems yourself. Like this book? Don't miss Michael Hartl's companion video tutorial, Learn Enough Python to Be Dangerous LiveLessons. Register your book for convenient access to downloads, updates, and/or corrections as they become available. See inside book for details.

Dive Into Data Science

2023-07-04 · O'Reilly Data Science Books O'Reilly Amazon

book

by Bradford Tuckfield

AI/ML Data Science Marketing data data-science

Dive into the exciting world of data science with this practical introduction. Packed with essential skills and useful examples, Dive Into Data Science will show you how to obtain, analyze, and visualize data so you can leverage its power to solve common business challenges. With only a basic understanding of Python and high school math, you’ll be able to effortlessly work through the book and start implementing data science in your day-to-day work. From improving a bike sharing company to extracting data from websites and creating recommendation systems, you’ll discover how to find and use data-driven solutions to make business decisions. Topics covered include conducting exploratory data analysis, running A/B tests, performing binary classification using logistic regression models, and using machine learning algorithms. You’ll also learn how to: •Forecast consumer demand •Optimize marketing campaigns •Reduce customer attrition •Predict website traffic •Build recommendation systems With this practical guide at your fingertips, harness the power of programming, mathematical theory, and good old common sense to find data-driven solutions that make a difference. Don’t wait; dive right in!

How Data Engineering Teams Power Machine Learning With Feature Platforms

2023-07-03 · Data Engineering Podcast Listen

podcast_episode

by Razi Raziuddin , Tobias Macey

AI/ML Data Engineering Data Management SaaS SQL

Summary

Feature engineering is a crucial aspect of the machine learning workflow. To make that possible, there are a number of technical and procedural capabilities that must be in place first. In this episode Razi Raziuddin shares how data engineering teams can support the machine learning workflow through the development and support of systems that empower data scientists and ML engineers to build and maintain their own features.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack Your host is Tobias Macey and today I'm interviewing Razi Raziuddin about how data engineers can empower data scientists to develop and deploy better ML models through feature engineering

Interview

Introduction How did you get involved in the area of data management? What is feature engineering is and why/to whom it matters?

A topic that commonly comes up in relation to feature engineering is the importance of a feature store. What are the tradeoffs for that to be a separate infrastructure/architecture component?

What is the overall lifecycle of a feature, from definition to deployment and maintenance?

How is this distinct from other forms of data pipeline development and delivery? Who are the participants in that workflow?

What are the sharp edges/roadblocks that typically manifest in that lifecycle? What are the interfaces that are needed for data scientists/ML engineers to be able to self-serve their feature management?

What is the role of the data engineer in supporting those interfaces? What are the communication/collaboration channels that are necessary to make the overall process a success?

From an implementation/architecture perspective, what are the patterns that you have seen teams build around for feature development/serving? What are the most interesting, innovative, or unexpected ways that you have seen feature platforms used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on feature engineering? What are the resources that you find most helpful in understanding and designing feature platforms?

Contact Info

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

FeatureByte DataRobot Feature Store Feast Feature Store Feathr Kaggle Yann LeCun

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Sponsored By: Rudderstack:

Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations fo