talk-data.com talk-data.com

Topic

Data Science

machine_learning statistics analytics

1516

tagged

Activity Trend

68 peak/qtr
2020-Q1 2026-Q1

Activities

1516 activities · Newest first

We talked about:

Audience Poll Andrey’s background What data science practice is Best DS practice in a traditional company vs IT-centric companies Getting started with building data science practice (finding out who you report to) Who the initiative comes from Finding out what kind of problems you will be solving (Centralized approach) Moving to a semi-decentralized approach Resources to learn about data science practice Pivoting from the role of a software engineer to data scientist The most impactful realization from data science practice Advice for individual growth Finding Andrey online

Links: 

Data Teams book: https://www.amazon.com/Data-Teams-Management-Successful-Data-Focused/dp/1484262271/

ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Beginning MATLAB and Simulink: From Beginner to Pro

Employ essential tools and functions of the MATLAB and Simulink packages, which are explained and demonstrated via interactive examples and case studies. This revised edition covers features from the latest MATLAB 2022b release, as well as other features that have been released since the first edition published. This book contains dozens of simulation models and solved problems via m-files/scripts and Simulink models which will help you to learn programming and modelling essentials. You’ll become efficient with many of the built-in tools and functions of MATLAB/Simulink while solving engineering and scientific computing problems. Beginning MATLAB and Simulink, Second Edition explains various practical issues of programming and modelling in parallel by comparing MATLAB and Simulink. After studying and using this book, you'll be proficient at using MATLAB and Simulink and applying the source code and models from the book's examples as templates for your own projects in data science or engineering. What You Will Learn Master the programming and modelling essentials of MATLAB and Simulink Carry out data visualization with MATLAB Build a GUI and develop App with MATLAB Work with integration and numerical root finding methods Apply MATLAB to differential equations-based models and simulations Use MATLAB and Simulink for data science projects Who This Book Is For Engineers, programmers, data scientists, and students majoring in engineering and scientific computing who are new to MATLAB and Simulink.

Today I’m chatting with Eugenio Zuccarelli, Research Scientist at MIT Media Lab and Manager of Data Science at CVS. Eugenio explains how he has created multiple algorithms designed to help shape decisions made in life or death situations, such as pediatric cardiac surgery and during the COVID-19 pandemic. Eugenio shared the lessons he’s learned on how to build trust in data when the stakes are life and death. Listen and learn how culture can affect adoption of decision support and ML tools, the impact delivery of information has on the user's ability to understand and use data, and why Eugenio feels that design is more important than the inner workings of ML algorithms.

Highlights/ Skip to:

Eugenio explains why he decided to work on machine learning models for cardiologists and healthcare workers involved in the COVID-19 pandemic (01:53)  The workflow surgeons would use when incorporating the predictive algorithm and application Eugenio helped develop (04:12) The question Eugenio’s predictive algorithm helps surgeons answer when evaluating whether to use various pediatric cardiac surgical procedures (06:37) The path Eugenio took to build trust with experienced surgeons and drive product adoption and the role of UX (09:42) Eugenio’s approach to identifying key problems and finding solutions using data (14:50) How Eugenio has tracked value delivery and adoption success for a tool that relies on more than just accurate data & predictions, but also surgical skill and patient case complexity (22:26) The design process Eugenio started early on to optimize user experience and adoption (28:40) Eugenio’s key takeaways from a different project that helped government agencies predict what resources would be needed in which areas during the COVID-19 pandemic (34:45)

Quotes from Today’s Episode “So many people today are developing machine-learning models, but I truly find the most difficult parts to be basically everything around machine learning … culture, people, stakeholders, products, and so on.” — Eugenio Zuccarelli (01:56)

“Developing machine-learning components, clean data, developing the machine-learning pipeline, those were the easy steps. The difficult ones who are gaining trust, as you said, developing something that was useful. And talking about trust, it’s especially tricky in the healthcare industry.” — Eugenio Zuccarelli (10:42)

“Because this tennis match, this ping-pong match between what can be done and what’s [the] problem [...] thankfully, we know, of course, it is not really the route to go. We don’t want to develop technology for the sake of it.” — Eugenio Zuccarelli (14:49)

“We put so much effort on the machine-learning side and then the user experience is so key, it’s probably even more important than the inner workings.” — Eugenio Zuccarelli (29:22)

“It was interesting to see exactly how the doctor is really focused on their job and doing it as well as they can, not really too interested in fancy [...] solutions, and so we were really able to not focus too much on appearance or fancy components, but more on usability and readability.” — Eugenio Zuccarelli (33:45)

“People’s ability to trust data, and how this varies from a lot of different entities, organizations, countries, [etc.] This really makes everything tricky. And of course, when you have a pandemic, this acts as a catalyst and enhances all of these cultural components.” — Eugenio Zuccarelli (35:59)

“I think [design success] boils down to delivery. You can package the same information in different ways [so that] it actually answers their questions in the ways that they’re familiar with.” — Eugenio Zuccarelli (37:42)

Links LinkedIn: https://www.linkedin.com/in/jayzuccarelli Twitter: twitter.com/jayzuccarelli Personal website: https://eugeniozuccarelli.com Medium: jayzuccarelli.medium.com

podcast_episode
by Val Kroll , Julie Hoyer , Tim Wilson (Analytics Power Hour - Columbus (OH) , Moe Kiss (Canva) , Michael Helbling (Search Discovery) , Jay Feng (Interview Query)

So, you finally took that recruiter's call, and then you made it through the initial phone screen. You weren't really expecting that to happen, but now you're facing an actual interview! It sounds intense and, yet, you're not sure what to expect or how to prepare for it. Flash cards with statistical concepts? A crash course in Python? LinkedIn stalking of current employees of the company? Maybe. We asked Jay Feng from Interview Query to join us to discuss strategies and tactics for data scientists and analyst interviews, and we definitely wanted to hire him by the time we were done! For complete show notes, including links to items mentioned in this episode and a transcript of the show, visit the show page.

Python has dominated data science programming for the last few years, but there’s another rising star programming language seeing increased adoption and popularity—Julia.

As the fourth most popular programming language, many data teams and practitioners are turning their attention toward understanding Julia and seeing how it could benefit individual careers, business operations, and drive increased value across organizations.

Zacharias Voulgaris, PhD joins the show to talk about his experience with the Julia programming language and his perspective on the future of Julia’s widespread adoption. Zacharias is the author of Julia for Data Science. As a Data Science consultant and mentor with 10 years of international experience that includes the role of Chief Science Officer at three startups, Zacharias is an expert in data science, analytics, artificial intelligence, and information systems.

In this episode, we discuss the strengths of Julia, how data scientists can get started using Julia, how team members and leaders alike can transition to Julia, why companies are secretive about adopting Julia, the interoperability of Julia with Python and other popular programming languages, and much more.

Check out this month’s events: https://www.datacamp.com/data-driven-organizations-2022

Take the Introduction to Julia course for free!

https://www.datacamp.com/courses/introduction-to-julia

R 4 Data Science Quick Reference: A Pocket Guide to APIs, Libraries, and Packages

In this handy, quick reference book you'll be introduced to several R data science packages, with examples of how to use each of them. All concepts will be covered concisely, with many illustrative examples using the following APIs: readr, dibble, forecasts, lubridate, stringr, tidyr, magnittr, dplyr, purrr, ggplot2, modelr, and more. With R 4 Data Science Quick Reference, you'll have the code, APIs, and insights to write data science-based applications in the R programming language. You'll also be able to carry out data analysis. All source code used in the book is freely available on GitHub.. What You'll Learn Implement applicable R 4 programming language specification features Import data with readr Work with categories using forcats, time and dates with lubridate, and strings with stringr Format data using tidyr and then transform that data using magrittr and dplyr Write functions with R for data science, data mining, and analytics-based applications Visualize data with ggplot2 and fit data to models using modelr Who This Book Is For Programmers new to R's data science, data mining, and analytics packages. Some prior coding experience with R in general is recommended.

How to leverage dbt Community as the first & ONLY data hire to survive

As data science and machine learning adoption grew over the last few years, Python moved up the ranks catching up to SQL in popularity in the world of data processing. SQL and Python are both powerful on their own, but their value in modern analytics is highest when they work together. This was a key motivator for us at Snowflake to build Snowpark for Python: to help modern analytics, data engineering, and data science teams generate insights without complex infrastructure management for separate languages.

Join this session to learn more about how dbt's new support for Python-based models and Snowpark for Python can help polyglot data teams get more value from their data through secure, efficient and performant metrics stores, feature stores, or data factories in the Data Cloud.

Check the slides here: https://docs.google.com/presentation/d/1xJEyfg81azw2hVilhGZ5BptnAQo8q1L7aDLGrnSYoUM/edit?usp=sharing

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Data Apps in the Real World: How to Capture Value Locked in the Data Warehouse

Should you consider building a Data App?

How many times has your product team asked for data science models to be available in realtime to serve feature flags and product recommendations to customers? They don’t, but they should, and with data apps the data team can make this a reality.

Join TJ Murphy of Multi Media LLC, Kevin Chao from Ramp, and Tejas Manohar from Hightouch to hear examples of data apps in the real world. Their aim is to give data practitioners a framework for when and why to use the warehouse for production applications, and why the data team is the right team for this undertaking.

TJ will walk through the data apps he built at Minted, including a user personalization service and marketing automation tools. At Minted, the data team supported a GraphQL layer on top of the warehouse that supported both web and mobile app personalization on a per user basis.

Kevin Chao will share how Ramp, a fintech leader valued at $8B, is using dbt and Hightouch to power compliance via Snowflake as the source of truth.

Tejas will share how Supr Daily, the Instacart of India, runs product recommendations in their mobile app and automatically sends push notifications at opportune moments to convert users at a higher rate.

Lastly, TJ will give a practical overview of architecture, and a checklist of what to think through before building a Data App.

Check the slides here: https://docs.google.com/presentation/d/1LMuuuvVy3QD2ZAltp5c1Eh5Ik4LgM0q-AMlThsZVR40/edit#slide=id.g166573b6b47_0_0

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Engineering your analytics career path

When Phoenix Jay (K Health) started her Master's in Applied Analytics, it was with the assumption that she’d quickly land a gig in data science. But by the time she graduated, the field had changed—as had every Data Scientist job posting. This shift brought uncertainty, but also opportunity. Learn how she adjusted her path to begin applying “classical” data science paradigms to analytics.

Check the slides here: https://docs.google.com/presentation/d/19DviXCiciRh3dq-5cSTXkPqA7CppUUKAsfpm6dmn0l8/edit?usp=sharing

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

The accidental analytics engineer

There’s a good chance you’re an analytics engineer who just sort of landed in an analytics engineering career. Or made a murky transition from data science/data engineering/software engineering to full-time analytics person. When did you realize you fell into the wild world of analytics engineering?

In this session, Michael Chow (RStudio) draws upon his experience building open source data science tools and working with the data science community to discuss the early signs of a budding analytics engineer, and the small steps these folks can take to keep the best parts of Python and R, all while moving towards engineering best practices.

Check the slides here: https://docs.google.com/presentation/d/1H2fVa-I4D8ibanlqLutIrwPOVypIlXVzEITDUNzzPpU/edit?usp=sharing

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

But really, what is transformation?

Many transformations are fine candidates for concretizing with dbt. But there are transformations that live in the data science world that are not well-suited for dbt—and probably for good reason. Consider the total set of all transformations, from mandatory pre-processing steps to sophisticated statistical transformations (e.g., converting data types versus computing robust measures of central tendency). The question quickly becomes: How do data teams decide which transformations to push down to dbt and which to leave up in the notebook?

In this panel discussion led by Allan Campopiano (Deepnote), analytics engineers, data engineers, and data scientists discuss what transformation means to them, where and when transformation happens in their stack, and how to collaborate effectively between high- and low-level forms of transformation.

Check the slides here: https://docs.google.com/presentation/d/1uqi1C2gpBnsMp-BTvjltmjlnedWKqEzwonZjrewOWNk/edit?usp=sharing

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

dbt Labs + Snowflake: Why SQL and Python go perfectly well together

As data science and machine learning adoption grew over the last few years, Python moved up the ranks catching up to SQL in popularity in the world of data processing. SQL and Python are both powerful on their own, but their value in modern analytics is highest when they work together. This was a key motivator for us at Snowflake to build Snowpark for Python: to help modern analytics, data engineering, and data science teams generate insights without complex infrastructure management for separate languages.

Join this session to learn more about how dbt's new support for Python-based models and Snowpark for Python can help polyglot data teams get more value from their data through secure, efficient and performant metrics stores, feature stores, or data factories in the Data Cloud.

Check Notion document here: https://www.notion.so/6382db82046f41599e9ec39afb035bdb

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

While securing the support of senior executives is a major hurdle of implementing a data transformation program, it’s often one of the earliest and easiest hurdles to overcome in comparison to the overall program itself. Leading a data transformation program requires thorough planning, organization-wide collaboration, careful execution, robust testing, and so much more.

Vanessa Gonzalez is the Senior Director of Data and Analytics for ML & AI at Transamerica. Vanessa has experience in data transformation, leadership, and strategic direction for Data Science and Data Governance teams, and is an experienced senior data manager.

Vanessa joins the show to share how she is helping to lead Transamerica’s Data Transformation program. In this episode, we discuss the biggest challenges Transamerica has faced throughout the process, the most important factors to making any large-scale transformation successful, how to collaborate with other departments, how Vanessa structures her team, the key skills data scientists need to be successful, and much more.

Check out this month’s events: https://www.datacamp.com/data-driven-organizations-2022

Mathematical Foundations of Data Science Using R, 2nd Edition

The aim of the book is to help students become data scientists. Since this requires a series of courses over a considerable period of time, the book intends to accompany students from the beginning to an advanced understanding of the knowledge and skills that define a modern data scientist. The book presents a comprehensive overview of the mathematical foundations of the programming language R and of its applications to data science.

We talked about:

Tomasz’s background What Tomasz did before DataOps (Data Science) Why Tomasz made the transition from Data science to DataOps What is DataOps? How is DataOps related to infrastructure? How Tomasz learned the skills necessary to become DataOps Becoming comfortable with terminal The overlap between DataOps and Data Engineering Suitable/useful skills for DataOps Minimal operational skills for DataOps Similarities between DataOps and Data Science Managers Tomasz’s interesting projects Confidence in results and avoiding going too deep with edge cases Conclusion

Links:

Terminal setup video, 19 minutes long: https://www.youtube.com/watch?v=D2PSsnqgBiw Command line videos, one and a half hour to become somewhat comfy with the terminal: https://www.youtube.com/playlist?list=PLIhvC56v63IKioClkSNDjW7iz-6TFvLwS Course from MIT talking about just that (command line, git, storing secrets): https://missing.csail.mit.edu/

ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

On this episode of Data Unchained, we bring you Sales Director at Titan Solutions Steve Low! Molly Presley, our host, and Steve discuss the ins and outs of Titan Solutions, how they help resellers asses new technologies that are coming out, how they shifted from a data storage company to a data management company, and how their new partnership with Hammerspace will better help their customers and company as a whole. Take a look into the age of data science on this latest podcast episode of Data Unchained!

Data #DataScience #DataManagement #DataStorage #Hammerspace

Cyberpunk by jiglr | https://soundcloud.com/jiglrmusic Music promoted by https://www.free-stock-music.com Creative Commons Attribution 3.0 Unported License https://creativecommons.org/licenses/by/3.0/deed.en_US Hosted on Acast. See acast.com/privacy for more information.

Today I’m chatting with Iván Herrero Bartolomé, Chief Data Officer at Grupo Intercorp. Iván describes how he was prompted to write his new article in CDO Magazine, “CDOs, Let’s Get Out of Our Comfort Zone” as he recognized the importance of driving cultural change within organizations in order to optimize the use of data. Listen in to find out how Iván is leveraging the role of the analytics translator to drive this cultural shift, as well as the challenges and benefits he sees data leaders encounter as they move from tactical to strategic objectives. Iván also reveals the number one piece of advice he’d give CDOs who are struggling with adoption. 

Highlights / Skip to:

Iván explains what prompted him to write his new article, “CDOs, Let’s Get Out of Our Comfort Zone” (01:08) What Iván feels is necessary for data leaders to close the gap between data and the rest of the business and why (03:44) Iván dives into who he feels really owns delivery of value when taking on new data science and analytics projects (09:50) How Iván’s team went from managing technical projects that often didn’t make it to production to working on strategic projects that almost always make it to production (13:06) The framework Iván has developed to upskill technical and business roles to be effective data / analytics translators (16:32) The challenge Iván sees data leaders face as they move from setting and measuring tactical goals to moving towards strategic goals and initiatives (24:12) Iván explains how the C-Suite’s attitude impacts the cross-functional role of data & analytics leadership (28:55) The number one piece of advice Iván would give new CDO’s struggling with low adoption of their data products and solutions (31:45)

Quotes from Today’s Episode “We’re going to do all our best to ensure that [...] everything that is expected from us is done in the best possible way. But that’s not going to be enough. We need a sponsorship and we need someone accountable for the project and someone who will be pushing and enabling the use of the solution once we are gone. Because we cannot stay forever in every company.” – Iván Herrero Bartolomé (10:52)

“We are trying to upskill people from the business to become data translators, but that’s going to take time. Especially what we try to do is to take product owners and give them a high-level immersion on the state-of-the-art and the possibilities that data analytics bring to the table. But as we can’t rely on our companies having this kind of talent and these data translators, they are one of the profiles that we bring in for every project that we work on.” – Iván Herrero Bartolomé (13:51)

“There’s a lot to do, not just between data and analytics and the other areas of the company, but aligning the incentives of all the organization towards the same goals in a way that there’s no friction between the goals of the different areas, the people, [...]  and the final goals of the organization. – Iván Herrero Bartolomé (23:13) “Deciding which goals are you going to be co-responsible for, I think that is a sophisticated process that it’s not mastered by many companies nowadays. That probably is one of the main blockers keeping data analytics areas working far from their business counterparts” – Iván Herrero Bartolomé (26:05)

“When the C-suite looks at data and analytics, if they think these are just technical skills, then the data analytics team are just going to behave as technical people. And many, many data analytics teams are set up as part of the IT organization. So, I think it all begins somehow with how the C-suite of our companies look at us.” – Iván Herrero Bartolomé (28:55) “For me, [digital] means much more than the technical development of solutions; it should also be part of the transformation of the company, both in how companies develop relationships with their customers, but also inside how every process in the companies becomes more nimble and can react faster to the changes in the market.” – Iván Herrero Bartolomé (30:49) “When you feel that everyone else not doing what you think they should be doing, think twice about whether it is they who are not doing what they should be doing or if it’s something that you are not doing properly.” – Iván Herrero Bartolomé (31:45)

Links “CDOs, Let’s Get Out of Our Comfort Zone”: https://www.cdomagazine.tech/cdo_magazine/topics/opinion/cdos-lets-get-out-of-our-comfort-zone/article_dce87fce-2479-11ed-a0f4-03b95765b4dc.html LinkedIn: https://www.linkedin.com/in/ivan-herrero-bartolome/

We talked about:

Katie’s background What is a data scientist? What is a data science manager? Quality of the craft How data leaders promote career growth Supporting senior data professionals Choosing the IC route vs the management route Managing junior data professionals Talking to senior stakeholders and PMs as a junior The importance of hiring juniors What skills do data scientist managers need to get hired? How juniors that are just starting out can set themselves apart from the competition Asking senior colleagues for help and the rubber duck channel The challenges of the head of data Conclusion

Links:

Jobs at Gloss Genius: https://boards.greenhouse.io/glossgenius

ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

We talked about:

Alvaro’s background Working as a QA (Quality Assurance) engineer Transitioning from QA to Machine Learning Gathering knowledge about ML field Searching for an ML job (improving soft skills and CV) Data science interview skills Zoomcamp projects Zoomcamp project deployment How to not undersell yourself during interviews Alvaro’s experience with interviews during his transition Alvaro’s Zoomcamp notes Alvaro’s coach The importance of mathematical knowledge to a transition into ML Preparing for technical interviews Alvaro’s typical workday Alvaro’s team’s tech stack The importance of a technical background to transitioning into ML

Links:

Alvaro's CV: https://www.dropbox.com/s/89hkt3ug0toqa2n/CV%20nou%20-%20angl%C3%A8s.pdf?dl=0 Github profile: https://github.com/ziritrion LinkedIn profile: https://www.linkedin.com/in/alvaronavas/

ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcampJoin 

DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

In this episode, Jason Foster talks to David Bader, Distinguished Professor of the Department of Data Science at the New Jersey Institute of Technology. They talk about building massive scale analytics, how to use a large amount of data to gain insights, the complexity of the data set and how to bridge the gap between architecture and algorithms. David also shares his notable experience, talks about capabilities and skills data departments require to run large-scale data projects and explores some use cases in diverse industries.