talk-data.com talk-data.com

Topic

Data Science

machine_learning statistics analytics

1516

tagged

Activity Trend

68 peak/qtr
2020-Q1 2026-Q1

Activities

1516 activities · Newest first

While leading a mature data science function is a challenge in its own right, building one from scratch at an organization can be just as, if not even more, difficult. As a data leader, you need to balance short-term goals with a long-term vision, translate technical expertise into business value, and develop strong communication skills and an internalized understanding of a business's values and goals in order to earn trust with key stakeholders and build the right team.

Elettra Damaggio is no stranger to this process. Elettra is the Director for Global Data Science at StoneX, an institutional-grade financial services network that connects clients to the global markets ecosystem. Elettra has over 10 years of experience in machine learning, AI, and various roles within digital transformation and digital business growth.

In this episode, she shares how data leaders can balance short-term wins with long-term goals, how to earn trust with stakeholders, major challenges when launching a data science function, and advice she has for new and aspiring data practitioners.

We talked about:

Lisa’s background Centralized org vs decentralized org Hybrid org (centralized/decentralized) Reporting your results in a data organization Planning in a data organization Having all the moving parts work towards the same goals Which approach Twitter follows (centralized vs decentralized) Pros and cons of a decentralized approach Pros and cons of a centralized approach Finding a common language with all the functions of an org Finding the right approach for companies that want to implement data science How many data scientists does a company need? Who do data scientists report huge findings to? The importance of partnering closely with other functions of the org The role of Product Managers in the org and across functions Who does analytics at Twitter (analysts vs data scientists) The importance of goals, objectives and key results Conflicting objectives The importance of research Finding Lisa online

Links:

LinkedIn: https://www.linkedin.com/in/cohenlisa/ Twitter: https://twitter.com/lisafeig Medium: https://medium.com/@lisa_cohen Lisa Cohen's YouTube videos: https://www.youtube.com/playlist?list=PLRhmnnfr2bX7-GAPHzvfUeIEt2iYCbI3w

MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

In pharmaceuticals, wrong decisions can not only cost a company revenue, but they can also cost people their lives. With stakes so high, it’s vital that pharmaceutical companies have robust systems and processes in place to accurately gather, analyze, and interpret data and turn it into actionable steps to solving health issues.

Suman Giri is the Global Head of Data Science of the Human Health Division at Merck, a biopharmaceutical research company that works to develop innovative health solutions for both people and animals. Suman joins the show today to share how Merck is using data to improve organizational decision-making, medical research outcomes, and how data science is transforming the pharmaceutical industry at scale. He also shares some of the biggest challenges facing the industry right now and what new trends are on the horizon.

In this talk, we explain how Apache Airflow is at the center of our Kubernetes-based Data Science Platform at PlayStation. We talk about how we built a flexible development environment for Data Scientists to interact with Apache Airflow and explain the tools and processes we built to help Data Scientists promote their dags from development to production. We will also talk about the impact of containerization and the usage of KubernetesOperator and the new SparkKubernetesOperator and the benefits of deploying Airflow in Kubernetes using the KubernetesExecutor across multiple environments.

Today I sit down with Vijay Yadav, head of the data science team at Merck Manufacturing Division. Vijay begins by relating his own path to adopting a data product and UX-driven approach to applied data science, andour chat quickly turns to the ever-present challenge of user adoption. Vijay discusses his process of designing data products with customers, as well as the impact that building user trust has on delivering business value. We go on to talk about what metrics can be used to quantify adoption and downstream value, and then Vijay discusses the financial impact he has seen at Merck using this user-oriented perspective. While we didn’t see eye to eye on everything, Vijay was able to show how focusing on the last mile UX has had a multi-million dollar impact on Merck. The conversation concludes with Vijay’s words of advice for other data science directors looking to get started with a design and user-centered approach to building data products that achieve adoption and have measurable impact.

In our chat, we covered Vijay’s design process, metrics, business value, and more: 

Vijay shares how he came to approach data science with a data product management approach and how UX fits in (1:52) We discuss overcoming the challenge of user adoption by understanding user thinking and behavior (6:00) We talk about the potential problems and solutions when users self-diagnose their technology needs (10:23) Vijay delves into what his process of designing with a customer looks like (17:36) We discuss the impact “solving on the human level” has on delivering real world benefits and building user trust (21:57) Vijay talks about measuring user adoption and quantifying downstream value—and Brian discusses his concerns about tool usage metrics as means of doing this (25:35) Brian and Vijay discuss the multi-million dollar financial and business impact Vijay has seen at Merck using a more UX  driven approach to data product development (31:45) Vijay shares insight on what steps a head of data science  might wish to take to get started implementing a data product and UX approach to creating ML and analytics applications that actually get used  (36:46)

Quotes from Today’s Episode “They will adopt your solution if you are giving them everything they need so they don’t have to go look for a workaround.” - Vijay (4:22)

“It’s really important that you not only capture the requirements, you capture the thinking of the user, how the user will behave if they see a certain way, how they will navigate, things of that nature.” - Vijay (7:48)

“When you’re developing a data product, you want to be making sure that you’re taking the holistic view of the problem that can be solved, and the different group of people that we need to address. And, you engage them, right?” - Vijay (8:52)

“When you’re designing in low fidelity, it allows you to design with users because you don’t spend all this time building the wrong thing upfront, at which point it’s really expensive in time and money to go and change it.” - Brian (17:11)

"People are the ones who make things happen, right? You have all the technology, everything else looks good, you have the data, but the people are the ones who are going to make things happen.” - Vijay (38:47)

“You want to make sure that you [have] a strong team and motivated team to deliver. And the human spirit is something, you cannot believe how stretchable it is. If the people are motivated, [and even if] you have less resources and less technology, they will still achieve [your goals].” - Vijay (42:41)

“You’re trying to minimize any type of imposition on [the user], and make it obvious why your data product  is better—without disruption. That’s really the key to the adoption piece: showing how it is going to be better for them in a way they can feel and perceive. Because if they don’t feel it, then it’s just another hoop to jump through, right?” - Brian (43:56)

Resources and Links:  LinkedIn: https://www.linkedin.com/in/vijyadav/

Building data science functions has become tables takes for many organizations today. However, before data science functions were needed, the finance function acted as the insights layer for many organizations over the past. This means that working in finance has become an effective entry point into data science function for professionals across all spectrums.

Brian Richardi is the Head of Finance Data Science and Analytics at Stryker, a medical equipment manufacturing company based in Michigan, US. Brian brings over 14 years of global experience to the table. At Stryker, Brian leads a team of data scientists that use business data and machine learning to make predictions for optimization and automation.

In this episode, Brian talks about his experience as a data science leader transitioning from Finance, how he utilizes collaboration and effective communication to drive value, how leads the data science finance function at Stryker, and what the future of data science looks like in the finance space, and more.

Ten Things to Know About ModelOps

The past few years have seen significant developments in data science, AI, machine learning, and advanced analytics. But the wider adoption of these technologies has also brought greater cost, risk, regulation, and demands on organizational processes, tasks, and teams. This report explains how ModelOps can provide both technical and operational solutions to these problems. Thomas Hill, Mark Palmer, and Larry Derany summarize important considerations, caveats, choices, and best practices to help you be successful with operationalizing AI/ML and analytics in general. Whether your organization is already working with teams on AI and ML, or just getting started, this report presents ten important dimensions of analytic practice and ModelOps that are not widely discussed, or perhaps even known. In part, this report examines: Why ModelOps is the enterprise "operating system" for AI/ML algorithms How to build your organization's IP secret sauce through repeatable processing steps How to anticipate risks rather than react to damage done How ModelOps can help you deliver the many algorithms and model formats available How to plan for success and monitor for value, not just accuracy Why AI will be soon be regulated and how ModelOps helps ensure compliance

We talked about:

Misra’s background What data scientists do Consultant data scientists vs in-house data scientists (and freelancers) Expectations for data scientists The importance of keeping up to date with AI developments (FOMA) How does DALL·E 2 work and should you care? Going to conferences to stay up to date The most pressing issue for data scientists Fighting FOMA and imposter syndrome Knowing when you have enough knowledge of a framework The “best” type of data scientist Being a generalist vs a specialist Advice for entry-level data entering an oversaturated market Catching the eye of big AI companies Choosing a project for your portfolio The importance of having a Ph.D. or Master’s degree in data science Finding Misra online

Links:

Mısra's YouTube channel: https://www.youtube.com/channel/UCpNUYWW0kiqyh0j5Qy3aU7w Twitter: https://twitter.com/misraturp Hands-on Data Science: Complete Your First Portfolio Project: https://www.soyouwanttobeadatascientist.com/hods 

MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.htm

Beginning Data Science in R 4: Data Analysis, Visualization, and Modelling for the Data Scientist

Discover best practices for data analysis and software development in R and start on the path to becoming a fully-fledged data scientist. Updated for the R 4.0 release, this book teaches you techniques for both data manipulation and visualization and shows you the best way for developing new software packages for R. Beginning Data Science in R 4, Second Edition details how data science is a combination of statistics, computational science, and machine learning. You’ll see how to efficiently structure and mine data to extract useful patterns and build mathematical models. This requires computational methods and programming, and R is an ideal programming language for this. Modern data analysis requires computational skills and usually a minimum of programming. After reading and using this book, you'll have what you need to get started with R programming with data science applications. Source code will be available to support your next projects as well. Source code is available at github.com/Apress/beg-data-science-r4. What You Will Learn Perform data science and analytics using statistics and the R programming language Visualize and explore data, including working with large data sets found in big data Build an R package Test and check your code Practice version control Profile and optimize your code Who This Book Is For Those with some data science or analytics background, but not necessarily experience with the R programming language.

Democratizing data, and developing data culture in large enterprise organizations is an incredibly complex process that can seem overwhelming if you don’t know where to start. And today’s guest draws a clear path towards becoming data-driven.

Meenal Iyer, Sr. Director for Data Science and Experimentation at Tailored Brands, Inc., has over 20 years of experience as a Data and Analytics strategist. She has built several data and analytics platforms and drives the enterprises she works with to be insights-driven. Meenal has also led data teams at various retail organizations, and as a wide variety of specialties in Data Science, including data literacy programs, data monetization, machine learning, enterprise data governance, and more.

In this episode, Meenal shares her thorough, effective, and clear strategy for democratizing data successfully and how that helps create a successful data culture in large enterprises, and gives you the tools you need to do the same in your organization.

[Announcement] Join us for DataCamp Radar, our digital summit on June 23rd. During this summit, a variety of experts from different backgrounds will be discussing everything related to the future of careers in data. Whether you're recruiting for data roles or looking to build a career in data, there’s definitely something for you. Seats are limited, and registration is free, so secure your spot today on https://events.datacamp.com/radar/

The Pandas Workshop

The Pandas Workshop offers a detailed journey into the world of data analysis using Python and the pandas library. Throughout the book, you'll build skills in accessing, transforming, visualizing, and modeling data, all while focusing on real-world data science challenges. You will gain the knowledge and confidence needed to dissect and derive insights from complex datasets. What this Book will help me do Understand how to access and load data from various formats including databases and web-based sources. Manipulate and transform data for analysis using efficient pandas techniques. Create insightful visualizations using Matplotlib integrated with pandas for clearer data presentation. Build predictive and descriptive data models and glean data-driven insights. Handle and analyze time-series data to uncover trends and seasonal effects in data patterns. Author(s) Blaine Bateman, Saikat Basak, Thomas Joseph, and William So collectively bring diverse expertise in data analysis, programming, and teaching. Their goal is to make cutting-edge data science techniques accessible through clear explanations and practical exercises, helping learners from varied backgrounds master the pandas library. Who is it for? This book is best suited for novice to intermediate programmers and data enthusiasts who are already familiar with Python but are new to the pandas library. Ideal readers are those interested in honing their skills in data analysis and visualization, as well as leveraging data for informed decision-making. Whether you're an analyst, aspiring data scientist, or business professional seeking to strengthen your analytical toolkit, this book provides beneficial insights and techniques.

In a recent conversation with data warehousing legend Bill Inmon, I learned about a new way to structure your data warehouse and self-service BI environment called the Unified Star Schema. The Unified Star Schema is potentially a small revolution for data analysts and business users as it allows them to easily join tables in a data warehouse or BI platform through a bridge. This gives users the ability to spend time and effort on discovering insights rather than dealing with data connectivity challenges and joining pitfalls. Behind this deceptively simple and ingenious invention is author and data modelling innovator Francesco Puppini. Francesco and Bill have co-written the book ‘The Unified Star Schema: An Agile and Resilient Approach to Data Warehouse and Analytics Design’ to allow data modellers around the world to take advantage of the Unified Star Schema and its possibilities. Listen to this episode of Leaders of Analytics, where we explore: What the Unified Star Schema is and why we need itHow Francesco came up with the concept of the USSReal-life examples of how to use the USSThe benefits of a USS over a traditional star schema galaxyHow Francesco sees the USS and data warehousing evolving in the next 5-10 years to keep up with new demands in data science and AI, and much more.Connect with Francesco Francesco on Linkedin: https://www.linkedin.com/in/francescopuppini/ Francesco's book on the USS: https://www.goodreads.com/author/show/20792240.Francesco_Puppini

Send us a text Part 1 : David Collins, Chief Revenue Officer of Diwo, enters the building.  We go a bit deep to learn about Decision Intelligence this week.  Enjoy.   Show Notes 03:28 Meet David Collins.  Who is this dude? 07:46 Why Diwo? 09:53 Decision Intelligence is NOT Data Science 14:44 Skip the dashboard 17:23 The pursuit of data perfection? 19:24 Contextual intelligence 23:12 Where does Diwo start? 27:09 A use case example 29:05 Are historical data trends meaningless? Find David : https://www.linkedin.com/in/dacollin/ Website : https://diwo.ai/ Want to be featured as a guest on Making Data Simple?  Reach out to us at [email protected] and tell us why you should be next.  The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.  Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

Advanced Analytics with PySpark

The amount of data being generated today is staggering and growing. Apache Spark has emerged as the de facto tool to analyze big data and is now a critical part of the data science toolbox. Updated for Spark 3.0, this practical guide brings together Spark, statistical methods, and real-world datasets to teach you how to approach analytics problems using PySpark, Spark's Python API, and other best practices in Spark programming. Data scientists Akash Tandon, Sandy Ryza, Uri Laserson, Sean Owen, and Josh Wills offer an introduction to the Spark ecosystem, then dive into patterns that apply common techniques-including classification, clustering, collaborative filtering, and anomaly detection, to fields such as genomics, security, and finance. This updated edition also covers NLP and image processing. If you have a basic understanding of machine learning and statistics and you program in Python, this book will get you started with large-scale data analysis. Familiarize yourself with Spark's programming model and ecosystem Learn general approaches in data science Examine complete implementations that analyze large public datasets Discover which machine learning tools make sense for particular problems Explore code that can be adapted to many uses

Data integration, governance, and consumption play a pivotal role in the machine learning lifecycle. New offerings from Informatica illustrate the types of tools data science teams need to handle data integration, governance, and consumption. Published at: https://www.eckerson.com/articles/integrating-governing-and-consuming-data-for-the-machine-learning-lifecycle

When many people talk about leading effective Data Science teams in large organizations, it’s easy for them to forget how much effort, intentionality, vision, and leadership are involved in the process.

Glenn Hofmann, Chief Analytics Officer at New York Life Insurance, is no stranger to that work. With over 20 years of global leadership experience in data, analytics, and AI that spans the US, Germany, and South Africa, Glenn knows firsthand what it takes to build an effective data science function within a large organization.

In this episode, we talk about how he built NeW York Life Insurance’s 50-person data science and AI function, how they utilize skillsets to offer different career paths for data scientists, building relationships across the organization, and so much more.

[Announcement] Join us for DataCamp Radar, our digital summit on June 23rd. During this summit, a variety of experts from different backgrounds will be discussing everything related to the future of careers in data. Whether you're recruiting for data roles or looking to build a career in data, there’s definitely something for you. Seats are limited, and registration is free, so secure your spot today on https://events.datacamp.com/radar/

The healthcare industry presents a set of unique challenges for data science, including how to manage and work with sensitive patient information and accounting for the real-world impact of AI and machine learning on patient care and experience.

Curren Katz, Senior Director for Data Science & Project Management at Johnson & Johnson, believes that despite challenges like these, there are massive opportunities for data science and machine learning to increase care quality, drive business objectives, diagnose diseases earlier, and ultimately save countless lives around the world.

Curren has over 10 years of leadership experience across both the US and Europe and has led more than 20 successful data science product launches in the payer, provider, and pharmaceutical spaces. She also brings her background as a cognitive neuroscientist to data science, with research in neural networks, connectivity analysis, and more.

[Announcement] Join us for DataCamp Radar, our digital summit on June 23rd. During this summit, a variety of experts from different backgrounds will be discussing everything related to the future of careers in data. Whether you're recruiting for data roles or looking to build a career in data, there’s definitely something for you. Seats are limited, and registration is free, so secure your spot today on https://events.datacamp.com/radar/

We talked about:

Daynan’s background Astronomy vs cosmology Applications of data science and machine learning in astronomy Determining signal vs noise What the data looks like in astronomy Determining the features of an object in space Ground truth for space objects Why water is an important resource in the space economy Other useful resources that can be found in asteroids Sources of asteroids The data team at an asteroid mining company Open datasets for hobbyists Mission and hardware design for asteroid mining Partnerships and hires

Links: 

LinkedIn: https://www.linkedin.com/in/daynan/ We're looking for a Sr Data Engineer: https://boards.eu.greenhouse.io/karmanplus/jobs/4027128101?gh_jid=4027128101 Minor Planet Center: https://minorplanetcenter.net/- JPL Horizons has a nice set of APIs for accessing data related to small bodies (including asteroids): https://ssd.jpl.nasa.gov/api.html ESA has NEODyS: https://newton.spacedys.com/neodys   IRSA catalog that contains image and catalog data related to the WISE/NEOWISE data (and other infrared platforms): https://irsa.ipac.caltech.edu/frontpage/ NASA also has an archive of data collected from their various missions, including a node related to small bodies: https://pds-smallbodies.astro.umd.edu/ Sub-node directly related to asteroids: https://sbn.psi.edu/pds/ Size, Mass, and Density of Asteroids (SiMDA) is a nice catalog of observed asteroid attributes (and an indication of how small our sample size is!): https://astro.kretlow.de/?SiMDA The source survey data, several are useful for asteroids: Pan-STARRS (https://outerspace.stsci.edu/display/PANSTARRS)

MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Today marks the last episode of our four-part DataFramed Careers Series on breaking into a data career. We’ve heard from Sadie St Lawrence, Nick Singh, and Khuyen Tran on best practices to adopt to help you land a data science interview. But what about the interview itself? Today’s guest, Jay Feng, joins the show to break down all the most important things you need to know about interviewing for data science roles. Jay is the co-founder of Interview Query, which helps data scientists, machine learning engineers, and other data professionals prepare for their dream jobs.

Throughout the episode, we discuss

The anatomy of data science interviews Biggest misconceptions and mistakes candidates make during interviews The importance of showcasing communication ability, business acumen, and technical intuition in the interview How to negotiate for the best salary possible

[Announcement] Join us for DataCamp Radar, our digital summit on June 23rd. During this summit, a variety of experts from different backgrounds will be discussing everything related to the future of careers in data. Whether you're recruiting for data roles or looking to build a career in data, there’s definitely something for you. Seats are limited, and registration is free, so secure your spot today on https://events.datacamp.com/radar/

Today is the third episode of this four-part DataFramed Careers series being published every day this week on building a career in data. We’ve heard from Nick Singh on the importance of portfolio projects, as well as the distinction between content-based and coding-based portfolio projects. When looking to get started with content-based projects, how do you move forward with getting yourself out there and sharing the work despite being a relative beginner in the field?Today’s guest tackles exactly this subject.

Khuyen Tran is a developer advocate at prefect and a prolific data science writer. She is the author of the book “Efficient Python Tricks and Tools for Data Scientists” and has written 100s of blog-articles and tutorials on key data science topics, amassing thousands of followers across platforms. Her writing has been key to accelerating here data career opportunities. Throughout the episode, we discuss:

How content creation accelerates the careers of aspiring practitioners The content creation process How to combat imposter syndrome What makes content useful Advice and feedback for aspiring data science writers  

Resources mentioned in the episode:

Analyze and Visualize URLs with Network Graph Show Your Work by Austin Cloud Mastery by Robert Greene Deep Questions with Cal Newport Podcast  

[Announcement] Join us for DataCamp Radar, our digital summit on June 23rd. During this summit, a variety of experts from different backgrounds will be discussing everything related to the future of careers in data. Whether you're recruiting for data roles or looking to build a career in data, there’s definitely something for you. Seats are limited, and registration is free, so secure your spot today on https://events.datacamp.com/radar/