talk-data.com talk-data.com

Topic

Data Science

machine_learning statistics analytics

1516

tagged

Activity Trend

68 peak/qtr
2020-Q1 2026-Q1

Activities

1516 activities · Newest first

Today, I chat with Manav Misra, Chief Data and Analytics Officer at Regions Bank. I begin by asking Manav what it was like to come in and implement a user-focused mentality at Regions, driven by his experience in the software industry. Manav details his approach, which included developing a new data product partner role and using effective communication to gradually gain trust and cooperation from all the players on his team. 

Manav then talks about how, over time, he solidified a formal framework for his team to be trained to use this approach and how his hiring is influenced by a product orientation. We also discuss his definition of data product at Regions, which I find to be one of the best I’ve heard to date. Today, Region Bank’s data products are delivering tens of millions of dollars in additional revenue to the bank. Given those results, I also dig into the role of design and designers to better understand who is actually doing the designing of Regions’ data products to make them so successful. Later, I ask Manav what it’s like when designers and data professionals work on the same team and how UX and data visualization design are handled at the bank. 

Towards the end, Manav shares what he has learned from his time at Regions and what he would implement in a new organization if starting over. He also expounds on the importance of empowering his team to ask customers the right questions and how a true client/stakeholder partnership has led to Manav’s most successful data products.

Highlights / Skip to:

Brief history of decision science and how it influenced the way data science and analytics work has been done (and unfortunately still is in many orgs) (1:47) Manav’s philosophy and methods for changing the data science culture at Regions Bank to being product and user-driven (5:19) Manav talks about the size of his team and the data product role within the team as well as what he had to do to convince leadership to buy in to the necessity of the data product partner role (10:54) Quantifying and measuring the value of data products at Regions and some of his results (which include tens of millions of dollars in additional revenue) (13:05) What’s a “data product” at Regions? Manav shares his definition (13:44) Who does the designing of data products at Regions? (17:00) The challenges and benefits of having a team comprised of both designers and data scientists (20:10) Lessons Manav has learned from building his team and culture at Regions (23:09) How Manav coaches his team and gives them the confidence to ask the right questions (27:17) How true partnership has led to Manav’s most successful data products (31:46)

Quotes from Today’s Episode Re: how traditional, non-product oriented enterprises do data work: “As younger people come out of data science programs…that [old] culture is changing. The folks coming into this world now are looking to make an impact and then they want to see what this can do in the real world.” — Manav 

On the role of the Data Product Partner: “We brought in people that had both business knowledge as well as the technical knowledge, so with a combination of both they could talk to the ‘Internal customers,’ of our data products, but they could also talk to the data scientists and our developers and communicate in both directions in order to form that bridge between the two.” — Manav

“There are products that are delivering tens of millions of dollars in terms of additional revenue, or stopping fraud, or any of those kinds of things that the products are designed to address, they’re delivering and over-delivering on the business cases that we created.” — Manav 

“The way we define a data product is this: an end-to-end software solution to a problem that the business has. It leverages data and advanced analytics heavily in order to deliver that solution.” — Manav 

“The deployment and operationalization is simply part of the solution. They are not something that we do after; they’re something that we design in from the start of the solution.” — Brian 

“Design is a team sport. And even if you don’t have a titled designer doing the work, if someone is going to use the solution that you made, whether it’s a dashboard, or report, or an email, or notification, or an application, or whatever, there is a design, whether you put intention behind it or not.” — Brian

“As you look at interactive components in your data product, which are, you know, allowing people to ask questions and then get answers, you really have to think through what that interaction will look like, what’s the best way for them to get to the right answers and be able to use that in their decision-making.” — Manav 

“I have really instilled in my team that tools will come and go, technologies will come and go, [and so] you’ll have to have that mindset of constantly learning new things, being able to adapt and take on new ideas and incorporate them in how we do things.” — Manav

Links Regions Bank: https://www.regions.com/ LinkedIn: https://www.linkedin.com/in/manavmisra/

Effective Data Science Infrastructure

Simplify data science infrastructure to give data scientists an efficient path from prototype to production. In Effective Data Science Infrastructure you will learn how to: Design data science infrastructure that boosts productivity Handle compute and orchestration in the cloud Deploy machine learning to production Monitor and manage performance and results Combine cloud-based tools into a cohesive data science environment Develop reproducible data science projects using Metaflow, Conda, and Docker Architect complex applications for multiple teams and large datasets Customize and grow data science infrastructure Effective Data Science Infrastructure: How to make data scientists more productive is a hands-on guide to assembling infrastructure for data science and machine learning applications. It reveals the processes used at Netflix and other data-driven companies to manage their cutting edge data infrastructure. In it, you’ll master scalable techniques for data storage, computation, experiment tracking, and orchestration that are relevant to companies of all shapes and sizes. You’ll learn how you can make data scientists more productive with your existing cloud infrastructure, a stack of open source software, and idiomatic Python. The author is donating proceeds from this book to charities that support women and underrepresented groups in data science. About the Technology Growing data science projects from prototype to production requires reliable infrastructure. Using the powerful new techniques and tooling in this book, you can stand up an infrastructure stack that will scale with any organization, from startups to the largest enterprises. About the Book Effective Data Science Infrastructure teaches you to build data pipelines and project workflows that will supercharge data scientists and their projects. Based on state-of-the-art tools and concepts that power data operations of Netflix, this book introduces a customizable cloud-based approach to model development and MLOps that you can easily adapt to your company’s specific needs. As you roll out these practical processes, your teams will produce better and faster results when applying data science and machine learning to a wide array of business problems. What's Inside Handle compute and orchestration in the cloud Combine cloud-based tools into a cohesive data science environment Develop reproducible data science projects using Metaflow, AWS, and the Python data ecosystem Architect complex applications that require large datasets and models, and a team of data scientists About the Reader For infrastructure engineers and engineering-minded data scientists who are familiar with Python. About the Author At Netflix, Ville Tuulos designed and built Metaflow, a full-stack framework for data science. Currently, he is the CEO of a startup focusing on data science infrastructure. Quotes By reading and referring to this book, I’m confident you will learn how to make your machine learning operations much more efficient and productive. - From the Foreword by Travis Oliphant, Author of NumPy, Founder of Anaconda, PyData, and NumFOCUS Effective Data Science Infrastructure is a brilliant book. It’s a must-have for every data science team. - Ninoslav Cerkez, Logit More data science. Less headaches. - Dr. Abel Alejandro Coronado Iruegas, National Institute of Statistics and Geography of Mexico Indispensable. A copy should be on every data engineer’s bookshelf. - Matthew Copple, Grand River Analytics

Many machine learning practitioners dedicate most of their attention to creating and deploying models that solve business problems. However, what happens post-deployment? And how should data teams go about monitoring models in production?

Hakim Elakhrass is the Co-Founder and CEO of NannyML, an open-source python library that allows users to estimate post-deployment model performance, detect data drift, and link data drift alerts back to model performance changes. Originally, Hakim started a machine learning consultancy with his NannyML co-founders, and the need for monitoring quickly arose, leading to the development of NannyML.

Hakim joins the show to discuss post-deployment data science, the real-world use cases for tools like NannyML, the potentially catastrophic effects of unmonitored models in production, the most important skills for modern data scientists to cultivate, and more.

We talked about:

DataTalks.Club intro Tereza’s background Working as a coach Identifying the mismatches between your needs and that of a company How to avoid misalignments Considering what’s mentioned in the job description, what isn’t, and why Diversity and culture of a company Lack of a salary in the job description Way of doing research about the company where you will potentially work How to avoid a mismatch with a company other than learning from your mistakes Before data, during data, after data (a company’s data maturity level) The company’s tech stack Finding Tereza online

Links: 

Decoding Data Science Job Descriptions (talk): https://www.youtube.com/watch?v=WAs9vSNTza8 Talk at ConnectForward: https://www.youtube.com/watch?v=WAs9vSNTza8 Slides: https://www.slideshare.net/terezaif/decoding-data-science-job-descriptions-250687704 Talk at DataLift: https://www.youtube.com/watch?v=pCtQ0szJiLA Slides: https://www.slideshare.net/terezaif/lessons-learned-from-hiring-and-retaining-data-practitioners

MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Professional sports have undergone a true data revolution over the last two decades. Today, all major sports teams, regardless of sports code, use analytics and data science to drive team performance, optimise game outcomes and scout young talent. Why has analytics become so popular in professional sports and how does it help drive a competitive edge? To answer these questions and many more relating to the sports analytics, I recently spoke to Ari Kaplan. Ari has spent more than three decades using analytics to measure and understand human ability, scout future superstars and win professional sports titles. He is known as “The Real Moneyball Guy” because of his work in baseball and his involvement in making the Hollywood classic Moneyball. Today, Ari is Global AI Evangelist at DataRobot. Listen to this episode of Leaders of Analytics to learn: How Ari became “the Real Moneyball Guy”The analytics the Chicago Cubs used to break a 108-year drought by winning the World Series in 2016The evolution of analytics and data science in sportsWhat the business world can learn from sports in terms of using analytics to gain a competitive edgeWhere sports analytics is going in the future, and much more.

Python for Data Science

Python is an ideal choice for accessing, manipulating, and gaining insights from data of all kinds. Python for Data Science introduces you to the Pythonic world of data analysis with a learn-by-doing approach rooted in practical examples and hands-on activities. Youâ??ll learn how to write Python code to obtain, transform, and analyze data, practicing state-of-the-art data processing techniques for use cases in business management, marketing, and decision support. You will discover Pythonâ??s rich set of built-in data structures for basic operations, as well as its robust ecosystem of open-source libraries for data science, including NumPy, pandas, scikit-learn, matplotlib, and more. Examples show how to load data in various formats, how to streamline, group, and aggregate data sets, and how to create charts, maps, and other visualizations. Later chapters go in-depth with demonstrations of real-world data applications, including using location data to power a taxi service, market basket analysis to identify items commonly purchased together, and machine learning to predict stock prices.

One of the biggest challenges facing the adoption of machine learning and AI in Data Science is understanding, interpreting, and explaining models and their outcomes to produce higher certainty, accountability, and fairness.

Serg Masis is a Climate & Agronomic Data Scientist at Syngenta and the author of the book, Interpretable Machine Learning with Python. For the last two decades, Serg has been at the confluence of the internet, application development, and analytics. Serg is a true polymath. Before his current role, he co-founded a search engine startup incubated by Harvard Innovation Labs, was the proud owner of a Bubble Tea shop, and more.

Throughout the episode, Serg spoke about the different challenges affecting model interpretability in machine learning, how bias can produce harmful outcomes in machine learning systems, the different types of technical and non-technical solutions to tackling bias, the future of machine learning interpretability, and much more.

We talked about:

Christine’s Background Private sector vs Public sector Public policy The challenges of being a community organizer How public policy relates to political science Programs that teach data science for public policy Data science for public policy vs regular data science The importance of ethical data science in public policy How data science in social impact project differs from other projects Other resources to learn about data science for public policy Challenges with getting data in data science for public policy The problems with accessing public datasets about recycling Christine’s potential projects after Master’s degree Gender inequality in STEM fields Corporate responsibility and why organizations need social impact data scientists What you need to start making a social impact with data science 80,000 hours Other use cases for public policy data science Coffee, Ethics & AI Finding Christine online

Links:

Explore some Data Science for Social Good projects: http://www.dssgfellowship.org/projects/ Bi-weekly Ethics in AI Coffee Chat: https://www.meetup.com/coffee-ethics-ai/ Make a Social Impact with your Job: https://tinyurl.com/80khours Course in Data Ethics: https://ethics.fast.ai/ Data Science for Social Good Berlin: https://dssg-berlin.org/ CorrelAid: https://correlaid.org/ DataKind: https://www.datakind.org/ Christine's LinkedIn: https://www.linkedin.com/in/christinecepelak/ Christine's Twitter: https://twitter.com/CLcep 

MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Katie was a founding member of Reddit's data science team and, currently, as Twitter's Data Science Manager, she leads the company's infrastructure data science and analytics organization. In this conversation with Tristan and Julia, Katie explores how, as a manager, to help data people (especially those new to the field!) do their best work. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com.  The Analytics Engineering Podcast is sponsored by dbt Labs.

Anjali Samani, Director of Data Science & Data Intelligence at Salesforce, joins the show to discuss what it takes to become a mature data organization and how to build an impactful, diverse data team. As a data leader with over 15 years of experience, Anjali is an expert at assessing and deriving maximum value out of data, implementing long-term and short-term strategies that directly enable positive business outcomes, and how you can do the same.

You will learn the hallmarks of a mature data organization, how to measure ROI on data initiatives, how Salesforce implements its data science function, and how you can utilize strong relationships to develop trust with internal stakeholders and your data team.

We talked about:

Olga’s career journey Hiring data scientists now vs 7 years ago The two qualities of an excellent data scientist What makes Alexey do this podcast How Alexey get the latest information on data science How Olga checks a candidate’s technical skills How to make an answer stand out (showing your depth of knowledge) A strong mathematical background vs a strong engineering background When Auto ML will replace the need to have data scientists Should data scientists transition into management? (the importance of communication in an organization) Switching from a data analyst role to a data scientist Attracting female talent in data science Changing a job description to find talent Long gaps in the CV Eierlegende Wollmilchsau

Links:

Olga's LinkedIn: https://www.linkedin.com/in/olgaivina/  Olga's Twitter: https://twitter.com/olgaivina

MLOps Zoomcamp: https://github.com/DataTalksClub/mlops-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Building a Lakehouse for Data Science at DoorDash

DoorDash was using a data warehouse but found that they needed more data transparency, lower costs, and the ability to handle streaming data as well as batch data. With an engineering team rooted in big data backgrounds at Uber and LinkedIn, they moved to a Lakehouse architecture intuitively, without knowing about the term. In this session, learn more about how they arrived at that architecture, the process of making the move, and the results they have seen. While addressing both data analysts and data scientists from their lakehouse, this session will focus on their machine learning operations, and how their efficiencies are enabling them to tackle more advanced use cases such as NLP and image classification.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Cloud and Data Science Modernization of Veterans Affairs Financial Service Center with Azure Databri

The Department of Veterans Affairs (VA) is home to over 420,000 employees, provides health care for 9.16 million enrollees and manages the benefits of 5.75 million recipients. The VA also hosts an array of financial management, professional, and administrative services at their Financial Service Center (FSC), located in Austin, Texas. The FSC is divided into various service groups organized around revenue centers and product lines, including the Data Analytics Service (DAS). To support the VA mission, in 2021 FSC DAS continued to press forward with their cloud modernization efforts, successfully achieving four key accomplishments:

Office of Community Care (OCC) Financial Time Series Forecast - Financial forecasting enhancements to predict claims CFO Dashboard - Productivity and capability enhancements for financial and audit analytics Datasets Migrated to the Cloud - Migration of on-prem datasets to the cloud for down-stream analytics (includes a supply chain proof-of-concept) Data Science Hackathon - A hackathon to predict bad claims codes and demonstrate DAS abilities to accelerate a ML use case using Databricks AutoML

This talk discusses FSC DAS’ cloud and data science modernization accomplishments in 2021, lessons learned, and what’s ahead.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Migrating Complex SAS Processes to Databricks - Case Study

Many federal agencies use SAS software for critical operational data processes. While SAS has historically been a leader in analytics, it has often been used by data analysts for ETL purposes as well. However, modern data science demands on ever-increasing volumes and types of data require a shift to modern, cloud architectures and data management tools and paradigms for ETL/ELT. In this presentation, we will provide a case study at Centers for Medicare and Medicaid Services (CMS) detailing the approach and results of migrating a large, complex legacy SAS process to modern, open-source/open-standard technology - Spark SQL & Databricks – to produce results ~75% faster without reliance on proprietary constructs of the SAS language, with more scalability, and in a manner that can more easily ingest old rules and better govern the inclusion of new rules and data definitions. Significant technical and business benefits derived from this modernization effort are described in this session.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Announcing General Availability of Databricks Terraform Provider

We all live in the exciting times and the hype of Distributed Data Mesh (or just mess). This talk will cover a couple architectural and organizational approaches on achieving Distributed Data Mesh, which is essentially a combination of mindset, fully automated infrastructure, continuous integration for data pipelines, dedicated team collaborative environments, and security enforcement. As a Data Leader, you’ll learn what kinds of things you’d need to pay attention to, when starting (or reviving) a modern Data Engineering and Data Science strategy and how Databricks Unity Catalog may help you automating that. As DevOps, you’ll learn about the best practices and pitfalls of Continuous Deployment on Databricks With Terraform and Continuous Integration with Databricks Repos. You’ll be excited how you can automate Data Security with Unity Catalog and Terraform. As a Data Scientist, you’ll learn how you can get relevant infrastructure into “production” relatively faster.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Beyond Daily Batch Processing: Operational Trade-Offs of Microbatch, Incremental, and Real-Time

Are you considering converting some batch daily pipelines to a realtime system? Perhaps restating multiple days of batch data is becoming unscalable for your pipelines. Maybe a short SLA is music to your stakeholders' ears. If you're flink-curious or possibly just sick of pondering your late arriving data, this discussion is for you.

On the Streaming Data Science and Engineering team at Netflix we support business-critical daily batch, hourly batch, incremental, and realtime pipelines with a rotating on-call system. In this presentation I'll discuss tradeoffs we experience between these systems with an emphasis on operational support when things go sideways. I'll also share some learnings about "goodness of fit" per processing type amongst various workloads with an eye for keeping your data timely and your colleagues sane.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Building and Scaling Machine Learning-Based Products in the World's Largest Brewery

In this session we will present how Anheuser-Busch InBev (Brazil) has been developing and growing an ML platform product to democratize and evolve AI usage within the full company. Our cutting-edge intelligence product offers a set of tools and processes to facilitate everything from exploratory data analysis to the development of state-of-the-art machine learning algorithms. We designed a simple, scalable, and performative product that involves the full data science/machine learning lifecycle, with processes abstraction, feature store, promptness to production and pipeline orchestration. Today we withstand and are always evolving a solution that is used by cross-functional teams in several countries, and helps data scientists to create their solutions in a cooperative setting and supports data engineers to monitor the model pipelines.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Building an Operational Machine Learning Organization from Zero and Leveraging ML for Crypto Securit

BlockFi is a cryptocurrency platform that allows its clients to grow wealth through various financial products including loans, trading and interest accounts. In this presentation, we will showcase our journey adopting Databricks to build an operational nerve center for analytics across the company. We will demonstrate how to build a cross-functional organization and solve key business problems to earn executive buy-in. We will showcase two of the early successes we've had using machine learning & data science to solve key business challenges in the domains of cyber security and IT Operations. In the domain of security, we will showcase how we are using Graph Analytics to analyze millions of blockchain transactions to identify dust attacks, account takeover and flag risky transactions. The operational IT use case will showcase how we are using Sarimax to forecast platform usage patterns to scale our infrastructure using hourly crypto prices, and financial indicators.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Practical Data Governance in a Large Scale Databricks Environment

Learn from two governance and data practitioners what it takes to do data governance at enterprise scale. This is critical, since the power of Data Science is the ability to tap into any type of data source and turn it into pure value. It is at odds with its key enablers of Scale and Governance and we often must tackle new ways to bring our focus back to unlocking the insights inside the data. In this session, We will share new agile practices to roll out governance policies that balance Governance and Scale. We will untap how to deliver centralized fine-grained governance for ML and data transformation workloads that actually empowers data scientists in an enterprise Databricks environment that ensures privacy and compliance across hundreds of datasets. With automation being key to scale, we will also explore how we successfully automated security and governance

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Competitive advantage hinges on predictive insights generated from AI! Build powerful data-driven

AI is central to unlocking competitive advantage. However data science teams don’t have access to a consistent level of high-quality data required to build AI & ML data applications.

Instead data scientists spend 80% of their time collecting, cleaning & preparing the data for analysis rather than building AI-data applications.

During this talk Snowplow introduces the concept of data creation. Create & deploy high-quality & predictive behavioral data in real-time to Databricks.

Learn how being equipped with AI-ready data in Databricks allows data science teams to focus on building AI data applications rather than data wrangling—dramatically accelerating the pace of data projects & improving model performance & managing data governance. - How to execute more AI & data intensive applications in production using Databricks & Snowplow - How to execute on each AI & data intensive application faster thanks to pre-validated & predictive data - How data creation can solve for data governance

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/