talk-data.com talk-data.com

Topic

Dashboard

data_visualization reporting bi

306

tagged

Activity Trend

23 peak/qtr
2020-Q1 2026-Q1

Activities

306 activities · Newest first

Pro DAX and Data Modeling in Power BI: Creating the Perfect Semantic Layer to Drive Your Dashboard Analytics

Develop powerful data models that bind data from disparate sources into a coherent whole. Then extend your data models using DAX–the query language that underpins Power BI–to create reusable measures to deliver finely-crafted custom calculations in your dashboards. This book starts off teaching you how to define and enhance the core structures of your data model to make it a true semantic layer that transforms complex data into familiar business terms. You’ll learn how to create calculated columns to solve basic analytical challenges. Then you’ll move up to mastering DAX measures to finely slice and dice your data. The book also shows how to handle temporal analysis in Power BI using a Date dimension. You will see how DAX Time Intelligence functions can simplify your analysis of data over time. Finally, the book shows how to extend DAX to filter and calculate datasets and develop DAX table functions and variables to handle complex queries. What You Will Learn Create clear and efficient data models that support in-depth analytics Define core attributes such as data types and standardized formatting consistently throughout a data model Define cross-filtering settings to enhance the data model Make use of DAX to create calculated columns and custom tables Extend your data model with custom calculations and reusable measures using DAX Perform time-based analysis using a Date dimension and Time Intelligence functions Who This Book Is For Everyone from the CEO to the Business Intelligence developer and from BI and Data architects and analysts to power users and IT managers can use this book to outshine the competition and create the data framework that they need and interactive dashboards using Power BI

How to not be a terrible writer

As data people, we don't always gravitate towards expressing our voices and opinions in words (because we...typically like the numbers to tell the stories). But just as a beautiful data dashboard can communicate meaningful insight and lead to decision making, words and stories have a way to move people in a way data sometimes can't. Join Justin Gage (Amplify Partners) as he breaks down the importance of writing in a data career, and unpacks the mechanisms needed to become a stronger technical writer and storyteller.

Check the slides here: https://docs.google.com/presentation/d/1-O3Woc3TpE8fG-HYWSAUa7sHcQp1NgrWNmE3lBIn60U/edit

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Data Storytelling with Google Looker Studio

Data Storytelling with Google Looker Studio is your definitive guide to creating compelling dashboards using Looker Studio. In this book, you'll journey through the principles of effective data visualization and learn how to harness Looker Studio to convey impactful data stories. Step by step, you'll acquire the skills to design, build, and refine dashboards using real-world data. What this Book will help me do Understand and apply data visualization principles to enhance data analysis and storytelling. Master the features and capabilities of Google Looker Studio for dashboard building. Learn to use a structured 3D approach - determine, design, and develop - for creating dashboards. Explore practical examples to apply your knowledge effectively in real projects. Gain insights into monitoring and measuring the impact of Looker Studio dashboards. Author(s) Sireesha Pulipati is an accomplished data analytics professional with extensive experience in business intelligence tools and data visualization. Leveraging her years of expertise, she has crafted this book to empower readers to effectively use Looker Studio. Sireesha's approachable teaching style and practical insights make complex concepts accessible to learners. Who is it for? This book is perfect for aspiring data analysts eager to master data visualization and dashboard design. It caters to beginners and requires no prior experience, making it a great starting point. Intermediate and seasoned professionals in analytics and business intelligence who are keen on using Looker Studio effectively will find immense value as well. If you aim to create insightful dashboards and refine your data storytelling skills, this book is for you.

Prevent dbt Changes from Breaking Looker with Spectacles

We'll explain how Spectacles makes dbt developers more confident and efficient by revealing the impact of their proposed changes on Looker—preventing dashboard outages and angry analysts.

Check the slides here: https://docs.google.com/presentation/d/1NOG61IhhN3GkkCe2osIZJ516oT4k_T7oJB3l9kklfyk/edit?usp=sharing

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Money, Python, and the Holy Grail: Designing Operational Data Models

Most analysts don’t become analysts to build dashboards. We don’t become analysts to do data pulls, or clean up messy data, or put together pitch decks. We become analysts to do impactful, strategic analysis. This is our calling; it’s the most valuable work that we do; and it’s why we put up with the rest of our job—for that afternoon with nothing but a big question, a clear calendar, and a trajectory-changing aha moment buried somewhere in our well-prepped datasets.

But the rapid rise of analytics engineering should make us question all of this. Is strategic analysis actually the holy grail of analytics? Is it the most valuable thing we could do? Is it even what we want to do?

In chasing this ambition, Benn Stancil (Mode) thinks we’ve lost sight of something even more important—and potentially, more interesting: Designing operational models. These frameworks, which are a natural extension of the semantic models built by analytics engineers, are often more valuable than any dashboard, any dataset, or any deep dive analysis.

In his talk, Benn will share what these models are, why they’re valuable, and why, in our eternal quest to both quantify our value and to find work we love, they could prove to be our holy grail we’ve always been looking for.

Check the slides here: https://docs.google.com/presentation/d/1lOH6Sb8DQnnlmZkYOlqqHgQeXKkUEQCm_LOxsjBRJlM/edit?usp=sharing

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

The Book of Dash

A swift and practical introduction to building interactive data visualization apps in Python, known as dashboards. Youâ??ve seen dashboards before; think election result visualizations you can update in real time, or population maps you can filter by demographic. With the Python Dash library youâ??ll create analytic dashboards that present data in effective, usable, elegant ways in just a few lines of code. The book is fast-paced and caters to those entirely new to dashboards. It will talk you through the necessary software, then get straight into building the dashboards themselves. Youâ??ll learn the basic format of a Dash app by building a twitter analysis dashboard that maps the number of likes certain accounts gained over time. Youâ??ll build up skills through three more sophisticated projects. The first is a global analysis app that compares country data in three areas: the percentage of a population using the internet, percentage of parliament seats held by women, and CO2 emissions. Youâ??ll then build an investment portfolio dashboard, and an app that allows you to visualize and explore machine learning algorithms. In this book you will: â?¢Create and run your first Dash apps â?¢Use the pandas library to manipulate and analyze social media data â?¢Use Git to download and build on existing apps written by the pros â?¢Visualize machine learning models in your apps â?¢Create and manipulate statistical and scientific charts and maps using Plotly Dash combines several technologies to get you building dashboards quickly and efficiently. This book will do the same.

Summary Agile methodologies have been adopted by a majority of teams for building software applications. Applying those same practices to data can prove challenging due to the number of systems that need to be included to implement a complete feature. In this episode Shane Gibson shares practical advice and insights from his years of experience as a consultant and engineer working in data about how to adopt agile principles in your data work so that you can move faster and provide more value to the business, while building systems that are maintainable and adaptable.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan today to learn more about how Atlan’s active metadata platform is helping pioneering data teams like Postman, Plaid, WeWork & Unilever achieve extraordinary things with metadata and escape the chaos. Prefect is the modern Dataflow Automation platform for the modern data stack, empowering data practitioners to build, run and monitor robust pipelines at scale. Guided by the principle that the orchestrator shouldn’t get in your way, Prefect is the only tool of its kind to offer the flexibility to write code as workflows. Prefect specializes in glueing together the disparate pieces of a pipeline, and integrating with modern distributed compute libraries to bring power where you need it, when you need it. Trusted by thousands of organizations and supported by over 20,000 community members, Prefect powers over 100MM business critical tasks a month. For more information on Prefect, visit dataengineeringpodcast.com/prefect. Data engineers don’t enjoy writing, maintaining, and modifying ETL pipelines all day, every day. Especially once they realize 90% of all major data sources like Google Analytics, Salesforce, Adwords, Facebook, Spreadsheets, etc., are already available as plug-and-play connectors with reliable, intuitive SaaS solutions. Hevo Data is a highly reliable and intuitive data pipeline platform used by data engineers from 40+ countries to set up and run low-latency ELT pipelines with zero maintenance. Boasting more than 150 out-of-the-box connectors that can be set up in minutes, Hevo also allows you to monitor and control your pipelines. You get: real-time data flow visibility, fail-safe mechanisms, and alerts if anything breaks; preload transformations and auto-schema mapping precisely control how data lands in your destination; models and workflows to transform data for analytics; and reverse-ETL capability to move the transformed data back to your business software to inspire timely action. All of this, plus its transparent pricing and 24*7 live support, makes it consistently voted by users as the Leader in the Data Pipeline category on review platforms like G2. Go to dataengineeringpodcast.com/hevodata and sign up for a free 14-day trial that also comes with 24×7 support. Your host is Tobias Macey and today I’m interviewing Shane Gibson about how to bring Agile practices to your data management workflows

Interview

Introduction How did you get involved in the area of data management? Can you describe what AgileData is and the story behind it? What are the main industries and/or use cases that you are focused on supporting? The data ecosystem has been trying on different paradigms from software development for some time now (e.g. DataOps, version control, etc.). What are the aspects of Agile that do and don’t map well to data engineering/analysis? One of the perennial challenges of data analysis is how to approach data modeling. How do you balance the need to provide value with the long-term impacts of incomplete or underinformed modeling decisions made in haste at the beginning of a project?

How do you design in affordances for refactoring of the data models without breaking downstream assets?

Another aspect of implementing data products/platforms is how to manage permissions and governance. What are the incremental ways that those principles can be incorporated early and evolved along with the overall analytical products? What are some of the organizational design strategies that you find most helpful when establishing or training a team who is working on data products? In order to have a useful target to work toward it’s necessary to understand what the data consumers are hoping to achieve. What are some of the challenges of doing requirements gathering for data products? (e.g. not knowing what information is available, consumers not understanding what’s hard vs. easy, etc.)

How do you work with the "customers" to help them understand what a reasonable scope is and translate that to the actual project stages for the engineers?

What are some of the perennial questions or points of confusion that you have had to address with your clients on how to design and implement analytical assets? What are the most interesting, innovative, or unexpected ways that you have seen agile principles used for data? What are the most interesting, unexpected, or challenging lessons that you have learned while working on AgileData? When is agile the wrong choice for a data project? What do you have planned for the future of AgileData?

Contact Info

LinkedIn @shagility on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don’t forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

AgileData OptimalBI How To Make Toast Data Mesh Information Product Canvas DataKitchen

Podcast Episode

Great Expectations

Podcast Episode

Soda Data

Podcast Episode

Google DataStore Unfix.work Activity Schema

Podcast Episode

Data Vault

Podcast Episode

Star Schema Lean Methodology Scrum Kanban

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Sponsored By: Atlan: Atlan

Have you ever woken up to a crisis because a number on a dashboard is broken and no one knows why? Or sent out frustrating slack messages trying to find the right data set? Or tried to understand what a column name means?

Our friends at Atlan started out as a data team themselves and faced all this collaboration chaos themselves, and started building Atlan as an internal tool for themselves. Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets & code, Atlan enables teams to create a single source of truth for all their data assets, and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker and more.

Go to dataengineeringpodcast.com/atlan and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $3000 on an annual subscription.Prefect: Prefect

Prefect is the modern Dataflow Automation platform for the modern data stack, empowering data practitioners to build, run and monitor robust pipelines at scale. Guided by the principle that the orchestrator shouldn’t get in your way, Prefect is the only tool of its kind to offer the flexibility to write code as workflows. Prefect specializes in glueing together the disparate pieces of a pipeline, and integrating with modern distributed compute libraries to bring power where you need it, when you need it. Trusted by thousands of organizations and supported by over 20,000 community members, Prefect powers over 100MM business critical tasks a month. For more information on Prefect, visit…

Summary The database market has seen unprecedented activity in recent years, with new options addressing a variety of needs being introduced on a nearly constant basis. Despite that, there are a handful of databases that continue to be adopted due to their proven reliability and robust features. MariaDB is one of those default options that has continued to grow and innovate while offering a familiar and stable experience. In this episode field CTO Manjot Singh shares his experiences as an early user of MySQL and MariaDB and explains how the suite of products being built on top of the open source foundation address the growing needs for advanced storage and analytical capabilities.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! You wake up to a Slack message from your CEO, who’s upset because the company’s revenue dashboard is broken. You’re told to fix it before this morning’s board meeting, which is just minutes away. Enter Metaplane, the industry’s only self-serve data observability tool. In just a few clicks, you identify the issue’s root cause, conduct an impact analysis⁠—and save the day. Data leaders at Imperfect Foods, Drift, and Vendr love Metaplane because it helps them catch, investigate, and fix data quality issues before their stakeholders ever notice they exist. Setup takes 30 minutes. You can literally get up and running with Metaplane by the end of this podcast. Sign up for a free-forever plan at dataengineeringpodcast.com/metaplane, or try out their most advanced features with a 14-day free trial. Mention the podcast to get a free "In Data We Trust World Tour" t-shirt. RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their state-of-the-art reverse ETL pipelines enable you to send enriched data to any cloud tool. Sign up free… or just get the free t-shirt for being a listener of the Data Engineering Podcast at dataengineeringpodcast.com/rudder. Data teams are increasingly under pressure to deliver. According to a recent survey by Ascend.io, 95% in fact reported being at or over capacity. With 72% of data experts reporting demands on their team going up faster than they can hire, it’s no surprise they are increasingly turning to automation. In fact, while only 3.5% report having current investments in automation, 85% of data teams plan on investing in automation in the next 12 months. 85%!!! That’s where our friends at Ascend.io come in. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation, orchestration, and observability. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend automates workloads on Snowflake, Databricks, BigQuery, and open source Spark, and can be deployed in AWS, Azure, or GCP. Go to dataengineeringpodcast.com/ascend and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $5,000 when

podcast_episode
by Val Kroll , Julie Hoyer , Tim Wilson (Analytics Power Hour - Columbus (OH) , Eric Weber (From Data to Product) , Moe Kiss (Canva) , Michael Helbling (Search Discovery)

Have you ever built a data-related "thing" — a dashboard, a data catalog, an experimentation platform, even — only to find that, rather than having the masses race to adopt it and use it on a daily basis, it gets an initial surge in usage… and then quietly dies? That's sorta' the topic of this episode. Except that's a pretty clunky and overly narrow summary. Partly, because it's a hard topic to summarize. But, data as a product and data products are the topic, and Eric Weber, the data scientist behind the From Data to Product newsletter, joined us for a discussion that we've been trying to make happen for months. It was worth the wait! For complete show notes, including links to items mentioned in this episode and a transcript of the show, visit the show page.

Summary The "data lakehouse" architecture balances the scalability and flexibility of data lakes with the ease of use and transaction support of data warehouses. Dremio is one of the companies leading the development of products and services that support the open lakehouse. In this episode Jason Hughes explains what it means for a lakehouse to be "open" and describes the different components that the Dremio team build and contribute to.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! You wake up to a Slack message from your CEO, who’s upset because the company’s revenue dashboard is broken. You’re told to fix it before this morning’s board meeting, which is just minutes away. Enter Metaplane, the industry’s only self-serve data observability tool. In just a few clicks, you identify the issue’s root cause, conduct an impact analysis⁠—and save the day. Data leaders at Imperfect Foods, Drift, and Vendr love Metaplane because it helps them catch, investigate, and fix data quality issues before their stakeholders ever notice they exist. Setup takes 30 minutes. You can literally get up and running with Metaplane by the end of this podcast. Sign up for a free-forever plan at dataengineeringpodcast.com/metaplane, or try out their most advanced features with a 14-day free trial. Mention the podcast to get a free "In Data We Trust World Tour" t-shirt. RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their state-of-the-art reverse ETL pipelines enable you to send enriched data to any cloud tool. Sign up free… or just get the free t-shirt for being a listener of the Data Engineering Podcast at dataengineeringpodcast.com/rudder. Data teams are increasingly under pressure to deliver. According to a recent survey by Ascend.io, 95% in fact reported being at or over capacity. With 72% of data experts reporting demands on their team going up faster than they can hire, it’s no surprise they are increasingly turning to automation. In fact, while only 3.5% report having current investments in automation, 85% of data teams plan on investing in automation in the next 12 months. 85%!!! That’s where our friends at Ascend.io come in. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation, orchestration, and observability. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend automates workloads on Snowflake, Databricks, BigQuery, and open source Spark, and can be deployed in AWS, Azure, or GCP. Go to dataengineeringpodcast.com/ascend and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $5,000 when you become a customer. Your host is Tobias Macey and today I’m interviewing Jason Hughes about the work that Dremio is doing to support the open lakehouse

Interview

Introduction How did you get involved in the area of data management? Can you d

Summary For any business that wants to stay in operation, the most important thing they can do is understand their customers. American Express has invested substantial time and effort in their Customer 360 product to achieve that understanding. In this episode Purvi Shah, the VP of Enterprise Big Data Platforms at American Express, explains how they have invested in the cloud to power this visibility and the complex suite of integrations they have built and maintained across legacy and modern systems to make it possible.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! You wake up to a Slack message from your CEO, who’s upset because the company’s revenue dashboard is broken. You’re told to fix it before this morning’s board meeting, which is just minutes away. Enter Metaplane, the industry’s only self-serve data observability tool. In just a few clicks, you identify the issue’s root cause, conduct an impact analysis⁠—and save the day. Data leaders at Imperfect Foods, Drift, and Vendr love Metaplane because it helps them catch, investigate, and fix data quality issues before their stakeholders ever notice they exist. Setup takes 30 minutes. You can literally get up and running with Metaplane by the end of this podcast. Sign up for a free-forever plan at dataengineeringpodcast.com/metaplane, or try out their most advanced features with a 14-day free trial. Mention the podcast to get a free "In Data We Trust World Tour" t-shirt. RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their state-of-the-art reverse ETL pipelines enable you to send enriched data to any cloud tool. Sign up free… or just get the free t-shirt for being a listener of the Data Engineering Podcast at dataengineeringpodcast.com/rudder. Data teams are increasingly under pressure to deliver. According to a recent survey by Ascend.io, 95% in fact reported being at or over capacity. With 72% of data experts reporting demands on their team going up faster than they can hire, it’s no surprise they are increasingly turning to automation. In fact, while only 3.5% report having current investments in automation, 85% of data teams plan on investing in automation in the next 12 months. 85%!!! That’s where our friends at Ascend.io come in. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation, orchestration, and observability. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend automates workloads on Snowflake, Databricks, BigQuery, and open source Spark, and can be deployed in AWS, Azure, or GCP. Go to dataengineeringpodcast.com/ascend and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $5,000 when you become a customer. Your host is Tobias Macey and today I’m interviewing Purvi Shah about building the Customer 360 data product for American Express and mig

Summary Data lineage is something that has grown from a convenient feature to a critical need as data systems have grown in scale, complexity, and centrality to business. Alvin is a platform that aims to provide a low effort solution for data lineage capabilities focused on simplifying the work of data engineers. In this episode co-founder Martin Sahlen explains the impact that easy access to lineage information can have on the work of data engineers and analysts, and how he and his team have designed their platform to offer that information to engineers and stakeholders in the places that they interact with data.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! You wake up to a Slack message from your CEO, who’s upset because the company’s revenue dashboard is broken. You’re told to fix it before this morning’s board meeting, which is just minutes away. Enter Metaplane, the industry’s only self-serve data observability tool. In just a few clicks, you identify the issue’s root cause, conduct an impact analysis⁠—and save the day. Data leaders at Imperfect Foods, Drift, and Vendr love Metaplane because it helps them catch, investigate, and fix data quality issues before their stakeholders ever notice they exist. Setup takes 30 minutes. You can literally get up and running with Metaplane by the end of this podcast. Sign up for a free-forever plan at dataengineeringpodcast.com/metaplane, or try out their most advanced features with a 14-day free trial. Mention the podcast to get a free "In Data We Trust World Tour" t-shirt. RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their state-of-the-art reverse ETL pipelines enable you to send enriched data to any cloud tool. Sign up free… or just get the free t-shirt for being a listener of the Data Engineering Podcast at dataengineeringpodcast.com/rudder. Data teams are increasingly under pressure to deliver. According to a recent survey by Ascend.io, 95% in fact reported being at or over capacity. With 72% of data experts reporting demands on their team going up faster than they can hire, it’s no surprise they are increasingly turning to automation. In fact, while only 3.5% report having current investments in automation, 85% of data teams plan on investing in automation in the next 12 months. 85%!!! That’s where our friends at Ascend.io come in. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation, orchestration, and observability. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend automates workloads on Snowflake, Databricks, BigQuery, and open source Spark, and can be deployed in AWS, Azure, or GCP. Go to dataengineeringpodcast.com/ascend and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $5,000 when you become a customer. Your host is Tobias Macey and today I’m interviewing Martin Sahlen about his work on data lineage at Alvin and how it factors into the day-to-day work of data engineers

Interview

Introduction How did you get involved in the area of data management? Can you describe what Alvin is and the story behind it? What is the core problem that you are trying to solve at Alvin? Data lineage has quickly become an overloaded term. What are the elements of lineage that you are focused on addressing?

What are some of the other sources/pieces of information that you integrate into the lineage graph?

How does data lineage show up in the work of data engineers?

In what ways does your focus on data engineers inform the way that you model the lineage information?

As with every data asset/product, the lineage graph is only as useful as the data that it stores. What are some of the ways that you focus on establishing and ensuring a complete view of lineage?

How do you account for assets (e.g. tables, dashboards, exports, etc.) that are created outside of the "officially supported" methods? (e.g. someone manually runs a SQL create statement, etc.)

Can you describe how you have implemented the Alvin platform?

How have the design and goals shifted from when you first started exploring the problem?

What are the types of data systems/assets that you are focused on supporting? (e.g. data warehouses vs. lakes, structured vs. unstructured, which BI tools, etc.) How does Alvin fit into the workflow of data engineers and their downstream customers/collaborators?

What are some of the design choices (both visual and functional) that you focused on to avoid friction in the data engineer’s workflow?

What are some of the open questions/areas for investigation/improvement in the space of data lineage?

What are the factors that contribute to the difficulty of a truly holistic and complete view of lineage across an organization?

What are the most interesting, innovative, or unexpected ways that you have seen Alvin used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Alvin? When is Alvin the wrong choice? What do you have planned for the future of Alvin?

Contact Info

LinkedIn @martinsahlen on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don’t forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

Alvin Unacast sqlparse Python library Cython

Podcast.init Episode

Antlr Kotlin programming language PostgreSQL

Podcast Episode

OpenSearch ElasticSearch Redis Kubernetes Airflow BigQuery Spark Looker Mode

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast

Summary Data integration from source systems to their downstream destinations is the foundational step for any data product. With the increasing expecation for information to be instantly accessible, it drives the need for reliable change data capture. The team at Fivetran have recently introduced that functionality to power real-time data products. In this episode Mark Van de Wiel explains how they integrated CDC functionality into their existing product, discusses the nuances of different approaches to change data capture from various sources.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! You wake up to a Slack message from your CEO, who’s upset because the company’s revenue dashboard is broken. You’re told to fix it before this morning’s board meeting, which is just minutes away. Enter Metaplane, the industry’s only self-serve data observability tool. In just a few clicks, you identify the issue’s root cause, conduct an impact analysis⁠—and save the day. Data leaders at Imperfect Foods, Drift, and Vendr love Metaplane because it helps them catch, investigate, and fix data quality issues before their stakeholders ever notice they exist. Setup takes 30 minutes. You can literally get up and running with Metaplane by the end of this podcast. Sign up for a free-forever plan at dataengineeringpodcast.com/metaplane, or try out their most advanced features with a 14-day free trial. Mention the podcast to get a free "In Data We Trust World Tour" t-shirt. RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their state-of-the-art reverse ETL pipelines enable you to send enriched data to any cloud tool. Sign up free… or just get the free t-shirt for being a listener of the Data Engineering Podcast at dataengineeringpodcast.com/rudder. Data teams are increasingly under pressure to deliver. According to a recent survey by Ascend.io, 95% in fact reported being at or over capacity. With 72% of data experts reporting demands on their team going up faster than they can hire, it’s no surprise they are increasingly turning to automation. In fact, while only 3.5% report having current investments in automation, 85% of data teams plan on investing in automation in the next 12 months. 85%!!! That’s where our friends at Ascend.io come in. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation, orchestration, and observability. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend automates workloads on Snowflake, Databricks, BigQuery, and open source Spark, and can be deployed in AWS, Azure, or GCP. Go to dataengineeringpodcast.com/ascend and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $5,000 when you become a customer. Your host is Tobias Macey and today I’m interviewing Mark Van de Wiel about Fivetran’s implementation of chang

Summary In order to improve efficiency in any business you must first know what is contributing to wasted effort or missed opportunities. When your business operates across multiple locations it becomes even more challenging and important to gain insights into how work is being done. In this episode Tommy Yionoulis shares his experiences working in the service and hospitality industries and how that led him to found OpsAnalitica, a platform for collecting and analyzing metrics on multi location businesses and their operational practices. He discusses the challenges of making data collection purposeful and efficient without distracting employees from their primary duties and how business owners can use the provided analytics to support their staff in their duties.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their state-of-the-art reverse ETL pipelines enable you to send enriched data to any cloud tool. Sign up free… or just get the free t-shirt for being a listener of the Data Engineering Podcast at dataengineeringpodcast.com/rudder. Data teams are increasingly under pressure to deliver. According to a recent survey by Ascend.io, 95% in fact reported being at or over capacity. With 72% of data experts reporting demands on their team going up faster than they can hire, it’s no surprise they are increasingly turning to automation. In fact, while only 3.5% report having current investments in automation, 85% of data teams plan on investing in automation in the next 12 months. 85%!!! That’s where our friends at Ascend.io come in. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation, orchestration, and observability. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend automates workloads on Snowflake, Databricks, BigQuery, and open source Spark, and can be deployed in AWS, Azure, or GCP. Go to dataengineeringpodcast.com/ascend and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $5,000 when you become a customer. You wake up to a Slack message from your CEO, who’s upset because the company’s revenue dashboard is broken. You’re told to fix it before this morning’s board meeting, which is just minutes away. Enter Metaplane, the industry’s only self-serve data observability tool. In just a few clicks, you identify the issue’s root cause, conduct an impact analysis⁠—and save the day. Data leaders at Imperfect Foods, Drift, and Vendr love Metaplane because it helps them catch, investigate, and fix data quality issues before their stakeholders ever notice they exist. Setup takes 30 minutes. You can literally get up and running with Metaplane by the end of this podcast. Sign up for a free-forever plan at dataengineeringpodcast.com/metaplane, or try out their most advanced

Data Literacy may be an important skill for everyone to have, but the level of need is always unique to each individual. Some may need advanced technical skills in machine learning algorithms, while others may just need to be able to understand the basics. Regardless of where anyone sits on the skills spectrum, the data community can help accelerate their careers.

There’s no one who knows that better than Kate Strachnyi. Kate is the Founder and Community Manager at DATAcated, a company that is focused on bringing data professionals together and helping data companies reach their target audience through effective content strategies.

Kate has created courses on data storytelling, dashboard and visualization best practices, and she is also the author of several books on data science, including a children’s book about data literacy. Through her professional accomplishments and her content efforts online, Kate has not only built a massive online following, she has also established herself as a leader in the data space.

In this episode, we talk about best practices in data visualization, the importance of technical skills and soft skills for data professionals, how to build a personal brand and overcome Imposter Syndrome, how data literacy can make or break organizations, and much more.

This episode of DataFramed is a part of DataCamp’s Data Literacy Month, where we raise awareness for Data Literacy throughout the month of September through webinars, workshops, and resources featuring thought leaders and subject matter experts that can help you build your data literacy, as well as your organization’s. For more information, visit: https://www.datacamp.com/data-literacy-month/for-teams

Today I’m sitting down with Jon Cooke, founder and CTO of Dataception, to learn his definition of a data product and his views on generating business value with your data products. In our conversation, Jon explains his philosophy on data products and where design and UX fit in. We also review his conceptual model for data products (which he calls the data product pyramid), and discuss how together, these concepts allow teams to ship working solutions faster that actually produce value. 

Highlights/ Skip to:

Jon’s definition of a data product (1:19)  Brian explains how UX research and design planning can and should influence data architecture —so that last mile solutions are useful and usable (9:47) The four characteristics of a data product in Jon’s model (16:16) The idea of products having a lifecycle with direct business/customer interaction/feedback (17:15) Understanding Jon’s data product pyramid (19:30) The challenges when customers/users don’t know what they want from data product teams - and who should be doing the work to surface requirements (24:44) Mitigating risk and the importance of having management buy-in when adopting a product-driven approach (33:23) Does the data product pyramid account for UX? (35:02) What needs to change in an org model that produces data products that aren’t delivering good last mile UXs (39:20)

Quotes from Today’s Episode “A data product is something that specifically solves a business problem, a piece of analytics, data use case, a pipeline, datasets, dashboard, that type that solves a business use case, and has a customer, and as a product lifecycle to it.” - Jon (2:15)

“I’m a fan of any definition that includes some type of deployment and use by some human being. That’s the end of the cycle, because the idea of a product is a good that has been made, theoretically, for sale.” - Brian (5:50)

“We don’t build a lot of stuff around cloud anymore. We just don’t build it from scratch. It’s like, you know, we don’t generate our own electricity, we don’t mill our own flour. You know, the cloud—there’s a bunch of composable services, which I basically pull together to build my application, whatever it is. We need to apply that thinking all the way through the stack, fundamentally.” - Jon (13:06)

“It’s not a data science problem, it’s not a business problem, it’s not a technology problem, it’s not a data engineering problem, it’s an everyone problem. And I advocate small, multidisciplinary teams, which have a business value person in it, have an SME, have a data scientist, have a data architect, have a data engineer, as a small pod that goes in and answer those questions.” - Jon (26:28)

“The idea is that you’re actually building the data products, which are the back-end, but you’re actually then also doing UX alongside that, you know? You’re doing it in tandem.” - Jon (37:36)

“Feasibility is one of the legs of the stools. There has to be market need, and your market just may be the sales team, but there needs to be some promise of value there that this person is really responsible for at the end of the day, is this data product going to create value or not?” - Brian (42:35)

“The thing about data products is sometimes you don’t know how feasible it is until you actually look at the data…You’ve got to do what we call data archaeology. You got to go and find the data, you got to brush it off, and you’re looking at and go, ‘Is it complete?’” - Jon (44:02)

Links Referenced: Dataception Data Product Pyramid Email: [email protected] LinkedIn: https://www.linkedin.com/in/jon-cooke-096bb0/

Summary The dream of every engineer is to automate all of their tasks. For data engineers, this is a monumental undertaking. Orchestration engines are one step in that direction, but they are not a complete solution. In this episode Sean Knapp shares his views on what constitutes proper automation and the work that he and his team at Ascend are doing to help make it a reality.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan today to learn more about how Atlan’s active metadata platform is helping pioneering data teams like Postman, Plaid, WeWork & Unilever achieve extraordinary things with metadata and escape the chaos. RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their state-of-the-art reverse ETL pipelines enable you to send enriched data to any cloud tool. Sign up free… or just get the free t-shirt for being a listener of the Data Engineering Podcast at dataengineeringpodcast.com/rudder. The only thing worse than having bad data is not knowing that you have it. With Bigeye’s data observability platform, if there is an issue with your data or data pipelines you’ll know right away and can get it fixed before the business is impacted. Bigeye let’s data teams measure, improve, and communicate the quality of your data to company stakeholders. With complete API access, a user-friendly interface, and automated yet flexible alerting, you’ve got everything you need to establish and maintain trust in your data. Go to dataengineeringpodcast.com/bigeye today to sign up and start trusting your analyses. Your host is Tobias Macey and today I’m interviewing Sean Knapp about the role of data automation in building maintainable systems

Interview

Introduction How did you get involved in the area of data management? Can you describe what you mean by the term "data automation" and the assumptions that it includes? One of the perennial challenges of automation is that there are always steps that are resistant to being performed without human involvement. What are some of the tasks that you have found to be common problems in that sense? What are the different concerns that need to be included in a stack that supports fully automated data workflows? There was recently an interesting article suggesting that the "left-to-right" approach to data workflows is backwards. In your experience, what would be required to allow for triggering data processes based on the needs of the data consumers? (e.g. "make sure that this BI dashboard is up to date every 6 hours") What are the

Today I’m chatting with Emilie Shario, a Data Strategist in Residence at Amplify Partners. Emilie thinks data teams should operate like product teams. But what led her to that conclusion, and how has she put the idea into practice? Emilie answers those questions and more, delving into what kind of pushback and hiccups someone can expect when switching from being data-driven to product-driven and sharing advice for data scientists and analytics leaders.

Highlights / Skip to:

Answering the question “whose job is it” (5:18) Understanding and solving problems instead of just building features people ask for (9:05) Emilie explains what Amplify Partners is and talks about her work experience and how it fuels her perspectives on data teams (11:04) Emilie and I talk about the definition of data product (13:00) Emilie talks about her approach to building and training a data team (14:40) We talk about UX designers and how they fit into Emilie’s data teams (18:40) Emilie talks about the book and blog “Storytelling with Data” (21:00) We discuss the push back you can expect when trying to switch a team from being data driven to being product driven (23:18) What hiccups can people expect when switching to a product driven model (30:36) Emilie’s advice for data scientists and and analyst leaders (35:50) Emilie explains what Locally Optimistic is (37:34)

Quotes from Today’s Episode “Our thesis is…we need to understand the problems we’re solving before we start building solutions, instead of just building the things people are asking for.” — Emilie (2:23)

“I’ve seen this approach of flipping the ask on its head—understanding the problem you’re trying to solve—work and be more successful at helping drive impact instead of just letting your data team fall into this widget builder service trap.” — Emilie (4:43)

“If your answer to any problem to me is, ‘That’s not my job,’ then I don’t want you working for me because that’s not what we’re here for. Your job is whatever the problem in front of you that needs to be solved.” — Emilie (7:14)

“I don’t care if you have all of the data in the world and the most talented machine learning engineers and you’ve got the ability to do the coolest new algorithm fancy thing. If it doesn’t drive business impact, it doesn’t matter.” — Emilie (7:52)

“Data is not just a thing that anyone can do. It’s not just about throwing numbers in a spreadsheet anymore. It’s about driving business impact. But part of how we drive business impact with data is making it accessible. And accessible isn’t just giving people the numbers, it’s also communicating with it effectively, and UX is a huge piece of how we do that.” — Emilie (19:57)

“There are no null choices in design. Someone is deciding what some other human—a customer, a client, an internal stakeholder—is going to use, whether it’s a React app, or a Power BI dashboard, or a spreadsheet dump, or whatever it is, right? There will be an experience that is created, whether it is intentionally created or not.” — Brian (20:28)

“People will think design is just putting in colors that match together, like, or spinning the color wheel and seeing what lands. You know, there’s so much more to it. And it is an expertise; it is a domain that you have to develop.” — Emilie (34:58)

Links Referenced: Blog post by Rifat Majumder storytellingwithdata.com Experiencing Data Episode 28 with Cole Nussbaumer Knaflic locallyoptimistic.com Twitter: @emilieschario

Today, I chat with Manav Misra, Chief Data and Analytics Officer at Regions Bank. I begin by asking Manav what it was like to come in and implement a user-focused mentality at Regions, driven by his experience in the software industry. Manav details his approach, which included developing a new data product partner role and using effective communication to gradually gain trust and cooperation from all the players on his team. 

Manav then talks about how, over time, he solidified a formal framework for his team to be trained to use this approach and how his hiring is influenced by a product orientation. We also discuss his definition of data product at Regions, which I find to be one of the best I’ve heard to date. Today, Region Bank’s data products are delivering tens of millions of dollars in additional revenue to the bank. Given those results, I also dig into the role of design and designers to better understand who is actually doing the designing of Regions’ data products to make them so successful. Later, I ask Manav what it’s like when designers and data professionals work on the same team and how UX and data visualization design are handled at the bank. 

Towards the end, Manav shares what he has learned from his time at Regions and what he would implement in a new organization if starting over. He also expounds on the importance of empowering his team to ask customers the right questions and how a true client/stakeholder partnership has led to Manav’s most successful data products.

Highlights / Skip to:

Brief history of decision science and how it influenced the way data science and analytics work has been done (and unfortunately still is in many orgs) (1:47) Manav’s philosophy and methods for changing the data science culture at Regions Bank to being product and user-driven (5:19) Manav talks about the size of his team and the data product role within the team as well as what he had to do to convince leadership to buy in to the necessity of the data product partner role (10:54) Quantifying and measuring the value of data products at Regions and some of his results (which include tens of millions of dollars in additional revenue) (13:05) What’s a “data product” at Regions? Manav shares his definition (13:44) Who does the designing of data products at Regions? (17:00) The challenges and benefits of having a team comprised of both designers and data scientists (20:10) Lessons Manav has learned from building his team and culture at Regions (23:09) How Manav coaches his team and gives them the confidence to ask the right questions (27:17) How true partnership has led to Manav’s most successful data products (31:46)

Quotes from Today’s Episode Re: how traditional, non-product oriented enterprises do data work: “As younger people come out of data science programs…that [old] culture is changing. The folks coming into this world now are looking to make an impact and then they want to see what this can do in the real world.” — Manav 

On the role of the Data Product Partner: “We brought in people that had both business knowledge as well as the technical knowledge, so with a combination of both they could talk to the ‘Internal customers,’ of our data products, but they could also talk to the data scientists and our developers and communicate in both directions in order to form that bridge between the two.” — Manav

“There are products that are delivering tens of millions of dollars in terms of additional revenue, or stopping fraud, or any of those kinds of things that the products are designed to address, they’re delivering and over-delivering on the business cases that we created.” — Manav 

“The way we define a data product is this: an end-to-end software solution to a problem that the business has. It leverages data and advanced analytics heavily in order to deliver that solution.” — Manav 

“The deployment and operationalization is simply part of the solution. They are not something that we do after; they’re something that we design in from the start of the solution.” — Brian 

“Design is a team sport. And even if you don’t have a titled designer doing the work, if someone is going to use the solution that you made, whether it’s a dashboard, or report, or an email, or notification, or an application, or whatever, there is a design, whether you put intention behind it or not.” — Brian

“As you look at interactive components in your data product, which are, you know, allowing people to ask questions and then get answers, you really have to think through what that interaction will look like, what’s the best way for them to get to the right answers and be able to use that in their decision-making.” — Manav 

“I have really instilled in my team that tools will come and go, technologies will come and go, [and so] you’ll have to have that mindset of constantly learning new things, being able to adapt and take on new ideas and incorporate them in how we do things.” — Manav

Links Regions Bank: https://www.regions.com/ LinkedIn: https://www.linkedin.com/in/manavmisra/

Cloud and Data Science Modernization of Veterans Affairs Financial Service Center with Azure Databri

The Department of Veterans Affairs (VA) is home to over 420,000 employees, provides health care for 9.16 million enrollees and manages the benefits of 5.75 million recipients. The VA also hosts an array of financial management, professional, and administrative services at their Financial Service Center (FSC), located in Austin, Texas. The FSC is divided into various service groups organized around revenue centers and product lines, including the Data Analytics Service (DAS). To support the VA mission, in 2021 FSC DAS continued to press forward with their cloud modernization efforts, successfully achieving four key accomplishments:

Office of Community Care (OCC) Financial Time Series Forecast - Financial forecasting enhancements to predict claims CFO Dashboard - Productivity and capability enhancements for financial and audit analytics Datasets Migrated to the Cloud - Migration of on-prem datasets to the cloud for down-stream analytics (includes a supply chain proof-of-concept) Data Science Hackathon - A hackathon to predict bad claims codes and demonstrate DAS abilities to accelerate a ML use case using Databricks AutoML

This talk discusses FSC DAS’ cloud and data science modernization accomplishments in 2021, lessons learned, and what’s ahead.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/