talk-data.com talk-data.com

Topic

Data Governance

data_management compliance data_quality

417

tagged

Activity Trend

90 peak/qtr
2020-Q1 2026-Q1

Activities

417 activities · Newest first

Summary

Working with financial data requires a high degree of rigor due to the numerous regulations and the risks involved in security breaches. In this episode Andrey Korchack, CTO of fintech startup Monite, discusses the complexities of designing and implementing a data platform in that sector.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free! Your host is Tobias Macey and today I'm interviewing Andrey Korchak about how to manage data in a fintech environment

Interview

Introduction How did you get involved in the area of data management? Can you start by summarizing the data challenges that are particular to the fintech ecosystem? What are the primary sources and types of data that fintech organizations are working with?

What are the business-level capabilities that are dependent on this data?

How do the regulatory and business requirements influence the technology landscape in fintech organizations?

What does a typical build vs. buy decision process look like?

Fraud prediction in e.g. banks is one of the most well-established applications of machine learning in industry. What are some of the other ways that ML plays a part in fintech?

How does that influence the architectural design/capabilities for data platforms in those organizations?

Data governance is a notoriously challenging problem. What are some of the strategies that fintech companies are able to apply to this problem given their regulatory burdens? What are the most interesting, innovative, or unexpected approaches to data management that you have seen in the fintech sector? What are the most interesting, unexpected, or challenging lessons that you have learned while working on data in fintech? What do you have planned for the future of your data capabilities at Monite?

Contact Info

LinkedIn

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

Monite ISO 270001 Tesseract GitOps SWIFT Protocol

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Sponsored By: Starburst: Starburst Logo

This episode is brought to you by Starburst - a data lake analytics platform for data engineers who are battling to build and scale high quality data pipelines on the data lake. Powered by Trino, Starburst runs petabyte-scale SQL analytics fast at a fraction of the cost of traditional methods, helping you meet all your data needs ranging from AI/ML workloads to data applications to complete analytics.

Trusted by the teams at Comcast and Doordash, Starburst delivers the adaptability and flexibility a lakehouse ecosystem promises, while providing a single point of access for your data and all your data governance allowing you to discover, transform, govern, and secure all in one place. Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Try Starburst Galaxy today, the easiest and fastest way to get started using Trino, and get $500 of credits free. dataengineeringpodcast.com/starburstRudderstack: Rudderstack

Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstackMaterialize: Materialize

You shouldn't have to throw away the database to build with fast-changing data. Keep the familiar SQL, keep the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date.

That is Materialize, the only true SQL streaming database built from the ground up to meet the needs of modern data products: Fresh, Correct, Scalable — all in a familiar SQL UI. Built on Timely Dataflow and Differential Dataflow, open source frameworks created by cofounder Frank McSherry at Microsoft Research, Materialize is trusted by data and engineering teams at Ramp, Pluralsight, Onward and more to build real-time data products without the cost, complexity, and development time of stream processing.

Go to materialize.com today and get 2 weeks free!Support Data Engineering Podcast

Architecting a Modern Data Warehouse for Large Enterprises: Build Multi-cloud Modern Distributed Data Warehouses with Azure and AWS

Design and architect new generation cloud-based data warehouses using Azure and AWS. This book provides an in-depth understanding of how to build modern cloud-native data warehouses, as well as their history and evolution. The book starts by covering foundational data warehouse concepts, and introduces modern features such as distributed processing, big data storage, data streaming, and processing data on the cloud. You will gain an understanding of the synergy, relevance, and usage data warehousing standard practices in the modern world of distributed data processing. The authors walk you through the essential concepts of Data Mesh, Data Lake, Lakehouse, and Delta Lake. And they demonstrate the services and offerings available on Azure and AWS that deal with data orchestration, data democratization, data governance, data security, and business intelligence. After completing this book, you will be ready to design and architect enterprise-grade, cloud-based modern data warehouses using industry best practices and guidelines. What You Will Learn Understand the core concepts underlying modern data warehouses Design and build cloud-native data warehousesGain a practical approach to architecting and building data warehouses on Azure and AWS Implement modern data warehousing components such as Data Mesh, Data Lake, Delta Lake, and Lakehouse Process data through pandas and evaluate your model’s performance using metrics such as F1-score, precision, and recall Apply deep learning to supervised, semi-supervised, and unsupervised anomaly detection tasks for tabular datasets and time series applications Who This Book Is For Experienced developers, cloud architects, and technology enthusiasts looking to build cloud-based modern data warehouses using Azure and AWS

How are video games collecting data from players? What kind of information is useful to the #videogame #company? How is the video game #data #governed? Find out the answer to all these questions and more as Marie Leseleuc, Analytics and Data Science Director at Eidos-Montreal, joins us to talk about moving data in the video game industry on this latest #podcast #episode of Data Unchained!

datagovernance #datascience #dataanalytics #dataengineers #international #remotework

Cyberpunk by jiglr | https://soundcloud.com/jiglrmusic Music promoted by https://www.free-stock-music.com Creative Commons Attribution 3.0 Unported License https://creativecommons.org/licenses/by/3.0/deed.en_US Hosted on Acast. See acast.com/privacy for more information.

Data leaders must prepare their teams to deliver the timely, accurate, and trustworthy data that GenAI initiatives need to ensure they deliver results. They can do so by modernizing their environments, extending data governance programs, and fostering collaboration with data science teams. Published at: https://www.eckerson.com/articles/the-data-leader-s-guide-to-generative-ai-part-i-models-applications-and-pipelines

Karim Wadie: Don’t Worry About BigQuery PII

Karim Wadie: Don’t Worry About BigQuery PII: How Google Cloud Helped a Large Telco Automate Data Governance at Scale

Discover how Google Cloud helped a large Telco automate data governance at scale in BigQuery with Karim Wadie. 📊🔒 Learn about the technical solution, GCP concepts, and see a live demo to fast-track your cloud journey. 🌐💡 #DataGovernance #BigQuery #googlecloud

✨ H I G H L I G H T S ✨

🙌 A huge shoutout to all the incredible participants who made Big Data Conference Europe 2023 in Vilnius, Lithuania, from November 21-24, an absolute triumph! 🎉 Your attendance and active participation were instrumental in making this event so special. 🌍

Don't forget to check out the session recordings from the conference to relive the valuable insights and knowledge shared! 📽️

Once again, THANK YOU for playing a pivotal role in the success of Big Data Conference Europe 2023. 🚀 See you next year for another unforgettable conference! 📅 #BigDataConference #SeeYouNextYear

Effective data management has become a cornerstone of success in our digital era. It involves not just collecting and storing information but also organizing, securing, and leveraging data to drive progress and innovation. Many organizations turn to tools like Snowflake for advanced data warehousing capabilities. However, while Snowflake enhances data storage and access, it's not a complete solution for all data management challenges. To address this, tools like Capital One’s Slingshot can be used alongside Snowflake, helping to optimize costs and refine data management strategies. Salim Syed is a VP, Head of engineering for Capital One Slingshot product. He led Capital One’s data warehouse migration to AWS and is a specialist in deploying Snowflake to a large enterprise. Salim’s expertise lies in developing Big Data (Lake) and Data Warehouse strategy on the public cloud. He leads an organization of more than 100 data engineers, support engineers, DBAs and full stack developers in driving enterprise data lake, data warehouse, data management and visualization platform services. Salim has more than 25 years of experience in the data ecosystem. His career started in data engineering where he built data pipelines and then moved into maintenance and administration of large database servers using multi-tier replication architecture in various remote locations. He then worked at CodeRye as a database architect and at 3M Health Information Systems as an enterprise data architect. Salim has been at Capital One for the past six years. In this episode, Adel and Salim explore cloud data management and the evolution of Slingshot into a major multi-tenant SaaS platform, the shift from on-premise to cloud-based data governance, the role of centralized tooling, strategies for effective cloud data management, including data governance, cost optimization, and waste reduction as well as insights into navigating the complexities of data infrastructure, security, and scalability in the modern digital era. Links Mentioned in the Show: Capital One SlingshotSnowflakeCourse: Introduction to Data WarehousingCourse: Introduction to Snowflake

Our industry’s breathless hype about generative AI tends to overlook the stubborn challenge of data governance. Data catalogs address this challenge by evaluating and controlling the accuracy, explainability, privacy, IP friendliness, and fairness of GenAI inputs. Published at: https://www.eckerson.com/articles/generative-ai-needs-vigilant-data-cataloging-and-governance

Poor data engineering is like building a shaky foundation for a house—it leads to unreliable information, wasted time and money, and even legal problems, making everything less dependable and more troublesome in our digital world. In the retail industry specifically, data engineering is particularly important for managing and analyzing large volumes of sales, inventory, and customer data, enabling better demand forecasting, inventory optimization, and personalized customer experiences. It helps retailers make informed decisions, streamline operations, and remain competitive in a rapidly evolving market. Insight and frameworks learned from data engineering practices can be applied to a multitude of people and problems, and in turn, learning from someone who has been at the forefront of data engineering is invaluable.   Mohammad Sabah is SVP of Engineering and Data at Thrive Market, and was appointed to this role in 2018. He joined the company from The Honest Company where he served as VP of Engineering & Chief Data Scientist. Sabah joined The Honest Company following its acquisition of Insnap, which he co-founded in 2015. Over the course of his career, Sabah has held various data science and engineering roles at companies including Facebook, Workday, Netflix, and Yahoo! In the episode, Richie and Mo explore the importance of using AI to identify patterns and proactively address common errors, the use of tools like dbt and SODA for data pipeline abstraction and stakeholder involvement in data quality, data governance and data quality as foundations for strong data engineering, validation layers at each step of the data pipeline to ensure data quality, collaboration between data analysts and data engineers for holistic problem-solving and reusability of patterns, ownership mentality in data engineering and much more.  Links from the show: PagerDutyDomoOpsGeneCareer Track: Data Engineer

Data Engineering with AWS - Second Edition

Learn data engineering and modern data pipeline design with AWS in this comprehensive guide! You will explore key AWS services like S3, Glue, Redshift, and QuickSight to ingest, transform, and analyze data, and you'll gain hands-on experience creating robust, scalable solutions. What this Book will help me do Understand and implement data ingestion and transformation processes using AWS tools. Optimize data for analytics with advanced AWS-powered workflows. Build end-to-end modern data pipelines leveraging cutting-edge AWS technologies. Design data governance strategies using AWS services for security and compliance. Visualize data and extract insights using Amazon QuickSight and other tools. Author(s) Gareth Eagar is a Senior Data Architect with over 25 years of experience in designing and implementing data solutions across various industries. He combines his deep technical expertise with a passion for teaching, aiming to make complex concepts approachable for learners at all levels. Who is it for? This book is intended for current or aspiring data engineers, data architects, and analysts seeking to leverage AWS for data engineering. It suits beginners with a basic understanding of data concepts who want to gain practical experience as well as intermediate professionals aiming to expand into AWS-based systems.

Mastering Tableau 2023 - Fourth Edition

This comprehensive book on Tableau 2023 is your practical guide to mastering data visualization and business intelligence techniques. You will explore the latest features of Tableau, learn how to create insightful dashboards, and gain proficiency in integrating analytics and machine learning workflows. By the end, you'll have the skills to address a variety of analytics challenges using Tableau. What this Book will help me do Master the latest Tableau 2023 features and use cases to tackle analytics challenges. Develop and implement ETL workflows using Tableau Prep Builder for optimized data preparation. Integrate Tableau with programming languages such as Python and R to enhance analytics. Create engaging, visually impactful dashboards for effective data storytelling. Understand and apply data governance to ensure data quality and compliance. Author(s) Marleen Meier is an experienced data visualization expert and Tableau consultant with over a decade of experience helping organizations transform data into actionable insights. Her approach integrates her technical expertise and a keen eye for design to make analytics accessible rather than overwhelming. Her passion for teaching others to use visualization tools effectively shines through in her writing. Who is it for? This book is ideal for business analysts, BI professionals, or data analysts looking to enhance their Tableau expertise. It caters to both newcomers seeking to understand the foundations of Tableau and experienced users aiming to refine their skills in advanced analytics and data visualization. If your goal is to leverage Tableau as a strategic tool in your organization's BI projects, this book is for you.

On this latest episode of the Data Unchained podcast, Lauren Maffeo joins us to talk about her book, 'Data Governance from the Ground up.' We also talk about controlling what's fed into AI engines and six steps of preparing for data governanace.

data #datascience #podcast #datagovernance #ai #artificialintelligence #author

Cyberpunk by jiglr | https://soundcloud.com/jiglrmusic Music promoted by https://www.free-stock-music.com Creative Commons Attribution 3.0 Unported License https://creativecommons.org/licenses/by/3.0/deed.en_US Hosted on Acast. See acast.com/privacy for more information.

Data Swamp to Data Swimming Pool: Cleaning up the Mess - Maggie Hays & John Joyce, Acryl Data

This talk was recorded at Crunch Conference 2022. Maggie and John from Acryl Data company data swamp to data swimming pool: cleaning up the mess with modern data governance.

"In this talk, we will present a practical approach to supercharging your data governance initiative by surfacing and leveraging metadata readily available within your data ecosystem."

The event was organized by Crafthub.

You can watch the rest of the conference talks on our channel.

If you are interested in more speakers, tickets and details of the conference, check out our website: https://crunchconf.com/ If you are interested in more events from our company: https://crafthub.events/

Generative AI is here to stay—even in the 8 months since the public release of ChatGPT, there are an abundance of AI tools to help make us more productive at work and ease the stress of planning and execution of our daily lives among other things.  Already, many of us are wondering what is to come in the next 8 months, the next year, and the next decade of AI’s evolution. In the grand scheme of things, this really is just the beginning. But what should we expect in this Cambrian explosion of technology? What are the use cases being developed behind the scenes? What do we need to be mindful of when training the next generations of AI? Can we combine multiple LLMs to get better results? Bal Heroor is CEO and Principal at Mactores and has led over 150 business transformations driven by analytics and cutting-edge technology. His team at Mactores are researching and building AI, AR/VR, and Quantum computing solutions for business to gain a competitive advantage. Bal is also the Co-Founder of Aedeon—the first hyper-scale Marketplace for Data Analytics and AI talent. In the episode, Richie and Bal explore common use cases for generative AI, how it's evolving to solve enterprise problems, challenges of data governance and the importance of explainable AI, the challenges of tracking the lineage of AI and data in large organizations. Bal also touches on the shift from general-purpose generative AI models to more specialized models, fascinating use cases in the manufacturing industry, what to consider when adopting AI solutions in business, and much more. Links mentioned in the show: PulsarTrifactaAWS Clarify[Course] Introduction to ChatGPT[Course] Implementing AI Solutions in Business[Course] Generative AI Concepts

Distributing Data Governance: How Unity Catalog Allows for a Collaborative Approach

As one of the world’s largest providers of content delivery network (CDN) and security solutions, Akamai owns thousands of data assets of various shapes and sizes, some even go up to multiple PBs. Several departments within the company leverage Databricks for their data and AI workloads, which means we have over a hundred Databricks workspaces within a single Databricks account, where some of the assets are shared across products, and some are product-specific.

In this presentation, we will describe how to use the capabilities of Unity Catalog to distribute the administration burden between departments, while still maintaining a unified governance model.

We will also share the benefits we’ve found in using Unity Catalog, beyond just access management, such as:

  • Visibility into which data assets we have in the organization
  • Ability to identify and potentially eliminate duplicate data workloads between departments
  • Removing boilerplate code for accessing external sources
  • Increasing innovation of product teams by exposing the data assets in a better, more efficient way

Talk by: Gilad Asulin and Pulkit Chadha

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Multicloud Data Governance on the Databricks Lakehouse

Across industries, a multicloud setup has quickly become the reality for large organizations. Multi-cloud introduces new governance challenges as permissions models often do not translate from one cloud to the other and if they do, are insufficiently granular to accommodate privacy requirements and principles of least privilege. This problem can be especially acute for data and AI workloads that rely on sharing and aggregating large and diverse data sources across business unit boundaries and where governance models need to incorporate assets such as table rows/columns and ML features and models.

In this session, we will provide guidelines on how best to overcome these challenges for companies that have adopted the Databricks Lakehouse as their collaborative space for data teams across the organization, by exploiting some of the unique product features of the Databricks platform. We will focus on a common scenario: a data platform team providing data assets to two different ML teams, one using the same cloud and the other one using a different cloud.

We will explain the step-by-step setup of a unified governance model by leveraging the following components and conventions:

  • Unity Catalog for implementing fine-grained access control across all data assets: files in cloud storage, rows and columns in tables and ML features and models
  • The Databricks Terraform provider to automatically enforce guardrails and permissions across clouds
  • Account level SSO Integration and identity federation to centralize administer access across workspaces
  • Delta sharing to seamlessly propagate changes in provider data sets to consumers in near real-time
  • Centralized audit logging for a unified view on what asset was accessed by whom

Talk by: Ioannis Papadopoulos and Volker Tjaden

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored: Impetus | Accelerating ADP’s Business Transformation w/ a Modern Enterprise Data Platform

Learn How ADP’s Enterprise Data Platform Is used to drive direct monetization opportunities, differentiate its solutions, and improve operations. ADP is continuously searching for ways to increase innovation velocity, time-to-market, and improve the overall enterprise efficiency. Making data and tools available to teams across the enterprise while reducing data governance risk is the key to making progress on all fronts. Learn about ADP’s enterprise data platform that created a single source of truth with centralized tools, data assets, and services. It allowed teams to innovate and gain insights by leveraging cross-enterprise data and central machine learning operations.

Explore how ADP accelerated creation of the data platform on Databricks and AWS, achieve faster business outcomes, and improve overall business operations. The session will also cover how ADP significantly reduced its data governance risk, elevated the brand by amplifying data and insights as a differentiator, increased data monetization, and leveraged data to drive human capital management differentiation.

Talk by: Chetan Kalanki and Zaf Babin

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Enabling Data Governance at Enterprise Scale Using Unity Catalog

Amgen has invested in building modern, cloud-native enterprise data and analytics platforms over the past few years with a focus on tech rationalization, data democratization, overall user experience, increase reusability, and cost-effectiveness. One of these platforms is our Enterprise Data Fabric which focuses on pulling in data across functions and providing capabilities to integrate and connect the data and govern access. For a while, we have been trying to set up robust data governance capabilities which are simple, yet easy to manage through Databricks. There were a few tools in the market that solved a few immediate needs, but none solved the problem holistically. For use cases like maintaining governance on highly restricted data domains like Finance and HR, a long-term solution native to Databricks and addressing the below limitations was deemed important:

The way these tools were set up, allowed the overriding of a few security policies

  • Tools were not UpToDate with the latest DBR runtime
  • Complexity of implementing fine-grained security
  • Policy management – AWS IAM + In tool policies

To address these challenges, and for large-scale enterprise adoption of our governance capability, we started working on UC integration with our governance processes. With an aim to realize the following tech benefits:

  • Independent of Databricks runtime
  • Easy fine-grained access control
  • Eliminated management of IAM roles
  • Dynamic access control using UC and dynamic views

Today, using UC, we have to implement fine-grained access control & governance for the restricted data of Amgen. We are in the process of devising a realistic migration & change management strategy across the enterprise.

Talk by: Lakhan Prajapati and Jaison Dominic

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Instacart on Why Engineers Shouldn't Write Data Governance Policies

Controlling permissions for accessing data assets can be messy, time consuming, and usually a combination of both. The teams responsible for creating the business rules that govern who should have access to what data are usually different from the teams responsible for administering the grants to achieve that access. On the other side of the equation, the end user who needs access to a data asset may be left waiting for grants to be made as the decision is passed between teams. That is, if they even know the correct path to getting access in the first place.

Separating the concerns of managing data governance at a business level and implementing data governance at an engineering level is the best way to clarify data access permissions. In practice, this involves building systems to enable data governance enforcement based on business rules, with little to no understanding of the individual system where the data lives.

In practice, with a concrete business rule, such as “only users from the finance team should have access to critical financial data,” we want a system that deals only with those constituent concepts. For example, “the data is marked as critical financial” and “the user is a part of the finance team.” By abstracting away any source system components, such as “the tables in the finance schema” and “someone who’s a member of the finance Databricks group,” the access policies applied will then model the business rules as closely as possible.

This session will focus on how to establish and align the processes, policies, and stakeholders involved in making this type of system work seamlessly. Sharing the experience and learnings of our team at Instacart, we will aim to help attendees streamline and simplify their data security and access strategies.

Talk by: Kieran Taylor and Andria Fuquen

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Increasing Data Trust: Enabling Data Governance on Databricks Using Unity Catalog & ML-Driven MDM

As part of Comcast Effectv’s transformation into a completely digital advertising agency, it was key to develop an approach to manage and remediate data quality issues related to customer data so that the sales organization is using reliable data to enable data-driven decision making. Like many organizations, Effectv's customer lifecycle processes are spread across many systems utilizing various integrations between them. This results in key challenges like duplicate and redundant customer data that requires rationalization and remediation. Data is at the core of Effectv’s modernization journey with the intended result of winning more business, accelerating order fulfillment, reducing make-goods and identifying revenue.

In partnership with Slalom Consulting, Comcast Effectv built a traditional lakehouse on Databricks to ingest data from all of these systems but with a twist; they anchored every engineering decision in how it will enable their data governance program.

In this session, we will touch upon the data transformation journey at Effectv and dive deeper into the implementation of data governance leveraging Databricks solutions such as Delta Lake, Unity Catalog and DB SQL. Key focus areas include how we baked master data management into our pipelines by automating the matching and survivorship process, and bringing it all together for the data consumer via DBSQL to use our certified assets in bronze, silver and gold layers.

By making thoughtful decisions about structuring data in Unity Catalog and baking MDM into ETL pipelines, you can greatly increase the quality, reliability, and adoption of single-source-of-truth data so your business users can stop spending cycles on wrangling data and spend more time developing actionable insights for your business.

Talk by: Maggie Davis and Risha Ravindranath

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Lineage System Table in Unity Catalog

Unity Catalog provides fully automated data lineage for all workloads in SQL, R, Python, Scala and across all asset types at Databricks. The aggregated view has been available to end users through data explorer and API. In this session, we are excited to share that lineage is available via delta table in their UC metastore. It stores full history of recent lineage records and it is near real time. Additionally, customers can query it through standard SQL interface. With that, customers can get significant operational insights about their workload for impact analysis, troubleshooting, quality assurance, data discovery, and data governance.

Together with the system table platform effort, which provides query history, job run operational data, audit logs and more, lineage table will be a critical piece to link all the data asset and entity asset together, providing better lakehouse observability and unification to customers.

Talk by: Menglei Sun

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc