talk-data.com talk-data.com

Topic

Data Governance

data_management compliance data_quality

417

tagged

Activity Trend

90 peak/qtr
2020-Q1 2026-Q1

Activities

417 activities · Newest first

How Mars Achieved a People Analytics Transformation with a Modern Data Stack

People Analytics at Mars was formed two years ago as part of an ambitious journey to transform our HR analytics capabilities. To transform, we needed to build foundational services to provide our associates with helpful insights through fast results and resolving complex problems. Critical in that foundation are data governance and data enablement which is the responsibility of the Mars People Data Office team whose focus is to deliver high quality and reliable data that is reusable for current and future People Analytics use cases. Come learn how this team used Databricks in helping Mars achieve its People Analytics Transformation.

Talk by: Rachel Belino and Sreeharsha Alagani

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

US Army Corp of Engineers Enhanced Commerce & National Sec Through Data-Driven Geospatial Insight

The US Army Corps of Engineers (USACE) is responsible for maintaining and improving nearly 12,000 miles of shallow-draft (9'-14') inland and intracoastal waterways, 13,000 miles of deep-draft (14' and greater) coastal channels, and 400 ports, harbors, and turning basins throughout the United States. Because these components of the national waterway network are considered assets to both US commerce and national security, they must be carefully managed to keep marine traffic operating safely and efficiently.

The National DQM Program is tasked with providing USACE a nationally standardized remote monitoring and documentation system across multiple vessel types with timely data access, reporting, dredge certifications, data quality control, and data management. Government systems have often lagged commercial systems in modernization efforts, and the emergence of the cloud and Data Lakehouse Architectures have empowered USACE to successfully move into the modern data era.

This session incorporates aspects of these topics: Data Lakehouse Architecture: Delta Lake, platform security and privacy, serverless, administration, data warehouse, Data Lake, Apache Iceberg, Data Mesh GIS: H3, MOSAIC, spatial analysis data engineering: data pipelines, orchestration, CDC, medallion architecture, Databricks Workflows, data munging, ETL/ELT, lakehouses, data lakes, Parquet, Data Mesh, Apache Spark™ internals. Data Streaming: Apache Spark Structured Streaming, real-time ingestion, real-time ETL, real-time ML, real-time analytics, and real-time applications, Delta Live Tables. ML: PyTorch, TensorFlow, Keras, scikit-learn, Python and R ecosystems data governance: security, compliance, RMF, NIST data sharing: sharing and collaboration, delta sharing, data cleanliness, APIs.

Talk by: Jeff Mroz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Advanced Governance with Collibra on Databricks

A data lake is only as good as its governance. Understanding what data you have, performing classification, defining/applying security policies and auding how it's used is the data governance lifecycle. Unity Catalog with its rich ecosystem of supported tools simplifies all stages of the data governance lifecycle. Learn how metadata can be hydrated, into Collibra directly from Unity Catalog. Once the metadata is available in Collibra we will demonstrate classification, defining security policies on the data and pushing those policies into Databricks. All access and usage of data is automatically audited with real time lineage provided in the data explorer as well as system tables.

Talk by: Leon Eller and Antonio Castelo

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksin

Post-Merger: Implementing Unity Catalog Across Multiple Accounts

Warner Media and Discovery have recently merged to form Warner Bros Discovery. Owning two Databricks accounts and wanting to maintain their separation, our data governance team has successfully implemented Unity Catalog as our data governance solution across both accounts, allowing our teams to collaboratively and securely use the data assets of two organizations collaboratively and securely.

This session is aimed at sharing that success story, including initial challenges, our approach, our architecture, the actual implementation, and user success post-implementation.

Talk by: Ramprasad Koya and Susheel Lakshmipathi

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

What’s New With Platform Security and Compliance in the Databricks Lakehouse Platform

At Databricks, we know that data is one of your most valuable assets and alwasys must be protected, that’s why security is built into every layer of the Databricks Lakehouse Platform. Databricks provides comprehensive security to protect your data and workloads, such as encryption, network controls, data governance and auditing.

In this session, you will hear from Databricks product leaders on the platform security and compliance progress made over the past year, with demos on how administrators can start protecting workloads fast. You will also learn more about the roadmap that delivers on the Databricks commitment to you as the most trusted, compliant, and secure data and AI platform with the Databricks Lakehouse.

Talk by: Samrat Ray and David Veuve

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksin

Future Data Access Control: Booz Allen Hamilton’s Way of Securing Databricks Lakehouse with Immuta

In this talk, I’ll review how we utilize Attribute-Based Access Control (ABAC) to enforce policy via Immuta. I’ll discuss the differences between the ABAC and legacy Role-Based Access Control (RBAC) approaches to control access and how the RBAC approach is not sufficient to keep up with today’s growing big data market. With so much data available, there also comes substantial risk. Data can contain many sensitive data elements, including PII and PHI. Industry leaders like Databricks are pushing the boundaries of data technology, which leads to constantly evolving data use cases. And that’s a good thing. However, the RBAC approach is struggling to keep up with those advancements.

So what is RBAC? It’s an approach to data access that permits system access based on the end-user’s role. For legacy systems, it’s meant as a simple but effective approach to securing data. Are you a manager? Then you’ll get access to data meant for managers. This is great for small deployments with clearly defined roles. Here at Booz Allen, we invested in Databricks because we have an environment of over 30 thousand users and billions of rows of data.

To mitigate this problem and align with our forward-thinking company standard, we introduced Immuta into our stack. Immuta uses ABAC to allow for dynamic data access control. Users are automatically assigned certain attributes, and access is based on those attributes instead of just their role. This allows for more flexibility and allows data access control to easily scale without the need to constantly map a user to their role. Using attributes, we can write policies in one place and have them applied across all our data platforms. This makes for a truly holistic data governance approach and provides immediate ROI and time savings for the company.

Talk by: Jeffrey Hess

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Lakehouse Architecture to Advance Security Analytics at the Department of State

In 2023, the Department of State surged forward on implementing a lakehouse architecture to get faster, smarter, and more effective on cybersecurity log monitoring and incident response. In addition to getting us ahead of federal mandates, this approach promises to enable advanced analytics and machine learning across our highly federated global IT environment while minimizing costs associated with data retention and aggregation.

This talk will include a high-level overview of the technical and policy challenge and a technical deeper dive on the tactical implementation choices made. We’ll share lessons learned related to governance and securing organizational support, connecting between multiple cloud environments, and standardizing data to make it useful for analytics. And finally, we’ll discuss how the lakehouse leverages Databricks in multicloud environments to promote decentralized ownership of data while enabling strong, centralized data governance practices.

Talk by: Timothy Ahrens and Edward Moe

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored: KPMG | Multicloud Enterprise Delta Sharing & Governance using Unity Catalog @ S&P Global

Cloud technologies have revolutionized global data access across a number of industries. However, many enterprise organizations face challenges in adopting these technologies effectively, as comprehensive cloud data governance strategies and solutions are complex and evolving – particularly in hybrid or multicloud scenarios involving multiple third parties. KPMG and S&P Global have harnessed the power of Databricks Lakehouse to create a novel approach.

By integrating Unity Catalogue, Delta Sharing, and the KPMG Modern Data Platform, S&P Global has enabled scalable, transformative cross-enterprise data sharing and governance. This demonstration highlights a collaboration between S&P Global Sustainable1 (S1) ESG program and the KPMG ESG Analytics Accelerators to enable large-scale SFDR ESG portfolio analytics. Join us to discover our solution that drives transformative change, fosters data-driven decision-making, and bolsters sustainability efforts in a wide range of industries.

Talk by: Niels Hanson,Dennis Tally

Here’s more to explore: A New Approach to Data Sharing: https://dbricks.co/44eUnT1

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Databricks As Code:Effectively Automate a Secure Lakehouse Using Terraform for Resource Provisioning

At Rivian, we have automated more than 95% of our Databricks resource provisioning workflows using an in-house Terraform module, affording us a lean admin team to manage over 750 users. In this session, we will cover the following elements of our approach and how others can benefit from improved team efficiency.

  • User and service principal management
  • Our permission model on Unity Catalog for data governance
  • Workspace and secrets resource management
  • Managing internal package dependencies using init scripts
  • Facilitating dashboards, SQL queries and their associated permissions
  • Scaling source of truth Petabyte scale Delta Lake table ingestion jobs and workflows

Talk by: Jason Shiverick and Vadivel Selvaraj

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

A Technical Deep Dive into Unity Catalog's Practitioner Playbook

Get ready to take a deep dive into Unity Catalog and explore how it can simplify data, analytics and AI governance across multiple clouds. In this session, take a deep dive into Unity Catalog and the expert Databricks team will guide you through a hands-on demo, showcasing the latest features and best practices for data governance. You'll learn how to master Unity Catalog and gain a practical understanding of how it can streamline your analytics and AI initiatives. Whether you're migrating from Hive Metastore or just looking to expand your knowledge of Unity Catalog, this session is for you. Join us for a practical, hands-on deep dive into Unity Catalog and learn how to achieve seamless data governance while following best practices for data, analytics and AI governance.

Talk by: Zeashan Pappa and Ifigeneia Derekli

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksin

What’s New in Unity Catalog -- With Live Demos

Join the Unity Catalog product team and dive into the cutting-edge world of data, analytics and AI governance. With Unity Catalog’s unified governance solution for data, analytics, and AI on any cloud, you’ll discover the latest and greatest enhancements we’re shipping, including fine-grained governance with row/column filtering, new enhancements with automated data lineage and governance for ML assets.

In this demo-packed session, You’ll learn how new capabilities in Unity Catalog can further simplify your data governance and accelerated analytics and AI initiatives. Plus, get an exclusive sneak peek at our upcoming roadmap. And don’t forget, you’ll have the chance to ask the product teams themselves any burning questions you have about the best governance solution for the lakehouse. Don’t miss out on this exciting opportunity to level up your data game with Unity Catalog.

Talk by: Paul Roome

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Moving Beyond Data Integration with Data Collaboration

How can you maximize data collaboration across your organization without having to build integrations between individual applications, systems, and other data sources? Data collaboration architectures that don't depend on integrations aren't a new idea, but they've assumed greater urgency as organizations increasingly struggle to manage the ever-growing numbers of data sources that exist inside their IT estates. In this report, Cinchy cofounders Dan DeMers and Karanjot Jaswal show CIOs, CTOs, CDOs, and other IT leaders how to rethink their organization's approach to data architectures, data management, and data governance. You'll learn about different approaches to creating data platforms that liberate and autonomize data, enable agile data management, apply consistent data access controls, and maximize visibility without requiring application-specific integrations. With this report, you'll discover: Why data integration is often handled piecemeal—combining one app with another rather than integrating all apps together How data collaboration platforms enable data sharing across all apps, systems, and sources without application-specific integrations Four major platforms you can use to make data available to all applications and services: Cinchy, K2View, Microsoft Dataverse, and The Modern Data Company Principles and practices for deploying the data collaboration platform of your choice Dan DeMers is the CEO and cofounder of Cinchy. Karanjot Jaswal is cofounder and CTO of Cinchy.

We talked about:

Simon's background What MLOps is and what it isn't Skills needed to build an ML platform that serves 100s of models Ranking the importance of skills The point where you should think about building an ML platform The importance of processes in ML platforms Weighing your options with SaaS platforms The exploratory setup, experiment tracking, and model registry What comes after deployment? Stitching tools together to create an ML platform Keeping data governance in mind when building a platform What comes first – the model or the platform? Do MLOps engineers need to have deep knowledge of how models work? Is API design important for MLOps? Simon's recommendations for furthering MLOps knowledge

Links:

LinkedIn: https://www.linkedin.com/in/simonstiebellehner/ Github: https://github.com/stiebels Medium: https://medium.com/@sistel

Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

LLMs are hugely popular with data engineers because they boost productivity. But companies must adapt their data governance programs to control risks related to data quality, privacy, intellectual property, fai-Datarness, and explainability. Published at: https://www.eckerson.com/articles/should-ai-bots-build-your-data-pipelines-part-ii-risks-and-governance-approaches-for-data-engineers-to-use-large-language-models

We talked about:

Bart's background What is data governance? Data dictionaries and data lineage Data access management How to learn about data governance What skills are needed to do data governance effectively When an organization needs to start thinking about data governance Good data access management processes Data masking and the importance of automating data access DPO and CISO roles How data access management works with a data mesh approach Avoiding the role explosion problem The importance of data governance integration in DataOps Terraform as a stepping stone to data governance How Raito can help an organization with data governance Open-source data governance tools

Links:

LinkedIn: https://www.linkedin.com/in/bartvandekerckhove/ Twitter: https://twitter.com/Bart_H_VDK Github: https://github.com/raito-io Website: https://www.raito.io/ Data Mesh Learning Slack: https://data-mesh-learning.slack.com/join/shared_invite/zt-1qs976pm9-ci7lU8CTmc4QD5y4uKYtAA#/shared-invite/email DataQG Website: https://dataqg.com/ DataQG Slack: https://dataqgcommunitygroup.slack.com/join/shared_invite/zt-12n0333gg-iTZAjbOBeUyAwWr8I~2qfg#/shared-invite/email DMBOK (Data Management Book of Knowledge): https://www.dama.org/cpages/body-of-knowledge DMBOK Wheel describing the data governance activities: https://www.dama.org/cpages/dmbok-2-wheel-images

Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Embedded Analytics

Over the past 10 years, data analytics and data visualization have become essential components of an enterprise information strategy. And yet, the adoption of data analytics has remained remarkably static, reaching no more than 30% of potential users. This book explores the most important techniques for taking that adoption further: embedding analytics into the workflow of our everyday operations. Authors Donald Farmer and Jim Horbury show business users how to improve decision making without becoming analytics specialists. You'll explore different techniques for exchanging data, insights, and events between analytics platforms and hosting applications. You'll also examine issues including data governance and regulatory compliance and learn best practices for deploying and managing embedded analytics at scale. Learn how data analytics improves business decision making and performance Explore advantages and disadvantages of different embedded analytics platforms Develop a strategy for embedded analytics in an organization or product Define the architecture of an embedded solution Select vendors, platforms, and tools to implement your architecture Hire or train developers and architects to build the embedded solutions you need Understand how embedded analytics interacts with traditional analytics

Govern Your Data Clients the Right Way to Scale | Memphis

ABOUT THE TALK: In this talk, you will learn what are the risks of growing without data governance, how to create a supportive framework to control your different data clients using a common open-source tools, and how to construct the framework to adapt to changes and additions that scale with the business movement.

ABOUT THE SPEAKER: Yaniv Ben Hemo has been fascinated by data and its power to disrupt our world, which motivated him to deepen his journey with it, specifically, data engineering. In 2022, alongside his three best friends from college, Yaniv founded Memphis.dev, a real-time data processing platform out of their struggles as devs working with legacy queues and brokers to enable high-scale real-time data processing for developers.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

Streaming Data Mesh

Data lakes and warehouses have become increasingly fragile, costly, and difficult to maintain as data gets bigger and moves faster. Data meshes can help your organization decentralize data, giving ownership back to the engineers who produced it. This book provides a concise yet comprehensive overview of data mesh patterns for streaming and real-time data services. Authors Hubert Dulay and Stephen Mooney examine the vast differences between streaming and batch data meshes. Data engineers, architects, data product owners, and those in DevOps and MLOps roles will learn steps for implementing a streaming data mesh, from defining a data domain to building a good data product. Through the course of the book, you'll create a complete self-service data platform and devise a data governance system that enables your mesh to work seamlessly. With this book, you will: Design a streaming data mesh using Kafka Learn how to identify a domain Build your first data product using self-service tools Apply data governance to the data products you create Learn the differences between synchronous and asynchronous data services Implement self-services that support decentralized data

podcast_episode
by Aynne Kokas (Miller Center, University of Virginia; Associate Professor of Media Studies, University of Virginia)

In this episode, we're thrilled to have Dr. Aynne Kokas, a C.K. Yen Professor at the Miller Center and an associate professor of media studies at the University of Virginia. Kokas’ research examines Sino-U.S. media and technology relations. Dr. Kokas is also the author of the critically acclaimed book "Trafficking Data: How China Is Winning the Battle for Digital Sovereignty," which we will be referring to frequently throughout this conversation. We will also touch on a few topics that were discussed in her recent conference at the Miller Center titled "U.S.-China Tech Competition: Has Democracy Met Its Match?"

During the event, Dr. Kokas and other experts discussed a variety of issues related to the ongoing tech competition between the US and China. For example, they explored the ways in which apps and other technology platforms may be used by the Chinese government for data-gathering purposes, and examined potential policy responses to the increasingly complex data flows between the two countries. Additionally, they discussed the long-term stability of US technology infrastructure and its implications for national security. In addition, there were panels that discussed the digital economy, climate, tech infrastructure, and political influence between China and the US.

In this episode we'll be discussing data policy for US-China technology, a topic that has become increasingly relevant in recent years as the two countries continue to compete for dominance in the tech industry. We'll delve into the differences in approach to data policy between China and the United States, the implications of these differences, and how China's digital silk road initiative is expanding its influence over the global digital economy.

We'll also discuss the challenges of balancing economic benefits against concerns about national security and human rights, and the future of the technology industry in light of these trends.

Links:

U.S.–China tech competition: Has democracy met its match?

Aynne Kokas website: https://www.aynnekokas.com/

Trafficking Data: How China Is Winning the Battle for Digital Sovereignty