talk-data.com talk-data.com

Topic

Data Lake

big_data data_storage analytics

73

tagged

Activity Trend

28 peak/qtr
2020-Q1 2026-Q1

Activities

73 activities · Newest first

From Data Lake Entanglement to Data Mesh Decoupling: Scaling a Self-Service Data Platform

Our data platform journey started with a classic data lake — easy to ingest, hard to evolve. As domains scaled, tight coupling across source systems, pipelines, and data products slowed everything down. In this talk, we share how we re-architected toward a domain-oriented data mesh using PySpark, Delta Lake and DQX to achieve true decoupling. Expect practical lessons on designing independent data products, managing lineage and governance, and scaling self-service without chaos.

AWS re:Invent 2025 - iTTi's Cross-Company Data Mesh Blueprint with Amazon SageMaker (ANT342)

This session shares the journey of implementing a hybrid Data Mesh architecture within a multi-company holding, balancing centralization and decentralization needs. We will cover how our hybrid approach leverages Amazon EMR on EKS for data lake ingestion and Amazon SageMaker to enable self-service data discovery, data product subscription, and consumption, allowing companies within the group to autonomously explore, access, and utilize data products while maintaining centralized governance. Through ITTI - Grupo Vázquez's real-world experience, attendees will learn how this hybrid data mesh architecture successfully addresses diverse data domains, varying governance requirements, and rapid value delivery needs.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - A practitioner’s guide to data for agentic AI (DAT315)

In this session, gain the skills needed to deploy end-to-end agentic AI applications using your most valuable data. This session focuses on data management using processes like Model Context Protocol (MCP) and Retrieval Augmented Generation (RAG), and provides concepts that apply to other methods of customizing agentic AI applications. Discover best practice architectures using AWS database services like Amazon Aurora and OpenSearch Service, along with analytical, data processing and streaming experiences found in SageMaker Unified Studio. Learn data lake, governance, and data quality concepts and how Amazon Bedrock AgentCore and Bedrock Knowledge Bases, and other features tie solution components together.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - Architecting the future: Amazon SageMaker as a data and AI platform (ANT351)

Learn how organizations are modernizing their data architectures using the next generation of Amazon SageMaker to create AI-ready data platforms. This session covers how to design and build modern data architectures that enable both innovation and control, examining real-world patterns for data lake consolidation, cross-account governance, and AI integration. This session is ideal for enterprise architects and technical leaders planning large-scale architectural transformations to support their AI/ML initiatives.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - Using graphs over your data lake to power generative AI applications (DAT447)

In this session, learn about new Amazon Neptune capabilities for high-performance graph analytics and queries over data lakes to unlock the implicit and explicit relationships in your data, driving more accurate, trustworthy generative AI responses. We'll demonstrate building knowledge graphs from structured and unstructured data, combining graph algorithms (PageRank, Louvain clustering, path optimization) with semantic search, and executing Cypher queries on Parquet and Iceberg formats in Amazon S3. Through code samples and benchmarks, learn advanced architectures to use Neptune for multi-hop reasoning, entity linking, and context enrichment at scale. This session assumes familiarity with graph concepts and data lake architectures.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

Partners: Grow Your Modern SecOps Practice with the Unified Platform

Step into the future of security operations! Unlock insights and actionable guidance for partners ready to amplify their security practice and grow their revenue streams through the Modern SecOps with Unified Platform solution play. Learn how to capitalize on cutting-edge tools like the new Microsoft Sentinel data lake & Microsoft Sentinel graph to transform threat detection, response, and investigation. Explore GTM resources & incentives to drive impactful customer engagements at every stage.

Below the tip of the Iceberg: How Wikimedia reduced reporting latency 10x using dbt and Iceberg

Learn how the Wikimedia Foundation implemented an on-prem, open source data lake to fund Wikipedia and the future of open knowledge. We'll discuss data architecture including challenges integrating open source tools, learnings from our implementation, how we achieved a 10x decrease in query run times, and more.

How Navy Federal's Enterprise Data Ecosystem Leverages Unity Catalog for Data + AI Governance

Navy Federal Credit Union has 200+ enterprise data sources in the enterprise data lake. These data assets are used for training 100+ machine learning models and hydrating a semantic layer for serving, at an average 4,000 business users daily across the credit union. The only option for extracting data from analytic semantic layer was to allow consuming application to access it via an already-overloaded cloud data warehouse. Visualizing data lineage for 1,000 + data pipelines and associated metadata is impossible and understanding the granular cost for running data pipelines is a challenge. Implementing Unity Catalog opened alternate path for accessing analytic semantic data from lake. It also opened the doors to remove duplicate data assets stored across multiple lakes which will save hundred thousands of dollars in data engineering efforts, compute and storage costs.

Iceberg Table Format Adoption and Unified Metadata Catalog Implementation in Lakehouse Platform

DoorDash Data organization actively adopts LakeHouse paradigm. This presentation describes the methodology which allows to migrate the classic Data Warehouse and Data Lake platforms to unified LakeHouse solution.The objective of this effort include Elimination of excessive data movement.Seamless integration and consolidation of the query engine layers, including Snowflake, Databricks, EMR and Trino.Query performance optimization.Abstracting away complexity of underlying storage layers and table formatsStrategic and justified decision on the Unified Metadata catalog used across varios compute platforms

Optimizing Analytics Infrastructure: Lessons from Migrating Snowflake to Databricks

This session explores the strategic migration from Snowflake to Databricks, focusing on the journey of transforming a data lake to leverage Databricks’ advanced capabilities. It outlines the assessment of key architectural differences, performance benchmarks, and cost implications driving the decision. Attendees will gain insights into planning and execution, including data ingestion pipelines, schema conversion and metadata migration. Challenges such as maintaining data quality, optimizing compute resources and minimizing downtime are discussed, alongside solutions implemented to ensure a seamless transition. The session highlights the benefits of unified analytics and enhanced scalability achieved through Databricks, delivering actionable takeaways for similar migrations.

Master Schema Translations in the Era of Open Data Lake

Unity Catalog puts variety of schemas into a centralized repository, now the developer community wants more productivity and automation for schema inference, translation, evolution and optimization especially for the scenarios of ingestion and reverse-ETL with more code generations.Coinbase Data Platform attempts to pave a path with "Schemaster" to interact with data catalog with the (proposed) metadata model to make schema translation and evolution more manageable across some of the popular systems, such as Delta, Iceberg, Snowflake, Kafka, MongoDB, DynamoDB, Postgres...This Lighting Talk covers 4 areas: The complexity and caveats of schema differences among The proposed field-level metadata model, and 2 translation patterns: point-to-point vs hub-and-spoke Why Data Profiling be augmented to enhance schema understanding and translation Integrate it with Ingestion & Reverse-ETL in a Databricks-oriented eco system Takeaway: standardize schema lineage & translation

Unlocking Enterprise Potential: Key Insights from P&G's Deployment of Unity Catalog at Scale

This session will explore Databricks Unity Catalog (UC) implementation by P&G to enhance data governance, reduce data redundancy and improve the developer experience through the enablement of a Lakehouse architecture. The presentation will cover: The distinction between data treated as a product and standard application data, highlighting how UC's structure maximizes the value of data in P&G's data lake. Real-life examples from two years of using Unity Catalog, demonstrating benefits such as improved governance, reduced waste and enhanced data discovery. Challenges related to disaster recovery and external data access, along with our collaboration with Databricks to address these issues. Sharing our experience can provide valuable insights for organizations planning to adopt Unity Catalog on an enterprise scale.

Story of a Unity Catalog (UC) Migration:  Using UCX at 7-Eleven to Reorient a Complex UC Migration

Unity Catalog (UC) enables governance and security for all data and AI assets within an enterprise’s data lake and is necessary to unlock the full potential of Databricks as a true Data Intelligence Platform. Unfortunately, UC migrations are non-trivial; especially for enterprises that have been using Databricks for more than five years, i.e., 7-Eleven. System Integrators (SIs) offer accelerators, guides, and services to support UC migrations; however, cloud infrastructure changes, anti-patterns within code, and data sprawl can significantly complicate UC migrations. There is no “shortcut” to success when planning and executing a complex UC migration. In this session, we will share how UCX by Databricks Labs, a UC Migration Assistant, allowed 7-Eleven to reorient their UC migration by leveraging assessments and workflows, etc., to assess, characterize, and ultimately plan a tenable approach for their UC migration.

Sponsored by: Deloitte | Advancing AI in Cybersecurity with Databricks & Deloitte: Data Management & Analytics

Deloitte is observing a growing trend among cybersecurity organizations to develop big data management and analytics solutions beyond traditional Security Information and Event Management (SIEM) systems. Leveraging Databricks to extend these SIEM capabilities, Deloitte can help clients lower the cost of cyber data management while enabling scalable, cloud-native architectures. Deloitte helps clients design and implement cybersecurity data meshes, using Databricks as a foundational data lake platform to unify and govern security data at scale. Additionally, Deloitte extends clients’ cybersecurity capabilities by integrating advanced AI and machine learning solutions on Databricks, driving more proactive and automated cybersecurity solutions. Attendees will gain insight into how Deloitte is utilizing Databricks to manage enterprise cyber risks and deliver performant and innovative analytics and AI insights that traditional security tools and data platforms aren’t able to deliver.

ViewShift: Dynamic Policy Enforcement With Spark and SQL Views

Dynamic policy enforcement is increasingly critical in today's landscape, where data compliance is a top priorities for companies, individuals, and regulators alike. In this talk, Walaa explores how LinkedIn has implemented a robust dynamic policy enforcement engine, ViewShift, and integrated it within its data lake. He will demystify LinkedIn's query engine stack by demonstrating how catalogs can automatically route table resolutions to compliance-enforcing SQL views. These SQL views possess several noteworthy properties: Auto-Generated: Created automatically from declarative data annotations. User-Centric: They honor user-level consent and preferences. Context-Aware: They apply different transformations tailored to specific use cases. Portable: Despite the SQL logic being implemented in a single dialect, it remains accessible across all engines. Join this session to learn how ViewShift helps ensure that compliance is seamlessly integrated into data processing workflows.

Comprehensive Data Management and Governance With Azure Data Lake Storage

Given that data is the new oil, it must be treated as such. Organizations that pursue greater insight into their businesses and their customers must manage, govern, protect and observe the use of the data that drives these insights in an efficient, cost-effective, compliant and auditable manner without degrading access to that data. Azure Data Lake Storage offers many features which allow customers to apply such controls and protections to their critical data assets. Understanding how these features behave, the granularity, cost and scale implications and the degree of control or protection that they apply are essential to implement a data lake that reflects the value contained within. In this session, the various data protection, governance and management capabilities available now and upcoming in ADLS will be discussed. This will include how deep integration with Azure Databricks can provide a more comprehensive, end-to-end coverage for these concerns, yielding a highly efficient and effective data governance solution.

Sponsored by: Microsoft | Leverage the power of the Microsoft Ecosystem with Azure Databricks

Join us for this insightful session to learn how you can leverage the power of the Microsoft ecosystem along with Azure Databricks to take your business to the next level. Azure Databricks is a fully integrated, native, first-party solution on Microsoft Azure. Databricks and Microsoft continue to actively collaborate on product development, ensuring tight integration, optimized performance, and a streamlined support experience. Azure Databricks offers seamless integrations with Power BI, Azure Open AI, Microsoft Purview, Azure Data Lake Storage (ADLS) and Foundry. In this session, you’ll learn how you can leverage deep integration between Azure Databricks and the Microsoft solutions to empower your organization to do more with your data estate. You’ll also get an exclusive sneak peek into the product roadmap.

Empowering Healthcare Insights: A Unified Lakehouse Approach With Databricks

NHS England is revolutionizing healthcare research by enabling secure, seamless access to de-identified patient data through the Federated Data Platform (FDP). Despite vast data resources spread across regional and national systems, analysts struggle with fragmented, inconsistent datasets. Enter Databricks: powering a unified, virtual data lake with Unity Catalog at its core — integrating diverse NHS systems while ensuring compliance and security. By bridging AWS and Azure environments with a private exchange and leveraging the Iceberg connector to interface with Palantir, analysts gain scalable, reliable and governed access to vital healthcare data. This talk explores how this innovative architecture is driving actionable insights, accelerating research and ultimately improving patient outcomes.

Redesigning Kaizen's Cloud Data Lake for the Future

At Kaizen Gaming, data drives our decision-making, but rapid growth exposed inefficiencies in our legacy cloud setup — escalating costs, delayed insights and scalability limits. Operating in 18 countries with 350M daily transactions (1PB+), shared quotas and limited cost transparency hindered efficiency. To address this, we redesigned our cloud architecture with Data Landing Zones, a modular framework that decouples resources, enabling independent scaling and cost accountability. Automation streamlined infrastructure, reduced overhead and enhanced FinOps visibility, while Unity Catalog ensured governance and security. Migration challenges included maintaining stability, managing costs and minimizing latency. A phased approach, Delta Sharing, and DBx Asset Bundles simplified transitions. The result: faster insights, improved cost control and reduced onboarding time, fostering innovation and efficiency. We share our transformation, offering insights for modern cloud optimization.