talk-data.com talk-data.com

Topic

Data Management

data_governance data_quality metadata_management

32

tagged

Activity Trend

88 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: Data + AI Summit 2025 ×
Iceberg Geo Type: Transforming Geospatial Data Management at Scale

The Apache Iceberg™ community is introducing native geospatial type support, addressing key challenges in managing geospatial data at scale, including fragmented formats and inefficiencies in storing large spatial datasets. This talk will delve into the origins of the Iceberg geo type, its specification design and future goals. We will examine the impact on both the geospatial and Iceberg communities, in introducing a standard data warehouse storage layer to the geospatial community, and enabling optimized geospatial analytics for Iceberg users. We will also present a live demonstration of the Iceberg geo data type with Apache Sedona™ and Apache Spark™, showcasing how it simplifies and accelerates geospatial analytics workflows and queries. Finally, we will also provide an in-depth look at its current capabilities and outline the roadmap for future developments, and offer a perspective on its role in advancing geospatial data management in the industry.

Get the Most of Your Delta Lake

Unlock the full potential of Delta Lake, the open-source storage framework for Apache Spark, with this session focused on its latest and most impactful features. Discover how capabilities like Time Travel, Column Mapping, Deletion Vectors, Liquid Clustering, UniForm interoperability, and Change Data Feed (CDF) can transform your data architecture. Learn not just what these features do, but when and how to use them to maximize performance, simplify data management, and enable advanced analytics across your lakehouse environment.

Sponsored by: Acceldata | Agentic Data Management: Trusted Data for Enterprise AI on Databricks

An intelligent, action-driven approach to bridge Data Engineering and AI/ML workflows, delivering continuous data trust through comprehensive monitoring, validation, and remediation across the entire Databricks data lifecycle. Learn how Acceldata’s Agentic Data Management (ADM) platform: Ensures end-to-end data reliability across Databricks from ingestion, transformation, feature engineering, and model deployment. Bridges data engineering and AI teams by providing unified insights across Databricks jobs, notebooks and pipelines with proactive data insights and actions. Accelerates the delivery of trustworthy enterprise AI outcomes by detecting multi-variate anomalies, monitoring feature drift, and maintaining lineage within Databricks-native environments.

Sponsored by: West Monroe | Disruptive Forces: LLMs and the New Age of Data Engineering

Seismic shift Large Language Models are unleashing on data engineering, challenging traditional workflows. LLMs obliterate inefficiencies and redefine productivity. AI powerhouses automate complex tasks like documentation, code translation, and data model development with unprecedented speed and precision. Integrating LLMs into tools promises to reduce offshore dependency, fostering agile onshore innovation. Harnessing LLMs' full potential involves challenges, requiring deep dives into domain-specific data and strategic business alignment. Session will addresses deploying LLMs effectively, overcoming data management hurdles, and fostering collaboration between engineers and stakeholders. Join us to explore a future where LLMs redefine possibilities, inviting you to embrace AI-driven innovation and position your organization as a leader in data engineering.

Retail data is expanding at an unprecedented rate, demanding a scalable, cost-efficient, and near real-time architecture. At Unilever, we transformed our data management approach by leveraging Databricks Lakeflow Declarative Pipelines, achieving approximately $500K in cost savings while accelerating computation speeds by 200–500%.By adopting a streaming-driven architecture, we built a system where data flows continuously across processing layers, enabling real-time updates with minimal latency.Lakeflow Declarative Pipelines' serverless simplicity replaced complex-dependency management, reducing maintenance overhead, and improving pipeline reliability. Lakeflow Declarative Pipelines Direct Publishing further enhanced data segmentation, concurrency, and governance, ensuring efficient and scalable data operations while simplifying workflows.This transformation empowers Unilever to manage data with greater efficiency, scalability, and reduced costs, creating a future-ready infrastructure that evolves with the needs of our retail partners and customers.

Sponsored by: KPMG | Enhancing Regulatory Compliance through Data Quality and Traceability

In highly regulated industries like financial services, maintaining data quality is an ongoing challenge. Reactive measures often fail to prevent regulatory penalties, causing inaccuracies in reporting and inefficiencies due to poor data visibility. Regulators closely examine the origins and accuracy of reporting calculations to ensure compliance. A robust system for data quality and lineage is crucial. Organizations are utilizing Databricks to proactively improve data quality through rules-based and AI/ML-driven methods. This fosters complete visibility across IT, data management, and business operations, facilitating rapid issue resolution and continuous data quality enhancement. The outcome is quicker, more accurate, transparent financial reporting. We will detail a framework for data observability and offer practical examples of implementing quality checks throughout the data lifecycle, specifically focusing on creating data pipelines for regulatory reporting,

Empowering the Warfighter With AI

The new Budget Execution Validation process has transformed how the Navy reviews unspent funds. Powered by Databricks Workflows, MLflow, Delta Lake and Apache Spark™, this data-driven model predicts which financial transactions are most likely to have errors, streamlining reviews and increasing accuracy. In FY24, it helped review $40 billion, freeing $1.1 billion for other priorities, including $260 million from active projects. By reducing reviews by 80%, cutting job runtime by over 50% and lowering costs by 60%, it saved 218,000 work hours and $6.7 million in labor costs. With automated workflows and robust data management, this system exemplifies how advanced tools can improve financial decision-making, save resources and ensure efficient use of taxpayer dollars.

Sponsored by: Prophecy | Ready for GenAI? Survey Says Governed Self-Service Is the New Playbook for Data Teams

Are data teams ready for AI? Prophecy’s exclusive survey, “The Impact of GenAI on Data Teams”, gives the clearest picture yet of GenAI’s potential in data management, and what’s standing in the way. The top two obstacles? Poor governance and slow access to high-quality data. The message is clear: Modernizing your data platform with Databricks is essential. But it’s only the beginning. To unlock the power of AI and analytics, organizations must deliver governed, self-service access to clean, trusted data. Traditional data prep tools introduce risks around security, quality, and cost. It’s no wonder data leaders cited data transformation as the area where GenAI will make the biggest impact. To deliver what’s needed teams need a shift to governed self-service. Data analysts and scientists move fast while staying within IT’s guardrails. Join us to learn more details from the survey and how leading organizations are ahead of the curve, using GenAI to reshape how data gets done.

Sponsored by: Boomi, LP | From Pipelines to Agents: Manage Data and AI on One Platform for Maximum ROI

In the age of agentic AI, competitive advantage lies not only in AI models, but in the quality of the data agents reason on and the agility of the tools that feed them. To fully realize the ROI of agentic AI, organizations need a platform that enables high-quality data pipelines and provides scalable, enterprise-grade tools. In this session, discover how a unified platform for integration, data management, MCP server management, API management, and agent orchestration can help you to bring cohesion and control to how data and agents are used across your organization.

Unleashing Data Governance at iFood:Harnessing System Tables and Lineage for Dynamic Tag Propagation

With regulations like LGPD (Brazil's General Data Protection Law) and GDPR, managing sensitive data access is critical. This session demonstrates how to leverage Databricks Unity Catalog system tables and data lineage to dynamically propagate classification tags, empowering organizations to monitor governance and ensure compliance. The presentation covers practical steps, including system table usage, data normalization, ingestion with Lakeflow Declarative Pipelines and classification tag propagation to downstream tables. It also explores permission monitoring with alerts to proactively address governance risks. Designed for advanced audiences, this session offers actionable strategies to strengthen data governance, prevent breaches and avoid regulatory fines while building scalable frameworks for sensitive data management.

Sponsored by: Informatica | Modernize analytics and empower AI in Databricks with trusted data using Informatica

As enterprises continue their journey to the cloud, data warehouse and data management modernization is essential to optimize analytics and drive business outcomes. Minimizing modernization timelines is important for reducing risk and shortening time to value – and ensuring enterprise data is clean, curated and governed is imperative to enable analytics and AI initiatives. In this session, learn how Informatica's Intelligent Data Management Cloud (IDMC) empowers analytics and AI on Databricks by helping data teams: · Develop no-code/low-code data pipelines that ingest, transform and clean data at enterprise scale · Improve data quality and extend enterprise governance with Informatica Cloud Data Governance and Catalog (CDGC) and Unity Catalog · Accelerate pilot-to-production with Mosaic AI

Best Practices for Moving to Unity Catalog Managed Tables

Are you ready to unlock the full power of Unity Catalog managed tables? This session delivers actionable insights for transitioning to UC managed tables. Learn why managed tables are the default for performance and ease of use, and how automatic feature upgrades future-proof your architecture. Whether you manage thousands of tables or want to streamline operations, you’ll gain the tools and strategies to thrive in the era of intelligent data management. Join us and discover how easy it is to move to UC managed tables!

Sponsored by: Informatica | Power Analytics and AI on Databricks With Master (Golden) Record Data

Supercharge advanced analytics and AI insights on Databricks with accurate and consistent master data. This session explores how Informatica’s Master Data Management (MDM) integrates with Databricks to provide high-quality, integrated golden record data like customer, supplier, product 360 or reference data to support downstream analytics, Generative AI and Agentic AI. Enterprises can accelerate and de-risk the process of creating a golden record via a no-code/low-code interface, allowing data teams to quickly integrate siloed data and create a complete and consistent record that improves decision-making speed and accuracy.

No Time for the Dad Bod: Automating Life with AI and Databricks

Life as a father, tech leader, and fitness enthusiast demands efficiency. To reclaim my time, I’ve built AI-driven solutions that automate everyday tasks—from research agents that prep for podcasts to multi-agent systems that plan meals—all powered by real-time data and automation. This session dives into the technical foundations of these solutions, focusing on event-driven agent design and scalable patterns for robust AI systems. You’ll discover how Databricks technologies like Delta Lake, for reliable and scalable data management, and DSPy, for streamlining the development of generative AI workflows, empower seamless decision-making and deliver actionable insights. Through detailed architecture diagrams and a live demo, I’ll showcase how to design systems that process data in motion to tackle complex, real-world problems. Whether you’re an engineer, architect, or data scientist, you’ll leave with practical strategies to integrate AI-driven automation into your workflows.

Scaling Modern MDM With Databricks, Delta Sharing and Dun & Bradstreet

Master Data Management (MDM) is the foundation of a successful enterprise data strategy — delivering consistency, accuracy and trust across all systems that depend on reliable data. But how can organizations integrate trusted third-party data to enhance their MDM frameworks? How can they ensure that this master data is securely and efficiently shared across internal platforms and external ecosystems? This session explores how Dun & Bradstreet’s pre-mastered data serves as a single source of truth for customers, suppliers and vendors — reducing duplication and driving alignment across enterprise systems. With Delta Sharing, organizations can natively ingest Dun & Bradstreet data into their Databricks environment and establish a scalable, interoperable MDM framework. Delta Sharing also enables secure, real-time distribution of master data across the enterprise ensuring that every system operates from a consistent and trusted foundation.

AI Powering Epsilon's Identity Strategy: Unified Marketing Platform on Databricks

Join us to hear about how Epsilon Data Management migrated Epsilon’s unique, AI-powered marketing identity solution from multi-petabyte on-prem Hadoop and data warehouse systems to a unified Databricks Lakehouse platform. This transition enabled Epsilon to further scale its Decision Sciences solution and enable new cloud-based AI research capabilities on time and within budget, without being bottlenecked by the resource constraints of on-prem systems. Learn how Delta Lake, Unity Catalog, MLflow and LLM endpoints powered massive data volume, reduced data duplication, improved lineage visibility, accelerated Data Science and AI, and enabled new data to be immediately available for consumption by the entire Epsilon platform in a privacy-safe way. Using the Databricks platform as the base for AI and Data Science at global internet scale, Epsilon deploys marketing solutions across multiple cloud providers and multiple regions for many customers.

Sponsored by: Deloitte | Advancing AI in Cybersecurity with Databricks & Deloitte: Data Management & Analytics

Deloitte is observing a growing trend among cybersecurity organizations to develop big data management and analytics solutions beyond traditional Security Information and Event Management (SIEM) systems. Leveraging Databricks to extend these SIEM capabilities, Deloitte can help clients lower the cost of cyber data management while enabling scalable, cloud-native architectures. Deloitte helps clients design and implement cybersecurity data meshes, using Databricks as a foundational data lake platform to unify and govern security data at scale. Additionally, Deloitte extends clients’ cybersecurity capabilities by integrating advanced AI and machine learning solutions on Databricks, driving more proactive and automated cybersecurity solutions. Attendees will gain insight into how Deloitte is utilizing Databricks to manage enterprise cyber risks and deliver performant and innovative analytics and AI insights that traditional security tools and data platforms aren’t able to deliver.

Using Catalogs for a Well-Governed and Efficient Data Ecosystem

The ability to enforce data management controls at scale and reduce the effort required to manage data pipelines is critical to operating efficiently. Capital One has scaled its data management capabilities and invested in platforms to help address this need. In the past couple of years, the role of “the catalog” in a data platform architecture has transitioned from just providing SQL to providing a full suite of capabilities that can help solve this problem at scale. This talk will give insight into how Capital One is thinking about leveraging Databricks Unity Catalog to help tackle these challenges.

Patients Are Waiting...Accelerating Healthcare Innovation with Data, AI and Agents

This session is repeated. In an era of exponential data growth, organizations across industries face common challenges in transforming raw data into actionable insights. This presentation showcases how Novo Nordisk is pioneering insights generation approaches to clinical data management and AI. Using our clinical trials platform FounData, built on Databricks, we demonstrate how proper data architecture enables advanced AI applications. We'll introduce a multi-agent AI framework that revolutionizes data interaction, combining specialized AI agents to guide users through complex datasets. While our focus is on clinical data, these principles apply across sectors – from manufacturing to financial services. Learn how democratizing access to data and AI capabilities can transform organizational efficiency while maintaining governance. Through this real-world implementation, participants will gain insights on building scalable data architectures and leveraging multi-agent AI frameworks for responsible innovation.

Accelerating Model Development and Fine-Tuning on Databricks with TwelveLabs

Scaling large language models (LLMs) and multimodal architectures requires efficient data management and computational power. NVIDIA NeMo Framework Megatron-LM on Databricks is an open source solution that integrates GPU acceleration and advanced parallelism with Databricks Delta Lakehouse, streamlining workflows for pre-training and fine-tuning models at scale. This session highlights context parallelism, a unique NeMo capability for parallelizing over sequence lengths, making it ideal for video datasets with large embeddings. Through the case study of TwelveLabs’ Pegasus-1 model, learn how NeMo empowers scalable multimodal AI development, from text to video processing, setting a new standard for LLM workflows.