talk-data.com talk-data.com

Topic

Data Management

data_governance data_quality metadata_management

84

tagged

Activity Trend

88 peak/qtr
2020-Q1 2026-Q1

Activities

84 activities · Newest first

Keynote by Lisa Amini- What’s Next in AI for Data and Data Management?

Advances in large language models (LLMs) have propelled a recent flurry of AI tools for data management and operations. For example, AI-powered code assistants leverage LLMs to generate code for dataflow pipelines. RAG pipelines enable LLMs to ground responses with relevant information from external data sources. Data agents leverage LLMs to turn natural language questions into data-driven answers and actions. While challenges remain, these advances are opening exciting new opportunities for data scientists and engineers. In this talk, we will examine recent advances, along with some still incubating in research labs, with the goal of understanding where this is all heading, and present our perspective on what’s next for AI in data management and data operations.

AWS re:Invent 2025 - Data Processing architectures for building AI solutions (ANT328)

Prepare to revolutionize your data infrastructure for the AI era with Amazon EMR, AWS Glue, and Amazon Athena. This session will guide you through leveraging these powerful AWS services to construct robust, scalable data architectures that empower AI solutions at scale. Gain insights into effective architectural strategies for data processing to build AI applications, optimizing for cost-efficiency and security. Explore architectural frameworks that underpin successful AI-driven data initiatives, and learn from field lessons on how to navigate modernization projects. Whether you’re starting your modernization journey or refining current setups, this session offers practical strategies to fast-track your organization towards achieving excellence in AI-powered data management.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - How Samsung optimized 1.2 PB on Amazon DynamoDB with zero downtime (DAT323)

Join Samsung Cloud to learn how they successfully implemented a massive-scale table optimization, handling 1.2 PB on Amazon DynamoDB with zero downtime. Follow the innovative journey behind Samsung Internet's tab sync feature, showcasing how the team developed an in-house solution to handle complex data transfer achieving substantial cost savings without third-party dependencies. Learn how this implementation resulted in the creation of a new standard framework that now serves as Samsung's blueprint for future data management initiatives. Gain practical insights into building scalable solutions through deep service domain understanding and discover actionable strategies to tackle similar challenges in your organization.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - A practitioner’s guide to data for agentic AI (DAT315)

In this session, gain the skills needed to deploy end-to-end agentic AI applications using your most valuable data. This session focuses on data management using processes like Model Context Protocol (MCP) and Retrieval Augmented Generation (RAG), and provides concepts that apply to other methods of customizing agentic AI applications. Discover best practice architectures using AWS database services like Amazon Aurora and OpenSearch Service, along with analytical, data processing and streaming experiences found in SageMaker Unified Studio. Learn data lake, governance, and data quality concepts and how Amazon Bedrock AgentCore and Bedrock Knowledge Bases, and other features tie solution components together.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 -What’s new in search, observability, and vector databases w/ OpenSearch (ANT201)

Discover the latest Amazon OpenSearch Service launches and capabilities that enable and quickly deploy agentic AI applications and vector search operations. Learn how new integrations with Amazon Q enable intelligent data discovery and automated insights, while enhanced Amazon S3 connectivity streamlines data management. This session showcases how our latest vector database optimizations accelerate AI/ML workloads for efficient development of agentic AI, semantic search, and recommendation systems. We'll demonstrate new cost optimization features and performance enhancements across all OpenSearch use cases, including significant updates to Observability. Whether you're building next-generation AI applications or scaling your existing search infrastructure, join us for a comprehensive update on new launches and releases that can transform your search and analytics capabilities.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

Iceberg Geo Type: Transforming Geospatial Data Management at Scale

The Apache Iceberg™ community is introducing native geospatial type support, addressing key challenges in managing geospatial data at scale, including fragmented formats and inefficiencies in storing large spatial datasets. This talk will delve into the origins of the Iceberg geo type, its specification design and future goals. We will examine the impact on both the geospatial and Iceberg communities, in introducing a standard data warehouse storage layer to the geospatial community, and enabling optimized geospatial analytics for Iceberg users. We will also present a live demonstration of the Iceberg geo data type with Apache Sedona™ and Apache Spark™, showcasing how it simplifies and accelerates geospatial analytics workflows and queries. Finally, we will also provide an in-depth look at its current capabilities and outline the roadmap for future developments, and offer a perspective on its role in advancing geospatial data management in the industry.

Get the Most of Your Delta Lake

Unlock the full potential of Delta Lake, the open-source storage framework for Apache Spark, with this session focused on its latest and most impactful features. Discover how capabilities like Time Travel, Column Mapping, Deletion Vectors, Liquid Clustering, UniForm interoperability, and Change Data Feed (CDF) can transform your data architecture. Learn not just what these features do, but when and how to use them to maximize performance, simplify data management, and enable advanced analytics across your lakehouse environment.

Sponsored by: Acceldata | Agentic Data Management: Trusted Data for Enterprise AI on Databricks

An intelligent, action-driven approach to bridge Data Engineering and AI/ML workflows, delivering continuous data trust through comprehensive monitoring, validation, and remediation across the entire Databricks data lifecycle. Learn how Acceldata’s Agentic Data Management (ADM) platform: Ensures end-to-end data reliability across Databricks from ingestion, transformation, feature engineering, and model deployment. Bridges data engineering and AI teams by providing unified insights across Databricks jobs, notebooks and pipelines with proactive data insights and actions. Accelerates the delivery of trustworthy enterprise AI outcomes by detecting multi-variate anomalies, monitoring feature drift, and maintaining lineage within Databricks-native environments.

Sponsored by: West Monroe | Disruptive Forces: LLMs and the New Age of Data Engineering

Seismic shift Large Language Models are unleashing on data engineering, challenging traditional workflows. LLMs obliterate inefficiencies and redefine productivity. AI powerhouses automate complex tasks like documentation, code translation, and data model development with unprecedented speed and precision. Integrating LLMs into tools promises to reduce offshore dependency, fostering agile onshore innovation. Harnessing LLMs' full potential involves challenges, requiring deep dives into domain-specific data and strategic business alignment. Session will addresses deploying LLMs effectively, overcoming data management hurdles, and fostering collaboration between engineers and stakeholders. Join us to explore a future where LLMs redefine possibilities, inviting you to embrace AI-driven innovation and position your organization as a leader in data engineering.

Sponsored by: KPMG | Enhancing Regulatory Compliance through Data Quality and Traceability

In highly regulated industries like financial services, maintaining data quality is an ongoing challenge. Reactive measures often fail to prevent regulatory penalties, causing inaccuracies in reporting and inefficiencies due to poor data visibility. Regulators closely examine the origins and accuracy of reporting calculations to ensure compliance. A robust system for data quality and lineage is crucial. Organizations are utilizing Databricks to proactively improve data quality through rules-based and AI/ML-driven methods. This fosters complete visibility across IT, data management, and business operations, facilitating rapid issue resolution and continuous data quality enhancement. The outcome is quicker, more accurate, transparent financial reporting. We will detail a framework for data observability and offer practical examples of implementing quality checks throughout the data lifecycle, specifically focusing on creating data pipelines for regulatory reporting,

Empowering the Warfighter With AI

The new Budget Execution Validation process has transformed how the Navy reviews unspent funds. Powered by Databricks Workflows, MLflow, Delta Lake and Apache Spark™, this data-driven model predicts which financial transactions are most likely to have errors, streamlining reviews and increasing accuracy. In FY24, it helped review $40 billion, freeing $1.1 billion for other priorities, including $260 million from active projects. By reducing reviews by 80%, cutting job runtime by over 50% and lowering costs by 60%, it saved 218,000 work hours and $6.7 million in labor costs. With automated workflows and robust data management, this system exemplifies how advanced tools can improve financial decision-making, save resources and ensure efficient use of taxpayer dollars.

Sponsored by: Prophecy | Ready for GenAI? Survey Says Governed Self-Service Is the New Playbook for Data Teams

Are data teams ready for AI? Prophecy’s exclusive survey, “The Impact of GenAI on Data Teams”, gives the clearest picture yet of GenAI’s potential in data management, and what’s standing in the way. The top two obstacles? Poor governance and slow access to high-quality data. The message is clear: Modernizing your data platform with Databricks is essential. But it’s only the beginning. To unlock the power of AI and analytics, organizations must deliver governed, self-service access to clean, trusted data. Traditional data prep tools introduce risks around security, quality, and cost. It’s no wonder data leaders cited data transformation as the area where GenAI will make the biggest impact. To deliver what’s needed teams need a shift to governed self-service. Data analysts and scientists move fast while staying within IT’s guardrails. Join us to learn more details from the survey and how leading organizations are ahead of the curve, using GenAI to reshape how data gets done.

Sponsored by: Boomi, LP | From Pipelines to Agents: Manage Data and AI on One Platform for Maximum ROI

In the age of agentic AI, competitive advantage lies not only in AI models, but in the quality of the data agents reason on and the agility of the tools that feed them. To fully realize the ROI of agentic AI, organizations need a platform that enables high-quality data pipelines and provides scalable, enterprise-grade tools. In this session, discover how a unified platform for integration, data management, MCP server management, API management, and agent orchestration can help you to bring cohesion and control to how data and agents are used across your organization.

Unleashing Data Governance at iFood:Harnessing System Tables and Lineage for Dynamic Tag Propagation

With regulations like LGPD (Brazil's General Data Protection Law) and GDPR, managing sensitive data access is critical. This session demonstrates how to leverage Databricks Unity Catalog system tables and data lineage to dynamically propagate classification tags, empowering organizations to monitor governance and ensure compliance. The presentation covers practical steps, including system table usage, data normalization, ingestion with Lakeflow Declarative Pipelines and classification tag propagation to downstream tables. It also explores permission monitoring with alerts to proactively address governance risks. Designed for advanced audiences, this session offers actionable strategies to strengthen data governance, prevent breaches and avoid regulatory fines while building scalable frameworks for sensitive data management.

Sponsored by: Informatica | Modernize analytics and empower AI in Databricks with trusted data using Informatica

As enterprises continue their journey to the cloud, data warehouse and data management modernization is essential to optimize analytics and drive business outcomes. Minimizing modernization timelines is important for reducing risk and shortening time to value – and ensuring enterprise data is clean, curated and governed is imperative to enable analytics and AI initiatives. In this session, learn how Informatica's Intelligent Data Management Cloud (IDMC) empowers analytics and AI on Databricks by helping data teams: · Develop no-code/low-code data pipelines that ingest, transform and clean data at enterprise scale · Improve data quality and extend enterprise governance with Informatica Cloud Data Governance and Catalog (CDGC) and Unity Catalog · Accelerate pilot-to-production with Mosaic AI

Best Practices for Moving to Unity Catalog Managed Tables

Are you ready to unlock the full power of Unity Catalog managed tables? This session delivers actionable insights for transitioning to UC managed tables. Learn why managed tables are the default for performance and ease of use, and how automatic feature upgrades future-proof your architecture. Whether you manage thousands of tables or want to streamline operations, you’ll gain the tools and strategies to thrive in the era of intelligent data management. Join us and discover how easy it is to move to UC managed tables!

Sponsored by: Informatica | Power Analytics and AI on Databricks With Master (Golden) Record Data

Supercharge advanced analytics and AI insights on Databricks with accurate and consistent master data. This session explores how Informatica’s Master Data Management (MDM) integrates with Databricks to provide high-quality, integrated golden record data like customer, supplier, product 360 or reference data to support downstream analytics, Generative AI and Agentic AI. Enterprises can accelerate and de-risk the process of creating a golden record via a no-code/low-code interface, allowing data teams to quickly integrate siloed data and create a complete and consistent record that improves decision-making speed and accuracy.

No Time for the Dad Bod: Automating Life with AI and Databricks

Life as a father, tech leader, and fitness enthusiast demands efficiency. To reclaim my time, I’ve built AI-driven solutions that automate everyday tasks—from research agents that prep for podcasts to multi-agent systems that plan meals—all powered by real-time data and automation. This session dives into the technical foundations of these solutions, focusing on event-driven agent design and scalable patterns for robust AI systems. You’ll discover how Databricks technologies like Delta Lake, for reliable and scalable data management, and DSPy, for streamlining the development of generative AI workflows, empower seamless decision-making and deliver actionable insights. Through detailed architecture diagrams and a live demo, I’ll showcase how to design systems that process data in motion to tackle complex, real-world problems. Whether you’re an engineer, architect, or data scientist, you’ll leave with practical strategies to integrate AI-driven automation into your workflows.

Scaling Modern MDM With Databricks, Delta Sharing and Dun & Bradstreet

Master Data Management (MDM) is the foundation of a successful enterprise data strategy — delivering consistency, accuracy and trust across all systems that depend on reliable data. But how can organizations integrate trusted third-party data to enhance their MDM frameworks? How can they ensure that this master data is securely and efficiently shared across internal platforms and external ecosystems? This session explores how Dun & Bradstreet’s pre-mastered data serves as a single source of truth for customers, suppliers and vendors — reducing duplication and driving alignment across enterprise systems. With Delta Sharing, organizations can natively ingest Dun & Bradstreet data into their Databricks environment and establish a scalable, interoperable MDM framework. Delta Sharing also enables secure, real-time distribution of master data across the enterprise ensuring that every system operates from a consistent and trusted foundation.

AI Powering Epsilon's Identity Strategy: Unified Marketing Platform on Databricks

Join us to hear about how Epsilon Data Management migrated Epsilon’s unique, AI-powered marketing identity solution from multi-petabyte on-prem Hadoop and data warehouse systems to a unified Databricks Lakehouse platform. This transition enabled Epsilon to further scale its Decision Sciences solution and enable new cloud-based AI research capabilities on time and within budget, without being bottlenecked by the resource constraints of on-prem systems. Learn how Delta Lake, Unity Catalog, MLflow and LLM endpoints powered massive data volume, reduced data duplication, improved lineage visibility, accelerated Data Science and AI, and enabled new data to be immediately available for consumption by the entire Epsilon platform in a privacy-safe way. Using the Databricks platform as the base for AI and Data Science at global internet scale, Epsilon deploys marketing solutions across multiple cloud providers and multiple regions for many customers.