talk-data.com talk-data.com

Topic

S3

Amazon S3

object_storage cloud_storage aws

104

tagged

Activity Trend

11 peak/qtr
2020-Q1 2026-Q1

Activities

104 activities · Newest first

AWS re:Invent 2025 - Best practices for building Apache Iceberg based lakehouse architectures on AWS

Discover advanced strategies for implementing Apache Iceberg on AWS, focusing on Amazon S3 Tables and integration of Iceberg Rest Catalog with the lakehouse in Amazon SageMaker. We'll cover performance optimization techniques for Amazon Athena and Amazon Redshift queries, real-time processing using Apache Spark, and integration with Amazon EMR, AWS Glue, and Trino. Explore practical implementations of zero-ETL, change data capture (CDC) patterns, and medallion architecture. Gain hands-on expertise in implementing enterprise-grade lakehouse solutions with Iceberg on AWS.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 -What’s new in search, observability, and vector databases w/ OpenSearch (ANT201)

Discover the latest Amazon OpenSearch Service launches and capabilities that enable and quickly deploy agentic AI applications and vector search operations. Learn how new integrations with Amazon Q enable intelligent data discovery and automated insights, while enhanced Amazon S3 connectivity streamlines data management. This session showcases how our latest vector database optimizations accelerate AI/ML workloads for efficient development of agentic AI, semantic search, and recommendation systems. We'll demonstrate new cost optimization features and performance enhancements across all OpenSearch use cases, including significant updates to Observability. Whether you're building next-generation AI applications or scaling your existing search infrastructure, join us for a comprehensive update on new launches and releases that can transform your search and analytics capabilities.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - Keynote Customer - TwelveLabs

Co-founder and CEO Jae Lee shares how TwelveLabs built video intelligence AI foundation models on AWS to process millions of hours at petabyte scale using Amazon S3 and S3 Vectors to store billions of embeddings that provide actionable insights.

Learn more about AWS events: https://go.aws/events

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world’s most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSEvents

AWS re:Invent 2025 - Intelligent Observability & Modernization w/ Amazon OpenSearch Service (ANT315)

Discover how Amazon OpenSearch Service is evolving beyond traditional search and analytics to power next-generation observability. We'll showcase how organizations can reduce operational costs by modernizing their observability stack using OpenTelemetry, OpenSearch, S3, and CloudWatch. We'll demonstrate building sophisticated observability solutions that combine OpenSearch's real-time analytics with AI-powered insights using Amazon Q.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - What's new in Amazon Redshift and Amazon Athena (ANT206)

Learn how AWS is enhancing its SQL analytics offerings with new capabilities in Amazon Redshift and Amazon Athena. Discover how Redshift's AI-powered data warehousing capabilities are enabling customers to modernize their analytics workloads with enhanced performance and cost optimization. Explore Athena's latest features for interactively querying data directly in their Amazon S3 data lakes. This session showcases new features and real-world examples of how organizations are using these services to accelerate business insights while optimizing costs.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - Turn unstructured data in Amazon S3 into AI-ready assets with SageMaker Catalog

Unstructured data often holds untapped value, and Amazon SageMaker makes it possible to turn that data into insights and AI-ready assets. In this session, you'll learn how to bring unstructured data from Amazon S3 into SageMaker, create searchable assets, and build knowledge bases for Amazon Bedrock to improve retrieval-augmented generation (RAG) accuracy. Discover how teams can collaborate across roles, data users can self-serve to find and understand the right data, and governance ensures that the right people get the right access. Bayer will share how they use these capabilities to unlock unstructured data and accelerate research and innovation.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - Using graphs over your data lake to power generative AI applications (DAT447)

In this session, learn about new Amazon Neptune capabilities for high-performance graph analytics and queries over data lakes to unlock the implicit and explicit relationships in your data, driving more accurate, trustworthy generative AI responses. We'll demonstrate building knowledge graphs from structured and unstructured data, combining graph algorithms (PageRank, Louvain clustering, path optimization) with semantic search, and executing Cypher queries on Parquet and Iceberg formats in Amazon S3. Through code samples and benchmarks, learn advanced architectures to use Neptune for multi-hop reasoning, entity linking, and context enrichment at scale. This session assumes familiarity with graph concepts and data lake architectures.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - Enterprise-scale ETL optimization for Apache Spark (ANT336)

Apache Spark on AWS Glue, Amazon EMR, and Amazon SageMaker enhances the optimization of large-scale data processing workloads. These include faster read and write throughput, accelerated processing of common file formats, and expanded Amazon S3 support through the S3A protocol for greater flexibility in write operations. In this session, we'll explore recent enhancements in Spark for distributed computation and in-memory storage to enable efficient data aggregation and job optimization. We'll also demonstrate how these innovations, combined with Spark's native capabilities, strengthen governance and encryption to help you optimize performance while maintaining control and compliance. Join us to learn how to build unified, secure, and high-performance ETL pipelines on AWS using Spark.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

Best practice for leveraging Amazon Analytic Services + dbt

As organizations increasingly adopt modern data stacks, the combination of dbt and AWS Analytics services emerged as a powerful pairing for analytics engineering at scale. This session will explore proven strategies and hard-learned lessons for optimizing this technology stack to use dbt-athena, dbt-redshift, and dbt-glue to deliver reliable, performant data transformations. We will also cover case studies, best practices, and modern lakehouse scenarios with Apache Iceberg and Amazon S3 Tables.

Stream processing systems have traditionally relied on local storage engines such as RocksDB to achieve low latency. While effective in single-node setups, this model doesn't scale well in the cloud, where elasticity and separation of compute and storage are essential. In this talk, we'll explore how RisingWave rethinks the architecture by building directly on top of S3 while still delivering sub-100 ms latency. At the core is Hummock, a log-structured state engine designed for object storage. Hummock organizes state into a three-tier hierarchy: in-memory cache for the hottest keys, disk cache managed by Foyer for warm data, and S3 as the persistent cold tier. This approach ensures queries never directly hit S3, avoiding its variable performance. We'll also examine how remote compaction offloads heavy maintenance tasks from query nodes, eliminating interference between user queries and background operations. Combined with fine-grained caching policies and eviction strategies, this architecture enables both consistent query performance and cloud-native elasticity. Attendees will walk away with a deeper understanding of how to design streaming systems that balance durability, scalability, and low latency in an S3-based environment.

Discover the fundamentals of delivering API management as a platform service with Kong Konnect. In this introductory workshop, we’ll explore essential concepts and methodologies that Platform Providers use to provide scalable, self-service API management capabilities to Platform Consumers. Topics include Design for Federation (provider-consumer contracts, multi-tenant architecture, RBAC and system accounts for automated workflows) and Operationalize at Scale (templated provisioning and GitOps pipelines, governed self-provisioning, integration with external systems such as S3 and Vault for secrets and artifacts). This session includes live demos and walkthroughs to build a reference blueprint you can take back to your org. Ready to go deeper? This session sets you up for our 201 Workshop and Automations Developer Day.

Discover the fundamentals of delivering API management as a platform service with Kong Konnect. In this introductory workshop, we’ll explore essential concepts and methodologies that Platform Providers use to provide scalable, self-service API management capabilities to Platform Consumers. Design for Federation: map the Provider-Consumer contract so platform teams and product teams can work autonomously; navigate Kong Konnect’s multi-tenant architecture to isolate teams while sharing global policies; model RBAC and system accounts for fully programmatic workflows—no ticket queues. Operationalize at Scale: onboard new teams in minutes using templated provisioning and GitOps pipelines; give teams Infrastructure autonomy with governed self-provisioning of Platform resources; integrate external systems (e.g., S3, Vault) for secrets and artifacts. Throughout live demos and in-depth walkthroughs, you’ll build a reference blueprint you can take back to your org. Ready to go deeper? This session sets you up for our 201 Workshop and Automations Developer Day.

Operationalize at Scale: Onboard new teams in minutes using templated provisioning and GitOps pipelines. Give teams infrastructure autonomy with governed self-provisioning of Platform resources. Integrate external systems (e.g., S3, Vault) for secrets and artifacts. Throughout live demos and in-depth walkthroughs, you will build a reference blueprint you can take back to your org.

Discover the fundamentals of delivering API management as a platform service with Kong Konnect. In this introductory workshop, we’ll explore essential concepts and methodologies that Platform Providers use to provide scalable, self-service API management capabilities to Platform Consumers. Who should attend? Platform Engineers, Platform Owners, SREs, and anyone building an internal API platform. Design for Federation - Map the “Provider - Consumer” contract so platform teams and product teams can work autonomously. Navigate Kong Konnect’s multi-tenant architecture to isolate teams while sharing global policies. Model RBAC and system accounts for fully programmatic workflows—no ticket queues. Operationalize at Scale - Onboard new teams in minutes using templated provisioning and GitOps pipelines. Give teams Infrastructure autonomy with governed self-provisioning of Platform resources. Integrate external systems (e.g. S3, Vault) for secrets and artifacts. Throughout live demos and in-depth walkthroughs, you’ll build a reference blueprint you can take back to your org. Ready to go deeper? This session sets you up for our 201 Workshop and Automations Developer Day.

Summary In this episode of the Data Engineering Podcast Andy Warfield talks about the innovative functionalities of S3 Tables and Vectors and their integration into modern data stacks. Andy shares his journey through the tech industry and his role at Amazon, where he collaborates to enhance storage capabilities, discussing the evolution of S3 from a simple storage solution to a sophisticated system supporting advanced data types like tables and vectors crucial for analytics and AI-driven applications. He explains the motivations behind introducing S3 Tables and Vectors, highlighting their role in simplifying data management and enhancing performance for complex workloads, and shares insights into the technical challenges and design considerations involved in developing these features. The conversation explores potential applications of S3 Tables and Vectors in fields like AI, genomics, and media, and discusses future directions for S3's development to further support data-driven innovation.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementTired of data migrations that drag on for months or even years? What if I told you there's a way to cut that timeline by up to 6x while guaranteeing accuracy? Datafold's Migration Agent is the only AI-powered solution that doesn't just translate your code; it validates every single data point to ensure perfect parity between your old and new systems. Whether you're moving from Oracle to Snowflake, migrating stored procedures to dbt, or handling complex multi-system migrations, they deliver production-ready code with a guaranteed timeline and fixed price. Stop burning budget on endless consulting hours. Visit dataengineeringpodcast.com/datafold to book a demo and see how they're turning months-long migration nightmares into week-long success stories.Your host is Tobias Macey and today I'm interviewing Andy Warfield about S3 Tables and VectorsInterview IntroductionHow did you get involved in the area of data management?Can you describe what your goals are with the Tables and Vector features of S3?How did the experience of building S3 Tables inform your work on S3 Vectors?There are numerous implementations of vector storage and search. How do you view the role of S3 in the context of that ecosystem?The most directly analogous implementation that I'm aware of is the Lance table format. How would you compare the implementation and capabilities of Lance with what you are building with S3 Vectors?What opportunity do you see for being able to offer a protocol compatible implementation similar to the Iceberg compatibility that you provide with S3 Tables?Can you describe the technical implementation of the Vectors functionality in S3?What are the sources of inspiration that you looked to in designing the service?Can you describe some of the ways that S3 Vectors might be integrated into a typical AI application?What are the most interesting, innovative, or unexpected ways that you have seen S3 Tables/Vectors used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on S3 Tables/Vectors?When is S3 the wrong choice for Iceberg or Vector implementations?What do you have planned for the future of S3 Tables and Vectors?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links S3 TablesS3 VectorsS3 ExpressParquetIcebergVector IndexVector DatabasepgvectorEmbedding ModelRetrieval Augmented GenerationTwelveLabsAmazon BedrockIceberg REST CatalogLog-Structured Merge TreeS3 MetadataSentence TransformerSparkTrinoDaftThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

In this season of the Analytics Engineering podcast, Tristan is deep into the world of developer tools and databases. If you're following us here, you've almost definitely used Amazon S3 it and its Blob Storage siblings. They form the foundation for nearly all data work in the cloud. In many ways, it was the innovations that happened inside of S3 that have unlocked all of the progress in cloud data over the last decade. In this episode, Tristan talks with Andy Warfield, VP and senior principal engineer at AWS, where he focuses primarily on storage. They go deep on S3, how it works, and what it unlocks. They close out italking about Iceberg, S3 table buckets, and what this all suggests about the outlines of the S3 product roadmap moving forward. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.

Historically Airflow was only capable of time-based scheduling, where a DAG would run at certain times. For data updates at varying times, such as an external party delivering data to an S3 bucket, that meant having to run a DAG and continuously poll for updates. Airflow 3 introduces event-driven scheduling that enables you to trigger DAGs based on such updates. In this talk I'll demonstrate how this changes your DAG's code and how this works internally in Airflow. Lastly, I'll demonstrate a practical use case that leverages Airflow 3's event-driven scheduling.

As your organization scales to 20+ data science teams and 300+ DS/ML/DE engineers, you face a critical challenge: how to build a secure, reliable, and scalable orchestration layer that supports both fast experimentation and stable production workflows. We chose Airflow — and didn’t regret it! But to make it truly work at our scale, we had to rethink its architecture from the ground up. In this talk, we’ll share how we turned Airflow into a powerful MLOps platform through its core capability: running pipelines across multiple K8s GPU clusters from a single UI (!) using per-cluster worker pools. To support ease of use, we developed MLTool — our own library for fast and standardized DAG development, integrated Vault for secure secret management across teams, enabled real-time logging with S3 persistence and built a custom SparkSubmitOperator for Kerberos-authenticated Spark/Hadoop jobs in Kubernetes. We also streamlined the developer experience — users can generate a GitLab repo and deploy a versioned pipeline to prod in under 10 minutes! We’re proud of what we’ve built — and our users are too. Now we want to share it with the world!