talk-data.com talk-data.com

Topic

Cloud Storage

object_storage file_storage cloud

67

tagged

Activity Trend

5 peak/qtr
2020-Q1 2026-Q1

Activities

67 activities · Newest first

Deploy AVS and manage VMware vSphere environment as an Azure resources (B) Connect AVS private cloud secretly with Internet (C) Migrate VMs from on-premises to AVS using VMware HCX technology (D) Expand AVS private cloud storage with Azure NetApp Files or Azure Elastic SAN; scaling storage independently from compute (E) Manage AVS workloads VMs using Azure interfaces by Arc-enabling AVS private cloud and its VMs (F) Modernization workloads with Azure Services like Azure SQL Managed Instances and AI capabilities.

Note: this session will use demo environments instead of live environments due to complexity and time.

Please RSVP and arrive at least 5 minutes before the start time, at which point remaining spaces are open to standby attendees.

Deploy AVS and manage VMware vSphere environment as an Azure resources (B) Connect AVS private cloud secretly with Internet (C) Migrate VMs from on-premises to AVS using VMware HCX technology (D) Expand AVS private cloud storage with Azure NetApp Files or Azure Elastic SAN; scaling storage independently from compute (E) Manage AVS workloads VMs using Azure interfaces by Arc-enabling AVS private cloud and its VMs (F) Modernization workloads with Azure Services like Azure SQL Managed Instances and AI capabilities.

Note: this session will use demo environments instead of live environments due to complexity and time.

Please RSVP and arrive at least 5 minutes before the start time, at which point remaining spaces are open to standby attendees.

Deploy AVS and manage VMware vSphere environment as an Azure resources (B) Connect AVS private cloud secretly with Internet (C) Migrate VMs from on-premises to AVS using VMware HCX technology (D) Expand AVS private cloud storage with Azure NetApp Files or Azure Elastic SAN; scaling storage independently from compute (E) Manage AVS workloads VMs using Azure interfaces by Arc-enabling AVS private cloud and its VMs (F) Modernization workloads with Azure Services like Azure SQL Managed Instances and AI capabilities.

Note: this session will use demo environments instead of live environments due to complexity and time.

Please RSVP and arrive at least 5 minutes before the start time, at which point remaining spaces are open to standby attendees.

EQT, a global investment organization specializing in private capital, infrastructure, and real assets, has transformed its data operations by fully adopting the modern data stack. As a cloud-native company with hundreds of internal and external data sources — from YouTube to Google Cloud Storage — EQT needed a scalable, centralized solution to ingest and transform data for complex financial use cases. Their journey took them from fragmented, Excel-based workflows to a robust, integrated data pipeline powered by Fivetran.

In this session, you’ll learn how:

•EQT streamlined external data ingestion and broke down data silos •How a unified data pipeline supports scalable financial analytics and decision-making •Fivetran’s ease of use, connector maintenance, and cost-effectiveness made it the clear choice

🛰️➡️🧑‍💻: Streamlining Satellite Data for Analysis-Ready Outputs

I will share how our team built an end-to-end system to transform raw satellite imagery into analysis-ready datasets for use cases like vegetation monitoring, deforestation detection, and identifying third-party activity. We streamlined the entire pipeline from automated acquisition and cloud storage to preprocessing that ensures spatial, spectral, and temporal consistency. By leveraging Prefect for orchestration, Anyscale Ray for scalable processing, and the open source STAC standard for metadata indexing, we reduced processing times from days to near real-time. We addressed challenges like inconsistent metadata and diverse sensor types, building a flexible system capable of supporting large-scale geospatial analytics and AI workloads.

Cloud-optimized (CO) data formats are designed to efficiently store and access data directly from cloud storage without needing to download the entire dataset. These formats enable faster data retrieval, scalability, and cost-effectiveness by allowing users to fetch only the necessary subsets of data. They also allow for efficient parallel data processing using on-the-fly partitioning, which can considerably accelerate data management operations. In this sense, cloud-optimized data is a nice fit for data-parallel jobs using serverless. FaaS provides a data-driven scalable and cost-efficient experience, with practically no management burden. Each serverless function will read and process a small portion of the cloud-optimized dataset, being read in parallel directly from object storage, significantly increasing the speedup.

In this talk, you will learn how to process cloud-optimized data formats in Python using the Lithops toolkit. Lithops is a serverless data processing toolkit that is specially designed to process data from Cloud Object Storage using Serverless functions. We will also demonstrate the Dataplug library that enables Cloud Optimized data managament of scientific settings such as genomics, metabolomics, or geospatial data. We will show different data processing pipelines in the Cloud that demonstrate the benefits of cloud-optimized data management.

Sponsored by: Google Cloud | Powering AI & Analytics: Innovations in Google Cloud Storage for Data Lakes

Enterprise customers need a powerful and adaptable data foundation to navigate demands of AI and multi-cloud environments. This session dives into how Google Cloud Storage serves as a unified platform for modern analytics data lakes, together with Databricks. Discover how Google Cloud Storage provides key innovations like performance optimizations for Apache Iceberg, Anywhere Cache as the easiest way to colocate storage and compute, Rapid Storage for ultra low latency object reads and appends, and Storage Intelligence for vital data insights and recommendations. Learn how you can optimize your infrastructure to unlock the full value of your data for AI-driven success.

Mastering Change Data Capture With Lakeflow Declarative Pipelines

Transactional systems are a common source of data for analytics, and Change Data Capture (CDC) offers an efficient way to extract only what’s changed. However, ingesting CDC data into an analytics system comes with challenges, such as handling out-of-order events or maintaining global order across multiple streams. These issues often require complex, stateful stream processing logic.This session will explore how Lakeflow Declarative Pipelines simplifies CDC ingestion using the Apply Changes function. With Apply Changes, global ordering across multiple change feeds is handled automatically — there is no need to manually manage state or understand advanced streaming concepts like watermarks. It supports both snapshot-based inputs from cloud storage and continuous change feeds from systems like message buses, reducing complexity for common streaming use cases.

Real-Time Analytics Pipeline for IoT Device Monitoring and Reporting

This session will show how we implemented a solution to support high-frequency data ingestion from smart meters. We implemented a robust API endpoint that interfaces directly with IoT devices. This API processes messages in real time from millions of distributed IoT devices and meters across the network. The architecture leverages cloud storage as a landing zone for the raw data, followed by a streaming pipeline built on Lakeflow Declarative Pipelines. This pipeline implements a multi-layer medallion architecture to progressively clean, transform and enrich the data. The pipeline operates continuously to maintain near real-time data freshness in our gold layer tables. These datasets connect directly to Databricks Dashboards, providing stakeholders with immediate insights into their operational metrics. This solution demonstrates how modern data architecture can handle high-volume IoT data streams while maintaining data quality and providing accessible real-time analytics for business users.

Sponsored by: Fivetran | Raw Data to Real-Time Insights: How Dropbox Revolutionized Data Ingestion

Dropbox, a leading cloud storage platform, is on a mission to accelerate data insights to better understand customers’ needs and elevate the overall customer experience. By leveraging Fivetran’s data movement platform, Dropbox gained real-time visibility into customer sentiment, marketing ROI, and ad performance-empowering teams to optimize spend, improve operational efficiency, and deliver greater business outcomes.Join this session to learn how Dropbox:- Cut data pipeline time from 8 weeks to 30 minutes by automating ingestion and streamlining reporting workflows.- Enable real-time, reliable data movement across tools like Zendesk Chat, Google Ads, MySQL, and more — at global operations scale.- Unify fragmented data sources into the Databricks Data Intelligence Platform to reduce redundancy, improve accessibility, and support scalable analytics.

Lakeflow Connect: Smarter, Simpler File Ingestion With the Next Generation of Auto Loader

Auto Loader is the definitive tool for ingesting data from cloud storage into your lakehouse. In this session, we’ll unveil new features and best practices that simplify every aspect of cloud storage ingestion. We’ll demo out-of-the-box observability for pipeline health and data quality, walk through improvements for schema management, introduce a series of new data formats and unveil recent strides in Auto Loader performance. Along the way, we’ll provide examples and best practices for optimizing cost and performance. Finally, we’ll introduce a preview of what’s coming next — including a REST API for pushing files directly to Delta, a UI for creating cloud storage pipelines and more. Join us to help shape the future of file ingestion on Databricks.

This session explores the evolution of data management on Kubernetes for AI and machine learning (ML) workloads and modern databases, including Google’s leadership in this space. We’ll discuss key challenges and solutions, including persistent storage with solutions like checkpointing and Cloud Storage FUSE, and accelerating data access with caching. Customers Qdrant and Codeway will share how they’ve successfully leveraged these technologies to improve their AI, ML, and database performance on Google Kubernetes Engine (GKE).

Discover how to transition from legacy, siloed systems to a unified, scalable, and insights-driven data platform on GCP. This session will cover best practices for data migration, overcoming common challenges, and integrating SaaS and third-party solutions using key Google Cloud services like BigQuery, Data Fusion, Cloud Storage, Application Integration, Cloud Run, Cloud Build, and Artifact Registry.

This Session is hosted by a Google Cloud Next Sponsor.
Visit your registration profile at g.co/cloudnext to opt out of sharing your contact information with the sponsor hosting this session.

Introduce how easy it can be to build a fully function Flutter app with Firebase Backend. It's fast, easy, and fully function with Front-end UI, and backend task (using Cloud Function), database (Firestore), storage (Cloud Storage), and more. Additional tips on Vertex AI and Gemini in both Flutter and Cloud Function will be added if time allows.

Modern analytics and AI workloads demand a unified storage layer for structured and unstructured data. Learn how Cloud Storage simplifies building data lakes based on Apache Iceberg. We’ll discuss storage best practices and new capabilities that enable high performance and cost efficiency. We’ll also guide you through real-world examples, including Iceberg data lakes with BigQuery or third-party solutions, data preparation for AI pipelines with Dataproc and Apache Spark, and how customers have built unified analytics and AI solutions on Cloud Storage.

Discover the latest breakthroughs in Cloud Storage. This executive session provides a high-level overview of the latest object, block, file storage, and backup and recovery solutions. Gain insights into our cutting-edge storage technologies and learn how they can optimize your infrastructure, reduce costs, and enhance your data management strategy. Don’t miss this opportunity to learn directly from Google executives about the future of storage. This session is a must for IT decision-makers seeking a competitive edge.

This talk offers demonstrations and live discussions on how to rapidly deploy production-ready GKE or Slurm clusters using Cluster Toolkit and Terraform. Leverage the latest GPUs to accelerate machine learning workloads and optimize resource utilization with GKE's Kueue, autoscaling Slurm, and Dynamic Workload Scheduler (DWS). Explore storage solutions like Google Cloud Storage (GCS), GCSFuse, Filestore Zonal, and Parallelstore. Leave this session with the tools and knowledge you need to deploy a high-performance ML cluster in minutes.

Properly architecting your storage infrastructure for AI is critical for success. Snap will share some of their best practices, implementation tips, and success stories for AI workloads. This session dives deep into training, checkpointing, and serving recommendations, covering Cloud Storage FUSE, Anywhere Cache, and parallel file systems. Gain insights to optimize your AI infrastructure and unlock its full potential. Don’t miss this opportunity to learn from real-world examples and expert advice.

Managing petabytes of Google Cloud storage objects? Attend this session to learn how the new Storage Intelligence product simplifies managing billions of objects across thousands of buckets. Leverage AI-driven insights to analyze cost, performance, security, and compliance – all through intuitive natural language queries. Quickly act on insights with bucket relocation and batch operations. Join us to uncover practical tools and exciting new features that can transform you into a Cloud Storage superhero.