talk-data.com talk-data.com

Topic

BigQuery

Google BigQuery

data_warehouse analytics google_cloud olap

315

tagged

Activity Trend

17 peak/qtr
2020-Q1 2026-Q1

Activities

315 activities · Newest first

The JupyterLab Extension Ecosystem: Trends & Signals from PyPI and GitHub

What does the JupyterLab extension ecosystem actually look like in 2025? While extensions drive much of JupyterLab's practical value, their overall landscape remains largely unexplored. This talk analyzes public PyPI (via BigQuery) and GitHub data to quantify growth, momentum, and health: monthly downloads by category, release recency, star-download relationships, and the rise of AI-focused extensions. I will present my approach for building this analysis pipeline and offer lessons learned. Finally, I will demonstrate of an open, read-only web catalog built on this data set.

Pro Oracle GoldenGate 23ai for the DBA: Powering the Foundation of Data Integration and AI

Transform your data replication strategy into a competitive advantage with Oracle GoldenGate 23ai. This comprehensive guide delivers the practical knowledge DBAs and architects need to implement, optimize , and scale Oracle GoldenGate 23ai in production environments. Written by Oracle ACE Director Bobby Curtis, it blends deep technical expertise with real-world business insights from hundreds of implementations across manufacturing, financial services, and technology sectors. Beyond traditional replication, this book explores the groundbreaking capabilities that make GoldenGate 23ai essential for modern AI initiatives. Learn how to implement real-time vector replication for RAG systems, integrate with cloud platforms like GCP and Snowflake, and automate deployments using REST APIs and Python. Each chapter offers proven strategies to deliver measurable ROI while reducing operational risk. Whether you're upgrading from Classic GoldenGate , deploying your first cloud data pipeline, or building AI-ready data architectures, this book provides the strategic guidance and technical depth to succeed. With Bobby's signature direct approach, you'll avoid common pitfalls and implement best practices that scale with your business. What You Will Learn Master the microservices architecture and new capabilities of Oracle GoldenGate 23ai Implement secure, high-performance data replication across Oracle, PostgreSQL, and cloud databases Configure vector replication for AI and machine learning workloads, including RAG systems Design and build multi-master replication models with automatic conflict resolution Automate deployments and management using RESTful APIs and Python Optimize performance for sub-second replication lag in production environments Secure your replication environment with enterprise-grade features and compliance Upgrade from Classic to Microservices architecture with zero downtime Integrate with cloud platforms including OCI, GCP, AWS, and Azure Implement real-time data pipelines to BigQuery , Snowflake, and other cloud targets Navigate Oracle licensing models and optimize costs Who This Book Is For Database administrators, architects, and IT leaders working with Oracle GoldenGate —whether deploying for the first time, migrating from Classic architecture, or enabling AI-driven replication—will find actionable guidance on implementation, performance tuning, automation, and cloud integration. Covers unidirectional and multi-master replication and is packed with real-world use cases.

Unleash the power of dbt on Google Cloud: BigQuery, Iceberg, DataFrames and beyond

The data world has long been divided, with data engineers and data scientists working in silos. This fragmentation creates a long, difficult journey from raw data to machine learning models. We've unified these worlds through the Google Cloud and dbt partnership. In this session, we'll show you an end-to-end workflow that simplifies data to AI journey. The availability of dbt Cloud on Google Cloud Marketplace streamlines getting started, and its integration with BigQuery's new Apache Iceberg tables creates an open foundation. We'll also highlight how BigQuery DataFrames' integration with dbt Python models lets you perform complex data science at scale, all within a single, streamlined process. Join us to learn how to build a unified data and AI platform with dbt on Google Cloud.

Une part essentielle des données stratégiques réside dans des systèmes critiques et de production (tels que IBM i, Oracle, SAP, SQL Server...) . Les extraire sans perturber la production est l’un des obstacles majeurs aux initiatives de modernisation.

Cette démonstration montrera comment la réplication de données permet de :

• Diffuser la donnée en temps réel sans impact sur les opérations,

• Consolider les données dans Snowflake, BigQuery ou les Data Lake pour l’analyse et l’IA,

• Réduire les coûts d’intégration et limiter les risques projets.

Une session de 30 minutes avec démonstration et temps de questions-réponses.

In a dynamic world where your data needs are evolving as quickly as the data warehousing solutions on the market, flexibility is key to unlock the full potential of your data at the best cost-performance ratio possible. Gameloft, a leading mobile and console game developer that operates a petabyte scale modern data architecture, has completely migrated their data warehouse from Snowflake to Google BigQuery and got leaner, faster and more flexible along the way.

Un témoignage inspirant pour les entreprises souhaitant moderniser leur stack ETL et exploiter pleinement les avantages du cloud.

MACIF partage son retour d'expérience sur la migration de plus de 400 workflows Informatica vers BigQuery et la plateforme dbt, en utilisant les accélérateurs développés par Infinite Lambda, dans le cadre de son projet de modernisation data "Move to Cloud". 

Aux côtés de Laurent, découvrez comment la MACIF a tiré parti du cloud pour accélérer la livraison de ses data products, réduire les risques techniques et améliorer la gouvernance des données.

Découvrez comment Adeo a radicalement transformé le suivi de sa performance commerciale en le déployant pour tous ses magasins grâce à la puissance de BigQuery. Conçue pour un usage intensif, la solution gère des centaines d'appels concurrents simultanément, permettant à chaque équipe d'obtenir des réponses à ses requêtes avec une rapidité bluffante, souvent en moins d'une seconde. Cette capacité à livrer une analyse puissante à la demande, tout en assurant une maîtrise rigoureuse des coûts, permet un pilotage à grande échelle à la fois plus efficace et plus économique pour l'ensemble du réseau

AG2R vous invite à découvrir sa transformation data. En positionnant BigQuery comme la clé de voûte de son écosystème, l'entreprise a rendu la donnée accessible et actionnable à grande échelle. Au-delà du défi technique relevé en moins d'un an, cette session explorera comment ce projet a été le levier d'un nouveau modèle opérationnel, favorisant l'autonomie des équipes via le self-service et la diffusion de "data products" à forte valeur ajoutée.

When Virgin Media and O2 merged, they faced the challenge of unifying thousands of pipelines and platforms while keeping 25 million customers connected. Victor Rivero, Head of Data Governance & Quality, shares how his team is transforming his data estate into a trusted source of truth by embedding Monte Carlo’s Data + AI Observability across BigQuery, Atlan, dbt, and Tableau. Learn how they've begun their journey to cut data downtime, enforced reliability dimensions, and measured success while creating a scalable blueprint for enterprise observability.

The world has never been more connected. Today, customers demand near-perfect uptime, responsive networks, and personalized digital experiences from their telecommunications providers. 

The industry has reached an inflection point. Legacy architectures, fragmented customer data, and batch-based analytics are no longer sufficient. Now is the time for Telcos to embrace real-time, when the speed of insights and the ability to remain agile determine competitive advantage.

In this session, leaders from Orange Belgium, Google Cloud, and Striim explore how telcos can rethink their data foundations to become real-time, intelligence-driven enterprises. From centralizing data in BigQuery and Spanner to enabling dynamic customer engagement and scalable operations, Orange Belgium shares how its cloud-first strategy is enabling agility, trust, and innovation.

This isn’t just a story of technology migration—it’s about building a data culture that prioritizes immediacy, empathy, and evolution. Join us for a forward-looking conversation on how telcos can align infrastructure, intelligence, and customer intent.

Discover how to build a powerful AI Lakehouse and unified data fabric natively on Google Cloud. Leverage BigQuery's serverless scale and robust analytics capabilities as the core, seamlessly integrating open data formats with Apache Iceberg and efficient processing using managed Spark environments like Dataproc. Explore the essential components of this modern data environment, including data architecture best practices, robust integration strategies, high data quality assurance, and efficient metadata management with Google Cloud Data Catalog. Learn how Google Cloud's comprehensive ecosystem accelerates advanced analytics, preparing your data for sophisticated machine learning initiatives and enabling direct connection to services like Vertex AI. 

AI agents need seamless access to enterprise data to deliver real value. DataHub's new MCP server creates the universal bridge that connects any AI agent to your entire data infrastructure through a single interface.

This session demonstrates how organizations are breaking down data silos by enabling AI agents to intelligently discover and interact with data across Snowflake, Databricks, BigQuery, and other platforms. See live examples of AI-powered data discovery, real-time incident response, and automated impact analysis.

Learn how forward-thinking data leaders are positioning their organizations at the center of the AI revolution by implementing universal data access strategies that scale across their entire ecosystem.

The growth of connected data has made graph databases essential, yet organisations often face a dilemma: choosing between an operational graph for real-time queries or an analytical engine for large-scale processing. This division leads to data silos and complex ETL pipelines, hindering the seamless integration of real-time insights with deep analytics and the ability to ground AI models in factual, enterprise-specific knowledge. Google Cloud aims to solve this with a unified "Graph Fabric," introducing Spanner Graph, which extends Spanner with native support for the ISO standard Graph Query Language (GQL). This session will cover how Google Cloud has developed a Unified Graph Solution with BigQuery and Spanner graphs to serve a full spectrum of graph needs from operational to analytical.

Présenté par : Rami Rekik, Staff SRE @Algolia : Découvrez la stratégie d'Algolia pour réduire ses coûts GCP de 150 000 $ par mois (soit 1,8 million de dollars par an !) en quelques jours seulement. Rami partage des optimisations concrètes : des commitments judicieux (CUDs, slots BigQuery), la suppression de ressources inutiles (logs, vues matérialisées), et l'optimisation des formats de stockage. Apprenez les actions précises, leurs impacts mesurés et comment les implémenter chez vous.

There was a post on the data engineering subreddit recently that discussed how difficult it is to keep up with the data engineering world. Did you learn Hadoop, great we are on Snowflake, BigQuery and Databricks now. Just learned Airflow, well now we have Airflow 3.0. And the list goes on. But what doesn’t change, and what have been the lessons over the past decade. That’s what I’ll be covering in this talk. Real lessons and realities that come up time and time again whether you’re working for a start-up or a large enterprise.

Discover how Apache Airflow powers scalable ELT pipelines, enabling seamless data ingestion, transformation, and machine learning-driven insights. This session will walk through: Automating Data Ingestion: Using Airflow to orchestrate raw data ingestion from third-party sources into your data lake (S3, GCP), ensuring a steady pipeline of high-quality training and prediction data. Optimizing Transformations with Serverless Computing: Offloading intensive transformations to serverless functions (GCP Cloud Run, AWS Lambda) and machine learning models (BigQuery ML, Sagemaker), integrating their outputs seamlessly into Airflow workflows. Real-World Impact: A case study on how INTRVL leveraged Airflow, BigQuery ML, and Cloud Run to analyze early voting data in near real-time, generating actionable insights on voter behavior across swing states. This talk not only provides a deep dive into the Political Tech space but also serves as a reference architecture for building robust, repeatable ELT pipelines. Attendees will gain insights into modern serverless technologies from AWS and GCP that enhance Airflow’s capabilities, helping data engineers design scalable, cloud-agnostic workflows.

Before Airflow, our BigQuery pipelines at Create Music Group operated like musicians without a conductor—each playing on its own schedule, regardless of whether upstream data was ready. As our data platform grew, this chaos led to spiralling costs, performance bottlenecks, and became utterly unsustainable. This talk tells the story of how Create Music Group brought harmony to its data workflows by adopting Apache Airflow and the Medallion architecture, ultimately slashing our data processing costs by 50%. We’ll show how moving to event-driven scheduling with datasets helped eliminate stale data issues, dramatically improved performance, and unlocked faster iteration across teams. Discover how we replaced repetitive SQL with standardized dimension/fact tables, empowering analysts in a safer sandbox.