talk-data.com talk-data.com

Topic

GCP

Google Cloud Platform (GCP)

cloud cloud_provider infrastructure services

1670

tagged

Activity Trend

31 peak/qtr
2020-Q1 2026-Q1

Activities

1670 activities · Newest first

Coalesce 2024: Customer health with dbt Cloud: A LiveRamp data journey

We aim to illustrate the transition from an antiquated methodology for generating final tables/views in Google Cloud Platform (GCP) to the implementation of a structured process utilizing dbt.

This transition involves defining how we develop source, staging, intermediate, and final models within dbt, facilitating enhanced change management and error detection mechanisms. We will talk about how far we have come and our plan to maintain this work-stream.

Speaker: Kyle Salomon Business Analytics Manager LiveRamp

Read the blog to learn about the latest dbt Cloud features announced at Coalesce, designed to help organizations embrace analytics best practices at scale https://www.getdbt.com/blog/coalesce-2024-product-announcements

Data Engineering for Machine Learning Pipelines: From Python Libraries to ML Pipelines and Cloud Platforms

This book covers modern data engineering functions and important Python libraries, to help you develop state-of-the-art ML pipelines and integration code. The book begins by explaining data analytics and transformation, delving into the Pandas library, its capabilities, and nuances. It then explores emerging libraries such as Polars and CuDF, providing insights into GPU-based computing and cutting-edge data manipulation techniques. The text discusses the importance of data validation in engineering processes, introducing tools such as Great Expectations and Pandera to ensure data quality and reliability. The book delves into API design and development, with a specific focus on leveraging the power of FastAPI. It covers authentication, authorization, and real-world applications, enabling you to construct efficient and secure APIs using FastAPI. Also explored is concurrency in data engineering, examining Dask's capabilities from basic setup to crafting advanced machine learning pipelines. The book includes development and delivery of data engineering pipelines using leading cloud platforms such as AWS, Google Cloud, and Microsoft Azure. The concluding chapters concentrate on real-time and streaming data engineering pipelines, emphasizing Apache Kafka and workflow orchestration in data engineering. Workflow tools such as Airflow and Prefect are introduced to seamlessly manage and automate complex data workflows. What sets this book apart is its blend of theoretical knowledge and practical application, a structured path from basic to advanced concepts, and insights into using state-of-the-art tools. With this book, you gain access to cutting-edge techniques and insights that are reshaping the industry. This book is not just an educational tool. It is a career catalyst, and an investment in your future as a data engineering expert, poised to meet the challenges of today's data-driven world. What You Will Learn Elevate your data wrangling jobs by utilizing the power of both CPU and GPU computing, and learn to process data using Pandas 2.0, Polars, and CuDF at unprecedented speeds Design data validation pipelines, construct efficient data service APIs, develop real-time streaming pipelines and master the art of workflow orchestration to streamline your engineering projects Leverage concurrent programming to develop machine learning pipelines and get hands-on experience in development and deployment of machine learning pipelines across AWS, GCP, and Azure Who This Book Is For Data analysts, data engineers, data scientists, machine learning engineers, and MLOps specialists

Scaling machine learning at large organizations like Renault Group presents unique challenges, in terms of scales, legal requirements, and diversity of use cases. Data scientists require streamlined workflows and automated processes to efficiently deploy models into production. We present an MLOps pipeline based on python Kubeflow and GCP Vertex AI API designed specifically for this purpose. It enables data scientists to focus on code development for pre-processing, training, evaluation, and prediction. This MLOPS pipeline is a cornerstone of the AI@Scale program, which aims to roll out AI across the Group.

We choose a Python-first approach, allowing Data scientists to focus purely on writing preprocessing or ML oriented Python code, also allowing data retrieval through SQL queries. The pipeline addresses key questions such as prediction type (batch or API), model versioning, resource allocation, drift monitoring, and alert generation. It favors faster time to market with automated deployment and infrastructure management. Although we encountered pitfalls and design difficulties, that we will discuss during the presentation, this pipeline integrates with a CI/CD process, ensuring efficient and automated model deployment and serving.

Finally, this MLOps solution empowers Renault data scientists to seamlessly translate innovative models into production, and smoothen the development of scalable, and impactful AI-driven solutions.

This session explores Gemini's capabilities, architecture, and performance benchmarks. We'll delve into the significance of its extensive context window and address the critical aspects of safety, security, and responsible AI use. Hallucination, a common concern in LLM applications, remains a focal point of ongoing development. This talk will highlight recent advancements aimed at mitigating the risk of hallucination to enhance LLMs utility across various applications.

Morrisons are driving business transformation with data, in part through near real-time ingestion of disparate datasets, which enable centralised critical actionable data within Google Cloud, but also operationally, by focussing on outcome driven data teams. Learn how being data driven is challenging, how data volume can be problematic, but also how the benefits of available live data enable success and aid future business growth.

In today's data-driven landscape, the ability to efficiently harness the power of AI is crucial for businesses seeking to unlock valuable insights and drive innovation. This session will explore how BigQuery, Google Cloud's leading data warehouse solution, can accelerate your AI initiatives. Discover how BigQuery's serverless architecture, built-in machine learning capabilities, and seamless integration with Google Cloud's AI ecosystem empower you to build, train, and deploy ML models at scale. Whether you're a data scientist, engineer, or business leader, this session will provide you with actionable insights and strategies to supercharge your AI efforts with BigQuery.

Sayle Matthews leads the North American GCP Data Practice at DoiT International. Over the past year and a half, he has focused almost exclusively on BigQuery, helping hundreds of GCP customers optimize their usage and solve some of their biggest 'Big Data' challenges. With extensive experience in Google BigQuery billing, we sat down to discuss the changes and, most importantly, the impact these changes have had on the market, as observed by Sayle while working with hundreds of clients of various sizes at DoiT. Sayle's LinkedIn page - https://www.linkedin.com/in/sayle-matthews-522a795/

OpenLineage is an open standard for lineage data collection, integrated into the Airflow codebase, facilitating lineage collection across providers like Google, Amazon, and more. Atlan Data Catalog is a 3rd generation active metadata platform that is a single source of trust unifying cataloging, data discovery, lineage, and governance experience. We will demonstrate what OpenLineage is and how, with minimal and intuitive setup across Airlfow and Atlan, it presents unified workflows view, efficient cross-platform lineage collection, including column level, in various technologies (Python, Spark, dbt, SQL etc.) and clouds (AWS, Azure, GCP, etc.) - all orchestrated by Airflow. This integration enables further use case unlocks on automated metadata management by making the operational pipelines dataset-aware for self-service exploration. It also will demonstrate real world challenges and resolutions for lineage consumers in improving audit and compliance accuracy through column-level lineage traceability across the data estate. The talk will also briefly overview the most recent OpenLineage developments and planned future enhancements.

Ford Motor Company operates extensively across various nations. The Data Operations (DataOps) team for Advanced Driver Assistance Systems (ADAS) at Ford is tasked with the processing of terabyte-scale daily data from lidar, radar, and video. To manage this, the DataOps team is challenged with orchestrating diverse, compute-intensive pipelines across both on-premises infrastructure and the GCP and deal with sensitive of customer data across both environments The team is also responsible for facilitating the execution of on-demand, compute-intensive algorithms at scale through. To achieve these objectives, the team employs Astronomer/Airflow at the core of its strategic approach. This involves various deployments of Astronomer/Airflow that integrate seamlessly and securely (via Apigee) to initiate batch data processing and ML jobs on the cloud, as well as compute-intensive computer vision tasks on-premises, with essential alerting provided through the ELK stack. This presentation will delve into the architecture and strategic planning surrounding the hybrid batch router, highlighting its pivotal role in promoting rapid innovation and scalability in the development of ADAS features.

Looking for a way to streamline your data workflows and master the art of orchestration? As we navigate the complexities of modern data engineering, Airflow’s dynamic workflow and complex data pipeline dependencies are starting to become more and more common nowadays. In order to empower data engineers to exploit Airflow as the main orchestrator, Airflow Datasets can be easily integrated in your data journey. This session will showcase the Dynamic Workflow orchestration in Airflow and how to manage multi-DAGs dependencies with Multi-Dataset listening. We’ll take you through a real-time data pipeline with Pub/Sub messaging integration and dbt in Google Cloud environment, to ensure data transformations are triggered only upon new data ingestion, moving away from rigid time-based scheduling or the use of sensors and other legacy ways to trigger a DAG.

Google Machine Learning and Generative AI for Solutions Architects

This book teaches solutions architects how to effectively design and implement AI/ML solutions utilizing Google Cloud services. Through detailed explanations, examples, and hands-on exercises, you will understand essential AI/ML concepts, tools, and best practices while building advanced applications. What this Book will help me do Build robust AI/ML solutions using Google Cloud tools such as TensorFlow, BigQuery, and Vertex AI. Prepare and process data efficiently for machine learning workloads. Establish and apply an MLOps framework for automating ML model lifecycle management. Implement cutting-edge generative AI solutions using best practices. Address common challenges in AI/ML projects with insights from expert solutions. Author(s) Kieran Kavanagh is a seasoned principal architect with nearly twenty years of experience in the tech industry. He has successfully led teams in designing, planning, and governing enterprise cloud strategies, and his wealth of experience is distilled into the practical approaches and insights in this book. Who is it for? This book is ideal for IT professionals aspiring to design AI/ML solutions, particularly in the role of solutions architects. It assumes a basic knowledge of Python and foundational AI/ML concepts but is suitable for both beginners and seasoned practitioners. If you're looking to deepen your understanding of state-of-the-art AI/ML applications on Google Cloud, this resource will guide you.

Send us a text Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. Datatopics Unplugged is your go-to spot for relaxed discussions around tech, news, data, and society. Dive into conversations that flow like your morning coffee, where industry insights meet laid-back banter. Whether you're a data aficionado or just curious about the digital age, pull up a chair and let's explore the heart of data, unplugged style!

Stack Overflow and OpenAI Deal Controversy: Discussing the partnership controversy, with users protesting the lack of an opt-out option and how this could reshape the platform. Look into Phind here.Apple and OpenAI Rumors - could ChatGPT be the new Siri? Examining rumors of ChatGPT potentially replacing Siri, and Apple's AI strategy compared to Microsoft’s MAI-1. Check out more community opinions here.Hello GPT-4o: Exploring the new era with OpenAI's GPT-4o that blends video, text, and audio for more dynamic human-AI interactions. Discussing AI's challenges under the European AI Act and chatgpt’s use in daily life and dating apps like Bumble.Claude Takes Europe: Claude 3 now available in the EU. How does it compare to ChatGPT in coding and conversation?ElevenLabs' Music Generation AI: A look at ElevenLabs' AI for generating music and the broader AI music landscape. How are these algorithms transforming music creation? Check out the AI Song Contest here.Google Cloud’s Big Oops with UniSuper: Unpack the shocking story of how Google Cloud accidentally wiped out UniSuper’s account. What does this mean for data security and redundancy strategies?The Great CLI Debate: Is Python really the right choice for CLI tools? We spark the debate over Python vs. Go and Rust in building efficient CLI tools.

Data Engineering with Google Cloud Platform - Second Edition

Data Engineering with Google Cloud Platform is your ultimate guide to building scalable data platforms using Google Cloud technologies. In this book, you will learn how to leverage products such as BigQuery, Cloud Composer, and Dataplex for efficient data engineering. Expand your expertise and gain practical knowledge to excel in managing data pipelines within the Google Cloud ecosystem. What this Book will help me do Understand foundational data engineering concepts using Google Cloud Platform. Learn to build and manage scalable data pipelines with tools such as Dataform and Dataflow. Explore advanced topics like data governance and secure data handling in Google Cloud. Boost readiness for Google Cloud data engineering certification with real-world exam guidance. Master cost-effective strategies and CI/CD practices for data engineering on Google Cloud. Author(s) Adi Wijaya, the author of this book, is a Data Strategic Cloud Engineer at Google with extensive experience in data engineering and the Google Cloud ecosystem. With his hands-on expertise, he emphasizes practical solutions and in-depth knowledge sharing, guiding readers through the intricacies of Google Cloud for data engineering success. Who is it for? This book is ideal for data analysts, IT practitioners, software engineers, and data enthusiasts aiming to excel in data engineering. Whether you're a beginner tackling fundamental concepts or an experienced professional exploring Google Cloud's advanced capabilities, this book is designed for you. It bridges your current skills with modern data engineering practices on Google Cloud, making it a valuable resource at any stage of your career.

Kartik Derasari is a technical consultant with a passion for technology and innovation. As a 6X Google Cloud Certified Professional, he has extensive experience in application development and analytics projects as a full-stack engineer. In addition to his professional work, Kartik is an advocate for the use of technology to drive business growth and innovation. He is the leader of the Go…

Empower your organization to achieve greater efficiency and solve critical business challenges with Google AppSheet's innovative no-code platform. This exclusive panel features industry leaders who achieved remarkable results by using AppSheet to streamline workflows and empower their teams. Learn first-hand their inspiring journeys, gain practical tips, and unlock the full potential of AppSheet to drive growth and innovation within your own organization.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

Join the exclusive Korean session in Las Vegas for three days (April 9-11) packed with essential insights on technical trends, business use cases, and exciting showcases. Google Cloud Korea will provide a summary of key highlights, tailored specifically for Korean guests. We'll cover three captivating topics, followed by an exclusive networking dinner and Q&A session.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

Retrieval Augmented Generation (RAG) is a powerful technique to provide real time, domain-specific context to the LLM to improve accuracy of responses. RAG doesn't require the addition of sensitive data to the model, but still requires application developers to address security and privacy of user and company data. In this session, you will learn about security implications of RAG workloads and how to architect your applications to handle user identity and to control data access.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

Discover the transformative synergy between SAP Datasphere and Google BigQuery, driving data insights. We'll explore Datasphere's transformation, integration, and data governance capabilities alongside Big Query’s scalability and real-time analytics process. Also learn how SAP GenAI Hub and Google Cloud accelerate AI initiatives and innovation. You will also hear real-world success stories on how businesses leverage this integration for tangible outcomes.

By attending this session, your contact information may be shared with the sponsor for relevant follow up for this event only. Please note: seating is limited and on a first-come, first served basis; standing areas are available

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

Prompt management is like the invisible conductor of your artificial intelligence (AI) orchestra. It's the science of crafting, organizing, and optimizing the prompts that guide your AI models to perform their best. The session will cover the end-to-end prompt lifecycle in Vertex AI Studio to help you reach production sooner. This includes new rapid evaluation within Vertex and the ability to "critique" model response for automatic prompt improvements.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.

In this session, we'll dive into deploying Java apps using Google Cloud's serverless platform. Designed for Java developers, it offers practical insights into consideration, challenges, tips and tricks for deploying JVM applications in Serverless platforms. We’ll also cover other best practices across different part of the application lifecycle, such as CI/CD pipelines, security, and observability. Through interactive demos, learn to build, secure, and monitor Java applications efficiently.

Click the blue “Learn more” button above to tap into special offers designed to help you implement what you are learning at Google Cloud Next 25.