talk-data.com talk-data.com

Topic

GCP

Google Cloud Platform (GCP)

cloud cloud_provider infrastructure services

1670

tagged

Activity Trend

31 peak/qtr
2020-Q1 2026-Q1

Activities

1670 activities · Newest first

Damian Filonowicz: Lessons Learned from the GCP Data Migration

Join Damian Filonowicz as he shares 'Lessons Learned from the GCP Data Migration.' 🌐 Discover how PAYBACK tackled challenges in shifting data to the cloud, navigated privacy regulations, and uncovered insights about Google Cloud services like Cloud Dataflow, Cloud DLP, BigQuery, and more. Gain valuable suggestions for future endeavors in this enlightening presentation! 🚀🔍 #DataMigration #GCP #lessonslearned

✨ H I G H L I G H T S ✨

🙌 A huge shoutout to all the incredible participants who made Big Data Conference Europe 2023 in Vilnius, Lithuania, from November 21-24, an absolute triumph! 🎉 Your attendance and active participation were instrumental in making this event so special. 🌍

Don't forget to check out the session recordings from the conference to relive the valuable insights and knowledge shared! 📽️

Once again, THANK YOU for playing a pivotal role in the success of Big Data Conference Europe 2023. 🚀 See you next year for another unforgettable conference! 📅 #BigDataConference #SeeYouNextYear

Karim Wadie: Don’t Worry About BigQuery PII

Karim Wadie: Don’t Worry About BigQuery PII: How Google Cloud Helped a Large Telco Automate Data Governance at Scale

Discover how Google Cloud helped a large Telco automate data governance at scale in BigQuery with Karim Wadie. 📊🔒 Learn about the technical solution, GCP concepts, and see a live demo to fast-track your cloud journey. 🌐💡 #DataGovernance #BigQuery #googlecloud

✨ H I G H L I G H T S ✨

🙌 A huge shoutout to all the incredible participants who made Big Data Conference Europe 2023 in Vilnius, Lithuania, from November 21-24, an absolute triumph! 🎉 Your attendance and active participation were instrumental in making this event so special. 🌍

Don't forget to check out the session recordings from the conference to relive the valuable insights and knowledge shared! 📽️

Once again, THANK YOU for playing a pivotal role in the success of Big Data Conference Europe 2023. 🚀 See you next year for another unforgettable conference! 📅 #BigDataConference #SeeYouNextYear

Data Exploration and Preparation with BigQuery

In "Data Exploration and Preparation with BigQuery," Michael Kahn provides a hands-on guide to understanding and utilizing Google's powerful data warehouse solution, BigQuery. This comprehensive book equips you with the skills needed to clean, transform, and analyze large datasets for actionable business insights. What this Book will help me do Master the process of exploring and assessing the quality of datasets. Learn SQL for performing efficient and advanced data transformations in BigQuery. Optimize the performance of BigQuery queries for speed and cost-effectiveness. Discover best practices for setting up and managing BigQuery resources. Apply real-world case studies to analyze data and derive meaningful insights. Author(s) Michael Kahn is an experienced data engineer and author specializing in big data solutions and technologies. With years of hands-on experience working with Google Cloud Platform and BigQuery, he has assisted organizations in optimizing their data pipelines for effective decision-making. His accessible writing style ensures complex topics become approachable, enabling readers of various skill levels to succeed. Who is it for? This book is tailored for data analysts, data engineers, and data scientists who want to learn how to effectively use BigQuery for data exploration and preparation. Whether you're new to BigQuery or looking to deepen your expertise in working with large datasets, this book provides clear guidance and practical examples to achieve your goals.

Google Cloud Platform for Data Science: A Crash Course on Big Data, Machine Learning, and Data Analytics Services

This book is your practical and comprehensive guide to learning Google Cloud Platform (GCP) for data science, using only the free tier services offered by the platform. Data science and machine learning are increasingly becoming critical to businesses of all sizes, and the cloud provides a powerful platform for these applications. GCP offers a range of data science services that can be used to store, process, and analyze large datasets, and train and deploy machine learning models. The book is organized into seven chapters covering various topics such as GCP account setup, Google Colaboratory, Big Data and Machine Learning, Data Visualization and Business Intelligence, Data Processing and Transformation, Data Analytics and Storage, and Advanced Topics. Each chapter provides step-by-step instructions and examples illustrating how to use GCP services for data science and big data projects. Readers will learn how to set up a Google Colaboratory account and run Jupyternotebooks, access GCP services and data from Colaboratory, use BigQuery for data analytics, and deploy machine learning models using Vertex AI. The book also covers how to visualize data using Looker Data Studio, run data processing pipelines using Google Cloud Dataflow and Dataprep, and store data using Google Cloud Storage and SQL. What You Will Learn Set up a GCP account and project Explore BigQuery and its use cases, including machine learning Understand Google Cloud AI Platform and its capabilities Use Vertex AI for training and deploying machine learning models Explore Google Cloud Dataproc and its use cases for big data processing Create and share data visualizations and reports with Looker Data Studio Explore Google Cloud Dataflow and its use cases for batch and stream data processing Run data processing pipelines on Cloud Dataflow Explore Google Cloud Storageand its use cases for data storage Get an introduction to Google Cloud SQL and its use cases for relational databases Get an introduction to Google Cloud Pub/Sub and its use cases for real-time data streaming Who This Book Is For Data scientists, machine learning engineers, and analysts who want to learn how to use Google Cloud Platform (GCP) for their data science and big data projects

Architecting Data and Machine Learning Platforms

All cloud architects need to know how to build data platforms that enable businesses to make data-driven decisions and deliver enterprise-wide intelligence in a fast and efficient way. This handbook shows you how to design, build, and modernize cloud native data and machine learning platforms using AWS, Azure, Google Cloud, and multicloud tools like Snowflake and Databricks. Authors Marco Tranquillin, Valliappa Lakshmanan, and Firat Tekiner cover the entire data lifecycle from ingestion to activation in a cloud environment using real-world enterprise architectures. You'll learn how to transform, secure, and modernize familiar solutions like data warehouses and data lakes, and you'll be able to leverage recent AI/ML patterns to get accurate and quicker insights to drive competitive advantage. You'll learn how to: Design a modern and secure cloud native or hybrid data analytics and machine learning platform Accelerate data-led innovation by consolidating enterprise data in a governed, scalable, and resilient data platform Democratize access to enterprise data and govern how business teams extract insights and build AI/ML capabilities Enable your business to make decisions in real time using streaming pipelines Build an MLOps platform to move to a predictive and prescriptive analytics approach

FinOps Examples using Google Cloud BigQuery by Aliz.ai - Zoltán Guth & Gergely Schmidt, Aliz.ai

This talk was recorded at Crunch Conference 2022. Zoltán and Gergely from Aliz.ai company spoke about FinOps examples using Google Cloud BigQuery.

"In this talk we will talk about the basics of FinOps concept and going to make an introduction to it through real life examples using BigQuery."

The event was organized by Crafthub.

You can watch the rest of the conference talks on our channel.

If you are interested in more speakers, tickets and details of the conference, check out our website: https://crunchconf.com/ If you are interested in more events from our company: https://crafthub.events/

Five Things You Didn't Know You Could Do with Databricks Workflows

Databricks workflows has come a long way since the initial days of orchestrating simple notebooks and jar/wheel files. Now we can orchestrate multi-task jobs and create a chain of tasks with lineage and DAG with either fan-in or fan-out among multiple other patterns or even run another Databricks job directly inside another job.

Databricks workflows takes its tag: “orchestrate anything anywhere” pretty seriously and is a truly fully-managed, cloud-native orchestrator to orchestrate diverse workloads like Delta Live Tables, SQL, Notebooks, Jars, Python Wheels, dbt, SQL, Apache Spark™, ML pipelines with excellent monitoring, alerting and observability capabilities as well. Basically, it is a one-stop product for all orchestration needs for an efficient lakehouse. And what is even better is, it gives full flexibility of running your jobs in a cloud-agnostic and cloud-independent way and is available across AWS, Azure and GCP.

In this session, we will discuss and deep dive on some of the very interesting features and will showcase end-to-end demos of the features which will allow you to take full advantage of Databricks workflows for orchestrating the lakehouse.

Talk by: Prashanth Babu

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Anna Nerezova is a Digital Marketing and Cloud Transformation Consultant with 15 years of experience in data, analytics and optimization. She has built innovative solutions using Google Cloud to solve problems in the media and entertainment, tech, health and non-profit industries Anna is a Google Cloud Engineer Scholar, and is on the 100 Women in Analytics list by Google Analytics. She is also a Google Developer Group NYC lead and a mentor for the Northeast, and a proud Women Techmakers Ambassador.

Anna is passionate about using the latest AI/ML technologies to provide solutions and bring growth and better experience for all users. She is a strong advocate for diversity and inclusion, and is committed to helping underrepresented groups become entrepreneurs and financially independent.

We will cover how Snap (parent company of Snapchat) has been using Airflow since 2016. How we built a secure deployment on GCP that integrates with internal tools for workload authorization, RBAC and more. We made permissions for DAGs easy to use for customers using k8s workload identity binding and tight UI integration. How are we migrating 2500+ DAGs from Airflow V1, Python 2 to V2 Python 3 using tools + automations. Making code/DAG migration requires significant amount of time investment. Our team created several tools that can convert or re-write DAGs in the new format. Some other self-serving tools that we built internally.

In 2022, cloud data centres accounted for up to 3.7% of global greenhouse gas emissions, exceeding those of aviation and shipping. Yet in the same year, Britain wasted 4 Terawatt hours of renewable energy because it couldn’t be transported from where it was generated to where it was needed. So why not move the cloud to the clean energy? VertFlow is an Airflow operator that deploys workloads to the greenest Google Cloud data centre, based on the realtime carbon intensity of electricity grids worldwide. At Ovo Energy, many of our batch workloads, like generation forecasts, don’t have latency or data residency requirements, so they can run anywhere. We use VertFlow to let them chase the sun to wherever energy is greenest, helping us save carbon on our mission to save carbon. VertFlow is available on PyPI: https://pypi.org/project/VertFlow/ Find out more at https://cloud.google.com/blog/topics/sustainability/ovo-energy-builds-greener-software-with-google-cloud

DAG Authoring - learn how to go beyond the basics and best practices when implementing Airflow DAGs. It will be a survival guide for Airflow DAG developers who need to cope with hundreds of Airflow operators. This session will go beyond 101 or “for dummies” session and will be of interest to both those who are just starting to develop Airflow DAGs and Airflow experts, as it will help them improve their productivity.

Kiwi.com started using Airflow in June 2016 as an orchestrator for several people in the company. The need for the tool grew and the monolithic instance was used by 30+ teams having 500+ DAGs active resulting in 3.5 million tasks/month successfully finished. At first, we moved to using a monolithic Airflow environment, but our needs quickly changed as we wanted to support a data mesh architecture within kiwi.com. By leveraging Astronomer on GCP, we were able to move from a monolithic Airflow environment to many smaller instances of Airflow. This talk will go into how to handle things like DAG dependencies, observability, and stakeholder management. Furthermore, we’ll talk about security, particularly how GCP’s workload identity helped us achieve a passwordless Airflow experience.

For the dag owner, testing Airflow DAGs can be complicated and tedious. Kubectl cp your dag from local to pod, exec into the pod, and run a command? Install breeze? Why pull the Airflow image and start up the webserver / scheduler / triggerer if all we want is to test the addition of a new task? It doesn’t have to be this hard. At Etsy, we’ve simplified testing dags for the dag owner with dagtest. Dagtest is a Python package that we house on our internal PyPi. It is a small client binary that makes HTTP requests to a test API. The test API is a simple Flask server that receives these requests and builds pods to run airflow dags backfill commands based on the options provided via dagtest. The simplest of these is a dry-run. Typically, users run test runs where the dag executes E2E for a single ds. Equally important is the environment setup. We use an adhoc Airflow instance in a separate GCP environment with an SA that cannot write to Production buckets. This talk will discuss both.

Today I’m chatting with Bruno Aziza, Head of Data & Analytics at Google Cloud. Bruno leads a team of outbound product managers in charge of BigQuery, Dataproc, Dataflow and Looker and we dive deep on what Bruno looks for in terms of skills for these leaders. Bruno describes the three patterns of operational alignment he’s observed in data product management, as well as why he feels ownership and customer obsession are two of the most important qualities a good product manager can have. Bruno and I also dive into how to effectively abstract the core problem you’re solving, as well as how to determine whether a problem might be solved in a better way. 

Highlights / Skip to:

Bruno introduces himself and explains how he created his “CarCast” podcast (00:45) Bruno describes his role at Google, the product managers he leads, and the specific Google Cloud products in his portfolio (02:36) What Bruno feels are the most important attributes to look for in a good data product manager (03:59) Bruno details how a good product manager focuses on not only the core problem, but how the problem is currently solved and whether or not that’s acceptable (07:20) What effective abstracting the problem looks like in Bruno’s view and why he positions product management as a way to help users move forward in their career (12:38) Why Bruno sees extracting value from data as the number one pain point for data teams and their respective companies (17:55) Bruno gives his definition of a data product (21:42) The three patterns Bruno has observed of operational alignment when it comes to data product management (27:57) Bruno explains the best practices he’s seen for cross-team goal setting and problem-framing (35:30)

Quotes from Today’s Episode  

“What’s happening in the industry is really interesting. For people that are running data teams today and listening to us, the makeup of their teams is starting to look more like what we do [in] product management.” — Bruno Aziza (04:29)

“The problem is the problem, so focus on the problem, decompose the problem, look at the frictions that are acceptable, look at the frictions that are not acceptable, and look at how by assembling a solution, you can make it most seamless for the individual to go out and get the job done.” – Bruno Aziza (11:28)

“As a product manager, yes, we’re in the business of software, but in fact, I think you’re in the career management business. Your job is to make sure that whatever your customer’s job is that you’re making it so much easier that they, in fact, get so much more done, and by doing so they will get promoted, get the next job.” – Bruno Aziza (15:41)

“I think that is the task of any technology company, of any product manager that’s helping these technology companies: don’t be building a product that’s looking for a problem. Just start with the problem back and solution from that. Just make sure you understand the problem very well.” (19:52)

“If you’re a data product manager today, you look at your data estate and you ask yourself, ‘What am I building to save money? When am I building to make money?’ If you can do both, that’s absolutely awesome. And so, the data product is an asset that has been built repeatedly by a team and generates value out of data.” – Bruno Aziza (23:12)

“[Machine learning is] hard because multiple teams have to work together, right? You got your business analyst over here, you’ve got your data scientists over there, they’re not even the same team. And so, sometimes you’re struggling with just the human aspect of it.” (30:30)

“As a data leader, an IT leader, you got to think about those soft ways to accomplish the stuff that’s binary, that’s the hard [stuff], right? I always joke, the hard stuff is the soft stuff for people like us because we think about data, we think about logic, we think, ‘Okay if it makes sense, it will be implemented.’ For most of us, getting stuff done is through people. And people are emotional, how can you express the feeling of achieving that goal in emotional value?” – Bruno Aziza (37:36)

Links As referenced by Bruno, “Good Product Manager/Bad Product Manager”: https://a16z.com/2012/06/15/good-product-managerbad-product-manager/ LinkedIn: https://www.linkedin.com/in/brunoaziza/ Bruno’s Medium Article on Competing Against Luck by Clayton M. Christensen: https://brunoaziza.medium.com/competing-against-luck-3daeee1c45d4 The Data CarCast on YouTube:  https://www.youtube.com/playlist?list=PLRXGFo1urN648lrm8NOKXfrCHzvIHeYyw

Summary The problems that are easiest to fix are the ones that you prevent from happening in the first place. Sifflet is a platform that brings your entire data stack into focus to improve the reliability of your data assets and empower collaboration across your teams. In this episode CEO and founder Salma Bakouk shares her views on the causes and impacts of "data entropy" and how you can tame it before it leads to failures.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days or even weeks. By the time errors have made their way into production, it’s often too late and damage is done. Datafold built automated regression testing to help data and analytics engineers deal with data quality in their pull requests. Datafold shows how a change in SQL code affects your data, both on a statistical level and down to individual rows and values before it gets merged to production. No more shipping and praying, you can now know exactly what will change in your database! Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Visit dataengineeringpodcast.com/datafold today to book a demo with Datafold. RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their extensive library of integrations enable you to automatically send data to hundreds of downstream tools. Sign up free at dataengineeringpodcast.com/rudder Data teams are increasingly under pressure to deliver. According to a recent survey by Ascend.io, 95% in fact reported being at or over capacity. With 72% of data experts reporting demands on their team going up faster than they can hire, it’s no surprise they are increasingly turning to automation. In fact, while only 3.5% report having current investments in automation, 85% of data teams plan on investing in automation in the next 12 months. 85%!!! That’s where our friends at Ascend.io come in. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation, orchestration, and observability. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend automates workloads on Snowflake, Databricks, BigQuery, and open source Spark, and can be deployed in AWS, Azure, or GCP. Go to dataengineeringpodcast.com/ascend and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $5,000 when you become a customer. Your host is Tobias Macey and today I’m interviewing Salma Bakouk about achieving data reliability and reducing entropy within your data stack with sifflet

Interview

Introduction How did you get involved in the area of data management? Can you describe what Sifflet is and the st

Summary CreditKarma builds data products that help consumers take advantage of their credit and financial capabilities. To make that possible they need a reliable data platform that empowers all of the organization’s stakeholders. In this episode Vishnu Venkataraman shares the journey that he and his team have taken to build and evolve their systems and improve the product offerings that they are able to support.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days or even weeks. By the time errors have made their way into production, it’s often too late and damage is done. Datafold built automated regression testing to help data and analytics engineers deal with data quality in their pull requests. Datafold shows how a change in SQL code affects your data, both on a statistical level and down to individual rows and values before it gets merged to production. No more shipping and praying, you can now know exactly what will change in your database! Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Visit dataengineeringpodcast.com/datafold today to book a demo with Datafold. RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their extensive library of integrations enable you to automatically send data to hundreds of downstream tools. Sign up free at dataengineeringpodcast.com/rudder Data teams are increasingly under pressure to deliver. According to a recent survey by Ascend.io, 95% in fact reported being at or over capacity. With 72% of data experts reporting demands on their team going up faster than they can hire, it’s no surprise they are increasingly turning to automation. In fact, while only 3.5% report having current investments in automation, 85% of data teams plan on investing in automation in the next 12 months. 85%!!! That’s where our friends at Ascend.io come in. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation, orchestration, and observability. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend automates workloads on Snowflake, Databricks, BigQuery, and open source Spark, and can be deployed in AWS, Azure, or GCP. Go to dataengineeringpodcast.com/ascend and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $5,000 when you become a customer. Your host is Tobias Macey and today I’m interviewing Vishnu Venkataraman about building the data platform at CreditKarma and the forces that shaped the design

Interview

Introduction How did you get involved in the area of data management? Can you describe what CreditKarma is and the role

Learning Google Analytics

Why is Google Analytics 4 the most modern data model available for digital marketing analytics? Rather than simply reporting what has happened, GA4's new cloud integrations enable more data activation, linking online and offline data across all your streams to provide end-to-end marketing data. This practical book prepares you for the future of digital marketing by demonstrating how GA4 supports these additional cloud integrations. Author Mark Edmondson, Google developer expert for Google Analytics and Google Cloud, provides a concise yet comprehensive overview of GA4 and its cloud integrations. Data, business, and marketing analysts will learn major facets of GA4's powerful new analytics model, with topics including data architecture and strategy, and data ingestion, storage, and modeling. You'll explore common data activation use cases and get the guidance you need to implement them. You'll learn: How Google Cloud integrates with GA4 The potential use cases that GA4 integrations can enable Skills and resources needed to create GA4 integrations How much GA4 data capture is necessary to enable use cases The process of designing dataflows from strategy through data storage, modeling, and activation How to adapt the use cases to fit your business needs

Summary Despite the best efforts of data engineers, data is as messy as the real world. Entity resolution and fuzzy matching are powerful utilities for cleaning up data from disconnected sources, but it has typically required custom development and training machine learning models. Sonal Goyal created and open-sourced Zingg as a generalized tool for data mastering and entity resolution to reduce the effort involved in adopting those practices. In this episode she shares the story behind the project, the details of how it is implemented, and how you can use it for your own data projects.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their extensive library of integrations enable you to automatically send data to hundreds of downstream tools. Sign up free at dataengineeringpodcast.com/rudder Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days or even weeks. By the time errors have made their way into production, it’s often too late and damage is done. Datafold built automated regression testing to help data and analytics engineers deal with data quality in their pull requests. Datafold shows how a change in SQL code affects your data, both on a statistical level and down to individual rows and values before it gets merged to production. No more shipping and praying, you can now know exactly what will change in your database! Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Visit dataengineeringpodcast.com/datafold today to book a demo with Datafold. Data teams are increasingly under pressure to deliver. According to a recent survey by Ascend.io, 95% in fact reported being at or over capacity. With 72% of data experts reporting demands on their team going up faster than they can hire, it’s no surprise they are increasingly turning to automation. In fact, while only 3.5% report having current investments in automation, 85% of data teams plan on investing in automation in the next 12 months. 85%!!! That’s where our friends at Ascend.io come in. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation, orchestration, and observability. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend automates workloads on Snowflake, Databricks, BigQuery, and open source Spark, and can be deployed in AWS, Azure, or GCP. Go to dataengineeringpodcast.com/ascend and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $5,000 when you become a customer. Your host is Tobias Macey and today I’m interviewing Sonal Goyal about Zingg, an open source entity resolution frame