talk-data.com talk-data.com

Topic

Cloud Computing

infrastructure saas iaas

4055

tagged

Activity Trend

471 peak/qtr
2020-Q1 2026-Q1

Activities

4055 activities · Newest first

Cracking the Data Engineering Interview

"Cracking the Data Engineering Interview" is your essential guide to mastering the data engineering interview process. This book offers practical insights and techniques to build your resume, refine your skills in Python, SQL, data modeling, and ETL, and confidently tackle over 100 mock interview questions. Gain the knowledge and confidence to land your dream role in data engineering. What this Book will help me do Craft a compelling data engineering portfolio to stand out to employers. Refresh and deepen understanding of essential topics like Python, SQL, and ETL. Master over 100 interview questions that cover both technical and behavioral aspects. Understand data engineering concepts such as data modeling, security, and CI/CD. Develop negotiation, networking, and personal branding skills crucial for job applications. Author(s) None Bryan and None Ransome are seasoned authors with a wealth of experience in data engineering and professional development. Drawing from their extensive industry backgrounds, they provide actionable strategies for aspiring data engineers. Their approachable writing style and real-world insights make complex topics accessible to readers. Who is it for? This book is ideal for aspiring data engineers looking to navigate the job application process effectively. Readers should be familiar with data engineering fundamentals, including Python, SQL, cloud data platforms, and ETL processes. It's tailored for professionals aiming to enhance their portfolios, tackle challenging interviews, and boost their chances of landing a data engineering role.

IBM TS7700 R5 DS8000 Object Store User's Guide

The IBM® TS7700 features a functional enhancement that allows for the TS7700 to act as an object store for transparent cloud tiering with IBM DS8000®, DFSMShsm (HSM), and native DFSMSdss (DSS). This function can be used to move data sets directly from DS8000 to TS7700. This IBM Redpaper publication provides a functional overview of the features, provides client value information, and walks through DFSMS, DS8000, and TS7700 set up steps.

Python for Data Science For Dummies, 3rd Edition

Let Python do the heavy lifting for you as you analyze large datasets Python for Data Science For Dummies lets you get your hands dirty with data using one of the top programming languages. This beginner’s guide takes you step by step through getting started, performing data analysis, understanding datasets and example code, working with Google Colab, sampling data, and beyond. Coding your data analysis tasks will make your life easier, make you more in-demand as an employee, and open the door to valuable knowledge and insights. This new edition is updated for the latest version of Python and includes current, relevant data examples. Get a firm background in the basics of Python coding for data analysis Learn about data science careers you can pursue with Python coding skills Integrate data analysis with multimedia and graphics Manage and organize data with cloud-based relational databases Python careers are on the rise. Grab this user-friendly Dummies guide and gain the programming skills you need to become a data pro.

Summary

Databases are the core of most applications, but they are often treated as inscrutable black boxes. When an application is slow, there is a good probability that the database needs some attention. In this episode Lukas Fittl shares some hard-won wisdom about the causes and solution of many performance bottlenecks and the work that he is doing to shine some light on PostgreSQL to make it easier to understand how to keep it running smoothly.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free! Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare production and development environments and column-level lineage to show you the exact impact of every code change on data, metrics, and BI tools, keeping your team productive and stakeholders happy. Datafold integrates with dbt, the modern data stack, and seamlessly plugs in your data CI for team-wide and automated testing. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold Your host is Tobias Macey and today I'm interviewing Lukas Fittl about optimizing your database performance and tips for tuning Postgres

Interview

Introduction How did you get involved in the area of data management? What are the different ways that database performance problems impact the business? What are the most common contributors to performance issues? What are the useful signals that indicate performance challenges in the database?

For a given symptom, what are the steps that you recommend for determining the proximate cause?

What are the potential negative impacts to be aware of when tu

On today’s episode, we’re joined by Lee Blakemore, Chief Executive Officer of Introhive, the leading Client Intelligence Platform. We talk about:  Building a culture that’s data-oriented & encouraging autonomy Resisting unproductive fads, such as pushing for metrics that don't correlate to profit & revenueBeing pragmatic – without killing opportunitiesAmusing anecdotes about the early cloud daysIf startups should invest more in marketing or sales

Summary

Databases are the core of most applications, whether transactional or analytical. In recent years the selection of database products has exploded, making the critical decision of which engine(s) to use even more difficult. In this episode Tanya Bragin shares her experiences as a product manager for two major vendors and the lessons that she has learned about how teams should approach the process of tool selection.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free! This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare production and development environments and column-level lineage to show you the exact impact of every code change on data, metrics, and BI tools, keeping your team productive and stakeholders happy. Datafold integrates with dbt, the modern data stack, and seamlessly plugs in your data CI for team-wide and automated testing. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold Data projects are notoriously complex. With multiple stakeholders to manage across varying backgrounds and toolchains even simple reports can become unwieldy to maintain. Miro is your single pane of glass where everyone can discover, track, and collaborate on your organization's data. I especially like the ability to combine your technical diagrams with data documentation and dependency mapping, allowing your data engineers and data consumers to communicate seamlessly about your projects. Find simplicity in your most complex projects with Miro. Your first three Miro boards are free when you sign up today at dataengineeringpodcast.com/miro. That’s three free boards at dataengineeringpodcast.com/miro. Your host is Tobias Macey and today I'm interviewing Tanya Bragin about her views on the database products market

Interview

Introduction How did you get involved in the area of data management? What are the aspects of the database market that keep you interested as a VP of product?

How have your experiences at Elastic informed your current work at Clickhouse?

What are the main product categories for databases today?

What are the industry trends that have the most impact on the development and growth of different product categories? Which categories do you see growing the fastest?

When a team is selecting a database technology for a given task, what are the types of questions that they should be asking? Transactional engines like Postgres, SQL Server, Oracle, etc. were long used

Central application for all your dbt packages - Coalesce 2023

dbt packages are libraries for dbt. Packages can produce information about best practice for your dbt project (ex: dbt project evaluator) and cloud warehouse cost overviews. Unfortunately, all theses KPIs are stored in your data warehouse and it can be painful and expensive to create data visualization dashboards. This application build automatically dashboards from dbt packages that you are using. You just need to parameter your dbt Cloud API key - that's it! In this session, you'll learn how.

Speaker: Adrien Boutreau, Head of Analytics Engineers , Infinite Lambda

Register for Coalesce at https://coalesce.getdbt.com

Hands-on tips to get started with CI in dbt Cloud - Coalesce 2023

Learn best practices for improving your data workflows at scale. In this session, the dbt Labs team shares tactical ideas for setting up CI for the first time and shipping with confidence, as well as tips to take your implementation to the next level.

Speaker: Joel Labes, Senior Developer Experience Advocate, dbt Labs

Register for Coalesce at https://coalesce.getdbt.com

Siemens' data evolution: dbt Cloud and the data mesh - Coalesce 2023

Siemens has been revamping how it approaches data, looking to democratize data access to unlock faster innovation. It recently rolled out Siemens Data Cloud — a data mesh with Snowflake and dbt Cloud at its heart. The goal: ensure the people closest to the business problems were empowered to self-serve responsibly, without compromising on governance or creating silos.

This is the story of how Siemens has achieved success with dbt Cloud and a data mesh — and what the future holds in store.

Register for Coalesce at https://coalesce.getdbt.com

How TOCA Football keeps their eye on the ball with dbt and data observability - Coalesce 2023

TOCA Football, the largest operator of indoor soccer centers in North America, leverages accurate data to power analytics for over 30 training centers, providing everything from operational insights for executives to ball-by-ball analysis.

In 2020, the team adopted a cloud-native data stack with dbt to scale analytics enablement for the go-to-market org, including the company’s finance, strategy, operations, and marketing teams. By 2022, their lean team of four was struggling to gain visibility into the health and performance of their dbt models. So, what was the TOCA team to do? Two words: data observability.

In this talk, Sam Cvetkovski, Director, Data & Analytics discusses how TOCA built their larger data observability strategy to reduce model bloat, increase data accuracy, and boost stakeholder satisfaction with their team’s data products. She shares her biggest “aha!” moments, key challenges, and best practices for teams getting started on their dbt reliability journeys.

Speakers: Sam Cvetkovski, Director, Data & Analytics, TOCA Football; Barr Moses, Co-Founder & CEO, Monte Carlo

Register for Coalesce at https://coalesce.getdbt.com

Automating accounting's end-of-month close process (and passing a SOX audit) - Coalesce 2023

In this session, the team at Rocket Money explains how they built a Quote-to-Cash (Q2C) system in dbt Cloud that the accounting team uses for end-of-month close. This was also used to pass Rocket Money's first SOX audit with zero deficiencies. The presentation will review how the team gathered requirements (for different products, payment providers, etc.) and built and tested all of their models.

Speaker: Amber Oar, Staff Analytics Engineer, Rocket Money

Register for Coalesce at https://coalesce.getdbt.com

Transforming healthcare by putting data in the driver’s seat at Vida Health - Coalesce 2023

In this session, Vida Health’s senior director of data, mobile, and web engineering shares a story that can help other data and business leaders capitalize on the opportunities being created by current technology innovations, market realities, and real-world problems. This includes a playbook on how Vida Health uses modern data technologies like dbt Cloud, Fivetran, Looker, BigQuery, BigQueryML/dbtML, Vertex AI, LLMs, and more to put data in the driver’s seat to solve meaningful problems in complex industries like healthcare.

Speaker: Trenton Huey, Senior Director, Data and Frontend Engineering, Vida Health

Register for Coalesce at https://coalesce.getdbt.com

Driving toward data mesh at Rivian - Coalesce 2023

This session shares Rivian’s journey building an analytics ecosystem from scratch over the last two years, centered around dbt and dbt Cloud. Through this work, Rivian is driving towards a healthy data mesh that links data and developers across many domains at the company, to enable rapid growth and value as they ramp to produce their new fleet of EVs and keep the world adventurous.

Speakers: Will Bishop, Manager, Data Science Analytics, Rivian

Register for Coalesce at https://coalesce.getdbt.com

How Fanduel migrated a mature data organization - Coalesce 2023

Fanduel, America's leading sports betting company, migrated all of its data pipelines to integrate with dbt Cloud in order to make data transformation processes more reliable, scalable, and maintainable.

In this presentation, the Fanduel team shares the challenges faced during the migration, including overcoming the inertia of established data infrastructure, integrating with legacy systems, and ensuring a smooth transition without disrupting ongoing data operations. The migration process fostered newfound collaboration among data analysts, product owners, and engineers, with the added benefit of accommodating future business needs.

Speakers: Phillip Tan, Technical Program Manager, Fanduel; Michael Lee, Senior Data Engineer, Fanduel; Harry Williams, Senior Data Engineer, FanDuel

Register for Coalesce at https://coalesce.getdbt.com

Warehouse-first data strategy at ClickUp - Coalesce 2023

During the data team's short tenure (2.5 years) at ClickUp, they have built and scaled a fully modern data stack and implemented a warehouse-first data strategy. ClickUp's data is comprised of thousands of dbt models and upstream/downstream integrations with nearly every software at ClickUp. ClickUp uses dbt Cloud and Snowflake to power dozens of downstream systems with audience creation, marketing optimization, predictive customer lifecycle ML, a PLG/PLS motion, and much more. This session covers the foundational principles ClickUp follows and how warehouse-first thinking has unlocked tremendous value for ClickUp.

Speakers: Marc Stone, Head of Data, ClickUp

Register for Coalesce at https://coalesce.getdbt.com

Better CI for better data quality - Coalesce 2023

Continuous Integration (CI) in dbt Cloud makes it easy to test every change you make prior to deploying. It’s a hallmark of mature analytics workflows. We’ve made some major improvements to dbt Cloud CI, so it’s easier than ever to prevent breaking changes, save on costs, and keep those pesky stakeholders happy.

Join the dbt Labs product team on this magical journey to a world of better data quality, and see for yourself what CI can do for you.

Speaker: Grace Goheen, Product Manager, dbt Labs

Register for Coalesce at https://coalesce.getdbt.com

Lazy devs unite! Building a data ecosystem that spoils data engineers - Coalesce 2023

Join Ryan Dolley and Jan Soubusta for a journey into the world of end-to-end analytics pipelines and how they can be a data engineer's best friend.

Learn how to automate boring tasks and create a safe haven for data engineers using the dynamic duo: dbt for transformative magic and GoodData for analytics awesomeness.

Combined with an data extraction and orchestration tools, you form the Voltron of easy to automate end-to-end analytics flows bringing data from source systems all the way through BI and to your end users.

And don't miss the grand finale where they reveal an alternative deployment on dbt Cloud that's so easy to orchestrate your coffee mug could do it. Prepare to laugh, learn, and level up your data game!

Speakers: Ryan Dolley, VP of Product Strategy, GoodData; Jan Soubusta, Distinguished Software Engineer, GoodData

Register for Coalesce at https://coalesce.getdbt.com

Scaling collaboration with dbt Cloud - Coalesce 2023

dbt has been adopted as an industry standard, now used by more than 25,000 organizations. For larger companies with more complex deployments, however, it can still be a challenge to sustain efficient collaboration around data development without sacrificing governance. Here is how dbt Cloud is helping to overcome that challenge.

Speakers: Jeremy Cohen, Product Manager, dbt Labs; Cameron Afzal, Product Manager, dbt Labs