How to Successfully Launch a SaaS Product with Boris Berenberg from Modus Create

2022-10-05 · SaaS Scaled - Interviews about SaaS Startups, Analytics, & Operations Listen

podcast_episode

by Boris Berenberg (Modus Create)

Analytics AWS SaaS Cyber Security

In today’s episode, we are joined by Boris Berenberg. Boris is VP of Product at Modus Create, a digital transformation consulting firm aimed at helping clients build competitive advantage through digital innovation.

We talk about:

How Modus works and the problems it solves.Boris’ background and how he got into building products.Finding the optimal sweet spot between growth and efficiency.Redefining your target audience and customer needs.The importance of go-to-market for products.The various phases of thinking through a successful product.The importance of quality content in SaaS.

This episode is brought to you by Qrvey

The tools you need to take action with your data, on a platform built for maximum scalability, security, and cost efficiencies. If you’re ready to reduce complexity and dramatically lower costs, contact us today at qrvey.com. Qrvey, the modern no-code analytics solution for SaaS companies on AWS.

saas #analytics #AWS #BI

Gain Visibility And Insight Into Your Supply Chains Through Operational Analytics Powered By Roambee

2022-10-03 · Data Engineering Podcast Listen

podcast_episode

by Sanjay Sharma (Roambee) , Tobias Macey

Analytics Data Engineering Data Management Data Quality Dataflow ETL/ELT Google Analytics Hevo Data Kubernetes Modern Data Stack MongoDB MySQL +3 more

Summary The global economy is dependent on complex and dynamic networks of supply chains powered by sophisticated logistics. This requires a significant amount of data to track shipments and operational characteristics of materials and goods. Roambee is a platform that collects, integrates, and analyzes all of that information to provide companies with the critical insights that businesses need to stay running, especially in a time of such constant change. In this episode Roambee CEO, Sanjay Sharma, shares the types of questions that companies are asking about their logistics, the technical work that they do to provide ways to answer those questions, and how they approach the challenge of data quality in its many forms.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan today to learn more about how Atlan’s active metadata platform is helping pioneering data teams like Postman, Plaid, WeWork & Unilever achieve extraordinary things with metadata and escape the chaos. Prefect is the modern Dataflow Automation platform for the modern data stack, empowering data practitioners to build, run and monitor robust pipelines at scale. Guided by the principle that the orchestrator shouldn’t get in your way, Prefect is the only tool of its kind to offer the flexibility to write code as workflows. Prefect specializes in glueing together the disparate pieces of a pipeline, and integrating with modern distributed compute libraries to bring power where you need it, when you need it. Trusted by thousands of organizations and supported by over 20,000 community members, Prefect powers over 100MM business critical tasks a month. For more information on Prefect, visit dataengineeringpodcast.com/prefect. Data engineers don’t enjoy writing, maintaining, and modifying ETL pipelines all day, every day. Especially once they realize 90% of all major data sources like Google Analytics, Salesforce, Adwords, Facebook, Spreadsheets, etc., are already available as plug-and-play connectors with reliable, intuitive SaaS solutions. Hevo Data is a highly reliable and intuitive data pipeline platform used by data engineers from 40+ countries to set up and run low-latency ELT pipelines with zero maintenance. Boasting more than 150 out-of-the-box connectors that can be set up in minutes, Hevo also allows you to monitor and control your pipelines. You get: real-time data flow visibility, fail-safe mechanisms, and alerts if anything breaks; preload transformations and auto-schema mapping precisely control how data lands in your destination; models and workflows to transform data for analytics; and reverse-ETL capability to move the transformed data back to your business software to inspire timely action. All of this, plus its transparent pricing and 24*7 live support, makes it consistently voted by user

Make Data Lineage A Ubiquitous Part Of Your Work By Simplifying Its Implementation With Alvin

2022-10-03 · Data Engineering Podcast Listen

podcast_episode

by Martin Sahlen (Alvin) , Tobias Macey

AI/ML Airflow AWS Azure BigQuery CDP Cloud Computing Dashboard Data Engineering Data Lake Data Management Data Quality +17 more

Summary Data lineage is something that has grown from a convenient feature to a critical need as data systems have grown in scale, complexity, and centrality to business. Alvin is a platform that aims to provide a low effort solution for data lineage capabilities focused on simplifying the work of data engineers. In this episode co-founder Martin Sahlen explains the impact that easy access to lineage information can have on the work of data engineers and analysts, and how he and his team have designed their platform to offer that information to engineers and stakeholders in the places that they interact with data.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! You wake up to a Slack message from your CEO, who’s upset because the company’s revenue dashboard is broken. You’re told to fix it before this morning’s board meeting, which is just minutes away. Enter Metaplane, the industry’s only self-serve data observability tool. In just a few clicks, you identify the issue’s root cause, conduct an impact analysis⁠—and save the day. Data leaders at Imperfect Foods, Drift, and Vendr love Metaplane because it helps them catch, investigate, and fix data quality issues before their stakeholders ever notice they exist. Setup takes 30 minutes. You can literally get up and running with Metaplane by the end of this podcast. Sign up for a free-forever plan at dataengineeringpodcast.com/metaplane, or try out their most advanced features with a 14-day free trial. Mention the podcast to get a free "In Data We Trust World Tour" t-shirt. RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their state-of-the-art reverse ETL pipelines enable you to send enriched data to any cloud tool. Sign up free… or just get the free t-shirt for being a listener of the Data Engineering Podcast at dataengineeringpodcast.com/rudder. Data teams are increasingly under pressure to deliver. According to a recent survey by Ascend.io, 95% in fact reported being at or over capacity. With 72% of data experts reporting demands on their team going up faster than they can hire, it’s no surprise they are increasingly turning to automation. In fact, while only 3.5% report having current investments in automation, 85% of data teams plan on investing in automation in the next 12 months. 85%!!! That’s where our friends at Ascend.io come in. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation, orchestration, and observability. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend automates workloads on Snowflake, Databricks, BigQuery, and open source Spark, and can be deployed in AWS, Azure, or GCP. Go to dataengineeringpodcast.com/ascend and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $5,000 when you become a customer. Your host is Tobias Macey and today I’m interviewing Martin Sahlen about his work on data lineage at Alvin and how it factors into the day-to-day work of data engineers

Interview

Introduction How did you get involved in the area of data management? Can you describe what Alvin is and the story behind it? What is the core problem that you are trying to solve at Alvin? Data lineage has quickly become an overloaded term. What are the elements of lineage that you are focused on addressing?

What are some of the other sources/pieces of information that you integrate into the lineage graph?

How does data lineage show up in the work of data engineers?

In what ways does your focus on data engineers inform the way that you model the lineage information?

As with every data asset/product, the lineage graph is only as useful as the data that it stores. What are some of the ways that you focus on establishing and ensuring a complete view of lineage?

How do you account for assets (e.g. tables, dashboards, exports, etc.) that are created outside of the "officially supported" methods? (e.g. someone manually runs a SQL create statement, etc.)

Can you describe how you have implemented the Alvin platform?

How have the design and goals shifted from when you first started exploring the problem?

What are the types of data systems/assets that you are focused on supporting? (e.g. data warehouses vs. lakes, structured vs. unstructured, which BI tools, etc.) How does Alvin fit into the workflow of data engineers and their downstream customers/collaborators?

What are some of the design choices (both visual and functional) that you focused on to avoid friction in the data engineer’s workflow?

What are some of the open questions/areas for investigation/improvement in the space of data lineage?

What are the factors that contribute to the difficulty of a truly holistic and complete view of lineage across an organization?

What are the most interesting, innovative, or unexpected ways that you have seen Alvin used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Alvin? When is Alvin the wrong choice? What do you have planned for the future of Alvin?

Contact Info

LinkedIn @martinsahlen on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don’t forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

Alvin Unacast sqlparse Python library Cython

Podcast.init Episode

Antlr Kotlin programming language PostgreSQL

Podcast Episode

OpenSearch ElasticSearch Redis Kubernetes Airflow BigQuery Spark Looker Mode

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast

Succeeding as a SaaS Founder with Indus Khaitan at Quolum

2022-09-28 · SaaS Scaled - Interviews about SaaS Startups, Analytics, & Operations Listen

podcast_episode

by Indus Khaitan (Quolum)

Analytics AWS SaaS Cyber Security

In today’s episode, we’re joined by Indus Khaitan. Indus is the CEO and Co-Founder of Quolum, a platform to make buying SaaS products as easy as possible.

We talk about:

Indus’ background, growing up in a mining town in India and moving to the USA to work in tech.How Quolum got started and the problems it solves today.Growing a business slowly and organically vs pushing to grow as fast as possible.Indus’ advice for early-stage founders.Is the SaaS market too heavily influenced by investors?The danger of celebrating unicorn valuations and funding.Some of the key events in Indus’ life that helped him in business.Why do people choose to risk it as a founder?

This episode is brought to you by Qrvey

The tools you need to take action with your data, on a platform built for maximum scalability, security, and cost efficiencies. If you’re ready to reduce complexity and dramatically lower costs, contact us today at qrvey.com.

Qrvey, the modern no-code analytics solution for SaaS companies on AWS.

saas #analytics #AWS #BI

Build A Common Understanding Of Your Data Reliability Rules With Soda Core and Soda Checks Language

2022-09-26 · Data Engineering Podcast Listen

podcast_episode

by Tom Baeyens (Soda Data) , Tobias Macey

Analytics Data Engineering Data Management Dataflow ETL/ELT Google Analytics Hevo Data Kubernetes Modern Data Stack MongoDB MySQL Prefect +2 more

Summary Regardless of how data is being used, it is critical that the information is trusted. The practice of data reliability engineering has gained momentum recently to address that question. To help support the efforts of data teams the folks at Soda Data created the Soda Checks Language and the corresponding Soda Core utility that acts on this new DSL. In this episode Tom Baeyens explains their reasons for creating a new syntax for expressing and validating checks for data assets and processes, as well as how to incorporate it into your own projects.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan today to learn more about how Atlan’s active metadata platform is helping pioneering data teams like Postman, Plaid, WeWork & Unilever achieve extraordinary things with metadata and escape the chaos. Prefect is the modern Dataflow Automation platform for the modern data stack, empowering data practitioners to build, run and monitor robust pipelines at scale. Guided by the principle that the orchestrator shouldn’t get in your way, Prefect is the only tool of its kind to offer the flexibility to write code as workflows. Prefect specializes in glueing together the disparate pieces of a pipeline, and integrating with modern distributed compute libraries to bring power where you need it, when you need it. Trusted by thousands of organizations and supported by over 20,000 community members, Prefect powers over 100MM business critical tasks a month. For more information on Prefect, visit dataengineeringpodcast.com/prefect. Data engineers don’t enjoy writing, maintaining, and modifying ETL pipelines all day, every day. Especially once they realize 90% of all major data sources like Google Analytics, Salesforce, Adwords, Facebook, Spreadsheets, etc., are already available as plug-and-play connectors with reliable, intuitive SaaS solutions. Hevo Data is a highly reliable and intuitive data pipeline platform used by data engineers from 40+ countries to set up and run low-latency ELT pipelines with zero maintenance. Boasting more than 150 out-of-the-box connectors that can be set up in minutes, Hevo also allows you to monitor and control your pipelines. You get: real-time data flow visibility, fail-safe mechanisms, and alerts if anything breaks; preload transformations and auto-schema mapping precisely control how data lands in your destination; models and workflows to transform data for analytics; and reverse-ETL capability to move the transformed data back to your business software to inspire timely action. All of this, plus its transparent pricing and 24*7 live support, makes it consistently voted by users as the Leader in the Data Pipeline category on review platforms like G2. Go to dataengineeringpodcast.com/hevodata and sign up for a free 14-day trial that also comes

Azure Data Engineering Cookbook - Second Edition

2022-09-26 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Nagaraj Venkatesan , Ahmad Osama , Luca Zanna

Analytics Azure ADF Cloud Computing Data Engineering Data Lake Databricks Microsoft Power BI RDBMS Synapse data +1 more

Azure Data Engineering Cookbook is your ultimate guide to mastering data engineering on Microsoft's Azure platform. Through an engaging collection of recipes, this book breaks down procedures to build sophisticated data pipelines, leveraging tools like Azure Data Factory, Data Lake, Databricks, and Synapse Analytics. What this Book will help me do Efficiently process large datasets using Azure Synapse analytics and Azure Databricks pipelines. Transform and shape data within systems by leveraging Azure Synapse data flows. Implement and manage relational databases in Azure with performance tuning and administration. Configure data pipeline solutions integrated with Power BI for insightful reporting. Monitor, optimize, and ensure lineage tracking for your data systems efficiently with Purview and Log analytics. Author(s) Nagaraj Venkatesan is an experienced cloud architect specializing in Microsoft Azure, with years of hands-on data engineering expertise. Ahmad Osama is a seasoned data professional and author's shared emphasis is on practical learning and bridging this with actionable skills effectively. Who is it for? This book is essential for data engineers seeking expertise in Azure's rich engineering capabilities. It's tailored for professionals with a foundational knowledge of cloud services, looking to achieve advanced proficiency in Azure data engineering pipelines.

Shaping the Future of Remote Work with Dvir Shapira at Venn

2022-09-21 · SaaS Scaled - Interviews about SaaS Startups, Analytics, & Operations Listen

podcast_episode

by Dvir Shapira (Venn LocalZone)

Analytics AWS SaaS Cyber Security

In today’s episode we’re talking to Dvir Shapira. Dvir is Chief Product Officer at Venn LocalZone, a company that’s creating a secure workspace for remote work.

We talk about:

…and much more.

This episode is brought to you by Qrvey

The tools you need to take action with your data, on a platform built for maximum scalability, security, and cost efficiencies. If you’re ready to reduce complexity and dramatically lower costs, contact us today at qrvey.com.

Qrvey, the modern no-code analytics solution for SaaS companies on AWS.

saas #analytics #AWS #BI

Learning Microsoft Power BI

2022-09-20 · O'Reilly Data Science Books O'Reilly Amazon

book

by Jeremey Arnold

Analytics Data Analytics Microsoft Power BI Cyber Security business-intelligence data data-science microsoft-power-platform power-bi

Microsoft Power BI is a data analytics and visualization tool powerful enough for the most demanding data scientists, but accessible enough for everyday use for anyone who needs to get more from data. The market has many books designed to train and equip professional data analysts to use Power BI, but few of them make this tool accessible to anyone who wants to get up to speed on their own. This streamlined intro to Power BI covers all the foundational aspects and features you need to go from "zero to hero" with data and visualizations. Whether you work with large, complex datasets or work in Microsoft Excel, author Jeremey Arnold shows you how to teach yourself Power BI and use it confidently as a regular data analysis and reporting tool. You'll learn how to: Import, manipulate, visualize, and investigate data in Power BI Approach solutions for both self-service and enterprise BI Use Power BI in your organization's business intelligence strategy Produce effective reports and dashboards Create environments for sharing reports and managing data access with your team Determine the right solution for using Power BI offerings based on size, security, and computational needs

Location Intelligence Part III: Enabling Technologies - Audio Blog

2022-09-19 · Secrets of Data Analytics Leaders Listen

podcast_episode

This article, the third in a series, dives into the technologies that underpin modern approaches to location intelligence. It explores databases for industrial-scale geospatial applications, advanced business intelligence (BI) tools for exploratory analysis, and simple use-case specific platforms. Published at: https://www.eckerson.com/articles/location-intelligence-part-iii-enabling-technologies

Building A Shared Understanding Of Data Assets In A Business Through A Single Pane Of Glass With Workstream

2022-09-19 · Data Engineering Podcast Listen

podcast_episode

by Nichola Freund (Workstream.io) , Tobias Macey

Analytics Data Engineering Data Management Dataflow ETL/ELT Google Analytics Hevo Data Kubernetes Modern Data Stack MongoDB MySQL Prefect +2 more

Summary There is a constant tension in business data between growing siloes, and breaking them down. Even when a tool is designed to integrate information as a guard against data isolation, it can easily become a silo of its own, where you have to make a point of using it to seek out information. In order to help distribute critical context about data assets and their status into the locations where work is being done Nicholas Freund co-founded Workstream. In this episode he discusses the challenge of maintaining shared visibility and understanding of data work across the various stakeholders and his efforts to make it a seamless experience.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan today to learn more about how Atlan’s active metadata platform is helping pioneering data teams like Postman, Plaid, WeWork & Unilever achieve extraordinary things with metadata and escape the chaos. Prefect is the modern Dataflow Automation platform for the modern data stack, empowering data practitioners to build, run and monitor robust pipelines at scale. Guided by the principle that the orchestrator shouldn’t get in your way, Prefect is the only tool of its kind to offer the flexibility to write code as workflows. Prefect specializes in glueing together the disparate pieces of a pipeline, and integrating with modern distributed compute libraries to bring power where you need it, when you need it. Trusted by thousands of organizations and supported by over 20,000 community members, Prefect powers over 100MM business critical tasks a month. For more information on Prefect, visit dataengineeringpodcast.com/prefect. Data engineers don’t enjoy writing, maintaining, and modifying ETL pipelines all day, every day. Especially once they realize 90% of all major data sources like Google Analytics, Salesforce, Adwords, Facebook, Spreadsheets, etc., are already available as plug-and-play connectors with reliable, intuitive SaaS solutions. Hevo Data is a highly reliable and intuitive data pipeline platform used by data engineers from 40+ countries to set up and run low-latency ELT pipelines with zero maintenance. Boasting more than 150 out-of-the-box connectors that can be set up in minutes, Hevo also allows you to monitor and control your pipelines. You get: real-time data flow visibility, fail-safe mechanisms, and alerts if anything breaks; preload transformations and auto-schema mapping precisely control how data lands in your destination; models and workflows to transform data for analytics; and reverse-ETL capability to move the transformed data back to your business software to inspire timely action. All of this, plus its transparent pricing and 24*7 live support, makes it consistently voted by users as the Leader in the Data Pipeline category on review platforms like G2. Go to

Business Intelligence with Databricks SQL

2022-09-16 · O'Reilly SQL Books O'Reilly Amazon

book

by Vihag Gupta

Analytics Data Analytics Data Lakehouse Data Management Databricks Delta DWH Cyber Security SQL business intelligence

Discover the power of business intelligence through Databricks SQL. This comprehensive guide explores the features and tools of the Databricks Lakehouse Platform, emphasizing how it leverages data lakes and warehouses for scalable analytics. You'll gain hands-on experience with Databricks SQL, enabling you to manage data efficiently and implement cutting-edge analytical solutions. What this Book will help me do Comprehend the core features of Databricks SQL and its role in the Lakehouse architecture. Master the use of Databricks SQL for conducting scalable and efficient data queries. Implement data management techniques, including security and cataloging, with Databricks. Optimize data performance using Delta Lake and Photon technologies with Databricks SQL. Compose advanced SQL scripts for robust data ingestion and analytics workflows. Author(s) Vihag Gupta, acclaimed data engineer and BI expert, brings a wealth of experience in large-scale data analytics to this work. With a career deeply rooted in cutting-edge data warehousing technologies, Vihag combines expertise with an approachable teaching style. This book reflects his commitment to empowering data professionals with tools for next-gen analytics. Who is it for? Ideal for data engineers, business intelligence analysts, and warehouse administrators aiming to enhance their practice with Databricks SQL. This book suits those with fundamental knowledge of SQL and data platforms seeking to adopt Lakehouse methodologies. Whether a novice to Databricks or looking to master advanced features, this guide will support professional growth.

Build Confidence In Your Data Platform With Schema Compatibility Reports That Span Systems And Domains Using Schemata

2022-09-12 · Data Engineering Podcast Listen

podcast_episode

by Ananth Packkildurai (Schemata) , Tobias Macey

Data Engineering Data Lake Data Management Dataflow DWH Kubernetes Modern Data Stack MongoDB MySQL Prefect postgresql

Summary Data engineering systems are complex and interconnected with myriad and often opaque chains of dependencies. As they scale, the problems of visibility and dependency management can increase at an exponential rate. In order to turn this into a tractable problem one approach is to define and enforce contracts between producers and consumers of data. Ananth Packildurai created Schemata as a way to make the creation of schema contracts a lightweight process, allowing the dependency chains to be constructed and evolved iteratively and integrating validation of changes into standard delivery systems. In this episode he shares the design of the project and how it fits into your development practices.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management

When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show!

Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan today to learn more about how Atlan’s active metadata platform is helping pioneering data teams like Postman, Plaid, WeWork & Unilever achieve extraordinary things with metadata and escape the chaos.

Prefect is the modern Dataflow Automation platform for the modern data stack, empowering data practitioners to build, run and monitor robust pipelines at scale. Guided by the principle that the orchestrator shouldn’t get in your way, Prefect is the only tool of its kind to offer the flexibility to write code as workflows. Prefect specializes in glueing together the disparate pieces of a pipeline, and integrating with modern distributed compute libraries to bring power where you need it, when you need it. Trusted by thousands of organizations and supported by over 20,000 community members, Prefect powers over 100MM business critical tasks a month. For more information on Prefect, visit dataengineeringpodcast.com/prefect.

Your host is Tobias Macey and today I’m interviewing Ananth Packkildurai about Schemata, a modelling framework for decentralised domain-driven ownership of data.

Interview

Introduction How did you get involved in the area of data management? Can you describe what Schemata is and the story behind it?

How does the garbage in/garbage out problem manifest in data warehouse/data lake environments?

What are the different places in a data system that schema definitions need to be established?

What are the different ways that schema management gets complicated across those various points of interaction?

Can you walk me through the end-to-end flow of how Schemata integrates with engineering practices across an organization’s data lifecycle?

How does the use of Schemata help with capturing and propagating context that would otherwise be lost or siloed?

How is the Schemata utility implemented?

What are some of the design and scope questions that you had to work through while developing Schemata?

What is the broad vision that you have for Schemata and its impact on data practices? How

A Reflection On Data Observability As It Reaches Broader Adoption

2022-09-05 · Data Engineering Podcast Listen

podcast_episode

by Barr Moses (Monte Carlo) , Tobias Macey , Lior Gavish (Monte Carlo)

API BigEye CDP Cloud Computing Data Engineering Data Lake Data Management Data Quality ETL/ELT Kubernetes MongoDB Monte Carlo +3 more

Summary Data observability is a product category that has seen massive growth and adoption in recent years. Monte Carlo is in the vanguard of companies who have been enabling data teams to observe and understand their complex data systems. In this episode founders Barr Moses and Lior Gavish rejoin the show to reflect on the evolution and adoption of data observability technologies and the capabilities that are being introduced as the broader ecosystem adopts the practices.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan today to learn more about how Atlan’s active metadata platform is helping pioneering data teams like Postman, Plaid, WeWork & Unilever achieve extraordinary things with metadata and escape the chaos. RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their state-of-the-art reverse ETL pipelines enable you to send enriched data to any cloud tool. Sign up free… or just get the free t-shirt for being a listener of the Data Engineering Podcast at dataengineeringpodcast.com/rudder. The only thing worse than having bad data is not knowing that you have it. With Bigeye’s data observability platform, if there is an issue with your data or data pipelines you’ll know right away and can get it fixed before the business is impacted. Bigeye let’s data teams measure, improve, and communicate the quality of your data to company stakeholders. With complete API access, a user-friendly interface, and automated yet flexible alerting, you’ve got everything you need to establish and maintain trust in your data. Go to dataengineeringpodcast.com/bigeye today to sign up and start trusting your analyses. Your host is Tobias Macey and today I’m interviewing Barr Moses and Lior Gavish about the state of the market for data observability and their own work at Monte Carlo

Interview

Introduction How did you get involved in the area of data management? Can you give the elevator pitch for Monte Carlo?

What are the notable changes in the Monte Carlo product and business since our last conversation in October 2020?

You were one of the early entrants in the market of data quality/data observability products. In your work to gain visibility and traction you invested substantially in content creation (blog posts, presentations, round table conversations, etc.). How would you summarize the focus of your initial efforts? Why do you think data observability has really taken off? A few years ago, the category barely existed – what’s changed? There’s a larger debate within

An Exploration Of What Data Automation Can Provide To Data Engineers And Ascend's Journey To Make It A Reality

2022-08-29 · Data Engineering Podcast Listen

podcast_episode

by Sean Knapp (Ascend) , Tobias Macey

API BigEye CDP Cloud Computing Dashboard Data Engineering Data Lake Data Management ETL/ELT Kubernetes MongoDB MySQL +2 more

Summary The dream of every engineer is to automate all of their tasks. For data engineers, this is a monumental undertaking. Orchestration engines are one step in that direction, but they are not a complete solution. In this episode Sean Knapp shares his views on what constitutes proper automation and the work that he and his team at Ascend are doing to help make it a reality.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan today to learn more about how Atlan’s active metadata platform is helping pioneering data teams like Postman, Plaid, WeWork & Unilever achieve extraordinary things with metadata and escape the chaos. RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their state-of-the-art reverse ETL pipelines enable you to send enriched data to any cloud tool. Sign up free… or just get the free t-shirt for being a listener of the Data Engineering Podcast at dataengineeringpodcast.com/rudder. The only thing worse than having bad data is not knowing that you have it. With Bigeye’s data observability platform, if there is an issue with your data or data pipelines you’ll know right away and can get it fixed before the business is impacted. Bigeye let’s data teams measure, improve, and communicate the quality of your data to company stakeholders. With complete API access, a user-friendly interface, and automated yet flexible alerting, you’ve got everything you need to establish and maintain trust in your data. Go to dataengineeringpodcast.com/bigeye today to sign up and start trusting your analyses. Your host is Tobias Macey and today I’m interviewing Sean Knapp about the role of data automation in building maintainable systems

Interview

Introduction How did you get involved in the area of data management? Can you describe what you mean by the term "data automation" and the assumptions that it includes? One of the perennial challenges of automation is that there are always steps that are resistant to being performed without human involvement. What are some of the tasks that you have found to be common problems in that sense? What are the different concerns that need to be included in a stack that supports fully automated data workflows? There was recently an interesting article suggesting that the "left-to-right" approach to data workflows is backwards. In your experience, what would be required to allow for triggering data processes based on the needs of the data consumers? (e.g. "make sure that this BI dashboard is up to date every 6 hours") What are the

Learning Tableau 2022 - Fifth Edition

2022-08-26 · O'Reilly Data Science Books O'Reilly Amazon

book

by Joshua N. Milligan

AI/ML Analytics DataViz Tableau data data-science data-science-tasks data-visualization

Learning Tableau 2022 is your comprehensive guide to mastering Tableau, one of the most popular tools for data visualization and analysis. Through this book, you will understand how to build impactful visualizations, create interactive dashboards, and tell compelling stories with data. With updated coverage of Tableau 2022's latest features, this book will take your data storytelling skills to the next level. What this Book will help me do Develop effective visualizations and dashboards to present complex data intuitively. Enhance data analysis with Tableau's advanced features like clustering, AI extensions, and Explain Data. Utilize calculations and parameters for tailoring and enriching analytics. Optimize workflows for data cleaning and preparation using Tableau Prep Builder. Confidently leverage Tableau for interlinking datasets and performing geospatial analysis. Author(s) Joshua N. Milligan, the author of Learning Tableau 2022, is a seasoned Tableau Zen Master. He has years of experience helping individuals and businesses transform their data into actionable insights through visualization and analysis. With a focus on clarity and practical applications, Joshua explains complex concepts in an approachable manner and equips readers with the skills to bring their ideas to life in Tableau. Who is it for? This book is ideal for business intelligence developers, data analysts, or any professional eager to improve their data visualization skills. Both beginners looking to understand Tableau from the ground up and intermediate users aiming to explore advanced Tableau techniques will find it valuable. A Tableau license and a thirst for learning are all you'll need to embark on this data visualization journey.

Pro Data Mashup for Power BI: Powering Up with Power Query and the M Language to Find, Load, and Transform Data

2022-08-25 · O'Reilly Data Science Books O'Reilly Amazon

book

by Adam Aspin

Analytics Azure Cloud Computing Data Modelling Power BI Python SQL business-intelligence data data-science microsoft-power-platform power-bi

This book provides all you need to find data from external sources and load and transform that data into Power BI where you can mine it for business insights and a competitive edge. This ranges from connecting to corporate databases such as Azure SQL and SQL Server to file-based data sources, and cloud- and web-based data sources. The book also explains the use of Direct Query and Live Connect to establish instant connections to databases and data warehouses and avoid loading data. The book provides detailed guidance on techniques for transforming inbound data into normalized data sets that are easy to query and analyze. This covers data cleansing, data modification, and standardization as well as merging source data into robust data structures that can feed into your data model. You will learn how to pivot and transpose data and extrapolate missing values as well as harness external programs such as R and Python into a Power Query data flow. You also will see how to handle errors in source data and extend basic data ingestion to create robust and parameterized data load and transformation processes. Everything in this book is aimed at helping you deliver compelling and interactive insight with remarkable ease using Power BI’s built-in data load and transformation tools. What You Will Learn Connect Power BI to a range of external data sources Prepare data from external sources for easy analysis in Power BI Cleanse data from duplicates, outliers, and other bad values Make live connections from which to refresh data quickly and easily Apply advanced techniques to interpolate missing data Who This Book Is For All Power BI users from beginners to super users. Any user of the world’s leading dashboarding toolcan leverage the techniques explained in this book to turbo-charge their data preparation skills and learn how a wide range of external data sources can be harnessed and loaded into Power BI to drive their analytics. No previous knowledge of working with data, databases, or external data sources is required—merely the need to find, transform, and load data into Power BI..

What is Positioning and Why is it Important? - Audio Blog

2022-08-24 · Secrets of Data Analytics Leaders Listen

podcast_episode

by Lawson Abinati

Marketing

In this article, Lawson Abinati lays out core principles for market positioning that apply across all industries, as well as to Business Intelligence (BI) professionals. Although many business intelligence (BI) managers see themselves as technologists first, unless they understand the soft skills of sales, marketing, and communication, they won't succeed professionally or make good on their organization's investments in BI. Published at: https://www.eckerson.com/articles/what-is-positioning-and-why-is-it-important

098 - Why Emilie Schario Wants You to Run Your Data Team Like a Product Team

2022-08-23 · Experiencing Data w/ Brian T. O’Neill (AI & data product management leadership—powered by UX design) Listen

podcast_episode

by Emilie Schario (Netlify) , Brian O’Neill (Designing for Analytics)

AI/ML Analytics Dashboard Power BI React

Today I’m chatting with Emilie Shario, a Data Strategist in Residence at Amplify Partners. Emilie thinks data teams should operate like product teams. But what led her to that conclusion, and how has she put the idea into practice? Emilie answers those questions and more, delving into what kind of pushback and hiccups someone can expect when switching from being data-driven to product-driven and sharing advice for data scientists and analytics leaders.

Highlights / Skip to:

Answering the question “whose job is it” (5:18) Understanding and solving problems instead of just building features people ask for (9:05) Emilie explains what Amplify Partners is and talks about her work experience and how it fuels her perspectives on data teams (11:04) Emilie and I talk about the definition of data product (13:00) Emilie talks about her approach to building and training a data team (14:40) We talk about UX designers and how they fit into Emilie’s data teams (18:40) Emilie talks about the book and blog “Storytelling with Data” (21:00) We discuss the push back you can expect when trying to switch a team from being data driven to being product driven (23:18) What hiccups can people expect when switching to a product driven model (30:36) Emilie’s advice for data scientists and and analyst leaders (35:50) Emilie explains what Locally Optimistic is (37:34)

Quotes from Today’s Episode “Our thesis is…we need to understand the problems we’re solving before we start building solutions, instead of just building the things people are asking for.” — Emilie (2:23)

“I’ve seen this approach of flipping the ask on its head—understanding the problem you’re trying to solve—work and be more successful at helping drive impact instead of just letting your data team fall into this widget builder service trap.” — Emilie (4:43)

“If your answer to any problem to me is, ‘That’s not my job,’ then I don’t want you working for me because that’s not what we’re here for. Your job is whatever the problem in front of you that needs to be solved.” — Emilie (7:14)

“I don’t care if you have all of the data in the world and the most talented machine learning engineers and you’ve got the ability to do the coolest new algorithm fancy thing. If it doesn’t drive business impact, it doesn’t matter.” — Emilie (7:52)

“Data is not just a thing that anyone can do. It’s not just about throwing numbers in a spreadsheet anymore. It’s about driving business impact. But part of how we drive business impact with data is making it accessible. And accessible isn’t just giving people the numbers, it’s also communicating with it effectively, and UX is a huge piece of how we do that.” — Emilie (19:57)

“There are no null choices in design. Someone is deciding what some other human—a customer, a client, an internal stakeholder—is going to use, whether it’s a React app, or a Power BI dashboard, or a spreadsheet dump, or whatever it is, right? There will be an experience that is created, whether it is intentionally created or not.” — Brian (20:28)

“People will think design is just putting in colors that match together, like, or spinning the color wheel and seeing what lands. You know, there’s so much more to it. And it is an expertise; it is a domain that you have to develop.” — Emilie (34:58)

Links Referenced: Blog post by Rifat Majumder storytellingwithdata.com Experiencing Data Episode 28 with Cole Nussbaumer Knaflic locallyoptimistic.com Twitter: @emilieschario

Understanding The Role Of The Chief Data Officer

2022-08-22 · Data Engineering Podcast Listen

podcast_episode

by Tracy Daniels (Truist Financial Corporation) , Tobias Macey

API BigEye CDP Cloud Computing Data Engineering Data Lake Data Management ETL/ELT Kubernetes MongoDB MySQL Data Streaming +1 more

Summary The position of Chief Data Officer (CDO) is relatively new in the business world and has not been universally adopted. As a result, not everyone understands what the responsibilities of the role are, when you need one, and how to hire for it. In this episode Tracy Daniels, CDO of Truist, shares her journey into the position, her responsibilities, and her relationship to the data professionals in her organization.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan today to learn more about how Atlan’s active metadata platform is helping pioneering data teams like Postman, Plaid, WeWork & Unilever achieve extraordinary things with metadata and escape the chaos. RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their state-of-the-art reverse ETL pipelines enable you to send enriched data to any cloud tool. Sign up free… or just get the free t-shirt for being a listener of the Data Engineering Podcast at dataengineeringpodcast.com/rudder. The only thing worse than having bad data is not knowing that you have it. With Bigeye’s data observability platform, if there is an issue with your data or data pipelines you’ll know right away and can get it fixed before the business is impacted. Bigeye let’s data teams measure, improve, and communicate the quality of your data to company stakeholders. With complete API access, a user-friendly interface, and automated yet flexible alerting, you’ve got everything you need to establish and maintain trust in your data. Go to dataengineeringpodcast.com/bigeye today to sign up and start trusting your analyses. Your host is Tobias Macey and today I’m interviewing Tracy Daniels about the role and responsibilities of the Chief Data Officer and how it is evolving along with the ecosystem

Interview

Introduction How did you get involved in the area of data management? Can you describe what your path to CDO of Truist has been?

As a CDO, what are your responsibilities and scope of influence?

Not every organization has an explicit position for the CDO. What are the factors that determine when that should be a distinct role?

What is the relationship and potential overlap with a CTO?

As the CDO of Truist, what are some of the projects/activities that are vying for your time and attention? Can you share the composition of your teams and how you think about organizational structure and integration for data professionals in your company? What are the industry and business trends that are having the greatest impact on your work as a

Exam Ref PL-300 Microsoft Power BI Data Analyst

2022-08-17 · O'Reilly Data Science Books O'Reilly Amazon

book

by Donis Marshall

Analytics Data Modelling DAX Microsoft Power BI business-intelligence data data-science microsoft-power-platform power-bi

Prepare for Microsoft Exam PL-300 and help demonstrate your real-world ability to deliver actionable insights with Power BI by leveraging available data and domain expertise; to provide meaningful business value through clear data visualizations; to enable others to perform self-service analytics, and to deploy and configure solutions for consumption. Designed for data analysts, business users, and other professionals, this Exam Ref focuses on the critical thinking and decision-making acumen needed for success at the Microsoft Certified: Power BI Data Analyst Associate level. Focus on the expertise measured by these objectives: Prepare the data Model the data Visualize and analyze the data Deploy and maintain assets This Microsoft Exam Ref: Organizes its coverage by exam objectives Features strategic, what-if scenarios to challenge you Assumes you are a data analyst, business intelligence professional, report creator, or other professional seeking to validate your skills and knowledge in analyzing data with Power BI About the Exam Exam PL-300 focuses on knowledge needed to get data from different data sources; clean, transform, and load data; design and develop data models; create model calculations with DAX; optimize model performance; create reports and dashboards; enhance reports for usability and storytelling; identify patterns and trends; and manage files, datasets, and workspaces. About Microsoft Certification Passing this exam fulfills your requirements for the Microsoft Certified: Power BI Data Analyst Associate certification, demonstrating your understanding of data repositories and data processes, and your skills in designing and building scalable data models, cleaning and transforming data, enabling advanced analytic capabilities to provide meaningful business value, and collaborating with key stakeholders to deliver relevant insights based on identified business requirements. See full details at: microsoft.com/learn ...

talk-data.com

BI

Activity Trend

Top Events

Top Speakers

How to Successfully Launch a SaaS Product with Boris Berenberg from Modus Create

saas #analytics #AWS #BI

Gain Visibility And Insight Into Your Supply Chains Through Operational Analytics Powered By Roambee

Make Data Lineage A Ubiquitous Part Of Your Work By Simplifying Its Implementation With Alvin

Succeeding as a SaaS Founder with Indus Khaitan at Quolum

saas #analytics #AWS #BI

Build A Common Understanding Of Your Data Reliability Rules With Soda Core and Soda Checks Language

Azure Data Engineering Cookbook - Second Edition

Shaping the Future of Remote Work with Dvir Shapira at Venn

saas #analytics #AWS #BI

Learning Microsoft Power BI

Location Intelligence Part III: Enabling Technologies - Audio Blog

Building A Shared Understanding Of Data Assets In A Business Through A Single Pane Of Glass With Workstream

Business Intelligence with Databricks SQL

Build Confidence In Your Data Platform With Schema Compatibility Reports That Span Systems And Domains Using Schemata

A Reflection On Data Observability As It Reaches Broader Adoption

An Exploration Of What Data Automation Can Provide To Data Engineers And Ascend's Journey To Make It A Reality

Learning Tableau 2022 - Fifth Edition

Pro Data Mashup for Power BI: Powering Up with Power Query and the M Language to Find, Load, and Transform Data

What is Positioning and Why is it Important? - Audio Blog

098 - Why Emilie Schario Wants You to Run Your Data Team Like a Product Team

Understanding The Role Of The Chief Data Officer

Exam Ref PL-300 Microsoft Power BI Data Analyst