talk-data.com talk-data.com

Topic

BI

Business Intelligence (BI)

data_visualization reporting analytics

1211

tagged

Activity Trend

111 peak/qtr
2020-Q1 2026-Q1

Activities

1211 activities · Newest first

Summary As more organizations are gaining experience with data management and incorporating analytics into their decision making, their next move is to adopt machine learning. In order to make those efforts sustainable, the core capability they need is for data scientists and analysts to be able to build and deploy features in a self service manner. As a result the feature store is becoming a required piece of the data platform. To fill that need Kevin Stumpf and the team at Tecton are building an enterprise feature store as a service. In this episode he explains how his experience building the Michelanagelo platform at Uber has informed the design and architecture of Tecton, how it integrates with your existing data systems, and the elements that are required for well engineered feature store.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Do you want to get better at Python? Now is an excellent time to take an online course. Whether you’re just learning Python or you’re looking for deep dives on topics like APIs, memory mangement, async and await, and more, our friends at Talk Python Training have a top-notch course for you. If you’re just getting started, be sure to check out the Python for Absolute Beginners course. It’s like the first year of computer science that you never took compressed into 10 fun hours of Python coding and problem solving. Go to dataengineeringpodcast.com/talkpython today and get 10% off the course that will help you find your next level. That’s dataengineeringpodcast.com/talkpython, and don’t forget to thank them for supporting the show. You invest so much in your data infrastructure – you simply can’t afford to settle for unreliable data. Fortunately, there’s hope: in the same way that New Relic, DataDog, and other Application Performance Management solutions ensure reliable software and keep application downtime at bay, Monte Carlo solves the costly problem of broken data pipelines. Monte Carlo’s end-to-end Data Observability Platform monitors and alerts for data issues across your data warehouses, data lakes, ETL, and business intelligence. The platform uses machine learning to infer and learn your data, proactively identify data issues, assess its impact through lineage, and notify those who need to know before it impacts the business. By empowering data teams with end-to-end data reliability, Monte Carlo helps organizations save time, increase revenue, and restore trust in their data. Visit dataengineeringpodcast.com/montecarlo today to request a demo and see how Monte Carlo delivers data observability across your data infrastructure. The first 25 will receive a free, limited edition Monte Carlo hat! Your host is Tobias Macey and today I’m interviewing Kevin Stumpf about Tecton and the role that the feature store plays in a modern MLOps platform

Interview

Introduction How did you get involved in the area of data management? Can you start by describing what you are building at Tecton and your motivation for starting the business? For anyone who isn’t familiar with the concept, what is an example of a feature? How do you define what a feature store is? What role does a feature store play in the overall lifecycle of a machine learning p

Beginning T-SQL: A Step-by-Step Approach

Get a performance-oriented introduction to the T-SQL language underlying the Microsoft SQL Server and Azure SQL database engines. This fourth edition is updated to include SQL Notebooks as well as up-to-date syntax and features for T-SQL on-premises and in the Azure cloud. Exercises and examples now include the WideWorldImporters database, the newest sample database from Microsoft for SQL Server. Also new in this edition is coverage of JSON from T-SQL, news about performance enhancements called Intelligent Query Processing, and an appendix on running SQL Server in a container on macOS or Linux. Beginning T-SQL starts you on the path to mastering T-SQL with an emphasis on best practices. Using the sound coding techniques taught in this book will lead to excellent performance in the queries that you write in your daily work. Important techniques such as windowing functions are covered to help you write fast-executing queries that solve real business problems.The book begins with an introduction to databases, normalization, and to setting up your learning environment. You will learn about the tools you need to use such as SQL Server Management Studio, Azure Data Studio, and SQL Notebooks. Each subsequent chapter teaches an aspect of T-SQL, building on the skills learned in previous chapters. Exercises in most chapters provide an opportunity for the hands-on practice that leads to true learning and distinguishes the competent professional. A stand-out feature in this book is that most chapters end with a Thinking About Performance section. These sections cover aspects of query performance relative to the content just presented, including the new Intelligent Query Processing features that make queries faster without changing code. They will help you avoid beginner mistakes by knowing about and thinking about performance from day 1. What You Will Learn Install a sandboxed SQL Server instance for learning Understand how relational databases are designed Create objects such as tables and stored procedures Query a SQL Server table Filter and order the results of a query Query and work with specialized data types such as XML and JSON Apply modern features such as window functions Choose correct techniques so that your queries perform well Who This Book Is For Anyone who wants to learn T-SQL from the beginning or improve their T-SQL skills; those who need T-SQL as an additional skill; and those who write queries such as application developers, database administrators, business intelligence developers, and data scientists. The book is also helpful for anyone who must retrieve data from a SQL Server database.

Summary Data governance is a term that encompasses a wide range of responsibilities, both technical and process oriented. One of the more complex aspects is that of access control to the data assets that an organization is responsible for managing. The team at Immuta has built a platform that aims to tackle that problem in a flexible and maintainable fashion so that data teams can easily integrate authorization, data masking, and privacy enhancing technologies into their data infrastructure. In this episode Steve Touw and Stephen Bailey share what they have built at Immuta, how it is implemented, and how it streamlines the workflow for everyone involved in working with sensitive data. If you are starting down the path of implementing a data governance strategy then this episode will provide a great overview of what is involved.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise. When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Feature flagging is a simple concept that enables you to ship faster, test in production, and do easy rollbacks without redeploying code. Teams using feature flags release new software with less risk, and release more often. ConfigCat is a feature flag service that lets you easily add flags to your Python code, and 9 other platforms. By adopting ConfigCat you and your manager can track and toggle your feature flags from their visual dashboard without redeploying any code or configuration, including granular targeting rules. You can roll out new features to a subset or your users for beta testing or canary deployments. With their simple API, clear documentation, and pricing that is independent of your team size you can get your first feature flags added in minutes without breaking the bank. Go to dataengineeringpodcast.com/configcat today to get 35% off any paid plan with code DATAENGINEERING or try out their free forever plan. You invest so much in your data infrastructure – you simply can’t afford to settle for unreliable data. Fortunately, there’s hope: in the same way that New Relic, DataDog, and other Application Performance Management solutions ensure reliable software and keep application downtime at bay, Monte Carlo solves the costly problem of broken data pipelines. Monte Carlo’s end-to-end Data Observability Platform monitors and alerts for data issues across your data warehouses, data lakes, ETL, and business intelligence. The platform uses machine learning to infer and learn your data, proactively identify data issues, assess its impact through lineage, and notify those who need to know before it impacts the business. By empowering data teams with end-to-end data reliability, Monte Carlo helps organizations save time, increase revenue, and restore trust in their data. Visit dataengineeringpodcast.com/montecarlo today to request a demo and see how Monte Carlo delivers data observability across your data inf

Implementing dbt at large enterprises
video
by Ryan Goltz (Chesapeake Energy) , Ben Singleton (JetBlue) , Amy Chen (Fishtown Analytics)

What does it look like to implement dbt at an organization where the number of employees is in the thousands? In this video we'll learn from the people who have answered exactly this question at organizations like JetBlue and Chesapeake Energy.

Speakers: Chris Holliday (Moderator), Senior VP, Client Management with Visual BI Amy Chen, Solutions Architect with Fishtown Analytics Ryan Goltz, Lead Data Strategist with Chesapeake Energy Ben Singleton, Director of Data Science & Analytics with JetBlue

Data dream teams: Netlify

Join us for a fireside chat with members of the Netlify data team to get an inside look at how their team gets work done. We'll learn how their data team is structured, some projects they've recently worked on, and what's coming up for the team!

Featured speakers: Emilie Schario, Senior Engineering Manager, Data and Business Intelligence with Netlify Laurie Voss, Senior Data Analyst with Netlify Francisco Lozano, Senior Analytics Engineer with Netlify Brian de la Motte, Senior Data Engineer with Netlify

Exam Ref PL-900 Microsoft Power Platform Fundamentals

Prepare for Microsoft Exam PL-900: Demonstrate your real-world knowledge of the fundamentals of Microsoft Power Platform, including its business value, core components, and the capabilities and advantages of Power BI, Power Apps, Power Automate, and Power Virtual Agents. Designed for business users, functional consultants, and other professionals, this Exam Ref focuses on the critical thinking and decision-making acumen needed for success at the Microsoft Certified: Power Platform Fundamentals level. Focus on the expertise measured by these objectives: Describe the business value of Power Platform Identify the Core Components of Power Platform Demonstrate the capabilities of Power BI Demonstrate the capabilities of Power Apps Demonstrate the capabilities of Power Automate Demonstrate the capabilities of Power Virtual Agents This Microsoft Exam Ref: Organizes its coverage by exam objectives Features strategic, what-if scenarios to challenge you Assumes you are a business user, functional consultant, or other professional who wants to improve productivity by automating business processes, analyzing data, creating simple app experiences, or developing business enhancements to Microsoft cloud solutions. About the Exam Exam PL-900 focuses on knowledge needed to describe the value of Power Platform services and of extending solutions; describe Power Platform administration and security; describe Common Data Service, Connectors, and AI Builder; identify common Power BI components; connect to and consume data; build basic dashboards with Power BI; identify common Power Apps components; build basic canvas and model-driven apps; describe Power Apps portals; identify common Power Automate components; build basic flows; describe Power Virtual Agents capabilities; and build and publish basic chatbots. About Microsoft Certification Passing this exam fulfills your requirements for the Microsoft Certified: Power Platform Fundamentals certification, demonstrating your understanding of Power Platforms core capabilitiesfrom business value and core product capabilities to building simple apps, connecting data sources, automating basic business processes, creating dashboards, and creating chatbots. With this certification, you can move on to earn specialist certifications covering more advanced aspects of Power Apps and Power BI, including Microsoft Certified: Power Platform App Maker Associate and Power Platform Data Analyst Associate. See full details at: microsoft.com/learn

Exam Ref PL-900 Microsoft Power Platform Fundamentals

Prepare for Microsoft Exam PL-900: Demonstrate your real-world knowledge of the fundamentals of Microsoft Power Platform, including its business value, core components, and the capabilities and advantages of Power BI, Power Apps, Power Automate, and Power Virtual Agents. Designed for business users, functional consultants, and other professionals, this Exam Ref focuses on the critical thinking and decision-making acumen needed for success at the Microsoft Certified: Power Platform Fundamentals level. Focus on the expertise measured by these objectives: Describe the business value of Power Platform Identify the Core Components of Power Platform Demonstrate the capabilities of Power BI Demonstrate the capabilities of Power Apps Demonstrate the capabilities of Power Automate Demonstrate the capabilities of Power Virtual Agents This Microsoft Exam Ref: Organizes its coverage by exam objectives Features strategic, what-if scenarios to challenge you Assumes you are a business user, functional consultant, or other professional who wants to improve productivity by automating business processes, analyzing data, creating simple app experiences, or developing business enhancements to Microsoft cloud solutions. About the Exam Exam PL-900 focuses on knowledge needed to describe the value of Power Platform services and of extending solutions; describe Power Platform administration and security; describe Common Data Service, Connectors, and AI Builder; identify common Power BI components; connect to and consume data; build basic dashboards with Power BI; identify common Power Apps components; build basic canvas and model-driven apps; describe Power Apps portals; identify common Power Automate components; build basic flows; describe Power Virtual Agents capabilities; and build and publish basic chatbots. About Microsoft Certification Passing this exam fulfills your requirements for the Microsoft Certified: Power Platform Fundamentals certification, demonstrating your understanding of Power Platforms core capabilitiesfrom business value and core product capabilities to building simple apps, connecting data sources, automating basic business processes, creating dashboards, and creating chatbots. With this certification, you can move on to earn specialist certifications covering more advanced aspects of Power Apps and Power BI, including Microsoft Certified: Power Platform App Maker Associate and Power Platform Data Analyst Associate. See full details at: microsoft.com/learn

The Future of the Data Warehouse

Almost all of us are using our data warehouse to power our business intelligence, what if we could use data warehouses do even more?

What if we could use data warehouses to power internal tooling, machine learning, behavioral analytics, or even customer-facing products?

Is this a future we're heading for, and if so, how do we get there?

In this video, you'll join a discussion with speakers: - Boris Jabes, CEO of Census - Jeremy Levy, CEO of Indicative - Arjun Narayan, CEO of Materialize - Jennifer Li, Partner at a16z as moderator

Learn more about the speakers and their companies at: https://www.getcensus.com/ https://www.indicative.com/ https://materialize.com/ https://a16z.com/

Learn more about dbt at: https://getdbt.com https://twitter.com/getdbt

Learn more about Fishtown Analytics at: https://fishtownanalytics.com https://twitter.com/fishtowndata https://www.linkedin.com/company/fishtown-analytics/

How to Scale Data Teams with Data Clinics and Balance Short-Term and Long-Term Projects

You’re in a state of flow, building out dbt models and then you get the dreaded message — "Quick question about this data..."

As a data team, how do you balance the roadmap work against those "quick" questions?

How do you prioritize all the work you need to do in the short-term (backlog items) while also working on your long-term projects (roadmap items)?

There are advantages to both backlog and roadmap items. How can data teams get the advantages of both?

In this video, Jacob Frackson will show how Data Clinics dedicated time put aside to work on these requests, can help your data team achieve this balance and empower self-serve along the way.

Data clinics have helped an organization: - Deliver 80% of Sprint Points - Answer up to 8 data questions per day - 10x weekly self-serve users on BI tools

Learn more about dbt at: https://getdbt.com https://twitter.com/getdbt

Learn more about Fishtown Analytics at: https://fishtownanalytics.com https://twitter.com/fishtowndata https://www.linkedin.com/company/fishtown-analytics/

Excited to share the last and final part of my three-episode series on BI data storytelling accelerator lessons learned. We'll be digging into the last but most exciting step in our BI Data Storytelling Mastery Framework, 'What you Draw'. Useless visualizations are everywhere! This episode will give you seven things to avoid and ways to fix them to ensure you bring your A-game when it comes to visualizing your storyboard. Tune in for knowledge bombs galore!

[05:53] Never skip the mock-up stage: 90% of us make the mistake of getting directly into drawing without a mock-up. Always make sure you a client signs off before you start on the mock-up and you have some sample data [09:34] Doing mock-up before getting a sign off on the storyboard and the analytics data dictionary. Without a sign-off, you should never go into the drawing board [11:44] Starting from scratch all the time: Nothing wastes more time than having to start from scratch all the time. To save on time, always treat every project you build like an asset. Every template you build as a team should be repurposed as a template for the users. For full show notes, and the links mentioned visit: https://bibrainz.com/podcast/72 Enjoyed the Show?  Please leave us a review on iTunes.

podcast_episode
by Mico Yuk (Data Storytelling Academy)

Tune in as I recap lessons learned from our latest accelerator class. This part two in the series of three podcasts, which deep dives into our BI data storytelling framework, where we tackle three parts of storytelling. 1) What You Say, 2) What You Write, and 3) What You Draw. I dive into some of the biggest mistakes people make when working with users and writing down their requirements, including why so many get confused with the different story parts and how to fix it. Tune in to this amazing BI masterclass!

[04:36]  - How to become a better master storyteller. [13:20]  - Changing your questions and process to create an effective data story. [14:25]  - Biggest mistakes people make when working with users to gather requirements and create an amazing data story. For full show notes, and the links mentioned visit: https://bibrainz.com/podcast/71 Enjoyed the Show?  Please leave us a review on iTunes.

Summary The first stage of every good pipeline is to perform data integration. With the increasing pace of change and the need for up to date analytics the need to integrate that data in near real time is growing. With the improvements and increased variety of options for streaming data engines and improved tools for change data capture it is possible for data teams to make that goal a reality. However, despite all of the tools and managed distributions of those streaming engines it is still a challenge to build a robust and reliable pipeline for streaming data integration, especially if you need to expose those capabilities to non-engineers. In this episode Ido Friedman, CTO of Equalum, explains how they have built a no-code platform to make integration of streaming data and change data capture feeds easier to manage. He discusses the challenges that are inherent in the current state of CDC technologies, how they have architected their system to integrate well with existing data platforms, and how to build an appropriate level of abstraction for such a complex problem domain. If you are struggling with streaming data integration and change data capture then this interview is definitely worth a listen.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise. When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Modern Data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days. Datafold helps Data teams gain visibility and confidence in the quality of their analytical data through data profiling, column-level lineage and intelligent anomaly detection. Datafold also helps automate regression testing of ETL code with its Data Diff feature that instantly shows how a change in ETL or BI code affects the produced data, both on a statistical level and down to individual rows and values. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Go to dataengineeringpodcast.com/datafold today to start a 30-day trial of Datafold. Once you sign up and create an alert in Datafold for your company data, they will send you a cool water flask. Are you bogged down by having to manually manage data access controls, repeatedly move and copy data, and create audit reports to prove compliance? How much time could you save if those tasks were automated across your cloud platforms? Immuta is an automated data governance solution that enables safe and easy data analytics in the cloud. Our comprehensive data-level security, auditing and de-identification features eliminate the need for time-consuming manual processes and our focus on data and compliance team collaboration empowers you to deliver quick and valuable data analytics on the most sensitive data to unloc

Seja bem-vindo a mais um episódio no podcast do Data Hackers! Dessa vez nós falamos sobre o passado, presente e futuro do BI, e como a XP Inc. vem usando essa tecnologia no seu dia a dia para tomada de decisão. Você irá conhecer como eles tem estruturado seus times para responder perguntas de negócio de forma mais rápida e eficiente.

What Is a Data Lake?

A revolution is occurring in data management regarding how data is collected, stored, processed, governed, managed, and provided to decision makers. The data lake is a popular approach that harnesses the power of big data and marries it with the agility of self-service. With this report, IT executives and data architects will focus on the technical aspects of building a data lake for your organization. Alex Gorelik from Facebook explains the requirements for building a successful data lake that business users can easily access whenever they have a need. You'll learn the phases of data lake maturity, common mistakes that lead to data swamps, and the importance of aligning data with your company's business strategy and gaining executive sponsorship. You'll explore: The ingredients of modern data lakes, such as the use of different ingestion methods for different data formats, and the importance of the three Vs: volume, variety, and velocity Building blocks of successful data lakes, including data ingestion, integration, persistence, data governance, and business intelligence and self-service analytics State-of-the-art data lake architectures offered by Amazon Web Services, Microsoft Azure, and Google Cloud

Pro Microsoft Power BI Administration: Creating a Consistent, Compliant, and Secure Corporate Platform for Business Intelligence

Manage Power BI within organizations. This book helps you systematize administration as Microsoft shifts Power BI from a self-service tool to an enterprise tool. You will learn best practices for many Power BI administrator tasks. And you will know how to manage artifacts such as reports, users, work spaces, apps, and gateways. The book also provides experience-based guidance on governance, licensing, and managing capacities. Good management includes policies and procedures that can be applied consistently and even automatically across a broad user base. This book provides a strategic road map for the creation and implementation of policies and procedures that support Power BI best practices in enterprises. Effective governance depends not only on good policies, but also on the active and timely monitoring of adherence to those policies. This book helps you evaluate the tools to automate and simplify the most common administrativeand monitoring tasks, freeing up administrators to provide greater value to the organization through better user training and awareness initiatives. What You Will Learn Recognize the roles and responsibilities of the Power BI administrator Manage users and their work spaces Know when to consider using Power BI Premium Govern your Power BI implementation and manage Power BI tenants Create an effective security strategy for Power BI in the enterprise Collaborate and share consistent views of the data across all users Follow a life cycle management strategy for rollout of dashboards and reports Create internal training resources backed up by accurate documentation Monitor Power BI to better understand risks and compliance manage costs, and track implementation Who This Book Is For IT professionals tasked with maintaining their corporate Power BI environments, Power BI administrators and power users interested in rolling out Power BI more widely in their organizations, and IT governance professionals tasked with ensuring adherence to policies and regulations

Summary One of the oldest aphorisms about data is "garbage in, garbage out", which is why the current boom in data quality solutions is no surprise. With the growth in projects, platforms, and services that aim to help you establish and maintain control of the health and reliability of your data pipelines it can be overwhelming to stay up to date with how they all compare. In this episode Egor Gryaznov, CTO of Bigeye, joins the show to explore the landscape of data quality companies, the general strategies that they are using, and what problems they solve. He also shares how his own product is designed and the challenges that are involved in building a system to help data engineers manage the complexity of a data platform. If you are wondering how to get better control of your own pipelines and the traps to avoid then this episode is definitely worth a listen.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise. When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Modern Data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days. Datafold helps Data teams gain visibility and confidence in the quality of their analytical data through data profiling, column-level lineage and intelligent anomaly detection. Datafold also helps automate regression testing of ETL code with its Data Diff feature that instantly shows how a change in ETL or BI code affects the produced data, both on a statistical level and down to individual rows and values. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Go to dataengineeringpodcast.com/datafold today to start a 30-day trial of Datafold. Once you sign up and create an alert in Datafold for your company data, they will send you a cool water flask. Are you bogged down by having to manually manage data access controls, repeatedly move and copy data, and create audit reports to prove compliance? How much time could you save if those tasks were automated across your cloud platforms? Immuta is an automated data governance solution that enables safe and easy data analytics in the cloud. Our comprehensive data-level security, auditing and de-identification features eliminate the need for time-consuming manual processes and our focus on data and compliance team collaboration empowers you to deliver quick and valuable data analytics on the most sensitive data to unlock the full potential of your cloud data platforms. Learn how we streamline and accelerate manual processes to help you derive real results from your data at dataengineeringpodcast.com/immuta. Your host is Tobias Macey and today I’m interviewing Egor Gryaznov about the state of the industry for data quality management and what he is building at B

Summary The core mission of data engineers is to provide the business with a way to ask and answer questions of their data. This often takes the form of business intelligence dashboards, machine learning models, or APIs on top of a cleaned and curated data set. Despite the rapid progression of impressive tools and products built to fulfill this mission, it is still an uphill battle to tie everything together into a cohesive and reliable platform. At Isima they decided to reimagine the entire ecosystem from the ground up and built a single unified platform to allow end-to-end self service workflows from data ingestion through to analysis. In this episode CEO and co-founder of Isima Darshan Rawal explains how the biOS platform is architected to enable ease of use, the challenges that were involved in building an entirely new system from scratch, and how it can integrate with the rest of your data platform to allow for incremental adoption. This was an interesting and contrarian take on the current state of the data management industry and is worth a listen to gain some additional perspective.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise. When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Modern Data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days. Datafold helps Data teams gain visibility and confidence in the quality of their analytical data through data profiling, column-level lineage and intelligent anomaly detection. Datafold also helps automate regression testing of ETL code with its Data Diff feature that instantly shows how a change in ETL or BI code affects the produced data, both on a statistical level and down to individual rows and values. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Follow go.datafold.com/dataengineeringpodcast to start a 30-day trial of Datafold. Once you sign up and create an alert in Datafold for your company data, they will send you a cool water flask. Are you bogged down by having to manually manage data access controls, repeatedly move and copy data, and create audit reports to prove compliance? How much time could you save if those tasks were automated across your cloud platforms? Immuta is an automated data governance solution that enables safe and easy data analytics in the cloud. Our comprehensive data-level security, auditing and de-identification features eliminate the need for time-consuming manual processes and our focus on data and compliance team collaboration empowers you to deliver quick and valuable data analytics on the most sensitive data to unlock the full potential of your cloud data platforms. Learn how we streamline and accelerate manual processes to help y

Summary A data catalog is a critical piece of infrastructure for any organization who wants to build analytics products, whether internal or external. While there are a number of platforms available for building that catalog, many of them are either difficult to deploy and integrate, or expensive to use at scale. In this episode Grant Seward explains how he built Tree Schema to be an easy to use and cost effective option for organizations to build their data catalogs. He also shares the internal architecture, how he approached the design to make it accessible and easy to use, and how it autodiscovers the schemas and metadata for your source systems.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise. When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Modern Data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days. Datafold helps Data teams gain visibility and confidence in the quality of their analytical data through data profiling, column-level lineage and intelligent anomaly detection. Datafold also helps automate regression testing of ETL code with its Data Diff feature that instantly shows how a change in ETL or BI code affects the produced data, both on a statistical level and down to individual rows and values. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Follow go.datafold.com/dataengineeringpodcast to start a 30-day trial of Datafold. Once you sign up and create an alert in Datafold for your company data, they will send you a cool water flask. Are you bogged down by having to manually manage data access controls, repeatedly move and copy data, and create audit reports to prove compliance? How much time could you save if those tasks were automated across your cloud platforms? Immuta is an automated data governance solution that enables safe and easy data analytics in the cloud. Our comprehensive data-level security, auditing and de-identification features eliminate the need for time-consuming manual processes and our focus on data and compliance team collaboration empowers you to deliver quick and valuable data analytics on the most sensitive data to unlock the full potential of your cloud data platforms. Learn how we streamline and accelerate manual processes to help you derive real results from your data at dataengineeringpodcast.com/immuta. Your host is Tobias Macey and today I’m interviewing Grant Seward about Tree Schema, a human friendly data catalog

Interview

Introduction How did you get involved in the area of data management? Can you start by giving an overview of what you have built at Tree Schema?

What was your motivation for creating it?

At what stage of maturity should a team or organization

Microsoft Power BI Quick Start Guide - Second Edition

"Microsoft Power BI Quick Start Guide" is your essential companion to mastering data visualization and analysis using Microsoft Power BI. This book offers step-by-step guidance on exploring data sources, creating effective dashboards, and leveraging advanced features like dataflows and AI insights to derive actionable intelligence quickly and effectively. What this Book will help me do Connect and import data from various sources using Power BI tools. Transform and cleanse data using the Power BI Query Editor and other techniques. Design optimized data models with relationships and DAX calculations. Create dynamic and visually compelling reports and dashboards. Implement row-level security and manage Power BI deployments within an organization. Author(s) Devin Knight, Erin Ostrowsky, and Mitchell Pearson are seasoned Power BI experts with extensive experience in business intelligence and data analytics. They bring a hands-on approach to teaching, focusing on practical skills and real-world applications. Their joint experience ensures a thorough and clear learning experience. Who is it for? This book is tailored for aspiring business intelligence professionals who wish to harness the power of Microsoft Power BI. If you have foundational knowledge of business intelligence concepts and are eager to apply them practically, this guide is for you. It's also ideal for individuals looking to upgrade their BI skill set and adopt modern data analysis tools. Whether a beginner or looking to enhance your current skills, you'll find tremendous value here.

Hands-On SQL Server 2019 Analysis Services

"Hands-On SQL Server 2019 Analysis Services" is a comprehensive guide to mastering data analysis using SQL Server Analysis Services (SSAS). This book provides you with step-by-step directions on creating and deploying tabular and multi-dimensional models, as well as using tools like MDX and DAX to query and analyze data. By the end, you'll be confident in designing effective data models for business analytics. What this Book will help me do Understand how to create and optimize both tabular and multi-dimensional models with SQL Server Analysis Services. Learn to use MDX and DAX to query and manipulate your data for enhanced insights. Integrate SSAS models with visualization tools like Excel and Power BI for effective decision-making. Implement robust security measures to safeguard data within your SSAS deployments. Master scaling and optimizing best practices to ensure high-performance analytical models. Author(s) Steven Hughes is a data analytics expert with extensive experience in business intelligence and SQL Server technologies. With years of practical experience in using SSAS and teaching data professionals, Steven has a knack for breaking down complex concepts into actionable knowledge. His approach to writing involves combining clear explanations with real-world examples. Who is it for? This book is intended for BI professionals, data analysts, and database developers who want to gain hands-on expertise with SQL Server 2019 Analysis Services. Ideal readers should have familiarity with database querying and a basic understanding of business intelligence tools like Power BI and Excel. It's perfect for those aiming to refine their skills in modeling and deploying robust analytics solutions.