talk-data.com talk-data.com

Topic

Analytics

data_analysis insights metrics

4552

tagged

Activity Trend

398 peak/qtr
2020-Q1 2026-Q1

Activities

4552 activities · Newest first

podcast_episode
by Mico Yuk (Data Storytelling Academy) , Trevor Tapscott (Wells Fargo)

Our focus for this inspiring episode of AOF is mindset, especially if you want to be a standout data analyst! I have brought one of my first ever followers and day ones! Trevor Tapscott is a VP and Analytics Consultant at Wells Fargo and has been in the game for 20 years and counting. We connected in 2008, when he was seeking help with a visualization tool called Xcelsius. In this knowledge bomb filled episode, Trevor shares his thoughts on what sets great data analysts apart, his framework 'VTAC' for impactful data analysis, and how to approach making an impression when applying for positions in data!   Trevor makes a number of really important points around understanding the business of data, what the role of an analyst truly is, and what goes into the mindset of a rock star in the space! We unpack indispensable parts of the relationships that make up the work of a data consultant, and our guest talks about getting to grips with a user's vision and their 'why'. Trevor also talks about the recent BIDS Accelerator class that he attended and the lessons he has already started implementing from that! Make sure to stay tuned to the end to hear about the amazing raffle that Trevor is hosting for listeners and how you can enter to win!    In this episode, you'll learn: [0:12:00] Trevor's basic definition of the role of a data analyst and what it means to be a rockstar! [0:15:02] The importance of understanding the business of data and the value of business insights.  [0:18:20] Unpacking Trevor's VTAC process: vision, translation, action, change and how to use it for delivery mode For full show notes, and the links mentioned visit: https://bibrainz.com/podcast/78    Enjoyed the Show?  Please leave us a review on iTunes.

Discussing Data, Innovation, and Creativity with Josh Linkner talks about using little creativity spurts to use for disruption. He sheds light on how organizations could embrace creativity and use little creative innovation to help stir for big breakthroughs. She shared lots of example of big little breakthroughs.

Bio: He has been the founder and CEO of five tech companies, which sold for a combined value of over $200 million. He’s the author of four books including the New York Times Bestsellers, Disciplined Dreaming, and The Road to Reinvention. This guy just loves starting and building companies. He’s the founding partner of Detroit Venture Partners and has been involved in the launch of over 100 startups. Today, Josh serves as Chairman and co-founder of Platypus Labs, innovation research, training, and consulting firm. He has twice been named the Ernst & Young Entrepreneur of the Year and is a recipient of the United States Presidential Champion of Change Award. Josh is also a passionate Detroiter, the father of four, a professional-level jazz guitarist, and has a slightly odd obsession for greasy pizza

Josh's Book: Big Little Breakthroughs https://amzn.to/3usFCLm

Josh's Recommendations: Think Like a Monk: Train Your Mind for Peace and Purpose Every Day https://amzn.to/3bzvyYh Range: Why Generalists Triumph in a Specialized World https://amzn.to/37K4PqW Think Again: The Power of Knowing What You Don't Know https://amzn.to/37MepcR

Discussion Timeline: TIMELINE

Some questions we covered: 1. Starter: Give your starter pitch 1 point that Big Little Breakthroughs points to: 2. Vishal briefly introduce Josh 3. What are you seeing the role of innovation in the middle of firefight[pandemic] 4. What is the state of enterprise investments to promote innovation? 5. What are some easy to fix bottlenecks to get enterprises to keep on innovating 6. What are some misconceptions about innovation and its adoption 7. Explain your journey to your current role? 8. Could you share something about your current role? 9. What does your company do? 10. Explain your journey to this book? 11. Why write this book? 12. Why are you so passionate about helping everyday people become everyday innovators? 13. What's the most misunderstood thing around human creativity? 14. What's your favorite brainstorming technique? 15. From doing the research for your new book, Big Little Breakthroughs, what surprised you the most? 16. What are 1-3 best practices that you think are the key to success in your journey? 17. Do you have any favorite read? 18. As a closing remark, what would you like to tell our audience?

About TAO.ai[Sponsor]: TAO is building the World's largest and AI-powered Skills Universe and Community powering career development platform empowering some of the World's largest communities/organizations. Learn more at https://TAO.ai

About FutureOfData: FutureOfData takes you on the journey with leaders, experts, academics, authors, and change-makers designing the future of data, analytics, and insights.

About AnalyticsWeek.com FutureOfData is managed by AnalyticsWeek.com, a #FutureOfData Leadership community of Organization architects and leaders.

Sponsorship / Guest Request should be directed to [email protected]

Keywords:

FutureofData #Work2.0 #Work2dot0 #Leadership #Growth #Org2dot0 #Work2 #Org2

podcast_episode
by Mico Yuk (Data Storytelling Academy) , Kimberly Herrington (Buffalo-based healthcare company; Buffalo Business Intelligence)
BI

Finding your dream job in the world of data and analytics might not be as hard you think! Our guest today, Kimberly Herrington stands as a testament to this idea and she joins us on AOF to talk about how you can go about identifying and capturing your ideal position in this fascinating space. Kimberly is a data journalist at a large healthcare company in Buffalo, New York, and is also the founder of Buffalo Business Intelligence. She is also a graduate of our BI Data Storytelling Accelerator Workshop and in 2020, she was voted Data Literacy Advocate of the Year. Tune into this episode and you will see why! I am so excited to share this great chat with you, in which we hear about Kimberly's personal method for landing that dream job that you might have thought was only a wish. In our conversation, Kimberly does an amazing job of unpacking some parts of the work she does and the difference between some of the job titles in the space. We also talk about the great learnings she garnered from our workshop, and why you should sign up immediately! I learned so much from today's guest, and you will too, as we run through common mistakes that are made in the pursuit of a great job, how to leverage working at home to your advantage, and the vital importance of community! Kimberly is like a fairy of goodness, and believe me AOF, you will see exactly why I say this!   In this episode, you'll learn: [0:04:13] Kimberly's past work as a professional improv comedian! [0:09:12] Mapping the trajectory of an ideal career in data storytelling and how to start this journey.  [0:18:19] Differentiating between a data journalist, data storyteller, and analytics translator.  [0:24:43] Investing the time and money into valuable education and qualifications. For full show notes, and the links mentioned visit: https://bibrainz.com/podcast/77    Enjoyed the Show?  Please leave us a review on iTunes.

Stakeholders often miss key insights that can be provided by data to drive action forward, due to the way the data is presented and communicated to them. My guest today believes that data storytelling is key to resolving this common pain point. Kam Lee, a BI Data Storytelling Mastery alumn and graduate who has used our framework to surface over $100M for the fintech company he works with! Kam is the Chief Data Scientist at his company Finetooth Analytics (specializing in marketing analytics), working with top marketers like Russell Brunson from Clickfunnels! Our data masterclass with Kam today delves deep into how he used our BI Data Storytelling Methodology and framework to straddle data engineering, data science, and storytelling. Kam shares game-changing concepts from the course and how he has used them to connect to stakeholders, influence their actions, and overcoming what he calls 'emotional responses' to data. Tune in to this knowledge bomb-filled episode! In this episode, you'll learn: [0:12:20] Three buckets Kam uses to organize the data storytelling process. [0:14:56] The challenge of dealing with stakeholders who respond emotionally to data. [0:26:48] Whether to start with the storyboarding or the analytics data dictionary first. [0:28:19] The difference between KPIs, trends, and actions. For full show notes, and the links mentioned visit: https://bibrainz.com/podcast/76    Enjoyed the Show?  Please leave us a review on iTunes.

Summary The process of building and deploying machine learning projects requires a staggering number of systems and stakeholders to work in concert. In this episode Yaron Haviv, co-founder of Iguazio, discusses the complexities inherent to the process, as well as how he has worked to democratize the technologies necessary to make machine learning operations maintainable.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Modern Data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days. Datafold helps Data teams gain visibility and confidence in the quality of their analytical data through data profiling, column-level lineage and intelligent anomaly detection. Datafold also helps automate regression testing of ETL code with its Data Diff feature that instantly shows how a change in ETL or BI code affects the produced data, both on a statistical level and down to individual rows and values. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Go to dataengineeringpodcast.com/datafold today to start a 30-day trial of Datafold. Once you sign up and create an alert in Datafold for your company data, they will send you a cool water flask. RudderStack’s smart customer data pipeline is warehouse-first. It builds your customer data warehouse and your identity graph on your data warehouse, with support for Snowflake, Google BigQuery, Amazon Redshift, and more. Their SDKs and plugins make event streaming easy, and their integrations with cloud applications like Salesforce and ZenDesk help you go beyond event streaming. With RudderStack you can use all of your customer data to answer more difficult questions and then send those insights to your whole customer data stack. Sign up free at dataengineeringpodcast.com/rudder today. Your host is Tobias Macey and today I’m interviewing Yaron Haviv about Iguazio, a platform for end to end automation of machine learning applications using MLOps principles.

Interview

Introduction How did you get involved in the area of data science & analytics? Can you start by giving an overview of what Iguazio is and the story of how it got started? How would you characterize your target or typical customer? What are the biggest challenges that you see around building production grade workflows for machine learning?

How does Iguazio help to address those complexities?

For customers who have already invested in the technical and organizational capacity for data science and data engineering, how does Iguazio integrate with their environments? What are the responsibilities of a data engineer throughout the different stages of the lifecycle for a machine learning application? Can you describe how the Iguazio platform is architected?

How has the design of the platform evolved since you first began working on it? How have the industry best practices around bringing machine learning to production changed?

How do you approach testing/validation of machine learning applications and releasing them to production environments? (e.g. CI/CD) Once a model is in

Did you know that there are 3 types different types of data scientists? A for analyst, B for builder, and C for consultant - we discuss the key differences between each one and some learning strategies you can use to become A, B, or C.

We talked about:

Inspirations for memes  Danny's background and career journey The ABCs of data science - the story behind the idea Data scientist type A - Analyst  Skills, responsibilities, and background for type A Transitioning from data analytics to type A data scientist (that's the path Danny took) How can we become more curious? Data scientist B - Builder  Responsibilities and background for type B Transitioning from type A to type B Most important skills for type B Why you have to learn more about cloud  Data scientist type C - consultant Skills, responsibilities, and background for type C Growing into the C type Ideal data science team Important business metrics Getting a job - easier as type A or type B? Looking for a job without experience Two approaches for job search: "apply everywhere" and "apply nowhere" Are bootcamps useful? Learning path to becoming a data scientist Danny's data apprenticeship program and "Serious SQL" course  Why SQL is the most important skill R vs Python Importance of Masters and PhD

Links:

Danny's profile on LinkedIn: https://linkedin.com/in/datawithdanny Danny's course: https://datawithdanny.com/ Trailer: https://www.linkedin.com/posts/datawithdanny_datascientist-data-activity-6767988552811847680-GzUK/ Technical debt paper: https://proceedings.neurips.cc/paper/2015/hash/86df7dcfd896fcaf2674f757a2463eba-Abstract.html

Join DataTalks.Club: https://datatalks.club/slack.html

Snowflake Cookbook

The "Snowflake Cookbook" is your guide to mastering Snowflake's unique cloud-centric architecture. This book provides detailed recipes for building modern data pipelines, configuring efficient virtual warehouses, ensuring robust data protection, and optimizing cost-performance-all while leveraging Snowflake's distinctive features such as data sharing and time travel. What this Book will help me do Set up and configure Snowflake's architecture for optimized performance and cost efficiency. Design and implement robust data pipelines using SQL and Snowflake's specialized features. Secure, manage, and share data efficiently with built-in Snowflake capabilities. Apply performance tuning techniques to enhance your Snowflake implementations. Extend Snowflake's functionality with tools like Spark Connector for advanced workflows. Author(s) Hamid Mahmood Qureshi and Hammad Sharif are both seasoned experts in data warehousing and cloud computing technologies. With extensive experience implementing analytics solutions, they bring a hands-on approach to teaching Snowflake. They are ardent proponents of empowering readers towards creating effective and scalable data solutions. Who is it for? This book is perfect for data warehouse developers, data analysts, cloud architects, and anyone managing cloud data solutions. If you're familiar with basic database concepts or just stepping into Snowflake, you'll find practical guidance here to deepen your understanding and functional expertise in cloud data warehousing.

Send us a text Want to be featured as a guest on Making Data Simple? Reach out to us at [[email protected]] and tell us why you should be next.

Abstract Hosted by Al Martin, VP, IBM Expert Services Delivery, Making Data Simple provides the latest thinking on big data, A.I., and the implications for the enterprise from a range of experts.

This week on Making Data Simple, we have Jeff Richardson. Jeff has a history of database data, information management, and he is now the Chief Information Officer at Accelerated Enrollment Solutions. Jeff was also at Bentley Systems for 17 ½ years as Chief Data Officer.

Show Notes 5:41 – What does it mean to be a technology nerd? 6:53 – What technologies as a CDO or CIO are you addressing on a regular bases? 13:04 – How are you going to tackle the culture and the politics? 17:25 – Is it Cloud or Hybrid to drive the new data lake? 24:03 – What is your plan to get to desired state? 27:04 – Does AI have a role in your new position? 31:44 – Fighting the Infodemic what made you write this article?  Fighting the Infodemic Jeff’s podcast list Analytics on Fire Dissecting popular IT Nerds The Data Chief Data Crunch 

Connect with the Team Producer Kate Brown - LinkedIn. Producer Steve Templeton - LinkedIn. Host Al Martin - LinkedIn and Twitter.  Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

podcast_episode
by Kevin Greene (Logi Analytics) , Mico Yuk (Data Storytelling Academy)

Here on the AOF podcast, I have often spoken about the issue of user adoption and the slow progress we have all been experiencing for a long time. Our guest today has so much to offer in relation to our approach to this challenge! I speak to Kevin Greene, the CEO of Logi Analytics, and we delve deep into why the mentality of product leaders is so valuable right now!  Kevin has a brilliant mind for the field of analytics, sharing how and where to get started with these parts of your leadership and partnerships. Kevin has been in the analytics game for over 20 years and was actually an investor at Logi before he took the reins! This episode is jam-packed with actionable knowledge bombs that you are definitely not going to want to miss, so let's get to it AOF!   In this episode, you'll learn: [15:41] Why Kevin believes enterprises have to start thinking like product leaders.  [22:47] Approaches to building strategic applications with a smaller team. [25:11] Finding the right service partners and the critical nature of owning the application. [28:59] Ensuring success by scheduling time with business leaders, prioritizing, and using collected data. [44:04] Operationalizing your application and bringing it to your end-users. For full show notes, and the links mentioned visit: https://bibrainz.com/podcast/75    Enjoyed the Show?  Please leave us a review on iTunes.

podcast_episode
by Val Kroll , Julie Hoyer , Tim Wilson (Analytics Power Hour - Columbus (OH) , Moe Kiss (Canva) , Michael Helbling (Search Discovery)

As analysts, we often have unique knowledge of the data, specialized responsibilities for data-related deliverables, and an expectation that we'll be at the ready to dive into high priority requests. What happens, then, when we're out of the office, be that for a planned vacation, for an unexpected illness, or for bringing a new human being into the world? And, what happens if it's that last one and you're also the most beloved co-host of the top-rated explicit analytics podcast? Tune in to this episode to find out, as we used Moe in a dual role of being both a co-host and a guest (again!) to explore the challenges (and opportunities!) of being out of the office. For complete show notes, including links to items mentioned in this episode and a transcript of the show, visit the show page.

podcast_episode
by Mico Yuk (Data Storytelling Academy)
BI

Welcome back AOF! I'm so glad to be back after taking some time off over the holidays and having an amazing social media detox! I'm excited to kick-off Season 6. Today's show is going to give you some super important updates and changes coming to the podcast, our academy platforms and groups, and everything else that matters to you. I will be riding solo on the podcast again this year, but you can still expect to hear more amazing guests who will deliver game-changing business intelligence, data, and analytics master classes. Also, tune in to be first to hear about our big name change and the new dates for our Accelerator live workshop (yes finally)! Enjoy and let's kick off this season in high gear!   In this episode, you'll learn: [01:01]  - A fresh start and some exciting changes for the AOF podcast! A look at the reasons that I took a break from social media at the end of 2020, and four updates to the podcast: guests, giveaways, new sponsors, and our exciting #1 ranking for the 3rd year in a row! [08:12]  - Why we are removing the 'BI' from our name and changing it all to Data Storytelling Method! New and exciting ways to engage with the updated content and platform. [14:06] - The changes to the dates for the Accelerator; it is now May 18-20 2021 instead of in February, and we expect it to sell out again. [16:48] - My surprise appearance on CNN in Dec! For full show notes, and the links mentioned visit: https://bibrainz.com/podcast/73   Enjoyed the Show?  Please leave us a review on iTunes.

Data Pipelines Pocket Reference

Data pipelines are the foundation for success in data analytics. Moving data from numerous diverse sources and transforming it to provide context is the difference between having data and actually gaining value from it. This pocket reference defines data pipelines and explains how they work in today's modern data stack. You'll learn common considerations and key decision points when implementing pipelines, such as batch versus streaming data ingestion and build versus buy. This book addresses the most common decisions made by data professionals and discusses foundational concepts that apply to open source frameworks, commercial products, and homegrown solutions. You'll learn: What a data pipeline is and how it works How data is moved and processed on modern data infrastructure, including cloud platforms Common tools and products used by data engineers to build pipelines How pipelines support analytics and reporting needs Considerations for pipeline maintenance, testing, and alerting

podcast_episode
by Sean Hewitt (Eckerson Group) , Joe Hilleary (Eckerson Group) , Dave Wells (Eckerson Group) , Kevin Petrie (Eckerson Group) , Andrew Sohn (Crawford & Company)

Every December, Eckerson Group fulfills its industry obligation to summon its collective knowledge and insights about data and analytics and speculate about what might happen in the coming year. The diversity of predictions from our research analysts and consultants exemplifies the breadth of their research and consulting experiences and the depth of their thinking. Predictions from Kevin Petrie, Joe Hilleary, Dave Wells, Andrew Sohn, and Sean Hewitt range from data and privacy governance to artificial intelligence with stops along the way for DataOps, data observability, data ethics, cloud platforms, and intelligent robotic automation.

Summary With all of the tools and services available for building a data platform it can be difficult to separate the signal from the noise. One of the best ways to get a true understanding of how a technology works in practice is to hear from people who are running it in production. In this episode Zeeshan Qureshi and Michelle Ark share their experiences using DBT to manage the data warehouse for Shopify. They explain how the structured the project to allow for multiple teams to collaborate in a scalable manner, the additional tooling that they added to address the edge cases that they have run into, and the optimizations that they baked into their continuous integration process to provide fast feedback and reduce costs. This is a great conversation about the lessons learned from real world use of a specific technology and how well it lives up to its promises.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Modern Data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days. Datafold helps Data teams gain visibility and confidence in the quality of their analytical data through data profiling, column-level lineage and intelligent anomaly detection. Datafold also helps automate regression testing of ETL code with its Data Diff feature that instantly shows how a change in ETL or BI code affects the produced data, both on a statistical level and down to individual rows and values. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Go to dataengineeringpodcast.com/datafold today to start a 30-day trial of Datafold. Once you sign up and create an alert in Datafold for your company data, they will send you a cool water flask. Today’s episode of Data Engineering Podcast is sponsored by Datadog, the monitoring and analytics platform for cloud-scale infrastructure and applications. Datadog’s machine-learning based alerts, customizable dashboards, and 400+ vendor-backed integrations makes it easy to unify disparate data sources and pivot between correlated metrics and events for faster troubleshooting. By combining metrics, traces, and logs in one place, you can easily improve your application performance. Try Datadog free by starting a your 14-day trial and receive a free t-shirt once you install the agent. Go to dataengineeringpodcast.com/datadog today see how you can unify your monitoring today. Your host is Tobias Macey and today I’m interviewing Zeeshan Qureshi and Michelle Ark about how Shopify is building their production data warehouse platform with DBT

Interview

Introduction How did you get involved in the area of data management? Can you start by giving an overview of what the Shopify platform is? What kinds of data sources are you working with?

Can you share some examples of the types of analysis, decisions, and products that you are building with the data that you manage? How have you structured your data teams to be able to deliver those projects?

What are the systems that you have in place, technological or otherwise, to allow you to support the needs of

Intelligent Data Analytics for Terror Threat Prediction

Intelligent data analytics for terror threat prediction is an emerging field of research at the intersection of information science and computer science, bringing with it a new era of tremendous opportunities and challenges due to plenty of easily available criminal data for further analysis. This book provides innovative insights that will help obtain interventions to undertake emerging dynamic scenarios of criminal activities. Furthermore, it presents emerging issues, challenges and management strategies in public safety and crime control development across various domains. The book will play a vital role in improvising human life to a great extent. Researchers and practitioners working in the fields of data mining, machine learning and artificial intelligence will greatly benefit from this book, which will be a good addition to the state-of-the-art approaches collected for intelligent data analytics. It will also be very beneficial for those who are new to the field and need to quickly become acquainted with the best performing methods. With this book they will be able to compare different approaches and carry forward their research in the most important areas of this field, which has a direct impact on the betterment of human life by maintaining the security of our society. No other book is currently on the market which provides such a good collection of state-of-the-art methods for intelligent data analytics-based models for terror threat prediction, as intelligent data analytics is a newly emerging field and research in data mining and machine learning is still in the early stage of development.

Summary Collecting and processing metrics for monitoring use cases is an interesting data problem. It is eminently possible to generate millions or billions of data points per second, the information needs to be propagated to a central location, processed, and analyzed in timeframes on the order of milliseconds or single-digit seconds, and the consumers of the data need to be able to query the information quickly and flexibly. As the systems that we build continue to grow in scale and complexity the need for reliable and manageable monitoring platforms increases proportionately. In this episode Rob Skillington, CTO of Chronosphere, shares his experiences building metrics systems that provide observability to companies that are operating at extreme scale. He describes how the M3DB storage engine is designed to manage the pressures of a critical system component, the inherent complexities of working with telemetry data, and the motivating factors that are contributing to the growing need for flexibility in querying the collected metrics. This is a fascinating conversation about an area of data management that is often taken for granted.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Modern Data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days. Datafold helps Data teams gain visibility and confidence in the quality of their analytical data through data profiling, column-level lineage and intelligent anomaly detection. Datafold also helps automate regression testing of ETL code with its Data Diff feature that instantly shows how a change in ETL or BI code affects the produced data, both on a statistical level and down to individual rows and values. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Go to dataengineeringpodcast.com/datafold today to start a 30-day trial of Datafold. Once you sign up and create an alert in Datafold for your company data, they will send you a cool water flask. Today’s episode of Data Engineering Podcast is sponsored by Datadog, the monitoring and analytics platform for cloud-scale infrastructure and applications. Datadog’s machine-learning based alerts, customizable dashboards, and 400+ vendor-backed integrations makes it easy to unify disparate data sources and pivot between correlated metrics and events for faster troubleshooting. By combining metrics, traces, and logs in one place, you can easily improve your application performance. Try Datadog free by starting a your 14-day trial and receive a free t-shirt once you install the agent. Go to dataengineeringpodcast.com/datadog today see how you can unify your monitoring today. Your host is Tobias Macey and today I’m interviewing Rob Skillington about Chronosphere, a scalable, reliable and customizable monitoring-as-a-service purpose built for cloud-native applications.

Interview

Introduction How did you get involved in the area of data management? Can you start by describing what you are building at Chronosphere and your motivation for turning it into a business? What are the

Cullah is a Milwaukee, Wisconsin, based independent musician who has released an album every year on his birthday — April 27 — for the past 14 years. For his 30th birthday in 2021, he’ll be releasing his 15th album, ½, as a testament to the fact that he’s released an album every year for half of his life. The son of a classically trained jazz musician and a farm-raised mathematician and computer scientist, “Cullah was brought up with the awareness of the balance between the creative and logical aspects of natural law.” As such, he studied Computer Engineering at Marquette University and Music and Media Technologies at Trinity College Dublin, working briefly as a web developer at a web design firm before turning to music full time. He’s been described as, “One part Jack White, one part Dan Auerbach, and one part Jeff Buckley,” and he was kind enough to carve out some time to discuss music data and music analytics from the artist perspective. Connect With Cullah Here If you want more free insights, follow our podcast, our blog, and our socials. If you're an artist with a free Chartmetric account, sign up for the artist plan, made exclusively for you, here. If you're new to Chartmetric, follow the URL above after creating a free account here.

IBM Integrated Synchronization: Incremental Updates Unleashed

The IBM® Db2® Analytics Accelerator (Accelerator) is a logical extension of Db2 for IBM z/OS® that provides a high-speed query engine that efficiently and cost-effectively runs analytics workloads. The Accelerator is an integrated back-end component of Db2 for z/OS. Together, they provide a hybrid workload-optimized database management system that seamlessly manages queries that are found in transactional workloads to Db2 for z/OS and queries that are found in analytics applications to Accelerator. Each query runs in its optimal environment for maximum speed and cost efficiency. The incremental update function of Db2 Analytics Accelerator for z/OS updates Accelerator-shadow tables continually. Changes to the data in original Db2 for z/OS tables are propagated to the corresponding target tables with a high frequency and a brief delay. Query results from Accelerator are always extracted from recent, close-to-real-time data. An incremental update capability that is called IBM InfoSphere® Change Data Capture (InfoSphere CDC) is provided by IBM InfoSphere Data Replication for z/OS up to Db2 Analytics Accelerator V7.5. Since then, an extra new replication protocol between Db2 for z/OS and Accelerator that is called IBM Integrated Synchronization was introduced. With Db2 Analytics Accelerator V7.5, customers can choose which one to use. IBM Integrated Synchronization is a built-in product feature that you use to set up incremental updates. It does not require InfoSphere CDC, which is bundled with IBM Db2 Analytics Accelerator. In addition, IBM Integrated Synchronization has more advantages: Simplified administration, packaging, upgrades, and support. These items are managed as part of the Db2 for z/OS maintenance stream. Updates are processed quickly. Reduced CPU consumption on the mainframe due to a streamlined, optimized design where most of the processing is done on the Accelerator. This situation provides reduced latency. Uses IBM Z® Integrated Information Processor (zIIP) on Db2 for z/OS, which leads to reduced CPU costs on IBM Z and better overall performance data, such as throughput and synchronized rows per second. On z/OS, the workload to capture the table changes was reduced, and the remainder can be handled by zIIPs. With the introduction of an enterprise-grade Hybrid Transactional Analytics Processing (HTAP) enabler that is also known as the Wait for Data protocol, the integrated low latency protocol is now enabled to support more analytical queries running against the latest committed data. IBM Db2 for z/OS Data Gate simplifies delivering data from IBM Db2 for z/OS to IBM Cloud® Pak® for Data for direct access by new applications. It uses the special-purpose integrated synchronization protocol to maintain data currency with low latency between Db2 for z/OS and dedicated target databases on IBM Cloud Pak for Data.

Summary Businesses often need to be able to ingest data from their customers in order to power the services that they provide. For each new source that they need to integrate with it is another custom set of ETL tasks that they need to maintain. In order to reduce the friction involved in supporting new data transformations David Molot and Hassan Syyid built the Hotlue platform. In this episode they describe the data integration challenges facing many B2B companies, how their work on the Hotglue platform simplifies their efforts, and how they have designed the platform to make these ETL workloads embeddable and self service for end users.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Modern Data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days. Datafold helps Data teams gain visibility and confidence in the quality of their analytical data through data profiling, column-level lineage and intelligent anomaly detection. Datafold also helps automate regression testing of ETL code with its Data Diff feature that instantly shows how a change in ETL or BI code affects the produced data, both on a statistical level and down to individual rows and values. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Go to dataengineeringpodcast.com/datafold today to start a 30-day trial of Datafold. Once you sign up and create an alert in Datafold for your company data, they will send you a cool water flask. This episode of Data Engineering Podcast is sponsored by Datadog, a unified monitoring and analytics platform built for developers, IT operations teams, and businesses in the cloud age. Datadog provides customizable dashboards, log management, and machine-learning-based alerts in one fully-integrated platform so you can seamlessly navigate, pinpoint, and resolve performance issues in context. Monitor all your databases, cloud services, containers, and serverless functions in one place with Datadog’s 400+ vendor-backed integrations. If an outage occurs, Datadog provides seamless navigation between your logs, infrastructure metrics, and application traces in just a few clicks to minimize downtime. Try it yourself today by starting a free 14-day trial and receive a Datadog t-shirt after installing the agent. Go to dataengineeringpodcast.com/datadog today to see how you can enhance visibility into your stack with Datadog. Your host is Tobias Macey and today I’m interviewing David Molot and Hassan Syyid about Hotglue, an embeddable data integration tool for B2B developers built on the Python ecosystem.

Interview

Introduction How did you get involved in the area of data management? Can you start by describing what you are building at Hotglue?

What was your motivation for starting a business to address this particular problem?

Who is the target user of Hotglue and what are their biggest data problems?

What are the types and sources of data that they are likely to be working with? How are they currently handling solutions for those problems? How does the introduction of Hotglue simplify or improve their work?

What is involved in getting Hotglue integrated into a given customer’s environment? How is Hotglue itself implemented?

How has the design or goals of the platform evolved since you first began building it? What were some of the initial assumptions that you had at the outset and how well have they held up as you progressed?

Once a customer has set up Hotglue what is their workflow for building and executing an ETL workflow?

What are their options for working with sources that aren’t supported out of the box?

What are the biggest design and implementation challenges that you are facing given the need for your product to be embedded in customer platforms and exposed to their end users? What are some of the most interesting, innovative, or unexpected ways that you have seen Hotglue used? What are the most interesting, unexpected, or challenging lessons that you have learned while building Hotglue? When is Hotglue the wrong choice? What do you have planned for the future of the product?

Contact Info

David

@davidmolot on Twitter LinkedIn

Hassan

hsyyid on GitHub LinkedIn

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don’t forget to check out our other show, Podcast.init to learn about the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat

Links

Hotglue Python

The Python Podcast.init

B2B == Business to Business Meltano

Podcast Episode

Airbyte Singer

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast

The Data Mirage

The Data Mirage: Why Companies Fail to Actually Use Their Data is a business book for executives and leaders who want to unlock more insights from their data and make better decisions. The importance of data doesn’t need an introduction or a fancy pitch deck. Data plays a critical role in helping companies to better understand their users, beat out their competitors, and breakthrough their growth targets. However, despite significant investments in their data, most organizations struggle to get much value from it. According to Forrester, only 38% of senior executives and decision-makers “have a high level of confidence in their customer insights and only 33% trust the analytics they generate from their business operations.” This reflects the real world that I have experienced. In this book, I will help readers formulate an analytics strategy that works in the real world, show them how to think about KPIs and help them tackle the problems they are bound to come across as they try to use data to make better decisions.