talk-data.com talk-data.com

Topic

Analytics

data_analysis insights metrics

4552

tagged

Activity Trend

398 peak/qtr
2020-Q1 2026-Q1

Activities

4552 activities · Newest first

podcast_episode
by Mico Yuk (Data Storytelling Academy) , Lewis Temares (University of Miami)
BI

Lewis Temares is a former Dean at the University of Miami, and though he's retired now, he still has a lot to say and a lot of knowledge to contribute. As one of the most beloved leaders I've ever met, he has managed large and very successful teams in IT. Not only was he my first mentor in the technical industry, but his influence and continual guidance have been instrumental in my career path. Given recent world events, I brought him on to discuss how techies can stand out at work during unusual circumstances like a pandemic, what advice he'd give to tech leaders, workers, and students, and how to deal with the obstacles that accompany being laid off or furloughed. Tune in to gain some real-world wisdom and practical steps you can take to ensure you are standing out even while working from home!

  [13:17]  - Key Quote: I mean, there's nothing comparable to this, but yes, we've seen other occurrences, we've seen economic depressions, we've seen a lot of things that have been negative and have negative effects on organizations, but nothing to this extent. –Lewis Temares [13:53]  - Lewis's perspective on where listeners should be focused right now [24:01]  - Advice Lewis would give to IT students For full show notes, and the links mentioned visit: https://bibrainz.com/podcast/61

Enjoyed the Show?  Please leave us a review on iTunes. Free Data Storytelling Training Register before it sells out again! Our BI Data Storytelling Mastery Accelerator 3-Day Live Workshop new dates are finally available. Many BI teams are still struggling to deliver consistent, high-engaging analytics their users love. At the end of the workshop, you'll leave with a clear BI delivery action plan. Register today!

Summary Finding connections between data and the entities that they represent is a complex problem. Graph data models and the applications built on top of them are perfect for representing relationships and finding emergent structures in your information. In this episode Denise Gosnell and Matthias Broecheler discuss their recent book, the Practitioner’s Guide To Graph Data, including the fundamental principles that you need to know about graph structures, the current state of graph support in database engines, tooling, and query languages, as well as useful tips on potential pitfalls when putting them into production. This was an informative and enlightening conversation with two experts on graph data applications that will help you start on the right track in your own projects.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise. When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Today’s episode of the Data Engineering Podcast is sponsored by Datadog, a SaaS-based monitoring and analytics platform for cloud-scale infrastructure, applications, logs, and more. Datadog uses machine-learning based algorithms to detect errors and anomalies across your entire stack—which reduces the time it takes to detect and address outages and helps promote collaboration between Data Engineering, Operations, and the rest of the company. Go to dataengineeringpodcast.com/datadog today to start your free 14 day trial. If you start a trial and install Datadog’s agent, Datadog will send you a free T-shirt. You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data platforms. For more opportunities to stay up to date, gain new skills, and learn from your peers there are a growing number of virtual events that you can attend from the comfort and safety of your home. Go to dataengineeringpodcast.com/conferences to check out the upcoming events being offered by our partners and get registered today! Your host is Tobias Macey and today I’m interviewing Denise Gosnell and Matthias Broecheler about the recently published practitioner’s guide to graph data

Interview

Introduction How did you get involved in the area of data management? Can you start by explaining what your goals are for the Practitioner’s Guide To Graph Data?

What was your motivation for writing a book to address this topic?

What do you see as the driving force behind the growing popularity of graph technologies in recent years? What are some of the common use cases/applications of graph data and graph traversal algorithms?

What are the core elements of graph thinking that data teams need to be aware of to be effective in identifying those cases in their existing systems?

What are the fundamental principles of graph technologies that data engineers should be familiar with?

Wha

Send us a text Want to be featured as a guest on Making Data Simple? Reach out to us at [[email protected]] and tell us why you should be next.

Abstract Hosted by Al Martin, VP, Data and AI Expert Services and Learning at IBM, Making Data Simple provides the latest thinking on big data, A.I., and the implications for the enterprise from a range of experts. This week on Making Data Simple, we have Ayal Steinberg, Global Sales VP, Data and AI at IBM. Ayal Steinberg is the Vice President of Global Sales for IBM's Data and AI business unit.  In his capacity Ayal oversees IBM's largest and one of the most strategic business units with over 1,500 people and several billion dollars of annual revenue.  Ayal has proven success in managing complex and global sales organizations. Throughout his career, Ayal has created and led high-performing sales teams focused on selling complex software solutions to some of the world’s most well-known brands in more then 50 countries. Prior to IBM, Ayal successfully led sales teams through transformation and hyper growth at IBM Netezza, Oracle, Datastax (the open source provider of Apache Cassandra), and other enterprise software companies.  Earlier in his career, Ayal was a pioneer in selling software for several start-ups in price optimization and advanced analytics. Ayal majored in Economics from Binghamton University, State University of New York.

Show Notes 4:00 – Ayal’s back ground 15:33 – IBM strategy  18:45 – Moving to cloud 21:23 – Why IBM 23:24 - Value Selling 27:58 – Value vs. price 29:57 - Skills set 31:20 – How do you bring someone back around Solution Selling Challenger Sale Strengths Finder 2.0 Connect with the Team Producer Kate Brown - LinkedIn. Producer Steve Templeton - LinkedIn. Host Al Martin - LinkedIn and Twitter.    Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

podcast_episode
by Mico Yuk (Data Storytelling Academy) , Cliff Alper (Analysis Factory)
BI

Why is the business missing in BI? And how can you get the business back into BI? That's what today's guest is here to talk about. Cliff Alper is Director of Product Development at Analysis Factory, and in today's episode, you'll hear what he has to say about how you can tell if you're someone who's missing out on the business part of BI, why it's so important to ask a lot of questions, and how to talk to end-users.

  [18:40]  - What Cliff means when he says that the business is missing from BI [30:25]  - Advice for BI analytics leaders who struggle with confidence and knowledge [40:35]  - How to teach BI folks how to notice the unsaid things that happen in meetings For full show notes, and the links mentioned visit: https://bibrainz.com/podcast/60

Enjoyed the Show?  Please leave us a review on iTunes. Free Data Storytelling Training Register before it sells out again! Our BI Data Storytelling Mastery Accelerator 3-Day Live Workshop new dates are finally available. Many BI teams are still struggling to deliver consistent, high-engaging analytics their users love. At the end of the workshop, you'll leave with a clear BI delivery action plan. Register today!

Data Management at Scale

As data management and integration continue to evolve rapidly, storing all your data in one place, such as a data warehouse, is no longer scalable. In the very near future, data will need to be distributed and available for several technological solutions. With this practical book, you’ll learnhow to migrate your enterprise from a complex and tightly coupled data landscape to a more flexible architecture ready for the modern world of data consumption. Executives, data architects, analytics teams, and compliance and governance staff will learn how to build a modern scalable data landscape using the Scaled Architecture, which you can introduce incrementally without a large upfront investment. Author Piethein Strengholt provides blueprints, principles, observations, best practices, and patterns to get you up to speed. Examine data management trends, including technological developments, regulatory requirements, and privacy concerns Go deep into the Scaled Architecture and learn how the pieces fit together Explore data governance and data security, master data management, self-service data marketplaces, and the importance of metadata

The Data Analysis Workshop

The Data Analysis Workshop teaches you how to analyze and interpret data to solve real-world business problems effectively. By working through practical examples and datasets, you'll gain actionable insights into modern analytic techniques and build your confidence as a data analyst. What this Book will help me do Understand and apply fundamental data analysis concepts and techniques to tackle diverse datasets. Perform rigorous hypothesis testing and analyze group differences within data sets. Create informative data visualizations using Python libraries like Matplotlib and Seaborn. Understand and use correlation metrics to identify relationships between variables. Leverage advanced data manipulation techniques to uncover hidden patterns in complex datasets. Author(s) The authors, Gururajan Govindan, Shubhangi Hora, and Konstantin Palagachev, are experts in data science and analytics with years of experience in industry and academia. Their background includes performing business-critical analysis for companies and teaching students how to approach data-driven decision-making. They bring their depth of knowledge and engaging teaching styles together in this approachable guide. Who is it for? This book is intended for programmers with proficiency in Python who want to apply their skills to the field of data analysis. Readers who have a foundational understanding of coding and are eager to implement hands-on data science techniques will gain the most value. The content is also suitable for anyone pursuing a data-driven problem-solving mindset. This is an excellent resource to help transition from basic coding proficiency to applying Python in real-world data science.

The Data Wrangling Workshop - Second Edition

The Data Wrangling Workshop is your beginner's guide to the essential techniques and practices of data manipulation using Python. Throughout the book, you will progressively build your skills, learning key concepts such as extracting, cleaning, and transforming data into actionable insights. By the end, you'll be confident in handling various data wrangling tasks efficiently. What this Book will help me do Understand and apply the fundamentals of data wrangling using Python. Combine and aggregate data from diverse sources like web data, SQL databases, and spreadsheets. Use descriptive statistics and plotting to examine dataset properties. Handle missing or incorrect data effectively to maintain data quality. Gain hands-on experience with Python's powerful data science libraries like Pandas, NumPy, and Matplotlib. Author(s) Brian Lipp, None Roychowdhury, and Dr. Tirthajyoti Sarkar are experienced educators and professionals in the fields of data science and engineering. Their collective expertise spans years of teaching and working with data technologies. They aim to make data wrangling accessible and comprehensible, focusing on practical examples to equip learners with real-world skills. Who is it for? The Data Wrangling Workshop is ideal for developers, data analysts, and business analysts aiming to become data scientists or analytics experts. If you're just getting started with Python, you will find this book guiding you step-by-step. A basic understanding of Python programming, as well as relational databases and SQL, is recommended for smooth learning.

podcast_episode
by Tim Wilson (Analytics Power Hour - Columbus (OH) , Moe Kiss (Canva) , Michael Helbling (Search Discovery)

Analytics is hard (so they say... but we're not going to open THAT can of worms). Do you know what's harder? Managing analysts! I mean, they're always asking, "Why?" Sometimes, they even ask it five times! They can wind up, you know, analyzing whatever you're asking them to do! On this episode, special guest Moe Kiss (you may know her as a co-host of this podcast) joined Michael and Tim to dig into the ins and outs of the analyst/manager relationship. For complete show notes, including links to items mentioned in this episode and a transcript of the show, visit the show page.

Are your subconscious beliefs causing difficulties in your life? Are they holding you back from overcoming challenges that you're facing? Today's guest can help you understand how your subconscious beliefs may be holding you back and what you can do about it. Julie Cairns is the author of the book The Abundance Code. In today's episode, you'll hear her talk about dealing with uncertainty, how to shift to an abundance mindset, and the three steps that can help you shift your subconscious beliefs.

 [08:35] User Expectations: How to shift to an abundance mindset [10:59]  - Coping with the current level of uncertainty [30:15]  - Three things that you need to do to change subconscious beliefs For full show notes, and the links mentioned visit: https://bibrainz.com/podcast/59

Enjoyed the Show?  Please leave us a review on iTunes. Free Data Storytelling Training Register before it sells out again! Our BI Data Storytelling Mastery Accelerator 3-Day Live Workshop new dates are finally available. Many BI teams are still struggling to deliver consistent, high-engaging analytics their users love. At the end of the workshop, you'll leave with a clear BI delivery action plan. Register today!

On this episode, we chat with Emily White, a music industry veteran who started her career as a world class tour manager before retiring at 23 to pursue artist management, entrepreneurial endeavors, and academia. Emily has worked with everyone from Dinosaur Jr. to Zac Brown Band, she’s founded and run multiple entertainment companies, released a number of books, and now, when she’s not teaching at NYU’s Tisch School of the Arts, Emily is using music data analytics to help activate voters for the upcoming presidential election in November. The #iVoted initiative, which Emily founded with Madison House co-founder Mike Luba and Wilco’s Pat Sansone, is gearing up to be one of the biggest digital music festivals ever, with dozens of artists performing via webcast nationwide. The cost of admission for fans? A selfie from home with their mail-in ballot or a photo from outside their polling place, though we strongly encourage the former. For a full list of artists performing on Nov. 3, check out iVotedConcerts.com, and full disclaimer: Chartmetric is a proud data partner of the #iVoted initiative. Connect With Emilyhttps://twitter.com/emwizzle https://twitter.com/iVotedConcerts https://twitter.com/collectiveent_ https://www.instagram.com/collectiveentinc/ Connect With Ushttp://podcast.chartmetric.com/http://chartmetric.com/https://blog.chartmetric.comhttps://smarturl.it/chartmetric_social

Summary Wind energy is an important component of an ecologically friendly power system, but there are a number of variables that can affect the overall efficiency of the turbines. Michael Tegtmeier founded Turbit Systems to help operators of wind farms identify and correct problems that contribute to suboptimal power outputs. In this episode he shares the story of how he got started working with wind energy, the system that he has built to collect data from the individual turbines, and how he is using machine learning to provide valuable insights to produce higher energy outputs. This was a great conversation about using data to improve the way the world works.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise. When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Today’s episode of the Data Engineering Podcast is sponsored by Datadog, a SaaS-based monitoring and analytics platform for cloud-scale infrastructure, applications, logs, and more. Datadog uses machine-learning based algorithms to detect errors and anomalies across your entire stack—which reduces the time it takes to detect and address outages and helps promote collaboration between Data Engineering, Operations, and the rest of the company. Go to dataengineeringpodcast.com/datadog today to start your free 14 day trial. If you start a trial and install Datadog’s agent, Datadog will send you a free T-shirt. You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data platforms. For more opportunities to stay up to date, gain new skills, and learn from your peers there are a growing number of virtual events that you can attend from the comfort and safety of your home. Go to dataengineeringpodcast.com/conferences to check out the upcoming events being offered by our partners and get registered today! Your host is Tobias Macey and today I’m interviewing Michael Tegtmeier about Turbit, a machine learning powered platform for performance monitoring of wind farms

Interview

Introduction How did you get involved in the area of data management? Can you start by describing what you are building at Turbit and your motivation for creating the business? What are the most problematic factors that contribute to low performance in power generation with wind turbines? What is the current state of the art for accessing and analyzing data for wind farms? What information are you able to gather from the SCADA systems in the turbine?

How uniform is the availability and formatting of data from different manufacturers?

How are you handling data collection for the individual turbines?

How much information are you processing at the point of collection vs. sending to a centralized data store?

Can you describe the system architecture of Turbit and the lifecycle of turbine data as it propag

What do you know about emotional intelligence? Today's guest is going to answer a lot of questions about emotional intelligence, EQ, when you should have your EQ assessed, and what you should do with that information.

  [01:13] What to expect in Season Five of AoF

[08:55] User Expectations: What emotional intelligence is and when you should take an emotional intelligence test For full show notes, and the links mentioned visit: https://bibrainz.com/podcast/58  

Enjoyed the Show?  Please leave us a review on iTunes. Free Data Storytelling Training Register before it sells out again! Our BI Data Storytelling Mastery Accelerator 3-Day Live Workshop new dates are finally available. Many BI teams are still struggling to deliver consistent, high-engaging analytics their users love. At the end of the workshop, you'll leave with a clear BI delivery action plan. Register today!

Learning Spark, 2nd Edition

Data is bigger, arrives faster, and comes in a variety of formatsâ??and it all needs to be processed at scale for analytics or machine learning. But how can you process such varied workloads efficiently? Enter Apache Spark. Updated to include Spark 3.0, this second edition shows data engineers and data scientists why structure and unification in Spark matters. Specifically, this book explains how to perform simple and complex data analytics and employ machine learning algorithms. Through step-by-step walk-throughs, code snippets, and notebooks, youâ??ll be able to: Learn Python, SQL, Scala, or Java high-level Structured APIs Understand Spark operations and SQL Engine Inspect, tune, and debug Spark operations with Spark configurations and Spark UI Connect to data sources: JSON, Parquet, CSV, Avro, ORC, Hive, S3, or Kafka Perform analytics on batch and streaming data using Structured Streaming Build reliable data pipelines with open source Delta Lake and Spark Develop machine learning pipelines with MLlib and productionize models using MLflow

Summary The first stage of every data pipeline is extracting the information from source systems. There are a number of platforms for managing data integration, but there is a notable lack of a robust and easy to use open source option. The Meltano project is aiming to provide a solution to that situation. In this episode, project lead Douwe Maan shares the history of how Meltano got started, the motivation for the recent shift in focus, and how it is implemented. The Singer ecosystem has laid the groundwork for a great option to empower teams of all sizes to unlock the value of their Data and Meltano is building the reamining structure to make it a fully featured contender for proprietary systems.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise. When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Today’s episode of the Data Engineering Podcast is sponsored by Datadog, a SaaS-based monitoring and analytics platform for cloud-scale infrastructure, applications, logs, and more. Datadog uses machine-learning based algorithms to detect errors and anomalies across your entire stack—which reduces the time it takes to detect and address outages and helps promote collaboration between Data Engineering, Operations, and the rest of the company. Go to dataengineeringpodcast.com/datadog today to start your free 14 day trial. If you start a trial and install Datadog’s agent, Datadog will send you a free T-shirt. You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data platforms. For more opportunities to stay up to date, gain new skills, and learn from your peers there are a growing number of virtual events that you can attend from the comfort and safety of your home. Go to dataengineeringpodcast.com/conferences to check out the upcoming events being offered by our partners and get registered today! Your host is Tobias Macey and today I’m interviewing Douwe Maan about Meltano, an open source platform for building, running & orchestrating ELT pipelines.

Interview

Introduction How did you get involved in the area of data management? Can you start by describing what Meltano is and the story behind it? Who is the target audience?

How does the focus on small or early stage organizations constrain the architectural decisions that go into Meltano?

What have you found to be the complexities in trying to encapsulate the entirety of the data lifecycle in a single tool or platform?

What are the most painful transitions in that lifecycle and how does that pain manifest?

How and why has the focus of the project shifted from its original vision? With your current focus on the data integration/data transfer stage of the lifecycle, what are you seeing as the biggest barriers to entry with the current ecosystem?

What are the main elements of

Summary There are an increasing number of use cases for real time data, and the systems to power them are becoming more mature. Once you have a streaming platform up and running you need a way to keep an eye on it, including observability, discovery, and governance of your data. That’s what the Lenses.io DataOps platform is built for. In this episode CTO Andrew Stevenson discusses the challenges that arise from building decoupled systems, the benefits of using SQL as the common interface for your data, and the metrics that need to be tracked to keep the overall system healthy. Observability and governance of streaming data requires a different approach than batch oriented workflows, and this episode does an excellent job of outlining the complexities involved and how to address them.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise. When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Today’s episode of the Data Engineering Podcast is sponsored by Datadog, a SaaS-based monitoring and analytics platform for cloud-scale infrastructure, applications, logs, and more. Datadog uses machine-learning based algorithms to detect errors and anomalies across your entire stack—which reduces the time it takes to detect and address outages and helps promote collaboration between Data Engineering, Operations, and the rest of the company. Go to dataengineeringpodcast.com/datadog today to start your free 14 day trial. If you start a trial and install Datadog’s agent, Datadog will send you a free T-shirt. You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data platforms. For more opportunities to stay up to date, gain new skills, and learn from your peers there are a growing number of virtual events that you can attend from the comfort and safety of your home. Go to dataengineeringpodcast.com/conferences to check out the upcoming events being offered by our partners and get registered today! Your host is Tobias Macey and today I’m interviewing Andrew Stevenson about Lenses.io, a platform to provide real-time data operations for engineers

Interview

Introduction How did you get involved in the area of data management? Can you start by describing what Lenses is and the story behind it? What is your working definition for what constitutes DataOps?

How does the Lenses platform support the cross-cutting concerns that arise when trying to bridge the different roles in an organization to deliver value with data?

What are the typical barriers to collaboration, and how does Lenses help with that?

Many different systems provide a SQL interface to streaming data on various substrates. What was your reason for building your own SQL engine and what is unique about it? What are the main challenges that you see engineers facing when working with s

Send us a text Want to be featured as a guest on Making Data Simple? Reach out to us at [[email protected]] and tell us why you should be next.

Abstract Hosted by Al Martin, VP, Data and AI Expert Services and Learning at IBM, Making Data Simple provides the latest thinking on big data, A.I., and the implications for the enterprise from a range of experts. This week on Making Data Simple, we have Jim Ruston Managing Director of Armeta Analytics. Al and Jim discuss monetizing data, prescriptive and descriptive approaches, and sealing the deal. Show Notes 4:45 - How do you monetize data 6:20 - Common actions 8:50 - Prescriptive approach 11:15 – Gap in data warehouse 13:05 – Cleanup  17:40 – Overhead costs 19:22 - Prescriptive and descriptive approach 20:56 – Preferred technology 23:07 - Who are the decision makers   24:56 – Sealing the deal 27:10 – Why do I need Armeta Armeta  Armeta Linkedin Guaranteed Analytics  The Challenger Sale Connect with the Team Producer Kate Brown - LinkedIn. Producer Meighann Helene - LinkedIn. Producer Michael Sestak - LinkedIn. Producer Steve Templeton - LinkedIn. Host Al Martin - LinkedIn and Twitter.  Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

Being a pioneer for the past 25 years, SONY PlayStation has played a vital role in the Interactive Gaming Industry. Over 100+ million monthly active users, 100+ million PS-4 console sales along with thousands of game development partners across the globe, big-data problem is quite inevitable. This presentation talks about how we scaled Airflow horizontally which has helped us building a stable, scalable and optimal data processing infrastructure powered by Apache Spark, AWS ECS, EC2 and Docker. Due to the demand for processing large volumes of data and also to meet the growing Organization’s data analytics and usage demands, the data team at PlayStation took an initiative to build an open source big data processing infrastructure where Apache Spark in Python as the core ETL engine. Apache Airflow is the core workflow management tool for the entire eco system. We started with an Airflow application running on a single AWS EC2 instance to support parallelism of 16 with 1 scheduler and 1 worker and eventually scaled it to a bigger scheduler along with 4 workers to support a parallelism of 96, DAG concurrency of 96 and a worker task concurrency of 24. Containerized all the services on AWS ECS which gave us an ability to scale Airflow horizontally.

For three years we at LOVOO, a market-leading dating app, have been using the Google Cloud managed version of Airflow, a product we’ve been familiar with since its Alpha release. We took a calculated risk and integrated the Alpha into our product, and, luckily, it was a match. Since then, we have been leveraging this software to build out not only our data pipeline, but also boost the way we do analytics and BI. The speaker will present an overview of the software’s usability for Pipeline Error Alerting through BashOperators that communicate with Slack and will touch upon how they built their Analytics Pipeline (deployment and growth) and currently batch big amounts of data from different sources effectively using Airflow. We will also showcase our PythonOperators-driven RedShift to BigQuery data migration process, as well as offer a guide for creating fully dynamic tasks inside DAG.

How do you create fast and painless delivery of new DAGs into production? When running Airflow at scale, it becomes a big challenge to manage the full lifecycle around your pipelines; making sure that DAGs are easy to develop, test, and ship into prod. In this talk, we will cover our suggested approach to building a proper CI/CD cycle that ensures the quality and fast delivery of production pipelines. CI/CD is the practice of delivering software from dev to prod, optimized for fast iteration and quality control. In the data engineering context, DAGs are just another piece of software that require some form of lifecycle management. Traditionally, DAGs have been thought of as relatively static, but the new wave of analytics and machine learning efforts require more agile DAG development, in line with how agile software engineering teams build and ship code. In this session, we will dive into the challenges of building CI/CD cycles for Airflow DAGs. We will focus on a pipeline that involves Apache Spark as an extra dimension of real-world complexity, walking through a typical flow of DAG authoring, debugging, and testing, from local to staging to prod environments. We will offer best practices and discuss open-source tools you can use to easily build your own smooth cycle for Airflow CI/CD.

Building Analytics Teams

In "Building Analytics Teams," author John K. Thompson draws from over three decades of experience in analytics and management to guide you through creating an impactful analytics team. The book emphasizes key strategies for hiring, managing, and leading analytics experts to drive business improvements and achieve organizational success. What this Book will help me do Develop the skills to build and lead high-performing analytics and AI teams. Gain insights into selecting impactful projects that drive measurable business outcomes. Understand how to cultivate successful collaborations with cross-functional business teams. Learn techniques to effectively communicate analytics-driven strategies to executives. Master strategies to navigate organizational and technological challenges in data initiatives. Author(s) John K. Thompson is a seasoned analytics and AI practitioner with over 30 years of experience leading data-driven transformations for dynamic organizations. Renowned for his strategic and pragmatic approach, John crafts hands-on methodologies to unlock the potential of analytics teams. His passion for mentoring fuels his engaging and insightful writing style. Who is it for? This book is ideal for senior executives and managers aiming to harness analytics and AI to transform their organizations. It's also tailored for analytics professionals who want to elevate their team's operational success. No matter your current experience, you'll find strategies to optimize your analytics initiatives and deliver impactful results.