talk-data.com talk-data.com

Topic

Analytics

data_analysis insights metrics

4552

tagged

Activity Trend

398 peak/qtr
2020-Q1 2026-Q1

Activities

4552 activities · Newest first

Send us a text Let's get more technical on the data center revolution. The CPU cannot keep up with data growth! APUs and dedicated processors for Analytics workloads. Jonathan Friedmann, CEO & Co-Founder of Speedata, joins the podcast. 01:55 Introducing Jonathan Friedmann 10:20 What makes SpeedData 20x faster?13:29 HW acceleration is better16:21 Target client profile17:57 How to accelerate via HW22:02 APUs are the answer23;42 The evil CPU25:57 Supply chain, no problem27:33 Chips and Science Act28:50 Predicting the Future32:48 For fun?

LinkedIn: linkedin.com/in/jonathan-friedmann-8b5b99/ https://www.linkedin.com/company/speedataio/ Website: https://www.speedata.io/ Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun. Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

How does a traditional bricks-and-mortar retailer transform itself into an omni-channel business with strong digital and data science capabilities? In this episode of Leaders of Analytics we learn from Bunnings General Manager, Data and Analytics, Genevieve Elliott, how the company is transforming its operations using data and analytics. As Australia and New Zealand’s largest retailer of home improvement products, Bunnings is a highly complex organisation with a large physical footprint, a wide product range and an elaborate supply chain. Bunnings is almost 130 years old and has undergone tremendous growth over the last three decades. The company’s well-known strategy of “lowest price, widest range and best customer experience” is increasingly being driven by the company’s growing data and analytics capability. In this episode we discuss: Genevieve’s career journey and how she ended up in data and analyticsHow Bunnings uses data to create operational efficiencies, improve customer experience and optimise pricingHow the team prioritises projects and engages with the organisationHow the Data & Analytics team is driving a data-driven culture through the companyGenevieve’s advice to other analytics leaders wanting to drive strategically important results for their organisation, and much more.Genevieve Elliott on LinkedIn: https://www.linkedin.com/in/genevieve-elliott/

With the increasing rate at which new data tools and platforms are being created, the modern data stack risks becoming just another buzzword data leaders use when talking about how they solve problems.

Alongside the arrival of new data tools is the need for leaders to see beyond just the modern data stack and think deeply about how their data work can align with business outcomes, otherwise, they risk falling behind trying to create value from innovative, but irrelevant technology.

In this episode, Yali Sassoon joins the show to explore what the modern data stack really means, how to rethink the modern data stack in terms of value creation, data collection versus data creation, and the right way businesses should approach data ingestion, and much more.

Yali is the Co-Founder and Chief Strategy Officer at Snowplow Analytics, a behavioral data platform that empowers data teams to solve complex data challenges. Yali is an expert in data with a background in both strategy and operations consulting teaching companies how to use data properly to evolve their operations and improve their results.

Summary Business intelligence is the foremost application of data in organizations of all sizes. The typical conception of how it is accessed is through a web or desktop application running on a powerful laptop. Zing Data is building a mobile native platform for business intelligence. This opens the door for busy employees to access and analyze their company information away from their desk, but it has the more powerful effect of bringing first-class support to companies operating in mobile-first economies. In this episode Sabin Thomas shares his experiences building the platform and the interesting ways that it is being used.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan today to learn more about how Atlan’s active metadata platform is helping pioneering data teams like Postman, Plaid, WeWork & Unilever achieve extraordinary things with metadata and escape the chaos. Data engineers don’t enjoy writing, maintaining, and modifying ETL pipelines all day, every day. Especially once they realize 90% of all major data sources like Google Analytics, Salesforce, Adwords, Facebook, Spreadsheets, etc., are already available as plug-and-play connectors with reliable, intuitive SaaS solutions. Hevo Data is a highly reliable and intuitive data pipeline platform used by data engineers from 40+ countries to set up and run low-latency ELT pipelines with zero maintenance. Boasting more than 150 out-of-the-box connectors that can be set up in minutes, Hevo also allows you to monitor and control your pipelines. You get: real-time data flow visibility, fail-safe mechanisms, and alerts if anything breaks; preload transformations and auto-schema mapping precisely control how data lands in your destination; models and workflows to transform data for analytics; and reverse-ETL capability to move the transformed data back to your business software to inspire timely action. All of this, plus its transparent pricing and 24*7 live support, makes it consistently voted by users as the Leader in the Data Pipeline category on review platforms like G2. Go to dataengineeringpodcast.com/hevodata and sign up for a free 14-day trial that also comes with 24×7 support. Struggling with broken pipelines? Stale dashboards? Missing data? If this resonates with you, you’re not alone. Data engineers struggling with unreliable data need look no further than Monte Carlo, the leading end-to-end Data Observability Platform! Trusted by the data teams at Fox, JetBlue, and PagerDuty, Monte Carlo solves the costly problem of broken data pipelines. Monte Carlo monitors and alerts for data issues across your data warehouses, data lakes, dbt models, Airflow jobs, and business intelligence tools, reducing time to detection and resolution from weeks to just minutes. Monte Carlo also gives you a holistic picture

Summary The term "real-time data" brings with it a combination of excitement, uncertainty, and skepticism. The promise of insights that are always accurate and up to date is appealing to organizations, but the technical realities to make it possible have been complex and expensive. In this episode Arjun Narayan explains how the technical barriers to adopting real-time data in your analytics and applications have become surmountable by organizations of all sizes.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days or even weeks. By the time errors have made their way into production, it’s often too late and damage is done. Datafold built automated regression testing to help data and analytics engineers deal with data quality in their pull requests. Datafold shows how a change in SQL code affects your data, both on a statistical level and down to individual rows and values before it gets merged to production. No more shipping and praying, you can now know exactly what will change in your database! Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Visit dataengineeringpodcast.com/datafold today to book a demo with Datafold. RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their extensive library of integrations enable you to automatically send data to hundreds of downstream tools. Sign up free at dataengineeringpodcast.com/rudder Build Data Pipelines. Not DAGs. That’s the spirit behind Upsolver SQLake, a new self-service data pipeline platform that lets you build batch and streaming pipelines without falling into the black hole of DAG-based orchestration. All you do is write a query in SQL to declare your transformation, and SQLake will turn it into a continuous pipeline that scales to petabytes and delivers up to the minute fresh data. SQLake supports a broad set of transformations, including high-cardinality joins, aggregations, upserts and window operations. Output data can be streamed into a data lake for query engines like Presto, Trino or Spark SQL, a data warehouse like Snowflake or Redshift., or any other destination you choose. Pricing for SQLake is simple. You pay $99 per terabyte ingested into your data lake using SQLake, and run unlimited transformation pipelines for free. That way data engineers and data users can process to their heart’s content without worrying about their cloud bill. For data engineering podcast listeners, we’re offering a 30 day trial with unlimited data, so go to dataengineeringpodcast.com/upsolver today and see for yourself how to avoid DAG hell. Your host is Tobias Macey and today I’m interviewing Arjun Narayan about the benefits of real-time data for teams of all sizes

Interview

Introduction How did you ge

podcast_episode
by Philipp Nguyen (Grupo Boticário) , Andrea Suman (Hospital Israelita Albert Einstein) , Lucas Brossi

Convidamos alguns dos maiores especialistas em gestão de times de dados do Brasil para compartilhar as estratégias de formação e liderança de equipes de dados prontos para encarar os desafios da atualidade.

Conheça nossos participantes:

Philipp Nguyen - Director of Data at Grupo Boticário Lucas Brossi - Head of Advanced Analytics for South America na Bain & Company Andrea Suman - CDO Chief Data Officer no Hospital Israelita Albert Einstein

Se você é um líder de dados, se pensa em se tornar um líder ou simplesmente quer saber mais sobre o tema, vai curtir muito esse episódio. Hoje convidamos alguns dos maiores especialistas em gestão de equipes de dados no Brasil para um papo exclusivo sobre liderança de equipes de dados diante dos desafios atuais das empresas! No podcast falamos de descentralização de equipes de dados, desafios de crescimento das equipes e das empresas, dicas para quem quer se tornar um gestor e muito mais.

Vale a pena conferir, a conversa está sensacional!

Acesse o nosso Medium para acessar todas as referências e links do episódio: https://medium.com/data-hackers

podcast_episode
by Dante DeAntonio (Moody's Analytics) , Cris deRitis , Mark Zandi (Moody's Analytics)

Mark and Cris welcome back colleague, Dante DeAntonio of Moody's Analytics, to analyze the November U.S. Employment Report. Is the jobs market moderating? Depends who you ask. Full episode transcript. Follow Mark Zandi @MarkZandi, Cris deRitis @MiddleWayEcon, and Marisa DiNatale on LinkedIn for additional insight

Questions or Comments, please email us at [email protected]. We would love to hear from you.    To stay informed and follow the insights of Moody's Analytics economists, visit Economic View.

We talked about:

Irina’s background Irina as a mentor Designing curriculum and program management at AI Guild Other things Irina taught at AI Guild Why Irina likes teaching Students’ reluctance to learn cloud Irina as a manager Cohort analysis in a nutshell How Irina started teaching formally Irina’s diversity project in the works How DataTalks.Club can attract more female students to the Zoomcamps How to get technical feedback at work Antipatterns and overrated/overhyped topics in data analytics Advice for young women who want to get into data science/engineering Finding Irina online Fundamentals for data analysts Suggestions for DataTalks.club collaborations Conclusions

Links:

LinkedIn Account: https://www.linkedin.com/in/irinabrudaru/

ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Send us a text Datatopics is a podcast presented by Kevin Missoorten to talk about the fuzzy and misunderstood concepts in the world of data, analytics, and AI and get to the bottom of things. In this third and final episode of a mini-series on AI Ethics, we focus on Explainable AI (X-AI) as a means of understanding what the model considers. The conversation covers topics such as: What is explainable AI, how does it work and how mature is it?Does explainability come at a cost, in terms of performance or otherwise?The benefits of understanding AI models, and the realization they are not limited to ethics and compliance!In this episode, Tim & I exchange ideas with the esteemed José Antonio Oramas Mogrovejo and Ali Anwar from UAntwerp Datatopics is brought to you by Dataroots Music: The Gentlemen - DivKidThe thumbnail is generated by Midjourney

The first LIVE IRL episode!   Stephen Bailey, data engineer at Whatnot and writer of an incredibly entertaining data substack, joins Tristan for a follow-up conversation to Stephen's Coalesce talk, "Excel at nothing: how to be an effective generalist." You can read Stephen's writing at https://stkbailey.substack.com/. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com.  The Analytics Engineering Podcast is sponsored by dbt Labs.

The Art of Data-Driven Business

Learn how to integrate data-driven methodologies and machine learning into your business decision-making processes with 'The Art of Data-Driven Business.' This comprehensive guide shows you how to apply Python-based machine learning techniques to real-world challenges, transforming your organization into an innovative and well-informed enterprise. What this Book will help me do Create professional-quality data visualizations using Python's seaborn library to derive business insights. Analyze customer behavior, including predicting churn, with machine learning techniques. Apply clustering algorithms to segment customers for targeted marketing campaigns. Utilize pandas effectively for pricing and sales analytics to optimize your pricing strategies. Forecast outcomes of promotional strategies to determine costs and benefits and maximize performance. Author(s) None Palacio is an experienced data scientist and educator who specializes in the application of machine learning to solve business problems. With extensive real-world industry experience, Palacio brings practical insights and methodologies to learners. Their teaching connects technical knowledge to actionable business strategies. Who is it for? This book is ideal for business professionals aiming to incorporate data science into their strategies and technical experts seeking to leverage machine learning for business scenarios. Beginners to Python can find foundational help, while data scientists will appreciate the focused practical applications. It's perfect for individuals seeking a strong data-driven perspective in marketing, sales, and customer management.

Fuzzy Computing in Data Science

FUZZY COMPUTING IN DATA SCIENCE This book comprehensively explains how to use various fuzzy-based models to solve real-time industrial challenges. The book provides information about fundamental aspects of the field and explores the myriad applications of fuzzy logic techniques and methods. It presents basic conceptual considerations and case studies of applications of fuzzy computation. It covers the fundamental concepts and techniques for system modeling, information processing, intelligent system design, decision analysis, statistical analysis, pattern recognition, automated learning, system control, and identification. The book also discusses the combination of fuzzy computation techniques with other computational intelligence approaches such as neural and evolutionary computation. Audience Researchers and students in computer science, artificial intelligence, machine learning, big data analytics, and information and communication technology.

In today’s episode, we’re joined by Jon Darbyshire, Co-Founder and CEO at SmartSuite, a collaborative Work Management platform that enables teams to plan, track and manage workflows.

We talk about:

  • Jon’s background and how SmartSuite works.
  • No-code vs low-code.
  • What drove the popularity of no-code?
  • The value of being able to hire people from all around the world.
  • The similar driving factors behind remote work and no-code.
  • How the interaction between product management and engineering and QA has changed over time.
  • How will the no-code space evolve over the next 10 years?
  • The impact of AI and smarter algorithms in the no-code space.

Jon Darbyshire - https://www.linkedin.com/in/jondarbyshire/ SmartSuite - https://www.linkedin.com/company/hellosmartsuite/

This episode is brought to you by Qrvey

The tools you need to take action with your data, on a platform built for maximum scalability, security, and cost efficiencies. If you’re ready to reduce complexity and dramatically lower costs, contact us today at qrvey.com.

Qrvey, the modern no-code analytics solution for SaaS companies on AWS.

saas #analytics #AWS #BI

Data Literacy in Practice

"Data Literacy in Practice" teaches readers to unlock the power of data for making smarter decisions. You'll learn how to understand and work with data, gain the ability to derive actionable insights, and develop the skills required for data-informed decision-making. What this Book will help me do Understand the basics of data literacy and the importance of data in decision-making. Learn to visualize data effectively using charts and graphs tailored to your audience. Master the application of the four-pillar model for organizational data literacy advancement. Develop proficiency in managing data environments and assessing data quality. Become competent in deriving actionable insights and critical questioning for better analysis. Author(s) Angelika Klidas and Kevin Hanegan are pioneers in the field of data literacy with extensive experience in data analytics. Both are seasoned educators at top universities and bring their expertise to this book to help readers understand and leverage the power of data. Who is it for? "Data Literacy in Practice" is ideal for data analysts, professionals, and teams looking to enhance their data literacy skills. Readers should have a desire to utilize data effectively in their roles, regardless of prior experience. The book is designed to guide both beginners starting out and those who aim to deepen their knowledge.

Today I’m discussing something we’ve been talking about a lot on the podcast recently - the definition of a “data product.” While my definition is still a work in progress, I think it’s worth putting out into the world at this point to get more feedback. In addition to sharing my definition of data products (as defined the “producty way”), on today’s episode definition, I also discuss some of the non-technical skills that data product managers (DPMs) in the ML and AI space need if they want to achieve good user adoption of their solutions. I’ll also share my thoughts on whether data scientists can make good data product managers, what a DPM can do to better understand your users and stakeholders, and how product and UX design factors into this role. 

Highlights/ Skip to:

I introduce my reasons for sharing my definition of a data product (0:46) My definition of data product (7:26) Thinking the “producty” way (8:14) My thoughts on necessary skills for data PMs (in particular, AI & machine learning product management) (12:21) How data scientists can become good data product managers (DPMs) by taking off the data science hat (13:42) Understanding the role of UX design within the context of DPM (16:37) Crafting your sales and marketing strategies to emphasize the value of your product to the people who can use or purchase it (23:07) How to build a team that will help you increase adoption of your data product (30:01) How to build relationships with stakeholders/customers that allow you to find the right solutions for them (33:47) Letting go of a technical identity to develop a new identity as a DPM who can lead a team to build a product that actually gets used (36:32)

Quotes from Today’s Episode “This is what’s missing in some of the other definitions that I see around data products  [...] they’re not talking about it from the customer of the data product lens. And that orientation sums up all of the work that I’m doing and trying to get you to do as well, which is to put the people at the center of the work that you’re doing and not the data science, engineering, tech, or design. I want you to put the people at the center.” (6:12) “A data product is a data-driven, end-to-end, human-in-the-loop decision support solution that’s so valuable, users would potentially pay to use it.” (7:26) “I want to plunge all the way in and say, ‘if you want to do this kind of work, then you need to be thinking the product-y way.’ And this means inherently letting go of some of the data science-y way of thinking and the data-first kinds of ways of thinking.” (11:46) “I’ve read in a few places that data scientists don’t make for good data product managers. [While it may be true that they’re more introverted,] I don’t think that necessarily means that there’s an inherent problem with data scientists becoming good data product managers. I think the main challenge will be—and this is the same thing for almost any career transitioning into product management—is knowing when to let go of your former identity and wear the right hat at the right time.” (14:24) “Make better things for people that will improve their life and their outcomes and the business value will follow if you’ve properly aligned those two things together.” (17:21) “The big message here is this: there is always a design and experience, whether it is an API, or a platform, a dashboard, a full application, etc. Since there are no null design choices, how much are you going to intentionally shape that UX, or just pray that it comes out good on the other end? Prayer is not really a reliable strategy.  If you want to routinely do this work right, you need to put intention behind it.” (22:33)  “Relationship building is a must, and this is where applying user experience research can be very useful—not just for users, but also with stakeholders. It’s learning how to ask really good questions and learning the feelings, emotions, and reasons why people ask your team to build the thing that they’ve asked for. Learning how to dig into that is really important.” (26:26)

Links Designing for Analytics Community Work With Me Email Record a question

Summary The data ecosystem has been growing rapidly, with new communities joining and bringing their preferred programming languages to the mix. This has led to inefficiencies in how data is stored, accessed, and shared across process and system boundaries. The Arrow project is designed to eliminate wasted effort in translating between languages, and Voltron Data was created to help grow and support its technology and community. In this episode Wes McKinney shares the ways that Arrow and its related projects are improving the efficiency of data systems and driving their next stage of evolution.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan today to learn more about how Atlan’s active metadata platform is helping pioneering data teams like Postman, Plaid, WeWork & Unilever achieve extraordinary things with metadata and escape the chaos. Struggling with broken pipelines? Stale dashboards? Missing data? If this resonates with you, you’re not alone. Data engineers struggling with unreliable data need look no further than Monte Carlo, the leading end-to-end Data Observability Platform! Trusted by the data teams at Fox, JetBlue, and PagerDuty, Monte Carlo solves the costly problem of broken data pipelines. Monte Carlo monitors and alerts for data issues across your data warehouses, data lakes, dbt models, Airflow jobs, and business intelligence tools, reducing time to detection and resolution from weeks to just minutes. Monte Carlo also gives you a holistic picture of data health with automatic, end-to-end lineage from ingestion to the BI layer directly out of the box. Start trusting your data with Monte Carlo today! Visit dataengineeringpodcast.com/montecarlo to learn more. Data engineers don’t enjoy writing, maintaining, and modifying ETL pipelines all day, every day. Especially once they realize 90% of all major data sources like Google Analytics, Salesforce, Adwords, Facebook, Spreadsheets, etc., are already available as plug-and-play connectors with reliable, intuitive SaaS solutions. Hevo Data is a highly reliable and intuitive data pipeline platform used by data engineers from 40+ countries to set up and run low-latency ELT pipelines with zero maintenance. Boasting more than 150 out-of-the-box connectors that can be set up in minutes, Hevo also allows you to monitor and control your pipelines. You get: real-time data flow visibility, fail-safe mechanisms, and alerts if anything breaks; preload transformations and auto-schema mapping precisely control how data lands in your destination; models and workflows to transform data for analytics; and reverse-ETL capability to move the transformed data back to your business software to inspire timely action. All of this, plus its transparent pricing and 24*7 live support, makes it consistently voted by users as the Leader in the Data Pipeline category on review platforms like G2. Go to dataengineeringpodcast.com/hevodata and sign up for a free 14-day trial that also comes with 24×7 support. Your host is Tobias Macey and today I’m interviewing Wes McKinney about his work at Voltron Data and on the Arrow ecosystem

Interview

Introduction How did you get involved in the area of data management? Can you describe what you are building at Voltron Data and the story behind it? What is the vision for the broader data ecosystem that you are trying to realize through your investment in Arrow and related projects?

How does your work at Voltron Data contribute to the realization of that vision?

What is the impact on engineer productivity and compute efficiency that gets introduced by the impedance mismatches between language and framework representations of data? The scope and capabilities of the Arrow project have grown substantially since it was first introduced. Can you give an overview of the current features and extensions to the project? What are some of the ways that ArrowVe and its related projects can be integrated with or replace the different elements of a data platform? Can you describe how Arrow is implemented?

What are the most complex/challenging aspects of the engineering needed to support interoperable data interchange between language runtimes?

How are you balancing the desire to move quickly and improve the Arrow protocol and implementations, with the need to wait for other players in the ecosystem (e.g. database engines, compute frameworks, etc.) to add support? With the growing application of data formats such as graphs and vectors, what do you see as the role of Arrow and its ideas in those use cases? For workflows that rely on integrating structured and unstructured data, what are the options for interaction with non-tabular data? (e.g. images, documents, etc.) With your support-focused business model, how are you approaching marketing and customer education to make it viable and scalable? What are the most interesting, innovative, or unexpected ways that you have seen Arrow used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Arrow and its ecosystem? When is Arrow the wrong choice? What do you have planned for the future of Arrow?

Contact Info

Website wesm on GitHub @wesmckinn on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don’t forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

Voltron Data Pandas

Podcast Episode

Apache Arrow Partial Differential Equation FPGA == Field-Programmable Gate Array GPU == Graphics Processing Unit Ursa Labs Voltron (cartoon) Feature Engineering PySpark Substrait Arrow Flight Acero Arrow Datafusion Velox Ibis SIMD == Single Instruction, Multiple Data Lance DuckDB

Podcast Episode

Data Threads Conference Nano-Arrow Arrow ADBC Protocol Apache Iceberg

Podcast Episode

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Sponsored By: Atlan: Atlan

Have you ever woken up to a crisis because a number on a dashboard is broken and no one knows why? Or sent out frustrating slack messages trying to find the right data set? Or tried to understand what a column name means?

Our friends at Atlan started out as a data team themselves and faced all this collaboration chaos themselves, and started building Atlan as an internal tool for themselves. Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets & code, Atlan enables teams to create a single source of truth for all their data assets, and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker and more.

Go to dataengineeringpodcast.com/atlan and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $3000 on an annual subscription.a href="https://dataengineeringpodcast.com/montecarlo"…

Summary The most expensive part of working with massive data sets is the work of retrieving and processing the files that contain the raw information. FeatureBase (formerly Pilosa) avoids that overhead by converting the data into bitmaps. In this episode Matt Jaffee explains how to model your data as bitmaps and the benefits that this representation provides for fast aggregate computation. He also discusses the improvements that have been incorporated into FeatureBase to simplify integration with the rest of your data stack, and the SQL interface that was added to make working with the product easier.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days or even weeks. By the time errors have made their way into production, it’s often too late and damage is done. Datafold built automated regression testing to help data and analytics engineers deal with data quality in their pull requests. Datafold shows how a change in SQL code affects your data, both on a statistical level and down to individual rows and values before it gets merged to production. No more shipping and praying, you can now know exactly what will change in your database! Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Visit dataengineeringpodcast.com/datafold today to book a demo with Datafold. RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their extensive library of integrations enable you to automatically send data to hundreds of downstream tools. Sign up free at dataengineeringpodcast.com/rudder Build Data Pipelines. Not DAGs. That’s the spirit behind Upsolver SQLake, a new self-service data pipeline platform that lets you build batch and streaming pipelines without falling into the black hole of DAG-based orchestration. All you do is write a query in SQL to declare your transformation, and SQLake will turn it into a continuous pipeline that scales to petabytes and delivers up to the minute fresh data. SQLake supports a broad set of transformations, including high-cardinality joins, aggregations, upserts and window operations. Output data can be streamed into a data lake for query engines like Presto, Trino or Spark SQL, a data warehouse like Snowflake or Redshift., or any other destination you choose. Pricing for SQLake is simple. You pay $99 per terabyte ingested into your data lake using SQLake, and run unlimited transformation pipelines for free. That way data engineers and data users can process to their heart’s content without worrying about their cloud bill. For data engineering podcast listeners, we’re offering a 30 day trial with unlimited data, so go to dataengineeringpodcast.com/upsolver today and see for yourself how to avoid DAG hell. Your host is Tobias Macey

podcast_episode
by Cris deRitis , Nick Bunker (Indeed) , Mark Zandi (Moody's Analytics) , Marisa DiNatale (Moody's Analytics) , Adam Ozimek (Upwork)

Nick Bunker, Economic Research Director for North America at the Indeed Hiring Lab and Adam Ozimek, Chief Economist at EIG, discuss the state of remote work and the economic implications. Full episode transcript Follow Mark Zandi @MarkZandi, Cris deRitis @MiddleWayEcon, and Marisa DiNatale on LinkedIn for additional insight

Questions or Comments, please email us at [email protected]. We would love to hear from you.    To stay informed and follow the insights of Moody's Analytics economists, visit Economic View.

Pro DAX and Data Modeling in Power BI: Creating the Perfect Semantic Layer to Drive Your Dashboard Analytics

Develop powerful data models that bind data from disparate sources into a coherent whole. Then extend your data models using DAX–the query language that underpins Power BI–to create reusable measures to deliver finely-crafted custom calculations in your dashboards. This book starts off teaching you how to define and enhance the core structures of your data model to make it a true semantic layer that transforms complex data into familiar business terms. You’ll learn how to create calculated columns to solve basic analytical challenges. Then you’ll move up to mastering DAX measures to finely slice and dice your data. The book also shows how to handle temporal analysis in Power BI using a Date dimension. You will see how DAX Time Intelligence functions can simplify your analysis of data over time. Finally, the book shows how to extend DAX to filter and calculate datasets and develop DAX table functions and variables to handle complex queries. What You Will Learn Create clear and efficient data models that support in-depth analytics Define core attributes such as data types and standardized formatting consistently throughout a data model Define cross-filtering settings to enhance the data model Make use of DAX to create calculated columns and custom tables Extend your data model with custom calculations and reusable measures using DAX Perform time-based analysis using a Date dimension and Time Intelligence functions Who This Book Is For Everyone from the CEO to the Business Intelligence developer and from BI and Data architects and analysts to power users and IT managers can use this book to outshine the competition and create the data framework that they need and interactive dashboards using Power BI

Unlocking the Value of Real-Time Analytics

Storing data and making it accessible for real-time analysis is a huge challenge for organizations today. In 2020 alone, 64.2 billion GB of data was created or replicated, and it continues to grow. With this report, data engineers, architects, and software engineers will learn how to do deep analysis and automate business decisions while keeping your analytical capabilities timely. Author Christopher Gardner takes you through current practices for extracting data for analysis and uncovers the opportunities and benefits of making that data extraction and analysis continuous. By the end of this report, you’ll know how to use new and innovative tools against your data to make real-time decisions. And you’ll understand how to examine the impact of real-time analytics on your business. Learn the four requirements of real-time analytics: latency, freshness, throughput, and concurrency Determine where delays between data collection and actionable analytics occur Understand the reasons for real-time analytics and identify the tools you need to reach a faster, more dynamic level Examine changes in data storage and software while learning methodologies for overcoming delays in existing database architecture Explore case studies that show how companies use columnar data, sharding, and bitmap indexing to store and analyze data Fast and fresh data can make the difference between a successful transaction and a missed opportunity. The report shows you how.