The practice of data science in genomics and computational biology is fraught with friction. This is largely due to a tight coupling of bioinformatic tools to file input/output. While omic data is specialized and the storage formats for high-throughput sequencing and related data are often standardized, the adoption of emerging open standards not tied to bioinformatics can help better integrate bioinformatic workflows into the wider data science, visualization, and AI/ML ecosystems. Here, we present two bridge libraries as short vignettes for composable bioinformatics. First, we present Anywidget, an architecture and toolkit based on modern web standards for sharing interactive widgets across all Jupyter-compatible runtimes, including JupyterLab, Google Colab, VSCode, and more. Second, we present Oxbow, a Rust and Python-based adapter library that unifies access to common genomic data formats by efficiently transforming queries into Apache Arrow, a standard in-memory columnar representation for tabular data analytics. Together, we demonstrate the composition of these libraries to build a custom connected genomic analysis and visualization environments. We propose that components such as these, which leverage scientific domain-agnostic standards to unbundle specialized file manipulation, analytics, and web interactivity, can serve as reusable building blocks for composing flexible genomic data analysis and machine learning workflows as well as systems for exploratory data analysis and visualization.
talk-data.com
Topic
Analytics
4552
tagged
Activity Trend
Top Events
Todd Olson joins me to talk about making analytics worth paying for and relevant in the age of AI. The CEO of Pendo, an analytics SAAS company, Todd shares how the company evolved to support a wider audience by simplifying dashboards, removing user roadblocks, and leveraging AI to both generate and explain insights. We also talked about the roles of product management at Pendo. Todd views AI product management as a natural evolution for adaptable teams and explains how he thinks about hiring product roles in 2025. Todd also shares how he thinks about successful user adoption of his product around “time to value” and “stickiness” over vanity metrics like time spent.
Highlights/ Skip to:
How Todd has addressed analytics apathy over the past decade at Pendo (1:17) Getting back to basics and not barraging people with more data and power (4:02) Pendo’s strategy for keeping the product experience simple without abandoning power users (6:44) Whether Todd is considering using an LLM (prompt-based) answer-driven experience with Pendo's UI (8:51) What Pendo looks for when hiring product managers right now, and why (14:58) How Pendo evaluates AI product managers, specifically (19:14) How Todd Olson views AI product management compared to traditional software product management (21:56) Todd’s concerns about the probabilistic nature of AI-generated answers in the product UX (27:51) What KPIs Todd uses to know whether Pendo is doing enough to reach its goals (32:49) Why being able to tell what answers are best will become more important as choice increases (40:05)
Quotes from Today’s Episode
“Let’s go back to classic Geoffrey Moore Crossing the Chasm, you’re selling to early adopters. And what you’re doing is you’re relying on the early adopters’ skill set and figuring out how to take this data and connect it to business problems. So, in the early days, we didn’t do anything because the market we were selling to was very, very savvy; they’re hungry people, they just like new things. They’re getting data, they’re feeling really, really smart, everything’s working great. As you get bigger and bigger and bigger, you start to try to sell to a bigger TAM, a bigger audience, you start trying to talk to the these early majorities, which are, they’re not early adopters, they’re more technology laggards in some degree, and they don’t understand how to use data to inform their job. They’ve never used data to inform their job. There, we’ve had to do a lot more work.” Todd (2:04 - 2:58) “I think AI is amazing, and I don’t want to say AI is overhyped because AI in general is—yeah, it’s the revolution that we all have to pay attention to. Do I think that the skills necessary to be an AI product manager are so distinct that you need to hire differently? No, I don’t. That’s not what I’m seeing. If you have a really curious product manager who’s going all in, I think you’re going to be okay. Some of the most AI-forward work happening at Pendo is not just product management. Our design team is going crazy. And I think one of the things that we’re seeing is a blend between design and product, that they’re always adjacent and connected; there’s more sort of overlappiness now.” Todd (22:41 - 23:28) “I think about things like stickiness, which may not be an aggregate time, but how often are people coming back and checking in? And if you had this companion or this agent that you just could not live without, and it caused you to come into the product almost every day just to check in, but it’s a fast check-in, like, a five-minute check-in, a ten-minute check-in, that’s pretty darn sticky. That’s a good metric. So, I like stickiness as a metric because it’s measuring [things like], “Are you thinking about this product a lot?” And if you’re thinking about it a lot, and like, you can’t kind of live without it, you’re going to go to it a lot, even if it’s only a few minutes a day. Social media is like that. Thankfully I’m not addicted to TikTok or Instagram or anything like that, but I probably check it nearly every day. That’s a pretty good metric. It gets part of my process of any products that you’re checking every day is pretty darn good. So yeah, but I think we need to reframe the conversation not just total time. Like, how are we measuring outcomes and value, and I think that’s what’s ultimately going to win here.” Todd (39:57)
Links
LinkedIn: https://www.linkedin.com/in/toddaolson/ X: https://x.com/tolson [email protected]
Tired of spending money on data courses you never finish? Here are 7 essential books that will actually boost your analytical skills, with no subscription required! Plus, make sure to tune in till the end as one lucky listener will get a free book from this list! Get the books here! DISCLAIMER: Some of the links in this video are affiliate links, meaning if you click through and make a purchase, I may earn a commission at no extra cost to you. Storytelling with Data by Cole Nussbaumer Knaflic 👉 https://amzn.to/3ZYHhsG Ace the Data Science Interview by Nick Singh and Kevin Huo 👉 https://amzn.to/3XZ9IaB Moneyball by Michael Lewis 👉 https://amzn.to/44fy4OD The StatQuest Illustrated Guide To Machine Learning by Josh Starmer 👉 https://amzn.to/40hRgu2 Fundamentals of Data Engineering by Joe Reis and Matt Housley 👉 https://amzn.to/3W84K8K Data Science for Business by Foster Provost and Tom Fawcett 👉 https://amzn.to/4k7jkaD The Big Book of Dashboards by Steve Wexler, Jeffrey Shaffer, and Andy Cotgreave 👉 https://amzn.to/462GJVj 💌 Join 10k+ aspiring data analysts & get my tips in your inbox weekly 👉 https://www.datacareerjumpstart.com/newsletter 🆘 Feeling stuck in your data journey? Come to my next free "How to Land Your First Data Job" training 👉 https://www.datacareerjumpstart.com/training 👩💻 Want to land a data job in less than 90 days? 👉 https://www.datacareerjumpstart.com/daa 👔 Ace The Interview with Confidence 👉 https://www.datacareerjumpstart.com/interviewsimulator ⌚ TIMESTAMPS 00:16 Book 1: The Big Book of Dashboards 02:52 Book 2: Data Science for Business 04:38 Book 3: Fundamentals of Data Engineering 06:05 Book 4: The StatQuest Illustrated Guide To Machine Learning 07:52 Book 5: Moneyball 10:09 Book 6: Ace the Data Science Interview 11:24 Book 7: Storytelling With Data I've interviewed some of these awesome data authors! Check out these episodes! Stats You Need to Know as a Data Analyst (w/ StatQuest) 👉 https://datacareerpodcast.com/episode/105-do-you-have-to-be-good-at-statistics-to-be-a-data-analyst-w-statquest-josh-starmer-phd How to Ace The Data Science & Analytics Interview w/ Nick Singh 👉 https://datacareerpodcast.com/episode/74-how-to-ace-the-data-science-analytics-interview-w-nick-singh Meet The Woman Who Changed Data Storytelling Forever (Cole Knaflic) 👉 https://datacareerpodcast.com/episode/142-meet-the-woman-who-changed-data-storytelling-forever-cole-knafflic
🔗 CONNECT WITH AVERY 🎥 YouTube Channel: https://www.youtube.com/@averysmith 🤝 LinkedIn: https://www.linkedin.com/in/averyjsmith/ 📸 Instagram: https://instagram.com/datacareerjumpstart 🎵 TikTok: https://www.tiktok.com/@verydata 💻 Website: https://www.datacareerjumpstart.com/ Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!
To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more
If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.
👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa
Pandas and scikit-learn have become staples in the machine learning toolkit for processing and modeling tabular data in Python. However, when data size scales up, these tools become slow or run out of memory. Ibis provides a unified, Pythonic, dataframe-like interface to 20+ execution backends, including dataframe libraries, databases, and analytics engines. Ibis enables users to leverage these powerful tools without rewriting their data engineering code (or learning SQL). IbisML extends the benefits of using Ibis to the ML workflow by letting users preprocess their data at scale on any Ibis-supported backend.
In this tutorial, you'll build an end-to-end machine learning project to predict the live win probability after each move during chess games.
If you are interested in a career in Data Science, this one is for you! In this episode with Kimberly Fessel (Dr. Kim Data) & Maven's own Chris Bruehl, you'll learn about the most important skills Data Scientists need, and where you should be focusing your energy. You'll walk away with a solid understanding of the Data Scientist role, core responsibilities, tools of the trade, and a concrete roadmap you can follow to start building skills immediately. What You'll Learn: The technical skills you need for a Data Science career Complementary soft skills that make a difference How to prioritize your learning to make the most of your effort This session was part of our OPEN CAMPUS week in October, which included 6 days of live expert sessions. Register for free to be part of the next live session: https://bit.ly/3XB3A8b
Follow us on Socials: LinkedIn YouTube Instagram (Mavens of Data) Instagram (Maven Analytics) TikTok Facebook Medium X/Twitter
Dante joins the Inside Economics crew for an unusual jobs Thursday podcast. The team discusses the disconnect between the positive headlines and market reaction to the June employment report and the weakening undercurrent in the labor market. They also debate whether higher inflation is still looming despite not showing up in the data yet. Marisa steals the show in the stats game with three figures that stump Mark, Cris, and Dante. Guest: Dante DeAntonio, Senior Director of Economic Research, Moody's Analytics Hosts: Mark Zandi – Chief Economist, Moody’s Analytics, Cris deRitis – Deputy Chief Economist, Moody’s Analytics, and Marisa DiNatale – Senior Director - Head of Global Forecasting, Moody’s Analytics Follow Mark Zandi on 'X' and BlueSky @MarkZandi, Cris deRitis on LinkedIn, and Marisa DiNatale on LinkedIn
Questions or Comments, please email us at [email protected]. We would love to hear from you. To stay informed and follow the insights of Moody's Analytics economists, visit Economic View.
Supported by Our Partners • WorkOS — The modern identity platform for B2B SaaS. • Statsig — The unified platform for flags, analytics, experiments, and more. • Sonar — Code quality and code security for ALL code. — What happens when a company goes all in on AI? At Shopify, engineers are expected to utilize AI tools, and they’ve been doing so for longer than most. Thanks to early access to models from GitHub Copilot, OpenAI, and Anthropic, the company has had a head start in figuring out what works. In this live episode from LDX3 in London, I spoke with Farhan Thawar, VP of Engineering, about how Shopify is building with AI across the entire stack. We cover the company’s internal LLM proxy, its policy of unlimited token usage, and how interns help push the boundaries of what’s possible. In this episode, we cover: • How Shopify works closely with AI labs • The story behind Shopify’s recent Code Red • How non-engineering teams are using Cursor for vibecoding • Tobi Lütke’s viral memo and Shopify’s expectations around AI • A look inside Shopify’s LLM proxy—used for privacy, token tracking, and more • Why Shopify places no limit on AI token spending • Why AI-first isn’t about reducing headcount—and why Shopify is hiring 1,000 interns • How Shopify’s engineering department operates and what’s changed since adopting AI tooling • Farhan’s advice for integrating AI into your workflow • And much more! — Timestamps (00:00) Intro (02:07) Shopify’s philosophy: “hire smart people and pair with them on problems” (06:22) How Shopify works with top AI labs (08:50) The recent Code Red at Shopify (10:47) How Shopify became early users of GitHub Copilot and their pivot to trying multiple tools (12:49) The surprising ways non-engineering teams at Shopify are using Cursor (14:53) Why you have to understand code to submit a PR at Shopify (16:42) AI tools' impact on SaaS (19:50) Tobi Lütke’s AI memo (21:46) Shopify’s LLM proxy and how they protect their privacy (23:00) How Shopify utilizes MCPs (26:59) Why AI tools aren’t the place to pinch pennies (30:02) Farhan’s projects and favorite AI tools (32:50) Why AI-first isn’t about freezing headcount and the value of hiring interns (36:20) How Shopify’s engineering department operates, including internal tools (40:31) Why Shopify added coding interviews for director-level and above hires (43:40) What has changed since Spotify added AI tooling (44:40) Farhan’s advice for implementing AI tools — The Pragmatic Engineer deepdives relevant for this episode: • How Shopify built its Live Globe for Black Friday • Inside Shopify's leveling split • Real-world engineering challenges: building Cursor • How Anthropic built Artifacts — See the transcript and other references from the episode at https://newsletter.pragmaticengineer.com/podcast — Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email [email protected].
Get full access to The Pragmatic Engineer at newsletter.pragmaticengineer.com/subscribe
A personal exploration of building an AI-powered analytics role.
Approaches to maintaining data quality amidst AI-driven analytics.
Aaron Neiderhiser (CEO of Tuva Health) has an audacious goal of improving US healthcare analytics. It all starts with the data. Aaron discusses his journey and obsession with making sense of the gigantic mess of US healthcare data.
--- According to the U.S. Environmental Protection Agency, transportation accounts for 28% of U.S. greenhouse gas emissions. For short trips, flying is much more carbon-intensive than rail or bus travel. At Johns Hopkins, faculty members travel the most of all affiliate types, producing more than double the emissions of administrative employees and staff.
--- The Johns Hopkins University Office of Climate and Sustainability, through its Campus as a Living Lab initiative - a program that supports sustainability innovation - partnered with GovEx to build a tool to help address this problem. Using interactive visualizations with comparable statistics across all Johns Hopkins divisions, users can compare the emissions data of different methods of transportation, enabling them to make more environmentally-friendly choices as they conduct their business.
--- We sit down with four contributors to the project to discuss how the tool was built and how cities can use it as a model to support their own climate change initiatives: Sara Betran de Lis, Director of Research and Analytics at GovEx; Heather Bree, Data Visualization and D3 Developer at GovEx; Debi Denney, Assistant Director of Johns Hopkins Office of Climate & Sustainability; and Rose Weeks, Senior Research Associate at Johns Hopkins Bloomberg School of Public Health, working with the Campus as a Living Lab Program at the Office of Climate & Sustainability.
--- Learn more about GovEx --- Fill out our listener survey!
Jason Bryll is a healthcare analytics expert and hiring manager with nearly two decades of experience. In this episode, Jason explains what healthcare analytics entails, why it's essential, and the role of AI in the field. More importantly, you'll learn how to stand out to hiring managers-- even in today's market! Wanna dive further into healthcare analytics? Here's your next podcast: 👉 https://datacareerpodcast.com/episode/160-she-became-a-data-analyst-after-a-20-year-career-in-physical-therapy-melody-santos 💌 Join 10k+ aspiring data analysts & get my tips in your inbox weekly 👉 https://www.datacareerjumpstart.com/newsletter 🆘 Feeling stuck in your data journey? Come to my next free "How to Land Your First Data Job" training 👉 https://www.datacareerjumpstart.com/training 👩💻 Want to land a data job in less than 90 days? 👉 https://www.datacareerjumpstart.com/daa 👔 Ace The Interview with Confidence 👉 https://www.datacareerjumpstart.com/interviewsimulator ⌚ TIMESTAMPS 00:00 Introduction to Healthcare Analytics 00:27 Jason's Career Journey 02:23 What Is Healthcare Analytics? 06:37 Parable Associates 10:56 Understanding Revenue Cycle and Accounts Receivable 15:00 Complexities in Healthcare Data Management 19:47 The Importance of Domain Knowledge 27:12 The Importance of Building a Strong Portfolio 31:43 Recommended Data Tools and Platforms 34:10 Advice To Become A Healthcare Analyst 🔗 CONNECT WITH JASON BRYLL 🎥 YouTube Channel: https://www.youtube.com/@UCGh1LOrX0mWuoWZk5J10zkw 🤝 LinkedIn: https://www.linkedin.com/in/jason-bryll/ 📸 Instagram: https://www.instagram.com/parable_associates/ 💻 Website: https://parableassociates.com/ Check out Jason's Healthcare Analyst courses here: 👉 https://www.parableacademy.com/link/d7GlNy?url=https%3A%2F%2Fwww.parableacademy.com%2Fcourse%3Fcourseid%3Drcm-analyst 🔗 CONNECT WITH AVERY 🎥 YouTube Channel: https://www.youtube.com/@averysmith 🤝 LinkedIn: https://www.linkedin.com/in/averyjsmith/ 📸 Instagram: https://instagram.com/datacareerjumpstart 🎵 TikTok: https://www.tiktok.com/@verydata 💻 Website: https://www.datacareerjumpstart.com/ Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!
To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more
If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.
👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa
As Apache Airflow adoption accelerates for data pipeline orchestration, integrating it effectively into your enterprise’s Automation Center of Excellence (CoE) is crucial for maximizing ROI, ensuring governance, and standardizing best practices. This session explores common challenges faced when bringing specialized tools like Airflow into a broader CoE framework. We’ll demonstrate how leveraging enterprise automation platforms like Automic Automation can simplify this integration by providing centralized orchestration, standardized lifecycle management, and unified auditing for Airflow DAGs alongside other enterprise workloads. Furthermore, discover how Automation Analytics & Intelligence (AAI) can offer the CoE a single pane of glass for monitoring performance, tracking SLAs, and proving the business value of Airflow initiatives within the complete automation landscape. Learn practical strategies to ensure Airflow becomes a well-governed, high-performing component of your overall automation strategy.
As Apache Airflow adoption accelerates for data pipeline orchestration, integrating it effectively into your enterprise’s Automation Center of Excellence (CoE) is crucial for maximizing ROI, ensuring governance, and standardizing best practices. This session explores common challenges faced when bringing specialized tools like Airflow into a broader CoE framework. We’ll demonstrate how leveraging enterprise automation platforms like Automic Automation can simplify this integration by providing centralized orchestration, standardized lifecycle management, and unified auditing for Airflow DAGs alongside other enterprise workloads. Furthermore, discover how Automation Analytics & Intelligence (AAI) can offer the CoE a single pane of glass for monitoring performance, tracking SLAs, and proving the business value of Airflow initiatives within the complete automation landscape. Learn practical strategies to ensure Airflow becomes a well-governed, high-performing component of your overall automation strategy.
Red Hat’s unified data and AI platform relies on Apache Airflow for orchestration, alongside Snowflake, Fivetran, and Atlan. The platform prioritizes building a dependable data foundation, recognizing that effective AI depends on quality data. Airflow was selected for its predictability, extensive connectivity, reliability, and scalability. The platform now supports business analytics, transitioning from ETL to ELT processes. This has resulted in a remarkable improvement in how we make data available for business decisions. The platform’s capabilities are being extended to power Digital Workers (AI agents) using large language models, encompassing model training, fine-tuning, and inference. Two Digital Workers are currently deployed, with more in development. This presentation will detail the rationale and background of this evolution, followed by an explanation of the architectural decisions made and the challenges encountered and resolved throughout the process of transforming into an AI-enabled data platform to power Red Hat’s business.
In this talk, I’ll walk through how we built an end-to-end analytics pipeline using open-source tools ( Airbyte, dbt, Airflow, and Metabase). At WirePick, we extract data from multiple sources using Airbyte OSS into PostgreSQL, transform it into business-specific data marts with dbt, and automate the entire workflow using Airflow. Our Metabase dashboards provide real-time insights, and we integrate Slack notifications to alert stakeholders when key business metrics change. This session will cover: Data extraction: Using Airbyte OSS to pull data from multiple sources Transformation & Modeling: How dbt helps create reusable data marts Automation & Orchestration: Managing the workflow with Airflow Data-driven decision-making: Delivering insights through Metabase & Slack alerts
At TrueCar, migrating hundreds of legacy workflows from in-house orchestration tools to Apache Airflow required key technical decisions that transformed our data platform architecture and organizational capabilities. We consolidated individual chained tasks into optimized DAGs leveraging native Airflow functionality to trigger compute across cloud environments. A crucial breakthrough was developing DAG generators to scale migration—essential for efficiently migrating hundreds of workflows while maintaining consistency. By decoupling orchestration from compute, we gained flexibility to select optimal tools for specific outcomes—programmatic processing, analytics, batch jobs, or AI/ML pipelines. This resulted in cost reductions, performance improvements, and team agility. We also gained unprecedented visibility into DAG performance and dependency patterns previously invisible across fragmented systems. Attendees will learn how we redesigned complex workflows into efficient DAGs using dynamic task generation, architectural decisions that enabled platform innovation and the decision framework that made our migration transformational.
Operating within the stringent regulatory landscape of Corporate Banking, Deutsche Bank relies heavily on robust data orchestration. This session explores how Deutsche Bank’s Corporate Bank leverages Apache Airflow across diverse environments, including both on-premises infrastructure and cloud platforms. Discover their approach to managing critical data & analytics workflows, encompassing areas like regulatory reporting, data integration and complex data processing pipelines. Gain insights into the architectural patterns and operational best practices employed to ensure compliance, security, and scalability when running Airflow at scale in a highly regulated, hybrid setting.
Before Airflow, our BigQuery pipelines at Create Music Group operated like musicians without a conductor—each playing on its own schedule, regardless of whether upstream data was ready. As our data platform grew, this chaos led to spiralling costs, performance bottlenecks, and became utterly unsustainable. This talk tells the story of how Create Music Group brought harmony to its data workflows by adopting Apache Airflow and the Medallion architecture, ultimately slashing our data processing costs by 50%. We’ll show how moving to event-driven scheduling with datasets helped eliminate stale data issues, dramatically improved performance, and unlocked faster iteration across teams. Discover how we replaced repetitive SQL with standardized dimension/fact tables, empowering analysts in a safer sandbox.
As data workloads grow in complexity, teams need seamless orchestration to manage pipelines across batch, streaming, and AI/ML workflows. Apache Airflow provides a flexible and open-source way to orchestrate Databricks’ entire platform, from SQL analytics with Materialized Views (MVs) and Streaming Tables (STs) to AI/ML model training and deployment. In this session, we’ll showcase how Airflow can automate and optimize Databricks workflows, reducing costs and improving performance for large-scale data processing. We’ll highlight how MVs and STs eliminate manual incremental logic, enable real-time ingestion, and enhance query performance—all while maintaining governance and flexibility. Additionally, we’ll demonstrate how Airflow simplifies ML model lifecycle management by integrating Databricks’ AI/ML capabilities into end-to-end data pipelines. Whether you’re a dbt user seeking better performance, a data engineer managing streaming pipelines, or an ML practitioner scaling AI workloads, this session will provide actionable insights on using Airflow and Databricks together to build efficient, cost-effective, and future-proof data platforms.