talk-data.com talk-data.com

Topic

Data Lakehouse

data_architecture data_warehouse data_lake

489

tagged

Activity Trend

118 peak/qtr
2020-Q1 2026-Q1

Activities

489 activities · Newest first

Future-proof your data architecture: Learn how DoorDash built a data lakehouse powered by Starburst to achieve a 20-30% faster time to insights. Akshat Nair shares lessons learned about what drove DoorDash to move beyond Snowflake to embrace the lakehouse. He will share his rationale for selecting Trino as their lakehouse query engine and why his team chose Starburst over open source. Discover how DoorDash seamlessly queries diverse sources, including Snowflake, Postgres, and data lake table formats, achieving faster data-driven decision-making at scale with cost benefits.

Summary In this episode of the Data Engineering Podcast Sida Shen, product manager at CelerData, talks about StarRocks, a high-performance analytical database. Sida discusses the inception of StarRocks, which was forked from Apache Doris in 2020 and evolved into a high-performance Lakehouse query engine. He explains the architectural design of StarRocks, highlighting its capabilities in handling high concurrency and low latency queries, and its integration with open table formats like Apache Iceberg, Delta Lake, and Apache Hudi. Sida also discusses how StarRocks differentiates itself from other query engines by supporting on-the-fly joins and eliminating the need for denormalization pipelines, and shares insights into its use cases, such as customer-facing analytics and real-time data processing, as well as future directions for the platform.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Your host is Tobias Macey and today I'm interviewing Sida Shen about StarRocks, a high performance analytical database supporting shared nothing and shared data patternsInterview IntroductionHow did you get involved in the area of data management?Can you describe what StarRocks is and the story behind it?There are numerous analytical databases on the market. What are the attributes of StarRocks that differentiate it from other options?Can you describe the architecture of StarRocks?What are the "-ilities" that are foundational to the design of the system?How have the design and focus of the project evolved since it was first created?What are the tradeoffs involved in separating the communication layer from the data layers?The tiered architecture enables the shared nothing and shared data behaviors, which allows for the implementation of lakehouse patterns. What are some of the patterns that are possible due to the single interface/dual pattern nature of StarRocks?The shared data implementation has cacheing built in to accelerate interaction with datasets. What are some of the limitations/edge cases that operators and consumers should be aware of?StarRocks supports management of lakehouse tables (Iceberg, Delta, Hudi, etc.), which overlaps with use cases for Trino/Presto/Dremio/etc. What are the cases where StarRocks acts as a replacement for those systems vs. a supplement to them?The other major category of engines that StarRocks overlaps with is OLAP databases (e.g. Clickhouse, Firebolt, etc.). Why might someone use StarRocks in addition to or in place of those techologies?We would be remiss if we ignored the dominating trend of AI and the systems that support it. What is the role of StarRocks in the context of an AI application?What are the most interesting, innovative, or unexpected ways that you have seen StarRocks used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on StarRocks?When is StarRocks the wrong choice?What do you have planned for the future of StarRocks?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links StarRocksCelerDataApache DorisSIMD == Single Instruction Multiple DataApache IcebergClickHousePodcast EpisodeDruidFireboltPodcast EpisodeSnowflakeBigQueryTrinoDatabricksDremioData LakehouseDelta LakeApache HiveC++Cost-Based OptimizerIceberg Summit Tencent Games PresentationApache PaimonLancePodcast EpisodeDelta UniformApache ArrowStarRocks Python UDFDebeziumPodcast EpisodeThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Summary In this episode of the Data Engineering Podcast Viktor Kessler, co-founder of Vakmo, talks about the architectural patterns in the lake house enabled by a fast and feature-rich Iceberg catalog. Viktor shares his journey from data warehouses to developing the open-source project, Lakekeeper, an Apache Iceberg REST catalog written in Rust that facilitates building lake houses with essential components like storage, compute, and catalog management. He discusses the importance of metadata in making data actionable, the evolution of data catalogs, and the challenges and innovations in the space, including integration with OpenFGA for fine-grained access control and managing data across formats and compute engines.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Your host is Tobias Macey and today I'm interviewing Viktor Kessler about architectural patterns in the lakehouse that are unlocked by a fast and feature-rich Iceberg catalogInterview IntroductionHow did you get involved in the area of data management?Can you describe what LakeKeeper is and the story behind it? What is the core of the problem that you are addressing?There has been a lot of activity in the catalog space recently. What are the driving forces that have highlighted the need for a better metadata catalog in the data lake/distributed data ecosystem?How would you characterize the feature sets/problem spaces that different entrants are focused on addressing?Iceberg as a table format has gained a lot of attention and adoption across the data ecosystem. The REST catalog format has opened the door for numerous implementations. What are the opportunities for innovation and improving user experience in that space?What is the role of the catalog in managing security and governance? (AuthZ, auditing, etc.)What are the channels for propagating identity and permissions to compute engines? (how do you avoid head-scratching about permission denied situations)Can you describe how LakeKeeper is implemented?How have the design and goals of the project changed since you first started working on it?For someone who has an existing set of Iceberg tables and catalog, what does the migration process look like?What new workflows or capabilities does LakeKeeper enable for data teams using Iceberg tables across one or more compute frameworks?What are the most interesting, innovative, or unexpected ways that you have seen LakeKeeper used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on LakeKeeper?When is LakeKeeper the wrong choice?What do you have planned for the future of LakeKeeper?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links LakeKeeperSAPMicrosoft AccessMicrosoft ExcelApache IcebergPodcast EpisodeIceberg REST CatalogPyIcebergSparkTrinoDremioHive MetastoreHadoopNATSPolarsDuckDBPodcast EpisodeDataFusionAtlanPodcast EpisodeOpen MetadataPodcast EpisodeApache AtlasOpenFGAHudiPodcast EpisodeDelta LakePodcast EpisodeLance Table FormatPodcast EpisodeUnity CatalogPolaris CatalogApache GravitinoPodcast Episode KeycloakOpen Policy Agent (OPA)Apache RangerApache NiFiThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

There are a lot of amazing AI features being announced at Google Cloud Next. In order to take full advantage of these, you need to make sure your data is being managed in a secure, centralized way. In this talk, you’ll learn how to set up your lakehouse to get your data ready for downstream workloads. You’ll view a demo involving an architecture of Google Cloud products that includes managing permissions on your data, configuring metadata management, and performing transformations using open source frameworks.

This session provides a comprehensive guide to building a secure and unified AI lakehouse on BigQuery with the power of open source software (OSS). We’ll explore essential components, including data ingestion, storage, and management; AI and machine learning workflows; pipeline orchestration; data governance; and operational efficiency. Learn about the newest features that support both Apache Spark and Apache Iceberg.

Join Google Cloud's Yasmeen Ahmad, Deutsche Telekom's VP of Data & Architecture Ashutosh Mishra and Snap's Senior Engineering Leader Bo Chen for a fireside chat exploring the future of the data lakehouse. They'll discuss how evolving architectures can empower organizations to handle explosive data growth and leverage the full potential of AI. Gain valuable insights into building a future-proof data foundation that fuels innovation in 2025 and beyond.

Unlock the potential of AI with high-performance, scalable lakehouses using BigQuery and Apache Iceberg. This session details how BigQuery leverages Google's infrastructure to supercharge Iceberg, delivering peak performance and resilience. Discover BigQuery's unified read/write path for rapid queries, superior storage management beyond simple compaction, and robust, high-throughput streaming pipelines. Learn how Spotify utilizes BigQuery's lakehouse architecture for a unified data source, driving analytics and AI innovation.

Redpanda, a leading Kafka API-compatible streaming platform, now supports storing topics in Apache Iceberg, seamlessly fusing low-latency streaming with data lakehouses using BigQuery and BigLake in GCP. Iceberg Topics eliminate complex & inefficient ETL between streams and tables, making real-time data instantly accessible for analysis in BigQuery This push-button integration eliminates the need for costly connectors or custom pipelines, enabling both simple and sophisticated SQL queries across streams and other datasets. By combining Redpanda and Iceberg, GCP customers gain a secure, scalable, and cost-effective solution that transforms their agility while reducing infrastructure and human capital costs.

This Session is hosted by a Google Cloud Next Sponsor.
Visit your registration profile at g.co/cloudnext to opt out of sharing your contact information with the sponsor hosting this session.

Databricks Certified Data Engineer Associate Study Guide

Data engineers proficient in Databricks are currently in high demand. As organizations gather more data than ever before, skilled data engineers on platforms like Databricks become critical to business success. The Databricks Data Engineer Associate certification is proof that you have a complete understanding of the Databricks platform and its capabilities, as well as the essential skills to effectively execute various data engineering tasks on the platform. In this comprehensive study guide, you will build a strong foundation in all topics covered on the certification exam, including the Databricks Lakehouse and its tools and benefits. You'll also learn to develop ETL pipelines in both batch and streaming modes. Moreover, you'll discover how to orchestrate data workflows and design dashboards while maintaining data governance. Finally, you'll dive into the finer points of exactly what's on the exam and learn to prepare for it with mock tests. Author Derar Alhussein teaches you not only the fundamental concepts but also provides hands-on exercises to reinforce your understanding. From setting up your Databricks workspace to deploying production pipelines, each chapter is carefully crafted to equip you with the skills needed to master the Databricks Platform. By the end of this book, you'll know everything you need to ace the Databricks Data Engineer Associate certification exam with flying colors, and start your career as a certified data engineer from Databricks! You'll learn how to: Use the Databricks Platform and Delta Lake effectively Perform advanced ETL tasks using Apache Spark SQL Design multi-hop architecture to process data incrementally Build production pipelines using Delta Live Tables and Databricks Jobs Implement data governance using Databricks SQL and Unity Catalog Derar Alhussein is a senior data engineer with a master's degree in data mining. He has over a decade of hands-on experience in software and data projects, including large-scale projects on Databricks. He currently holds eight certifications from Databricks, showcasing his proficiency in the field. Derar is also an experienced instructor, with a proven track record of success in training thousands of data engineers, helping them to develop their skills and obtain professional certifications.

This session explores the rise of Lakehouse architecture and its industry-wide adoption, highlighting its ability to simplify Data Management. We’ll also examine how Large Language Models (LLMs) are transforming Data Engineering, enabling analysts to solve complex problems that once required advanced technical skills.

AWS re:Invent 2024 - Deep dive into Amazon DynamoDB zero-ETL integrations (DAT348)

Amazon DynamoDB is a serverless, NoSQL, fully managed database with single-digit millisecond performance at any scale. DynamoDB lends itself to easy integration with several other AWS services. In this session, dive deep into zero-ETL integrations between Amazon DynamoDB and Amazon SageMaker Lakehouse, Amazon OpenSearch Service, and Amazon Redshift. Learn from AWS experts about how these integrations can reduce operational burden and cost, allowing you to focus on creating value from data instead of preparing data for analysis.

Learn more: AWS re:Invent: https://go.aws/reinvent. More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

About AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2024

AWS re:Invent 2024-Zero-ETL replication to Amazon SageMaker Lakehouse & Amazon Redshift (ANT353-NEW)

In today’s data-driven landscape, organizations rely on enterprise applications to manage critical business processes. However, extracting and integrating this data into data warehouses and data lakes can be complex. This session explores a new zero-ETL capability that simplifies ingesting data to Amazon SageMaker Lakehouse and Amazon Redshift via AWS Glue from enterprise applications such as Salesforce, ServiceNow, and Zendesk. See how zero-ETL automates the extract and load process, expanding your analytics and machine solutions with valuable SaaS data.

Learn more: AWS re:Invent: https://go.aws/reinvent. More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

About AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2024

Alexey Novakov: Streamhouse Architecture with Flink and Paimon

🌟 Session Overview 🌟

Session Name: Speaker: Alexey Novakov Session Description: Today, many data teams choose lakehouse architecture for their data platforms. But what if they process all data in streaming mode? Then they end up building a streaming lakehouse, or 'streamhouse' for short! This means they use stream processing engines to ingest, transform, and analyze business data in near real-time. However, they still want to use inexpensive storage infrastructure. How can they achieve that?

This talk introduces data teams to tools like Apache Paimon in combination with Flink. Paimon has been built with a strong focus on streaming workflows, serving as a table format in a lakehouse. It takes the stream processing approach in lakehouse architecture to the next level compared to other table formats that are more oriented towards batch data. After this talk, data teams will know how to use Paimon and Flink to build a cost-efficient and fast data layer for different data processing scenarios.

🚀 About Big Data and RPA 2024 🚀

Unlock the future of innovation and automation at Big Data & RPA Conference Europe 2024! 🌟 This unique event brings together the brightest minds in big data, machine learning, AI, and robotic process automation to explore cutting-edge solutions and trends shaping the tech landscape. Perfect for data engineers, analysts, RPA developers, and business leaders, the conference offers dual insights into the power of data-driven strategies and intelligent automation. 🚀 Gain practical knowledge on topics like hyperautomation, AI integration, advanced analytics, and workflow optimization while networking with global experts. Don’t miss this exclusive opportunity to expand your expertise and revolutionize your processes—all from the comfort of your home! 📊🤖✨

📅 Yearly Conferences: Curious about the evolution of QA? Check out our archive of past Big Data & RPA sessions. Watch the strategies and technologies evolve in our videos! 🚀 🔗 Find Other Years' Videos: 2023 Big Data Conference Europe https://www.youtube.com/playlist?list=PLqYhGsQ9iSEpb_oyAsg67PhpbrkCC59_g 2022 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEryAOjmvdiaXTfjCg5j3HhT 2021 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEqHwbQoWEXEJALFLKVDRXiP

💡 Stay Connected & Updated 💡

Don’t miss out on any updates or upcoming event information from Big Data & RPA Conference Europe. Follow us on our social media channels and visit our website to stay in the loop!

🌐 Website: https://bigdataconference.eu/, https://rpaconference.eu/ 👤 Facebook: https://www.facebook.com/bigdataconf, https://www.facebook.com/rpaeurope/ 🐦 Twitter: @BigDataConfEU, @europe_rpa 🔗 LinkedIn: https://www.linkedin.com/company/73234449/admin/dashboard/, https://www.linkedin.com/company/75464753/admin/dashboard/ 🎥 YouTube: http://www.youtube.com/@DATAMINERLT

Martin Zuern, Markus Zachai: How Decentralized Data Products are Changing GEMA from Inside Out

🌟 Session Overview 🌟

Session Name: From Zero to Hero: How Decentralized Data Products are Changing GEMA from Inside Out Speaker: Martin Zuern, Markus Zachai Session Description: In a data-driven era, organizations are challenged to efficiently refine this valuable resource. At GEMA, using self-service data platforms, lakehouse architecture, and data mesh principles, we embarked on a transformational journey. Our approach has transformed the organization from the inside out, rooted in lean governance and decentralized ownership. In the first ten months, more than 100 data products have been created, with over 40% of the workforce actively using the platform.

Join us as we explore the challenges and solutions encountered while implementing data mesh and governance. We'll delve into the intricacies of our data journey, from technical hurdles to organizational mindset shifts. We'll also look at growth hacking strategies and the critical role of the data governance manager, a position that is often misunderstood in the governance setup.

Discover how GEMA's data journey has led to exciting use cases and valuable insights, and what you can take away for your own organization. 🚀 About Big Data and RPA 2024 🚀

Unlock the future of innovation and automation at Big Data & RPA Conference Europe 2024! 🌟 This unique event brings together the brightest minds in big data, machine learning, AI, and robotic process automation to explore cutting-edge solutions and trends shaping the tech landscape. Perfect for data engineers, analysts, RPA developers, and business leaders, the conference offers dual insights into the power of data-driven strategies and intelligent automation. 🚀 Gain practical knowledge on topics like hyperautomation, AI integration, advanced analytics, and workflow optimization while networking with global experts. Don’t miss this exclusive opportunity to expand your expertise and revolutionize your processes—all from the comfort of your home! 📊🤖✨

📅 Yearly Conferences: Curious about the evolution of QA? Check out our archive of past Big Data & RPA sessions. Watch the strategies and technologies evolve in our videos! 🚀 🔗 Find Other Years' Videos: 2023 Big Data Conference Europe https://www.youtube.com/playlist?list=PLqYhGsQ9iSEpb_oyAsg67PhpbrkCC59_g 2022 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEryAOjmvdiaXTfjCg5j3HhT 2021 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEqHwbQoWEXEJALFLKVDRXiP

💡 Stay Connected & Updated 💡

Don’t miss out on any updates or upcoming event information from Big Data & RPA Conference Europe. Follow us on our social media channels and visit our website to stay in the loop!

🌐 Website: https://bigdataconference.eu/, https://rpaconference.eu/ 👤 Facebook: https://www.facebook.com/bigdataconf, https://www.facebook.com/rpaeurope/ 🐦 Twitter: @BigDataConfEU, @europe_rpa 🔗 LinkedIn: https://www.linkedin.com/company/73234449/admin/dashboard/, https://www.linkedin.com/company/75464753/admin/dashboard/ 🎥 YouTube: http://www.youtube.com/@DATAMINERLT

Send us a text Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. Datatopics Unplugged is your go-to spot for relaxed discussions around tech, news, data, and society. This week, Yannick joins the conversation for a lively year-end retrospective on the state of AI, data, and technology in 2024. Whether you're knee-deep in neural networks or just data-curious, this episode offers plenty to ponder. Grab your coffee, sit back, and explore: AI’s meteoric rise in 2024: How GenAI went from hype to tangible business tools and what’s ahead for 2025.Strategic AI adoption: Challenges and best practices for embedding AI into workflows and decision-making processes.Real-time data: From dynamic pricing to e-commerce triggers, we explore gaps and future trends in event-driven infrastructure.The ethics and compliance puzzle: A dive into the EU AI Act, data privacy, and the evolving landscape of ethical AI usage.Developer tools and trends: Productivity boosters like Copilot and the rise of tools like PDM and Ubi in the Python ecosystem.With reflections on everything from Lakehouse data platforms to open-source debates, this episode is the perfect blend of geeky insights and forward-looking predictions. Pull up a chair, relax, and let’s dive into the world of data, unplugged style!

AWS re:Invent 2024 - [NEW LAUNCH] Amazon SageMaker Lakehouse: Accelerate analytics & AI (ANT354-NEW)

Data warehouses, data lakes, or both? Explore how Amazon SageMaker Lakehouse, a unified, open, and secure data lake house simplifies analytics and AI. This session unveils how SageMaker Lakehouse provides unified access to data across Amazon S3 data lakes, Amazon Redshift data warehouses, and third-party sources without altering your existing architecture. Learn how it breaks down data silos and opens your data estate with Apache Iceberg compatibility, offering flexibility to use preferred query engines and tools that accelerate your time to insights. Discover robust security features, including consistent fine-grained access controls, that help democratize data without compromises.

Learn more: AWS re:Invent: https://go.aws/reinvent. More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

About AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2024

Supercharge your lakehouse with Azure Databricks and Microsoft Fabric | BRK203

Azure Databricks enhances the lakehouse experience in Azure by seamlessly integrating data and AI solutions for faster value. Catalog data, schema, and tables in Unity Catalog are readily available, supporting data engineering, data science, real-time intelligence, and optimized performance, delivering blazing fast insights with Power BI.

𝗦𝗽𝗲𝗮𝗸𝗲𝗿𝘀: * Lindsey Allen * Robert Saxby

𝗦𝗲𝘀𝘀𝗶𝗼𝗻 𝗜𝗻𝗳𝗼𝗿𝗺𝗮𝘁𝗶𝗼𝗻: This is one of many sessions from the Microsoft Ignite 2024 event. View even more sessions on-demand and learn about Microsoft Ignite at https://ignite.microsoft.com

BRK203 | English (US) | Data

MSIgnite

In this episode, I had the pleasure of speaking with Ken Pickering, VP of Engineering at Going, about the intricacies of streaming data into a Trino and Iceberg lakehouse. Ken shared his journey from product engineering to becoming deeply involved in data-centric roles, highlighting his experiences in ecommerce and InsurTech. At Going, Ken leads the data platform team, focusing on finding travel deals for consumers, a task that involves handling massive volumes of flight data and event stream information.

Ken explained the dual approach of passive and active search strategies used by Going to manage the vast data landscape. Passive search involves aggregating data from global distribution systems, while active search is more transactional, querying specific flight prices. This approach helps Going sift through approximately 50 petabytes of data annually to identify the best travel deals.

We delved into the technical architecture supporting these operations, including the use of Confluent for data streaming, Starburst Galaxy for transformation, and Databricks for modeling. Ken emphasized the importance of an open lakehouse architecture, which allows for flexibility and scalability as the business grows.

Ken also discussed the composition of Going's engineering and data teams, highlighting the collaborative nature of their work and the reliance on vendor tooling to streamline operations. He shared insights into the challenges and strategies of managing data life cycles, ensuring data quality, and maintaining uptime for consumer-facing applications.

Throughout our conversation, Ken provided a glimpse into the future of Going's data architecture, including potential expansions into other travel modes and the integration of large language models for enhanced customer interaction. This episode offers a comprehensive look at the complexities and innovations in building a data-driven travel advisory service.