It's Friday! Matt Housley and I catch up to discuss the aftermath of AWS re:Invent and why the industry’s obsession with AI Agents might be premature. We also dive deep into the hardware wars between Google and NVIDIA , the "brain-damaged" nature of current LLMs , and the growing "enshittification" of the internet and platforms like LinkedIn. Plus, I reveals some details about my upcoming "Mixed Model Arts" project.
talk-data.com
Topic
AWS
Amazon Web Services (AWS)
117
tagged
Activity Trend
Top Events
Você já pensou em como a Inteligência Artificial generativa está transformando o jeito que grandes empresas criam produtos digitais? Neste episódio, conversamos com o time do Grupo Boticário para entender como a companhia está unindo tecnologia e inovação para transformar o futuro da beleza. Exploramos como a GenAI vem impulsionando o desenvolvimento de produtos digitais e potencializando o trabalho de analistas, times de produto e engenharia com ferramentas. Falamos sobre os bastidores da Semana de IA GB, os aprendizados que ela trouxe para o negócio e como a GenAI está ajudando os times a ganharem eficiência e profundidade nas análises. Se você quer entender como uma das maiores empresas de beleza do país está moldando sua cultura de produto e engenharia para o futuro, esse episódio é para você! Lembrando que você pode encontrar todos os podcasts da comunidade Data Hackers no Spotify, iTunes, Google Podcast, Castbox e muitas outras plataformas. Convidados: Bruno Fuzetti Penso - Gerente Sênior de Plataforma Thayana Borba - Gerente Sênior de Produtos Digitais João Alves De Oliveira Neto - Gerente Sênior Produtos de Dados Nossa Bancada Data Hackers: Paulo Vasconcellos — Co-founder da Data Hackers e Principal Data Scientist na Hotmart. Monique Femme — Head of Community Management na Data Hackers Canais do Grupo Boticário: LinkedIn do GB Página de vagas do GB Instagram do GB Referências: Plataforma de Desenvolvimento (Alquimia) https://github.com/customer-stories/grupoboticario https://medium.com/gbtech/plataforma-do-desenvolvimento-grupo-botic%C3%A1rio-61b1aaddbc9b https://medium.com/gbtech/opentelemetry-na-nova-plataforma-de-integra%C3%A7%C3%A3o-350e744b6a5f https://aws.amazon.com/pt/solutions/case-studies/grupo-boticario-summit/
The relationship between AI assistants and data professionals is evolving rapidly, creating both opportunities and challenges. These tools can supercharge workflows by generating SQL, assisting with exploratory analysis, and connecting directly to databases—but they're far from perfect. How do you maintain the right balance between leveraging AI capabilities and preserving your fundamental skills? As data teams face mounting pressure to deliver AI-ready data and demonstrate business value, what strategies can ensure your work remains trustworthy? With issues ranging from biased algorithms to poor data quality potentially leading to serious risks, how can organizations implement responsible AI practices while still capitalizing on the positive applications of this technology? Christina Stathopoulos is an international data specialist who regularly serves as an executive advisor, consultant, educator, and public speaker. With expertise in analytics, data strategy, and data visualization, she has built a distinguished career in technology, including roles at Fortune 500 companies. Most recently, she spent over five years at Google and Waze, leading data strategy and driving cross-team projects. Her professional journey has spanned both the United States and Spain, where she has combined her passion for data, technology, and education to make data more accessible and impactful for all. Christina also plays a unique role as a “data translator,” helping to bridge the gap between business and technical teams to unlock the full value of data assets. She is the founder of Dare to Data, a consultancy created to formalize and structure her work with some of the world’s leading companies, supporting and empowering them in their data and AI journeys. Current and past clients include IBM, PepsiCo, PUMA, Shell, Whirlpool, Nitto, and Amazon Web Services.
In the episode, Richie and Christina explore the role of AI agents in data analysis, the evolving workflow with AI assistance, the importance of maintaining foundational skills, the integration of AI in data strategy, the significance of trustworthy AI, and much more.
Links Mentioned in the Show: Dare to DataJulius AIConnect with ChristinaCourse - Introduction to SQL with AIRelated Episode: The Data to AI Journey with Gerrit Kazmaier, VP & GM of Data Analytics at Google CloudRewatch RADAR AI
New to DataCamp? Learn on the go using the DataCamp mobile app Empower your business with world-class data and AI skills with DataCamp for business
Summary In this episode of the AI Engineering Podcast Mark Brooker, VP and Distinguished Engineer at AWS, talks about how agentic workflows are transforming database usage and infrastructure design. He discusses the evolving role of data in AI systems, from traditional models to more modern approaches like vectors, RAG, and relational databases. Mark explains why agents require serverless, elastic, and operationally simple databases, and how AWS solutions like Aurora and DSQL address these needs with features such as rapid provisioning, automated patching, geodistribution, and spiky usage. The conversation covers topics including tool calling, improved model capabilities, state in agents versus stateless LLM calls, and the role of Lambda and AgentCore for long-running, session-isolated agents. Mark also touches on the shift from local MCP tools to secure, remote endpoints, the rise of object storage as a durable backplane, and the need for better identity and authorization models. The episode highlights real-world patterns like agent-driven SQL fuzzing and plan analysis, while identifying gaps in simplifying data access, hardening ops for autonomous systems, and evolving serverless database ergonomics to keep pace with agentic development.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Your host is Tobias Macey and today I'm interviewing Marc Brooker about the impact of agentic workflows on database usage patterns and how they change the architectural requirements for databasesInterview IntroductionHow did you get involved in the area of data management?Can you describe what the role of the database is in agentic workflows?There are numerous types of databases, with relational being the most prevalent. How does the type and purpose of an agent inform the type of database that should be used?Anecdotally I have heard about how agentic workloads have become the predominant "customers" of services like Neon and Fly.io. How would you characterize the different patterns of scale for agentic AI applications? (e.g. proliferation of agents, monolithic agents, multi-agent, etc.)What are some of the most significant impacts on workload and access patterns for data storage and retrieval that agents introduce?What are the categorical differences in that behavior as compared to programmatic/automated systems?You have spent a substantial amount of time on Lambda at AWS. Given that LLMs are effectively stateless, how does the added ephemerality of serverless functions impact design and performance considerations around having to "re-hydrate" context when interacting with agents?What are the most interesting, innovative, or unexpected ways that you have seen serverless and database systems used for agentic workloads?What are the most interesting, unexpected, or challenging lessons that you have learned while working on technologies that are supporting agentic applications?Contact Info BlogLinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links AWS Aurora DSQLAWS LambdaThree Tier ArchitectureVector DatabaseGraph DatabaseRelational DatabaseVector EmbeddingRAG == Retrieval Augmented GenerationAI Engineering Podcast EpisodeGraphRAGAI Engineering Podcast EpisodeLLM Tool CallingMCP == Model Context ProtocolA2A == Agent 2 Agent ProtocolAWS Bedrock AgentCoreStrandsLangChainKiroThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Elliot Foreman and Andrew DeLave from ProsperOps joined Yuliia and Dumky to discuss automated cloud cost optimization through commitment management. As Google go-to-market director and senior FinOps specialist, they explain how their platform manages over $4 billion in cloud spend by automating reserved instances, committed use discounts, and savings plans across AWS, Azure, and Google Cloud. The conversation covers the psychology behind commitment hesitation, break-even point mathematics for cloud discounts, workload volatility optimization, and why they avoid AI in favor of deterministic algorithms for financial decisions. They share insights on managing complex multi-cloud environments, the human vs automation debate in FinOps, and practical strategies for reducing cloud costs while mitigating commitment risks.
Está no ar, o Data Hackers News !! Os assuntos mais quentes da semana, com as principais notícias da área de Dados, IA e Tecnologia, que você também encontra na nossa Newsletter semanal, agora no Podcast do Data Hackers !! Aperte o play e ouça agora, o Data Hackers News dessa semana ! Para saber tudo sobre o que está acontecendo na área de dados, se inscreva na Newsletter semanal: https://www.datahackers.news/ Conheça nossos comentaristas do Data Hackers News: Monique Femme Matérias/assuntos comentados: Evento Mettup Itaú Matéria Netflix Matéria CEO da AWS Demais canais do Data Hackers: Site Linkedin Instagram Tik Tok You Tube
Combining LLMs with enterprise knowledge bases is creating powerful new agents that can transform business operations. These systems are dramatically improving on traditional chatbots by understanding context, following conversations naturally, and accessing up-to-date information. But how do you effectively manage the knowledge that powers these agents? What governance structures need to be in place before deployment? And as we look toward a future with physical AI and robotics, what fundamental computing challenges must we solve to ensure these technologies enhance rather than complicate our lives? Jun Qian is an accomplished technology leader with extensive experience in artificial intelligence and machine learning. Currently serving as Vice President of Generative AI Services at Oracle since May 2020, Jun founded and leads the Engineering and Science group, focusing on the creation and enhancement of Generative AI services and AI Agents. Previously held roles include Vice President of AI Science and Development at Oracle, Head of AI and Machine Learning at Sift, and Principal Group Engineering Manager at Microsoft, where Jun co-founded Microsoft Power Virtual Agents. Jun's career also includes significant contributions as the Founding Manager of Amazon Machine Learning at AWS and as a Principal Investigator at Verizon. In the episode, Richie and Jun explore the evolution of AI agents, the unique features of ChatGPT, the challenges and advancements in chatbot technology, the importance of data management and security in AI, and the future of AI in computing and robotics, and much more. Links Mentioned in the Show: OracleConnect with JunCourse: Introduction to AI AgentsJun at DataCamp RADARRelated Episode: A Framework for GenAI App and Agent Development with Jerry Liu, CEO at LlamaIndexRewatch RADAR AI New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business
In this season of the Analytics Engineering podcast, Tristan is deep into the world of developer tools and databases. If you're following us here, you've almost definitely used Amazon S3 it and its Blob Storage siblings. They form the foundation for nearly all data work in the cloud. In many ways, it was the innovations that happened inside of S3 that have unlocked all of the progress in cloud data over the last decade. In this episode, Tristan talks with Andy Warfield, VP and senior principal engineer at AWS, where he focuses primarily on storage. They go deep on S3, how it works, and what it unlocks. They close out italking about Iceberg, S3 table buckets, and what this all suggests about the outlines of the S3 product roadmap moving forward. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.
Summary In this episode of the Data Engineering Podcast Mai-Lan Tomsen Bukovec, Vice President of Technology at AWS, talks about the evolution of Amazon S3 and its profound impact on data architecture. From her work on compute systems to leading the development and operations of S3, Mylan shares insights on how S3 has become a foundational element in modern data systems, enabling scalable and cost-effective data lakes since its launch alongside Hadoop in 2006. She discusses the architectural patterns enabled by S3, the importance of metadata in data management, and how S3's evolution has been driven by customer needs, leading to innovations like strong consistency and S3 tables.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.This is a pharmaceutical Ad for Soda Data Quality. Do you suffer from chronic dashboard distrust? Are broken pipelines and silent schema changes wreaking havoc on your analytics? You may be experiencing symptoms of Undiagnosed Data Quality Syndrome — also known as UDQS. Ask your data team about Soda. With Soda Metrics Observability, you can track the health of your KPIs and metrics across the business — automatically detecting anomalies before your CEO does. It’s 70% more accurate than industry benchmarks, and the fastest in the category, analyzing 1.1 billion rows in just 64 seconds. And with Collaborative Data Contracts, engineers and business can finally agree on what “done” looks like — so you can stop fighting over column names, and start trusting your data again.Whether you’re a data engineer, analytics lead, or just someone who cries when a dashboard flatlines, Soda may be right for you. Side effects of implementing Soda may include: Increased trust in your metrics, reduced late-night Slack emergencies, spontaneous high-fives across departments, fewer meetings and less back-and-forth with business stakeholders, and in rare cases, a newfound love of data. Sign up today to get a chance to win a $1000+ custom mechanical keyboard. Visit dataengineeringpodcast.com/soda to sign up and follow Soda’s launch week. It starts June 9th.Your host is Tobias Macey and today I'm interviewing Mai-Lan Tomsen Bukovec about the evolutions of S3 and how it has transformed data architectureInterview IntroductionHow did you get involved in the area of data management?Most everyone listening knows what S3 is, but can you start by giving a quick summary of what roles it plays in the data ecosystem?What are the major generational epochs in S3, with a particular focus on analytical/ML data systems?The first major driver of analytical usage for S3 was the Hadoop ecosystem. What are the other elements of the data ecosystem that helped shape the product direction of S3?Data storage and retrieval have been core primitives in computing since its inception. What are the characteristics of S3 and all of its copycats that led to such a difference in architectural patterns vs. other shared data technologies? (e.g. NFS, Gluster, Ceph, Samba, etc.)How does the unified pool of storage that is exemplified by S3 help to blur the boundaries between application data, analytical data, and ML/AI data?What are some of the default patterns for storage and retrieval across those three buckets that can lead to anti-patterns which add friction when trying to unify those use cases?The age of AI is leading to a massive potential for unlocking unstructured data, for which S3 has been a massive dumping ground over the years. How is that changing the ways that your customers think about the value of the assets that they have been hoarding for so long?What new architectural patterns is that generating?What are the most interesting, innovative, or unexpected ways that you have seen S3 used for analytical/ML/Ai applications?What are the most interesting, unexpected, or challenging lessons that you have learned while working on S3?When is S3 the wrong choice?What do you have planned for the future of S3?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links AWS S3KinesisKafkaSQSEMRDrupalWordpressNetflix Blog on S3 as a Source of TruthHadoopMapReduceNasa JPLFINRA == Financial Industry Regulatory AuthorityS3 Object VersioningS3 Cross RegionS3 TablesIcebergParquetAWS KMSIceberg RESTDuckDBNFS == Network File SystemSambaGlusterFSCephMinIOS3 MetadataPhotoshop Generative FillAdobe FireflyTurbotax AI AssistantAWS Access AnalyzerData ProductsS3 Access PointAWS Nova ModelsLexisNexis ProtegeS3 Intelligent TieringS3 Principal Engineering TenetsThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Summary In this episode of the Data Engineering Podcast Chakravarthy Kotaru talks about scaling data operations through standardized platform offerings. From his roots as an Oracle developer to leading the data platform at a major online travel company, Chakravarthy shares insights on managing diverse database technologies and providing databases as a service to streamline operations. He explains how his team has transitioned from DevOps to a platform engineering approach, centralizing expertise and automating repetitive tasks with AWS Service Catalog. Join them as they discuss the challenges of migrating legacy systems, integrating AI and ML for automation, and the importance of organizational buy-in in driving data platform success.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.This is a pharmaceutical Ad for Soda Data Quality. Do you suffer from chronic dashboard distrust? Are broken pipelines and silent schema changes wreaking havoc on your analytics? You may be experiencing symptoms of Undiagnosed Data Quality Syndrome — also known as UDQS. Ask your data team about Soda. With Soda Metrics Observability, you can track the health of your KPIs and metrics across the business — automatically detecting anomalies before your CEO does. It’s 70% more accurate than industry benchmarks, and the fastest in the category, analyzing 1.1 billion rows in just 64 seconds. And with Collaborative Data Contracts, engineers and business can finally agree on what “done” looks like — so you can stop fighting over column names, and start trusting your data again.Whether you’re a data engineer, analytics lead, or just someone who cries when a dashboard flatlines, Soda may be right for you. Side effects of implementing Soda may include: Increased trust in your metrics, reduced late-night Slack emergencies, spontaneous high-fives across departments, fewer meetings and less back-and-forth with business stakeholders, and in rare cases, a newfound love of data. Sign up today to get a chance to win a $1000+ custom mechanical keyboard. Visit dataengineeringpodcast.com/soda to sign up and follow Soda’s launch week. It starts June 9th.Your host is Tobias Macey and today I'm interviewing Chakri Kotaru about scaling successful data operations through standardized platform offeringsInterview IntroductionHow did you get involved in the area of data management?Can you start by outlining the different ways that you have seen teams you work with fail due to lack of structure and opinionated design?Why NoSQL?Pairing different styles of NoSQL for different problemsUseful patterns for each NoSQL style (document, column family, graph, etc.)Challenges in platform automation and scaling edge casesWhat challenges do you anticipate as a result of the new pressures as a result of AI applications?What are the most interesting, innovative, or unexpected ways that you have seen platform engineering practices applied to data systems?What are the most interesting, unexpected, or challenging lessons that you have learned while working on data platform engineering?When is NoSQL the wrong choice?What do you have planned for the future of platform principles for enabling data teams/data applications?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links RiakDynamoDBSQL ServerCassandraScyllaDBCAP TheoremTerraformAWS Service CatalogBlog PostThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
In this podcast episode, we talked with Eddy Zulkifly about From Supply Chain Management to Digital Warehousing and FinOps
About the Speaker: Eddy Zulkifly is a Staff Data Engineer at Kinaxis, building robust data platforms across Google Cloud, Azure, and AWS. With a decade of experience in data, he actively shares his expertise as a Mentor on ADPList and Teaching Assistant at Uplimit. Previously, he was a Senior Data Engineer at Home Depot, specializing in e-commerce and supply chain analytics. Currently pursuing a Master’s in Analytics at the Georgia Institute of Technology, Eddy is also passionate about open-source data projects and enjoys watching/exploring the analytics behind the Fantasy Premier League.
In this episode, we dive into the world of data engineering and FinOps with Eddy Zulkifly, Staff Data Engineer at Kinaxis. Eddy shares his unconventional career journey—from optimizing physical warehouses with Excel to building digital data platforms in the cloud.
🕒 TIMECODES 0:00 Eddy’s career journey: From supply chain to data engineering 8:18 Tools & learning: Excel, Docker, and transitioning to data engineering 21:57 Physical vs. digital warehousing: Analogies and key differences 31:40 Introduction to FinOps: Cloud cost optimization and vendor negotiations 40:18 Resources for FinOps: Certifications and the FinOps Foundation 45:12 Standardizing cloud cost reporting across AWS/GCP/Azure 50:04 Eddy’s master’s degree and closing thoughts
🔗 CONNECT WITH EDDY Twitter - https://x.com/eddarief Linkedin - https://www.linkedin.com/in/eddyzulkifly/ Github: https://github.com/eyzyly/eyzyly ADPList: https://adplist.org/mentors/eddy-zulkifly
🔗 CONNECT WITH DataTalksClub Join the community - https://datatalks.club/slack.html Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/r?cid=ZjhxaWRqbnEwamhzY3A4ODA5azFlZ2hzNjBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ
Check other upcoming events - https://lu.ma/dtc-events LinkedIn - https://www.linkedin.com/company/datatalks-club/ Twitter - https://twitter.com/DataTalksClub Website - https://datatalks.club/
Send us a text Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. DataTopics Unpluggedis your go-to spot for relaxed discussions around tech, news, data, and society. Dive into conversations that should flow as smoothly as your morning coffee (but don’t), where industry insights meet laid-back banter. Whether you're a data aficionado or just someone curious about the digital age, pull up a chair, relax, and let's get into the heart of data—unplugged style! In this episode: OpenAI asks White House for AI regulation relief: OpenAI seeks federal-level AI policy exceptions in exchange for transparency. But is this a sign they’re losing momentum?Hot take: GPT-4.5 is a ‘nothing burger’: Is GPT-4.5 actually an upgrade, or just a well-marketed rerun?Claude 3.7 & Blowing €100 in Two Days: One of the hosts tests Claude extensively—and racks up a pricey bill. Was it worth it?OpenAI’s Deep Research: How does OpenAI’s new research tool compare to Perplexity?AI cracks superbug problem in two days: AI speeds up decades of scientific research—should we be impressed or concerned?European tech coalition demands ‘radical action’ on digital sovereignty: Big names like Airbus and Proton push for homegrown European tech.Migrating from AWS to a European cloud: A real-world case study on cutting costs by 62%—is it worth the trade-offs?Docs by the French government: A Notion alternative for open-source government collaboration.Why people hate note-taking apps: A deep dive into the frustrations with Notion, Obsidian, and alternatives.Model Context Protocol (MCP): How MCP is changing AI tool integrations—and why OpenAI isn’t on board (yet).OpenRouter.ai: The one-stop API for switching between AI models. Does it live up to the hype?OTDiamond.ai: A multi-LLM approach that picks the best model for your queries to balance cost and performance.Are you polite to AI?: Study finds most people say "please" to ChatGPT—good manners or fear of the AI uprising?AI refusing to do your work?: A hilarious case of an AI refusing to generate code because it "wants you to learn."And finally, a big announcement—DataTopics Unplugged is evolving! Stay tuned for an updated format and a fresh take on tech discussions.
It’s time for another episode of the Data Engineering Central Podcast. In this episode, we cover … * AWS Lambda + DuckDB and Delta Lake (Polars, Daft, etc). * IAC - Long Live Terraform. * Databricks Data Quality with DQX. * Unity Catalog releases for DuckDB and Polars * Bespoke vs Managed Data Platforms * Delta Lake vs. Iceberg and UinFORM for a single table. Thanks for b…
This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit dataengineeringcentral.substack.com/subscribe
In todays episode of Data Engineering Central Podcast we talk about a few hot topics, AWS S3 Tables, Databricks raising money, are Data Contracts Dead, and the Lake House Storage Format battle! It's a good one, buckle up!
This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit dataengineeringcentral.substack.com/subscribe
Generative AI and data are more interconnected than ever. If you want quality in your AI product, you need to be connected to a database with high quality data. But with so many database options and new AI tools emerging, how do you ensure you’re making the right choices for your organization? Whether it’s enhancing customer experiences or improving operational efficiency, understanding the role of your databases in powering AI is crucial. Andi Gutmans is the General Manager and Vice President for Databases at Google. Andi’s focus is on building, managing, and scaling the most innovative database services to deliver the industry’s leading data platform for businesses. Prior to joining Google, Andi was VP Analytics at AWS running services such as Amazon Redshift. Prior to his tenure at AWS, Andi served as CEO and co-founder of Zend Technologies, the commercial backer of open-source PHP. Andi has over 20 years of experience as an open source contributor and leader. He co-authored open source PHP. He is an emeritus member of the Apache Software Foundation and served on the Eclipse Foundation’s board of directors. He holds a bachelor’s degree in computer science from the Technion, Israel Institute of Technology. In the episode, Richie and Andi explore databases and their relationship with AI and GenAI, key features needed in databases for AI, GCP database services, AlloyDB, federated queries in Google Cloud, vector databases, graph databases, practical use cases of AI in databases and much more. Links Mentioned in the Show: GCPConnect with AndiAlloyDB for PostgreSQLCourse: Responsible AI Data ManagementRelated Episode: The Power of Vector Databases and Semantic Search with Elan Dekel, VP of Product at PineconeSign up to RADAR: Forward Edition New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business
Navnit Shukla is a solutions architect with AWS. He joins me to chat about data wrangling and architecting solutions on AWS, writing books, and much more.
Navnit is also in the Coursera Data Engineering Specialization, dropping knowledge on data engineering on AWS. Check it out!
Data Wrangling on AWS: https://www.amazon.com/Data-Wrangling-AWS-organize-analysis/dp/1801810907
LinkedIn: https://www.linkedin.com/in/navnitshukla/
Está no ar, o Data Hackers News !! Os assuntos mais quentes da semana, com as principais notícias da área de Dados, IA e Tecnologia, que você também encontra na nossa Newsletter semanal, agora no Podcast do Data Hackers !!
Aperte o play e ouça agora, o Data Hackers News dessa semana !
Para saber tudo sobre o que está acontecendo na área de dados, se inscreva na Newsletter semanal:
https://www.datahackers.news/
Conheça nossos comentaristas do Data Hackers News:
Monique Femme
Matérias/assuntos comentados:
CEO da AWS diz que desenvolvedores não codificarão mais em 2 anos;
Apple lança iPhone 16 com AI Generativa;
Canva diz que seus recursos de IA valem o aumento de preço de 300 por cento.
Baixe o relatório completo do State of Data Brazil e os highlights da pesquisa :
Demais canais do Data Hackers:
Site
Linkedin
Instagram
Tik Tok
You Tube
Join Kirk and Joe, as they Detail Joe's journey from a farm in Michigan, to leading a data center company in Dallas. Transitioning from tech roles at Disney and Amazon to spearheading AWS's expansion into Asia and how it changed his decision-making approach. They also speak on optimizing data center locations for connectivity and efficiency and explore the data center industry's evolution. They emphasize the fusion of cloud computing and AI and the growing demand for skilled professionals in companies like Digital Bridge and Databank. They also discuss the societal impact of technology in education and healthcare, envisioning AI's role in revolutionizing service delivery through energy solutions like hydrogen and geothermal sources. Wrapping up they discuss its impact on industries like weather prediction and drug development, emphasizing the need for employee investment and growth-centric cultures.
Thank you to our Title Sponsor MCIS - Mission Critical Interior Solutions, Inc. provides interior architectural solutions for data, cloud, and mission critical centers across North America. We offer everything from polished, epoxy, and sealed concrete to raised flooring, hot/cold aisle containment, and high-density ceiling grid. We also install raised access flooring in office spaces and provide underfloor air distribution testing to commission spaces - https://mcis-inc.com
For more about us: https://linktr.ee/overwatchmissioncritical
Summary
Stripe is a company that relies on data to power their products and business. To support that functionality they have invested in Trino and Iceberg for their analytical workloads. In this episode Kevin Liu shares some of the interesting features that they have built by combining those technologies, as well as the challenges that they face in supporting the myriad workloads that are thrown at this layer of their data platform.
Announcements
Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst is an end-to-end data lakehouse platform built on Trino, the query engine Apache Iceberg was designed for, with complete support for all table formats including Apache Iceberg, Hive, and Delta Lake. Trusted by teams of all sizes, including Comcast and Doordash. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Your host is Tobias Macey and today I'm interviewing Kevin Liu about his use of Trino and Iceberg for Stripe's data lakehouse
Interview
Introduction How did you get involved in the area of data management? Can you describe what role Trino and Iceberg play in Stripe's data architecture?
What are the ways in which your job responsibilities intersect with Stripe's lakehouse infrastructure?
What were the requirements and selection criteria that led to the selection of that combination of technologies?
What are the other systems that feed into and rely on the Trino/Iceberg service?
what kinds of questions are you answering with table metadata
what use case/team does that support
comparative utility of iceberg REST catalog What are the shortcomings of Trino and Iceberg? What are the most interesting, innovative, or unexpected ways that you have seen Iceberg/Trino used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Stripe's data infrastructure? When is a lakehouse on Trino/Iceberg the wrong choice? What do you have planned for the future of Trino and Iceberg at Stripe?
Contact Info
Substack LinkedIn
Parting Question
From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.
Links
Trino Iceberg Stripe Spark Redshift Hive Metastore Python Iceberg Python Iceberg REST Catalog Trino Metadata Table Flink
Podcast Episode
Tabular
Podcast Episode
Delta Table
Podcast Episode
Databricks Unity Catalog Starburst AWS Athena Kevin Trinofest Presentation Alluxio
Podcast Episode
Parquet Hudi Trino Project Tardigrade Trino On Ice
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Sponsored By:
Starburst: 
This episode is brought to you by Starburst - an end-to-end data lakehouse platform for data engineers who are battling to build and scale high quality data pipelines on the data lake. Powered by Trino, the query engine Apache Iceberg was designed for, Starburst is an open platform with support for all table formats including Apache Iceberg, Hive, and Delta Lake.
Trusted by the teams at Comcast and Doordash, Starburst del
Spatial computing is revolutionizing the way we interact with digital and physical worlds, but its adoption comes with questions about practicality and return on investment. As businesses explore this cutting-edge technology, they must consider how it can enhance productivity and streamline operations. What are the best strategies to integrate spatial computing into your current systems? How can you ensure that it not only boosts efficiency but also delivers measurable benefits to your bottom line? Cathy Hackl is a web3 and metaverse strategist, tech futurist, speaker and author. She's worked with metaverse-related companies such as HTC VIVE, Magic Leap, and AWS, and currently consults with some of the world's leading brands, including P&G, Clinique, Ralph Lauren, Orlando Economic Partnership and more. Hackl is one of the world's first Chief Metaverse Officers and the co-founder of Journey, where she works with luxury, fashion, and beauty brands to create successful metaverse and web3 strategies and helps them build worlds in platforms like Roblox, Fortnite, Decentraland, The Sandbox, and beyond. She is widely regarded as one of the leading thinkers on the Metaverse. Irena Cronin is SVP of Product for DADOS Technology, which is making an Apple Vision Pro data analytics and visualization app. She is also the CEO of Infinite Retina, which helps companies develop and implement AI, AR, and other new technologies for their businesses. Before this, she worked as an equity research analyst and gained extensive experience in evaluating both public and private companies. In the episode, Richie, Cathy and Irina explore spatial computing, the current viability of spacial computing and it's prominence alongside the release of Apple's Vision Pro, expected effects of spatial computing on gaming and entertainment, industrial applications as well as data visualization and AI integration opportunities of spatial computing, how businesses can leverage spatial computing, future developments in the space and much more. Links Mentioned in the Show: Cathy’s BookIrena’s BooksApple Vision ProMarvel Studios and ILM Immersive Announce 'What If...? - An Immersive Story'[Course] Artificial Intelligence (AI) StrategyRelated Episode: Why the Future of AI in Data will be Weird with Benn Stancil, CTO at Mode & Field CTO at ThoughtSpotSign up to RADAR: AI Edition New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business