talk-data.com
People (116 results)
See all 116 →Activities & events
| Title & Speakers | Event |
|---|---|
|
#283: Good Things (Can) Come in Small Datasets with Joe Domaleski
2025-10-28 · 04:30
Joe Domaleski
– guest
@ Country Fried Creative
Does size matter? When it comes to datasets, the conventional wisdom seems to be a resounding, "Yes!" But what about small datasets? Small- and mid-sized businesses and nonprofits, especially, often have limited web traffic, small email lists, CRM systems that can comfortably operate under the free tier, and lead and order counts that don't lend themselves to "big data" descriptors. Even large enterprises have scenarios where some datasets easily fit into Google Sheets with limited scrolling required. Should this data be dismissed out of hand, or should it be treated as what it is: potentially useful? Joe Domaleski from Country Fried Creative works with a lot of businesses that are operating in the small data world, and he was so intrigued by the potential of putting data to use on behalf of his clients that he's mid-way through getting a Master's degree in Analytics from Georgia Tech! He wrote a really useful article about the ins and outs of small data, so we brought him on for a discussion on the topic! This episode's Measurement Bite from show sponsor Recast is an explanation of synthetic controls and how they can be used as counterfactuals from Michael Kaminsky! For complete show notes, including links to items mentioned in this episode and a transcript of the show, visit the show page. |
The Analytics Power Hour |
|
Practical introduction to building remote MCP servers
2025-07-30 · 10:30
How do you let any MCP-aware agent tap into your platform? Build a remote server! In 30 minutes, Toby P. provides first-hand insights behind building GitHub’s remote MCP server. Then Joe Z. will follow up by building up a fresh remote server live, using the open-source MCP SDK. Walk away with a clear blueprint to launch your own remote MCP server before the day is done. |
MCP Dev Days: Day 2 - Builders
|
|
Practical introduction to building remote MCP servers
2025-07-30 · 10:30
How do you let any MCP-aware agent tap into your platform? Build a remote server! In 30 minutes, Toby P. provides first-hand insights behind building GitHub’s remote MCP server. Then Joe Z. will follow up by building up a fresh remote server live, using the open-source MCP SDK. Walk away with a clear blueprint to launch your own remote MCP server before the day is done. |
MCP Dev Days: Day 2 - Builders
|
|
Joe Reis Flips The Script And Interviews Tobias Macey About The Data Engineering Podcast
2022-07-17 · 23:00
Joe Reis
– founder
@ Ternary Data
,
Tobias Macey
– host
Summary Data engineering is a large and growing subject, with new technologies, specializations, and "best practices" emerging at an accelerating pace. This podcast does its best to explore this fractal ecosystem, and has been at it for the past 5+ years. In this episode Joe Reis, founder of Ternary Data and co-author of "Fundamentals of Data Engineering", turns the tables and interviews the host, Tobias Macey, about his journey into podcasting, how he runs the show behind the scenes, and the other things that occupy his time. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their state-of-the-art reverse ETL pipelines enable you to send enriched data to any cloud tool. Sign up free… or just get the free t-shirt for being a listener of the Data Engineering Podcast at dataengineeringpodcast.com/rudder. Data teams are increasingly under pressure to deliver. According to a recent survey by Ascend.io, 95% in fact reported being at or over capacity. With 72% of data experts reporting demands on their team going up faster than they can hire, it’s no surprise they are increasingly turning to automation. In fact, while only 3.5% report having current investments in automation, 85% of data teams plan on investing in automation in the next 12 months. 85%!!! That’s where our friends at Ascend.io come in. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation, orchestration, and observability. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend automates workloads on Snowflake, Databricks, BigQuery, and open source Spark, and can be deployed in AWS, Azure, or GCP. Go to dataengineeringpodcast.com/ascend and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $5,000 when you become a customer. Your host is Tobias Macey and today we’re flipping the script. Joe Reis of Ternary Data will be interviewing me about my time as the host of this show and my perspectives on the data ecosystem Interview Introduction How did you get involved in the area of data management? Now I’ll hand it off to Joe… Joe’s Notes You do a lot of podcasts. Why? Podcast.init started in 2015, and your first episode of Data Engineering was published January 14, 2017. Walk us through the start of these podcasts. why not a data science podcast? why DE? You’ve published 306 of shows of the Data Engineering Podcast, plus 370 for the init podcast, then you’ve got a new ML podcast. How have you kept the motivation over the years? What’s the process for the show (finding guests, topics, etc….recording, publishing)? It’s a lot of work. Walk us through this process. You’ve done a ton of shows and have a lot of context with what’s going on in the field of both data engineering and Python. What have been some of the |
|
|
Dat: Distributed Versioned Data Sharing with Danielle Robinson and Joe Hand - Episode 16
2018-01-29 · 03:00
Summary Sharing data across multiple computers, particularly when it is large and changing, is a difficult problem to solve. In order to provide a simpler way to distribute and version data sets among collaborators the Dat Project was created. In this episode Danielle Robinson and Joe Hand explain how the project got started, how it functions, and some of the many ways that it can be used. They also explain the plans that the team has for upcoming features and uses that you can watch out for in future releases. Preamble Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure When you’re ready to launch your next project you’ll need somewhere to deploy it. Check out Linode at dataengineeringpodcast.com/linode and get a $20 credit to try out their fast and reliable Linux virtual servers for running your data pipelines or trying out the tools you hear about on the show. Continuous delivery lets you get new features in front of your users as fast as possible without introducing bugs or breaking production and GoCD is the open source platform made by the people at Thoughtworks who wrote the book about it. Go to dataengineeringpodcast.com/gocd to download and launch it today. Enterprise add-ons and professional support are available for added peace of mind. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. You can help support the show by checking out the Patreon page which is linked from the site. To help other people find the show you can leave a review on iTunes, or Google Play Music, and tell your friends and co-workers A few announcements: There is still time to register for the O’Reilly Strata Conference in San Jose, CA March 5th-8th. Use the link dataengineeringpodcast.com/strata-san-jose to register and save 20% The O’Reilly AI Conference is also coming up. Happening April 29th to the 30th in New York it will give you a solid understanding of the latest breakthroughs and best practices in AI for business. Go to dataengineeringpodcast.com/aicon-new-york to register and save 20% If you work with data or want to learn more about how the projects you have heard about on the show get used in the real world then join me at the Open Data Science Conference in Boston from May 1st through the 4th. It has become one of the largest events for data scientists, data engineers, and data driven businesses to get together and learn how to be more effective. To save 60% off your tickets go to dataengineeringpodcast.com/odsc-east-2018 and register. Your host is Tobias Macey and today I’m interviewing Danielle Robinson and Joe Hand about Dat Project, a distributed data sharing protocol for building applications of the future Interview Introduction How did you get involved in the area of data management? What is the Dat project and how did it get started? How have the grants to the Dat project influenced the focus and pace of development that was possible? Now that you have established a non-profit organization around Dat, what are your plans to support future sustainability and growth of the project? Can you explain how the Dat protocol is designed and how it has evolved since it was first started? How does Dat manage conflict resolution and data versioning when replicating between multiple machines? One of the primary use cases that is mentioned in the documentation and website for Dat is that of hosting and distributing open data sets, with a focus on researchers. How does Dat help with that effort and what improvements does it offer over other existing solutions? One of the difficult aspects of building a peer-to-peer protocol is that of establishing a critical mass of users to add value to the network. How have you approached that effort and how much progress do you feel that you have made? How does the peer-to-peer nature of the platform affect the architectural patterns for people wanting to build applications that are delivered via dat, vs the common three-tier architecture oriented around persistent databases? What mechanisms are available for content discovery, given the fact that Dat URLs are private and unguessable by default? For someone who wants to start using Dat today, what is involved in creating and/or consuming content that is available on the network? What have been the most challenging aspects of building and promoting Dat? What are some of the most interesting or inspiring uses of the Dat protocol that you are aware of? Contact Info Dat datproject.org Email @dat_project on Twitter Dat Chat Danielle Email @daniellecrobins Joe Email @joeahand on Twitter Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Links Dat Project Code For Science and Society Neuroscience Cell Biology OpenCon Mozilla Science Open Education Open Access Open Data Fortune 500 Data Warehouse Knight Foundation Alfred P. Sloan Foundation Gordon and Betty Moore Foundation Dat In The Lab Dat in the Lab blog posts California Digital Library IPFS Dat on Open Collective – COMING SOON! ScienceFair Stencila eLIFE Git BitTorrent Dat Whitepaper Merkle Tree Certificate Transparency Dat Protocol Working Group Dat Multiwriter Development – Hyperdb Beaker Browser WebRTC IndexedDB Rust C Keybase PGP Wire Zenodo Dryad Data Sharing Dataverse RSync FTP Globus Fritter Fritter Demo Rotonde how to Joe’s website on Dat Dat Tutorial Data Rescue – NYTimes Coverage Data.gov Libraries+ Network UC Conservation Genomics Consortium Fair Data principles hypervision hypervision in browser The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Click here to read the unedited transcript… Tobias Macey 00:13… |
|
|
Joe Celko
– author
Joe Celko's SQL Puzzles and Answers, Second Edition, challenges you with his trickiest puzzles and then helps solve them with a variety of solutions and explanations. Author Joe Celko demonstrates the thought processes that are involved in attacking a problem from an SQL perspective to help advanced database programmers solve the puzzles you frequently face. These techniques not only help with the puzzle at hand, but also help develop the mindset needed to solve the many difficult SQL puzzles you face every day. This updated edition features many new puzzles; dozens of new solutions to puzzles; and new chapters on temporal query puzzles and common misconceptions about SQL and RDBMS that leads to problems. This book is recommended for database programmers with a good knowledge of SQL. A great collection of tricky SQL puzzles with a variety of solutions and explanations Uses the proven format of puzzles and solutions to provide a user-friendly, practical look into SQL programming problems - many of which will help users solve their own problems New edition features: Many new puzzles added!, Dozens of new solutions to puzzles, and using features in SQL-99, Code is edited to conform to SQL STYLE rules, New chapter on temporal query puzzles, New chapter on common misconceptions about SQL and RDBMS that leads to problems |
O'Reilly Data Engineering Books
|