talk-data.com talk-data.com

Filter by Source

Select conferences and events

People (137 results)

See all 137 →
Showing 5 results

Activities & events

Title & Speakers Event
Dan Sotolongo – guest @ Snowflake , Tobias Macey – host

Summary In this episode of the Data Engineering Podcast Dan Sotolongo from Snowflake talks about the complexities of incremental data processing in warehouse environments. Dan discusses the challenges of handling continuously evolving datasets and the importance of incremental data processing for optimized resource use and reduced latency. He explains how delayed view semantics can address these challenges by maintaining up-to-date results with minimal work, leveraging Snowflake's dynamic tables feature. The conversation also explores the broader landscape of data processing, comparing batch and streaming systems, and highlights the trade-offs between them. Dan emphasizes the need for a unified theoretical framework to discuss semantic guarantees in data pipelines and introduces the concept of delayed view semantics, touching on the limitations of current systems and the potential of dynamic tables to simplify complex data workflows.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Your host is Tobias Macey and today I'm interviewing Dan Sotolongo about the challenges of incremental data processing in warehouse environments and how delayed view semantics help to address the problemInterview IntroductionHow did you get involved in the area of data management?Can you start by defining the scope of the term "incremental data processing"?What are some of the common solutions that data engineers build when creating workflows to implement that pattern?What are some common difficulties that they encounter in the pursuit of incremental data?Can you describe what delayed view semantics are and the story behind it?What are the problems that DVS explicitly doesn't address?How does the approach that you have taken in Dynamic View Semantics compare to systems like Materialize, Feldera, etc.Can you describe the technical architecture of the implementation of Dynamic Tables?What are the elements of the problem that are as-yet unsolved?How has the implementation changed/evolved as you learned more about the solution space?What would be involved in implementing the delayed view semantics pattern in other dbms engines?For someone who wants to use DVS/Dyamic Tables for managing their incremental data loads, what does the workflow look like?What are the options for being able to apply tests/validation logic to a dynamic table while it is operating?What are the most interesting, innovative, or unexpected ways that you have seen Dynamic Tables used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Dynamic Tables/Delayed View Semantics?When are Dynamic Tables/DVS the wrong choice?What do you have planned for the future of Dynamic Tables?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links Delayed View Semantics: Presentation SlidesSnowflakeNumPyIPythonJupyterFlinkSpark StreamingKafkaSnowflake Dynamic TablesAirflowDagsterStreaming WatermarksMaterializeFelderaACIDCAP Theorem)LinearizabilitySerializable ConsistencySIGMODMaterialized ViewsdbtData VaultApache IcebergDatabricks DeltaHudiDead Letter Queuepg_ivmProperty Based TestingIceberg V3 Row LineagePrometheusThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

AI/ML Data Engineering Data Management Datafold Python Snowflake Data Streaming
Dan Bruckner – co-founder and CTO @ Tamr , Tobias Macey – host

Summary In this episode of the Data Engineering Podcast Dan Bruckner, co-founder and CTO of Tamr, talks about the application of machine learning (ML) and artificial intelligence (AI) in master data management (MDM). Dan shares his journey from working at CERN to becoming a data expert and discusses the challenges of reconciling large-scale organizational data. He explains how data silos arise from independent teams and highlights the importance of combining traditional techniques with modern AI to address the nuances of data reconciliation. Dan emphasizes the transformative potential of large language models (LLMs) in creating more natural user experiences, improving trust in AI-driven data solutions, and simplifying complex data management processes. He also discusses the balance between using AI for complex data problems and the necessity of human oversight to ensure accuracy and trust.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. As a listener of the Data Engineering Podcast you clearly care about data and how it affects your organization and the world. For even more perspective on the ways that data impacts everything around us don't miss Data Citizens® Dialogues, the forward-thinking podcast brought to you by Collibra. You'll get further insights from industry leaders, innovators, and executives in the world's largest companies on the topics that are top of mind for everyone. In every episode of Data Citizens® Dialogues, industry leaders unpack data’s impact on the world; like in their episode “The Secret Sauce Behind McDonald’s Data Strategy”, which digs into how AI-driven tools can be used to support crew efficiency and customer interactions. In particular I appreciate the ability to hear about the challenges that enterprise scale businesses are tackling in this fast-moving field. The Data Citizens Dialogues podcast is bringing the data conversation to you, so start listening now! Follow Data Citizens Dialogues on Apple, Spotify, YouTube, or wherever you get your podcasts.Your host is Tobias Macey and today I'm interviewing Dan Bruckner about the application of ML and AI techniques to the challenge of reconciling data at the scale of businessInterview IntroductionHow did you get involved in the area of data management?Can you start by giving an overview of the different ways that organizational data becomes unwieldy and needs to be consolidated and reconciled?How does that reconciliation relate to the practice of "master data management"What are the scaling challenges with the current set of practices for reconciling data?ML has been applied to data cleaning for a long time in the form of entity resolution, etc. How has the landscape evolved or matured in recent years?What (if any) transformative capabilities do LLMs introduce?What are the missing pieces/improvements that are necessary to make current AI systems usable out-of-the-box for data cleaning?What are the strategic decisions that need to be addressed when implementing ML/AI techniques in the data cleaning/reconciliation process?What are the risks involved in bringing ML to bear on data cleaning for inexperienced teams?What are the most interesting, innovative, or unexpected ways that you have seen ML techniques used in data resolution?What are the most interesting, unexpected, or challenging lessons that you have learned while working on using ML/AI in master data management?When is ML/AI the wrong choice for data cleaning/reconciliation?What are your hopes/predictions for the future of ML/AI applications in MDM and data cleaning?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links TamrMaster Data ManagementCERNLHCMichael StonebrakerConway's LawExpert SystemsInformation RetrievalActive LearningThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

AI/ML Collibra Data Engineering Data Management Datafold LLM Master Data Management Python
Dan Weeks – Co-creator of Apache Iceberg, Co-founder & CTO, Tabular

Analytic databases are quietly going through an unprecedented transformation. Open table formats, led by Apache Iceberg, enable multiple query engines to share one central copy of a table. This will fundamentally change the data industry, by freeing data that’s being held hostage by siloed data vendors. In this session to hear from Dan Weeks, co-creator of Apache Iceberg and co-founder & CTO of Tabular, as he will cover the origins and basics of open table formats and show how these new capabilities are shaping the future of both open-source compute projects and commercial data warehouses alike. You will learn key advice for building a data architecture that makes data more accessible to all data practitioners while avoiding vendor lock-in.

Iceberg
Data Universe 2024
nPlan's ML Paper Club 2024-03-07 · 12:30

This week Vahan will be talking us through World Model on Million-Length Video And Language With RingAttention by Kawin Ethayarajh, Winnie Xu, Niklas Muennighoff, Dan Jurafsky and Douwe Kiela.

We look forward to seeing you there and please keep your eyes peeled for details on next weeks in person Paper Club!

Want to learn more about Paper Club?

  • We discuss a different research paper every week. We post each week's paper in our GitHub repo - please read it before the meetup.
  • All events will be hosted on a Google Meets video call. Once a month we also host an in-person event in our London office - watch this space for updates.
  • All recorded presentations can be found in our YouTube channel (don't forget to subscribe!).
nPlan's ML Paper Club
Jason Joven – host @ Chartmetric

HighlightsNSync performs at Coachella w/ Ariana Grande and Michael Jackson’s legacy deals with Leaving Neverland...but does this affect their music data?MissionGood morning, it’s Jason here at Chartmetric with your 3-minute Data Dump where we upload charts, artists and playlists into your brain so you can stay up on the latest in the music data world.DateThis is your Data Dump for Wednesday April 17th 2019.Legacy acts in the spotlightMost of the time, music data is all about the frontline releases, the next emerging artists and global superstars...but what about legacy acts?Loosely defined, legacy acts are any artists that have had a successful career and have since left their glory days, yet still hold sway over the general public.In this sense, late 90s/early 2000s American boy band NSYNC and the late Michael Jackson fit this definition.But sometimes, the work of such acts bubble up again for one reason or another, and sometimes they are good, and sometimes not so much.Exhibit 1: Just this past Sunday, reigning American pop queen Ariana Grande invited NSYNC on stage (minus Justin Timberlake) to perform a few of their hits as part of her headlining set. The various teasers leading up to the event have given way to performance reviews on all the music outlets, and while the effect is diluted on Ms. Grande’s red-hot career, how does this affect the former group that haven’t released original material since 2001?Legacy acts on streaming services are an odd juxtaposition of the old and the new, but for NSYNC, they are enjoying streaming metrics that would otherwise be great for an up and coming act.At 6.1M Spotify monthly listeners and 914K followers, this gives them listener to follower ratio of 6.7, putting them ahead of Charli XCX and even Billie Eilish. This actually makes a lot of sense for the group, because a high ratio is usually the result of a highly loyal but small following with little to no marketing reach…and a now-defunct yet hugely famous 2000s boy band pretty much fits that bill to a T.In terms of immediate effects observed, they’re pretty much nil: no major editorial playlists on either Spotify, Apple, Amazon or Deezer added NSYNC records, and while their Spotify daily follower count jumped roughly 50%, it was only an additional 600 or so followers from their norm.If anything, their Twitter daily followers jumped 10x after Sunday and their Instagram daily followers popped 15x their norm, which makes sense given the very Instagrammable nature of Coachella, but already there seems to be no long-term effects.Now while there was a fun, no strings attached nature to the one-time Coachella performance, Michael Jackson’s legacy has recently taken a turn for the not-so-flattering.At the beginning of March, HBO released a documentary called Finding Neverland directed by British filmmaker Dan Reed, which focuses on the testimonials of two now-grown men that were allegedly sexually abused as children by the former King of Pop.Both traditional and social media were not quiet about the exposé, but  nevertheless, Michael Jackson’s music data profile doesn’t seem to have really experienced much of any difference: his Spotify daily follower patterns show no real changes  since March and his monthly listener count slowed slightly from 22.3M at the beginning of the month to 21.5M currently. This metric is largely buoyed by Drake’s sampling of Jackson in the track “Don’t Matter to Me” on Drake’s juggernaut album Scorpion.After Finding Neverland’s release, Jackson’s YouTube daily channel subscribers only briefly fluctuated to twice his average then cut in half from his average before returning back to normal, and his Wikipedia page views peaked at 6x his daily norm until returning back his average of about 30K views a few weeks after.What may be most interesting is how radio airplay has reacted: among 300 of the most influential US radio stations, they collectively went from spinning Jackson’s music roughly 100-150 times a day during the holiday months of Nov/Dec last year, and now trickling down to just 10 spins a day as of early April.Due to the limited airtime stations have and the more localized connection they have to their listeners, this might create more accountability and the need to insulate themselves from angry listeners revolted by the documentary.All in all, some say that in the show business, “any publicity is good publicity”, but from a music data perspective, at least for these artists, maybe it should be “any publicity doesn’t affect our legacy much.”OutroThat’s it for your Daily Data Dump for Wednesday April 17th 2019. This is Jason from Chartmetric.Free accounts are at chartmetric.io/signupAnd article links and show notes are at a new website: podcast.chartmetric.com.Happy Wednesday, see you tomorrow! 

Marketing Data Streaming
How Music Charts
Showing 5 results