talk-data.com talk-data.com

Topic

Data Streaming

realtime event_processing data_flow

739

tagged

Activity Trend

70 peak/qtr
2020-Q1 2026-Q1

Activities

739 activities · Newest first

Summary Building and maintaining a data lake is a choose your own adventure of tools, services, and evolving best practices. The flexibility and freedom that data lakes provide allows for generating significant value, but it can also lead to anti-patterns and inconsistent quality in your analytics. Delta Lake is an open source, opinionated framework built on top of Spark for interacting with and maintaining data lake platforms that incorporates the lessons learned at DataBricks from countless customer use cases. In this episode Michael Armbrust, the lead architect of Delta Lake, explains how the project is designed, how you can use it for building a maintainable data lake, and some useful patterns for progressively refining the data in your lake. This conversation was useful for getting a better idea of the challenges that exist in large scale data analytics, and the current state of the tradeoffs between data lakes and data warehouses in the cloud.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! And to keep track of how your team is progressing on building new pipelines and tuning their workflows, you need a project management system designed by engineers, for engineers. Clubhouse lets you craft a workflow that fits your style, including per-team tasks, cross-project epics, a large suite of pre-built integrations, and a simple API for crafting your own. With such an intuitive tool it’s easy to make sure that everyone in the business is on the same page. Data Engineering Podcast listeners get 2 months free on any plan by going to dataengineeringpodcast.com/clubhouse today and signing up for a free trial. Support the show and get your data projects in order! You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Coming up this fall is the combined events of Graphorum and the Data Architecture Summit. The agendas have been announced and super early bird registration for up to $300 off is available until July 26th, with early bird pricing for up to $200 off through August 30th. Use the code BNLLC to get an additional 10% off any pass when you register. Go to dataengineeringpodcast.com/conferences to learn more and take advantage of our partner discounts when you register. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Your host is Tobias Macey and today I’m interviewing Michael Armbrust about Delta Lake, an open source storage layer that brings ACID transactions to Apache Spark and big data workloads.

Interview

Introduction How did you get involved in the area of data m

HighlightsIt’s time to hit the road again, so we’re heading down south to trigger city São Paulo, Brazil. What makes it such an important global music marketplace?Mission   Good morning, it’s Rutger here at Chartmetric with your 3-minute Data Dump where we upload charts, artists and playlists into your brain so you can stay up on the latest in the music data world.DateThis is your Data Dump for Thursday, June 13th, 2019.Excursion Thursday: Trigger City São Paulo, BrazilWe’re hitting the road again, heading down south to trigger city São Paulo, Brazil, to see what makes it such an important global music marketplace. First, it’s important to note that São Paulo is also a state in Brazil — naturally, the state in which São Paulo, the city, is located. Obviously, this presents some major metadata problems, which are compounded by the fact that São Paulo (with a tilde) and “Sao Paulo” (without a tilde) are reported as different cities. Adjusting for metadata errors, the city, which is Brazil’s wealthiest and most populous, is ranked third in the world for non-unique monthly Spotify listeners, based on our calculations from a week in May.For that same week, São Paulo came in ninth for global YouTube views.They’re really living up to their city motto, “I am not led; I lead.”It’s not just local artists and the longstanding sertanejo style updated for younger people skyrocketing São Paulo with regional streams.Scanning our top artists charts, the city comes up on three of the Top 10 artists — namely, J Balvin, Justin Bieber, and Shawn Mendes — as somewhere people listen most.Of the Top 100 artists globally according to our Cross-Platform Performance metric, São Paulo is in the Top 5 listener cities for 26, or just a bit more than a quarter, of them.Zooming in a bit and looking at Top Artists by Spotify Monthly Listeners on São Paulo’s city page, Brazilian artists do tend to dominate, with the 10 most listened-to artists, except for Lady Gaga, calling Brazil home.On Top Artists by YouTube Views, the Top 10 are all Brazilian as well, but when it comes to Top Artists by Shazam Chart Occurrences, only two Brazilians make the Top 10, suggesting São Paulo locals are loyal to their countrymen and countrywomen on major streaming platforms, but Shazam is where they learn what’s happening in the Anglo music world.And they certainly have an ear for British and American hits like “Giant” by Calvin Harris and Rag ‘n’ Bone Man or “Happier” by Marshmello and Bastille.With a population comparable to New York City and Los Angeles combined, São Paulo tops each of those cities on the global stage, thanks to a musical ecosystem — not to mention tradition — as robust as the Amazon rainforest and an appetite for pop hits from their neighbors on the northern side of the Tropic of Cancer.OutroThat’s it for your Daily Data Dump for Thursday, June 13th, 2019. This is Rutger from Chartmetric.If you’re interested in learning more about trigger cities, check out Jason’s in-depth analysis on our blog at blog.chartmetric.io.Free accounts are at chartmetric.comAnd article links and show notes are at: podcast.chartmetric.com.Happy Thursday, and see you tomorrow!

HighlightsSpecial interview episode today: Does data science scare you? Does it keep you up at night when you hear or read about it at a panel or on some podcast, and you think to yourself, “I have no idea what they are talking about.”Rest easy and let Chartmetric’s Resident Data Scientist assuage your fears.How do you measure artist success across multiple streaming, social and other Internets platforms? We might have something for you.Mission   Good morning, it’s Jason and Josh here at Chartmetric usually with your 3-minute Data Dump where we upload charts, artists, and playlists into your brain so you can stay up on the latest in the music data world.DateThis is your Data Dump for Wednesday, June 12th, 2019.Interview OutlineWhat is Cross-Platform Performance scoring and ranking on Chartmetric?Josh’s blog article / CPP explanationCPP measurementsStage: This is the amount of “reach” or “exposure” that an artist has over audiences. The bigger the stage, the more people actively listening, watching, or consuming what the artist is creating.Followers: This is the size of an artist’s “fanbase” or an artist’s “stickiness” with audiences. Followers have opted into tracking an artist and therefore are more likely to re-engage with the artist’s products in the future. Followers are not actively engaging with an artist all the time, but artists have an easier job of connecting with followers than non-followers.Cool CPP video to visualize the data science (made by Graphic & Motion Design Artist Anastasiya Bulavkina)Philosophical debate: what is “best” nowadays?Is there a way for people to reach out to you on the Interwebs, Josh?Josh’s LinkedIn profilehi (at) chartmetric (dot) comOutroThat’s it for your Daily Data Dump for Wednesday, June 12th, 2019. This is Josh and Jason from Chartmetric.Free accounts are at app.chartmetric.com/signupAnd article links and show notes are at: podcast.chartmetric.comHappy Wednesday, see you tomorrow!

2019-06-11 // Technique Tuesday: DJ Khaled vs. Tyler, the Creator HighlightsDJ Khaled is taking on Billboard’s charting calculations and Tyler, the Creator is caught in the crossfire. So, how do the two artists stack up in the streaming world?Mission   Good morning, it’s Rutger here at Chartmetric with your 3-minute Data Dump where we upload charts, artists, and playlists into your brain so you can stay up on the latest in the music data world.DateThis is your Data Dump for Tuesday, June 11th, 2019.Technique Tuesday: DJ Khaled vs. Tyler, the CreatorYesterday, Music Business Worldwide and Pitchfork reported that DJ Khaled, who just released his new album “Father of Asahd,” is taking on Billboard’s charting methods following the album’s No. 2 placement on the Billboard 200 Albums chart behind Tyler, the Creator’s “Igor.”At the heart of the issue is a discrepancy in physical album sales due to the practice of bundling, or wrapping up the sale of an album with the sale of merchandise.Here are the numbers: Billboard credited Tyler, the Creator with 165,000 total album sales for the week, and DJ Khaled with 137,000. For “Igor,” that’s 74,000 physical albums sold, 90,000 Streaming Equivalent Albums sold (SEA) and 1,000 Track, or download, Equivalent Albums sold (TEA).For “Father of Asahd,” the same breakdown came out to 35,000 physical, 95,000 SEA, and 7,000 TEA.So far, the ostensibly arbitrary SEA measurement isn’t DJ Khaled’s issue here, but if he really is pursuing a lawsuit, then Billboard’s charting methods for streams could come under scrutiny as well.Here’s how they’re calculating it: According to the New York Times’ Ben Sisario, four years ago, 1,500 streams equalled the equivalent of one physical album sale, but Billboard’s new method comes out to 1,250 for paid streams and 3,750 for free streams.We can’t measure differentiated streams for DJ Khaled and Tyler, the Creator according to Billboard’s new method, but we can use our Analyze function to visibly compare the changes in their monthly Spotify listeners on a custom chart.While Tyler starts off at an estimated 6.5 million monthly listeners, DJ Khaled is at an estimated 18 million around the release of their albums on May 17.By the end of the week, Tyler has crossed the 10 mil threshold and DJ Khaled has racked up an estimated 20.7 mil. This means Tyler experienced a more than 50% growth rate in monthly listeners and DJ Khaled only around 15% for their album debut week ending on May 23.However, DJ Khaled still ends up with around twice as many monthly Spotify listeners for the week. Does this translate to what Billboard calculated as each artist’s SEA? That’s difficult to say, because each unique monthly listener only gets counted once for every 28-day period -- no matter how many times they play a track.While DJ Khaled is more exposed on the playlist front, Tyler saw a bigger gain in monthly listeners during their album release week.Tyler also overtook Khaled’s Spotify Popularity Index score with a 92, vs. Khaled’s 88, out of 100.Based on this data, Tyler’s “Igor” is complementing his catalogue and driving more of his streaming collectively, while DJ Khaled’s success depends on a handful of mega hits.It’s a cult hip-hop icon vs. a Top 40 superstar, but DJ Khaled, with some 2.8 billion YouTube video views for the week in question, compared to just 442 million for Tyler, the Creator, shouldn’t have too much to complain about.OutroThat’s it for your Daily Data Dump for Tuesday, June 11th, 2019. This is Rutger from Chartmetric.Free accounts are at app.chartmetric.com/signupAnd article links and show notes are at: podcast.chartmetric.comHappy Tuesday, see you tomorrow!

HighlightsShazam isn’t just in the music fingerprinting and identification game — it’s also playlisting on Apple Music with Shazam Recommends: Best New Music. Mission   Good morning, it’s Jason here at Chartmetric with your 3-minute Data Dump where we upload charts, artists and playlists into your brain so you can stay up on the latest in the music data world.DateThis is your Data Dump for Monday, June 10, 2019.New Music Friday Monday: Shazam’s Best New Music Recommendations … on Apple Music?Did you know that Shazam moonlights as playlisting curator on Apple Music?It’s currently managing 11 official playlists, including a mix of prediction-oriented genre-based ones like “Shazam Risers: Latin” or “Shazam Risers: K-Pop”, or exclusive celebrity playlists from the likes of David Guetta or BLACKPINK.Interestingly, Shazam also runs the “Shazam Recommends: The Best New Music” playlist, which is refreshed primarily on Fridays and Saturdays.Apple acquired Shazam in September of 2018, though we have Shazam playlists dating back to early 2017.So now that Shazam is now officially an Apple asset, it’s likely Apple Music is incorporating Shazam’s unique predictive dataset as a way to predict future hits...but does it actually work?If we compare last week’s Shazam “Best New Music” playlist with the Apple Music Top 100 charts today, we can try to see if- at least within the Apple Music platform- that actually becomes true or not.After some quick spot checks, the Shazam “Best New Music” playlist is actually global: it’s the same tracks and ordering no matter which country storefront you’re listening from. So the best comparison would naturally be the Apple Music Top 100 Global chart.The last Shazam “Best New Music” playlist was updated on June 1st, and comparing it to today’s Apple Top 100 global chart, there are actually four tracks in common:“The London” by Young Thug at #2 on the Top 100“Cross Me” by Ed Sheeran at #28“Don’t Call Me Up” by Mabel at #59“Easier” by Five Seconds of Summer at #66 of the Top 100So out of last week’s “Best New Music” playlist, 4 of out of the 24 total tracks ended up charting one week later, about 15%. Pretty cool.Now, cross-checking last week’s “Best New Music” playlist, but now comparing it to Shazam’s own Top 200 chart, which is its own chart independent of the Apple platform, we have the same, and only the same, four tracks pop up: the ones from Young Thug, Ed Sheeran, Mabel and Five Seconds of Summer.That’s interesting because finding tracks that are only on Apple’s Top charts are subject to Apple algorithms and other playlists, while Shazam Top charts are privy to being played in public spaces and people having the app and Shazaming those tracks.But to find the same track on both charts, must really mean that the tracks are achieving a kind of success both in user curiosity and actual streaming activity on one of biggest platforms in the world.Now, what feeds Shazam’s “Best New Music” playlist in the first place, as they are all new releases and so Shazam doesn’t really have any data on them...well, we don’t know either.Maybe they are doing granular music analysis on the song waveforms or maybe it’s just a result of traditional playlist pitching, but what we can measure in the data, is see which ones stick. Just give it a week!OutroThat’s it for your Daily Data Dump for Monday, June 10, 2019. This is Jason from Chartmetric.Free accounts are at app.chartmetric.com/signupAnd article links and show notes are at: podcast.chartmetric.com.Happy Monday, see you tomorrow!

Summary Building a machine learning model can be difficult, but that is only half of the battle. Having a perfect model is only useful if you are able to get it into production. In this episode Stepan Pushkarev, founder of Hydrosphere, explains why deploying and maintaining machine learning projects in production is different from regular software projects and the challenges that they bring. He also describes the Hydrosphere platform, and how the different components work together to manage the full machine learning lifecycle of model deployment and retraining. This was a useful conversation to get a better understanding of the unique difficulties that exist for machine learning projects.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! And to keep track of how your team is progressing on building new pipelines and tuning their workflows, you need a project management system designed by engineers, for engineers. Clubhouse lets you craft a workflow that fits your style, including per-team tasks, cross-project epics, a large suite of pre-built integrations, and a simple API for crafting your own. With such an intuitive tool it’s easy to make sure that everyone in the business is on the same page. Data Engineering Podcast listeners get 2 months free on any plan by going to dataengineeringpodcast.com/clubhouse today and signing up for a free trial. Support the show and get your data projects in order! You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Coming up this fall is the combined events of Graphorum and the Data Architecture Summit. The agendas have been announced and super early bird registration for up to $300 off is available until July 26th, with early bird pricing for up to $200 off through August 30th. Use the code BNLLC to get an additional 10% off any pass when you register. Go to dataengineeringpodcast.com/conferences to learn more and take advantage of our partner discounts when you register. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Your host is Tobias Macey and today I’m interviewing Stepan Pushkarev about Hydrosphere, the first open source platform for Data Science and Machine Learning Management automation

Interview

Introduction How did you get involved in the area of data management? Can you start by explaining what Hydrosphere is and share its origin story? In your experience, what are the most challenging or complicated aspects of managing machine learning models in a production context?

How does it differ from deployment and maintenance

Highlights  It’s Found on Friday, and we’re using Spotify playlist adds and reach to introduce you to a tropical DJ from Spain, an American lo-fi beats producer and an Irish singer-songwriter with literary flair.Mission   Good morning, it’s Jason here at Chartmetric with your 3-minute Data Dump where we upload charts, artists and playlists into your brain so you can stay up on the latest in the music data world.DateThis is your Data Dump for Friday, June 7th, 2019.Found on Friday: Playlist Reach Uncovers a Galician DJ, an American lo-fi beats producer and an Irish Literary SongwriterIt’s Found on Friday, which means we are digitally crate-digging for new artists in the proverbial streaming record shops of the Internets, and this time through the lens of “reach”.In the world of social media, reach is the unique number of people who see a particular piece of content, and we can contrast that with “impressions”, which are the total number of times they see that content, and “engagement”, which is the number of interactions those audience members actively take upon that content.In Spotify’s streaming world, reach in one sense is obviously playlisting, and we can aggregate how many followers a particular playlist has, and at the artist-level, aggregate how many total playlist followers that artist has at any given point.These of course are non-unique follower counts, as we all are probably following dozens if not hundreds of playlists from each of our single profiles.Nevertheless, it’s still a measure of reach, and that can be an important metric for determining which artists are in a great position to break. Now ranked by number of new popular playlists adds in the past 30 days, Spanish DJ Zeper occupies the #1 spot today.From Pontevedra, Galicia, the young producer has a very accessible tropical dance vibe that has Majestic Casual vibes and would easily fit in any college student’s chillout or study playlist. Currently on 50 playlists with 10K or more followers, Zeper’s total playlist reach is over 2.8M followers, growing over 45K total followers since last week.His latest release was “Stop” on May 31st collaborating with another emerging artist KRIMETZ.Now added on an additional 39 playlists with over 10K followers each is American artist Hurley Mower.With his polished take on the lo-fi beats genre, Mower gained nearly another 30K aggregated playlist followers in the past week, bringing him over the 2M mark.With 207K monthly listeners and only 5.3K followers on his own Spotify profile, he’s got a listener to follower ratio of 38, which definitely puts him well into the promising artist category for that metric.Last but not least is Jealous of the Birds. Such an interesting name.On 5 playlists with more than 10K followers, the Irish singer-songwriter has over 767K total playlist followers, including Spotify’s Evening Acoustic playlist in the 84/100 spot and the Sad Indie playlist in the 60/80 position.She’s no stranger to attention however, her previous tracks have been from NPR’s All Songs Considered and BBC Radio 1’s Tune of the Week.No matter what you’re vibe, there’s some new artists hanging out on your smartphone, check them out this weekend!Outro That’s it for your Daily Data Dump for Friday, June 7th, 2019. This is Jason from Chartmetric.Do you like this podcast? Does it help your day? If so, this is the part where we grovel at your feet for an iTunes rating or review...we are a business to business podcast, so it’s not like we’re trying to blow up, but if we can grow our audience some more to maybe start a music data interest community, we think that could be a really cool thing.So if you like what we do, please give us a shout-out on iTunes. If you’re on an iPhone, just scroll all the way down on the Daily Data Dump page in your Apple Podcasts app or in the Ratings and Review tab in your iTunes app on your laptop, and show some love, Rutger and I will do a silent happy dance for every star that we get.Free accounts are at app.chartmetric.com/signupAnd article links and show notes are at: podcast.chartmetric.comHappy Friday, have a great weekend, and see you on Monday!

Stream Processing with Apache Spark

Before you can build analytics tools to gain quick insights, you first need to know how to process data in real time. With this practical guide, developers familiar with Apache Spark will learn how to put this in-memory framework to use for streaming data. You’ll discover how Spark enables you to write streaming jobs in almost the same way you write batch jobs. Authors Gerard Maas and François Garillot help you explore the theoretical underpinnings of Apache Spark. This comprehensive guide features two sections that compare and contrast the streaming APIs Spark now supports: the original Spark Streaming library and the newer Structured Streaming API. Learn fundamental stream processing concepts and examine different streaming architectures Explore Structured Streaming through practical examples; learn different aspects of stream processing in detail Create and operate streaming jobs and applications with Spark Streaming; integrate Spark Streaming with other Spark APIs Learn advanced Spark Streaming techniques, including approximation algorithms and machine learning algorithms Compare Apache Spark to other stream processing projects, including Apache Storm, Apache Flink, and Apache Kafka Streams

Summary Building an ETL pipeline can be a significant undertaking, and sometimes it needs to be rebuilt when a better option becomes available. In this episode Aaron Gibralter, director of engineering at Greenhouse, joins Raghu Murthy, founder and CEO of DataCoral, to discuss the journey that he and his team took from an in-house ETL pipeline built out of open source components onto a paid service. He explains how their original implementation was built, why they decided to migrate to a paid service, and how they made that transition. He also discusses how the abstractions provided by DataCoral allows his data scientists to remain productive without requiring dedicated data engineers. If you are either considering how to build a data pipeline or debating whether to migrate your existing ETL to a service this is definitely worth listening to for some perspective.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! And to keep track of how your team is progressing on building new pipelines and tuning their workflows, you need a project management system designed by engineers, for engineers. Clubhouse lets you craft a workflow that fits your style, including per-team tasks, cross-project epics, a large suite of pre-built integrations, and a simple API for crafting your own. With such an intuitive tool it’s easy to make sure that everyone in the business is on the same page. Data Engineering Podcast listeners get 2 months free on any plan by going to dataengineeringpodcast.com/clubhouse today and signing up for a free trial. Support the show and get your data projects in order! You listen to this show to learn and stay up to date with the ways that Python is being used, including the latest in machine learning and data analysis. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Coming up this fall is the combined events of Graphorum and the Data Architecture Summit. The agendas have been announced and super early bird registration for up to $300 off is available until July 26th, with early bird pricing for up to $200 off through August 30th. Use the code BNLLC to get an additional 10% off any pass when you register. Go to dataengineeringpodcast.com/conferences to learn more and take advantage of our partner discounts when you register. You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Go to dataengineeringpodcast.com/conferences to learn more and take advantage of our partner discounts when you register. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. To help other

podcast_episode
by Rutger (Chartmetric)

2019-06-04 // Charting the End of iTunes HighlightsIn the wake of Apple’s announcement that it will end the iTunes digital download as we know it, we’re scanning the iTunes Charts to see what, if anything, will be lost.    Mission   Good morning, it’s Rutger here at Chartmetric with your 3-minute Data Dump where we upload charts, artists, and playlists into your brain so you can stay up on the latest in the music data world.DateThis is your Data Dump for Tuesday, June 4th, 2019.Charting the End of iTunesToday, we’re looking at the U.S. iTunes Charts following Apple’s Worldwide Developer Conference (WWDC) announcement that it will be ending the iTunes digital download as we know it and spinning out the iTunes app into three separate apps for Apple Music, podcasts, and television.What will that mean for the music you already purchased and downloaded? Rest assured, Apple is making provisions for the digital downloads you already own. The company wouldn’t be ending the iTunes digital download era without good cause — namely, most consumers stream; they don’t mp3 anymore.That said, what will be lost? We’re gonna walk you through how to figure that out using the iTunes Top 100 Tracks and iTunes Albums charts for U.S. storefronts.Looking solely at chart position, there’s a lot of correlation between high performing pop downloads and high performing pop streams on Apple’s iTunes and Music apps, respectively. Lil Nas X and Billy Ray Cyrus’ “Old Town Road,” Katy Perry’s “Never Really Over,” Ed Sheeran and Justin Bieber’s “I Don’t Care,” and Billie Eilish’s “Bad Guy” being prime examples.Differences emerge with different genres, however. At No. 2 on the U.S. iTunes chart for June 3rd is John Rich’s “Shut Up About Politics,” which is nowhere on the Apple Music Daily Tracks chart. Blake Shelton’s “God’s Country,” which is at No. 6 on the U.S. iTunes chart for June 3rd, ranks just 89th on the U.S. Apple Music Daily Tracks. It’s a similar story for Morgan Wallen’s “Whiskey Glasses” at No. 7 on iTunes but No. 71 on Apple Music, and for Luke Combs’ “Beer Never Broke My Heart” at No. 12 on iTunes but No. 64 on Apple Music. What will this mean for country fans who tend to prefer digital downloads? In 2017, Pandora's chief executive, Tim Westergren, saw promise in converting country listeners into paying subscribers considering how active country fans and artists are on the platform. We’ll see if the end of iTunes chases country fans from Apple to Pandora, but that would still require an adjustment from a download oriented consumer base to a streaming oriented consumer base.iTunes has also been huge for another important segment of the music industry: movie soundtracks. Looking at chart summaries by artist, Elton John and Will Smith have nine and four tracks on the iTunes Top 100, respectively, and it’s all thanks to the recent Elton John biopic, Rocket Man, and Guy Ritchie’s live-action Aladdin movie, starring Will Smith as the genie. Jumping over to the iTunes Albums in All Genres chart for June 3rd, the Aladdin soundtrack is at No. 3 and various Elton John albums and/or compilations scatter the top 10. Amazingly, the soundtrack for The Greatest Showman — a movie released two years ago — is at No. 9.While the end of iTunes probably won’t affect income streams for most artists — as the majority of music consumers have largely forgotten about mp3s anyway — for country music stars and artists on movie soundtracks, the end of this era just might sting a little. OutroThat’s it for your Daily Data Dump for Tuesday, June 4th, 2019. This is Rutger from Chartmetric.Free accounts are at app.chartmetric.com/signupAnd article links and show notes are at: podcast.chartmetric.comHappy Tuesday, see you tomorrow!

podcast_episode
by Ichiro Asatsuma (Fujipacific Music) , Jason Joven (Chartmetric)

HighlightsIt’s Excursion Thursday, we’re teleporting to Tokyo, Japan, where local music matters for Spotify and Instagram, but not for Shazam. What does that say about public and private listening habits in Tokyo?Mission   Good morning, it’s Jason here at Chartmetric with your 3-minute Data Dump where we upload charts, artists and playlists into your brain so you can stay up on the latest in the music data world.DateThis is your Data Dump for Thursday, May 30th, 2019.Excursion Thursday:As Japan’s capital and the world’s largest city with a population of around 38 million, Tokyo is the heart of the No. 2 music market in the world.Despite streaming’s rescue of the global music industry from a $14.6B decline in global revenue since the 2000s, a lot of Japanese simply don’t care as 71% of their local recorded music revenue in 2018 came from physical sales.Along with their love of physical music goods, Japan’s consumer base also remains faithful to its local artists. According to Ichiro Asatsuma, Chairman of Fujipacific Music., the breakdown of the country’s physical sales is typically 85-90% Japanese repertoire and 10-15% international.Now how does this percentage distribution hold up in Tokyo’s digital market? Looking at Top Artists by Spotify Monthly Listeners in the past month, 18 of the top 25 are Japanese, and by recent Instagram Followers, 15 of the Top 25 Artists are also local. But Spotify and Instagram are generally more private platforms when it comes to use, at least in comparison to an audio fingerprinting app like Shazam, which is utilized in a public space like a bar or a club.So, what’s the Shazam spread look like? Of the 25 Top Artists by Shazam Chart Occurrences in the past month, only three are Japanese.So recently, locals tend to prefer Japanese artists on Spotify and Instagram, at 72 and 60 percent respectively, but not at quite the same 85-90 percent distribution that Asatsuma suggests for physical.On Shazam, the preference for Japanese artists bottoms out at only 12% domestic.This suggests that Tokyo locals are more likely to listen to their fellow countrymen and women when they’re in a personal streaming mode and they’re simply curious about foreign music when they’re in a public environment.But YouTube, arguably the most “global” platform of this bunch and the 2nd most visited website in the world, seems to have more of a globalizing effect on Tokyo’s use of it. Looking at Top Artists by local YouTube Video Views, only eight of the top 25 are Japanese. Same story when it comes to Top Tracks by local YouTube Views, with just three of the top 10 originating in Japan. That’s a 32 and a 30 percent distribution, respectively, indicating international preference just might increase the more global the streaming platform gets.Granted, these streaming stats are from the last 28 days, so they’re more current, and also susceptible to fluctuation and recent releases...so if a few Japanese bangers make some great YouTube videos next month, then the numbers might be telling a different story.OutroThat’s it for your Daily Data Dump for Thursday, May 30th, 2019. This is Jason from Chartmetric.Free accounts are at app.chartmetric.com/signupAnd article links and show notes are at: podcast.chartmetric.com.Happy Thursday, see you tomorrow!

Highlights It’s Winner Wednesday, and we’re scanning the top of the SoundCloud and QQ Music charts to see what moods are winning out on two very different streaming platforms. Mission    Good morning, it’s Rutger here at Chartmetric with your 3-minute Data Dump where we upload charts, artists and playlists into your brain so you can stay up on the latest in the music data world.Date This is your Data Dump for Wednesday, May 29th, 2019.Winner Wednesday: Welcome back to this week’s Winner Wednesday, where we’re scanning the SoundCloud and QQ Music charts to see what song valences are winning out on those two very different streaming platforms.First, what the heck is valence? Think of it as the mood or emotional quality of a track. With high valence songs, there’s going to be more positive or cheerful energy, and low valence songs are going to sound a bit more negative, sad, or angry. In other words, 100 percent valence suggests a song might be the happiest you’ve ever heard. 0 percent valence suggests it’s going to be downright depressing.Note that we measure valence irrespective of lyrical content, so there’s plenty potential for a low valence song to have uplifting lyrics, but that’s not typically the case.  Looking at the top of the SoundCloud charts for May 18-24, there’s a clear and unsurprising frontrunner when it comes to genre: hip-hop. In fact, the genre overwhelms the Top 100 consistently, making the Swedish-founded streaming service almost exclusively important to the rap scene. Why does this matter for valence? SoundCloud was crucial for helping niche sub-genres like emo rap and trap — both of which tend to be characterized by melancholy — go mainstream. So much so, in fact, that dark and gritty “SoundCloud rap” has become a genre altogether.   So, is it borne out in the data? For the most part, yes. At No. 1, “Shotta Flow” by NLE Choppa has a 45 percent valence measurement; at No. 3, “Old Town Road” by Lil Nas X is at 47 percent; and if we dip down to No. 4 and No. 5, “Pop Out” by Polo G featuring Lil TJay is only at 25 percent and Earfquake by Tyler, the Creator is only at 41 percent. The outlier here is “Suge” by DaBaby, which is at No. 2 with 85 percent valence.  And that brings us to Chinese streaming service and Tencent subsidiary, QQ Music. Looking at the platform’s Western Music Chart behavior during a similar timeframe, pop and dance are the genre frontrunners, with 50 of 96 songs tagged with those genre identifiers. Here, hip-hop only accounts for eight. With pop and dance frontloading QQ Music’s Western Music Chart, you’d probably expect high valence songs at the top. Would you be right?“Me!” by Taylor Swift featuring Panic! At the Disco’s Brendon Urie, “Rescue Me” by One Republic, and “If I Can’t Have You” by Shawn Mendes hit the high notes here with 66, 64, and 82 percent valence measurements, respectively. But Carly Rae Jepsen and Lana Del Rey, at No. 4 and No. 5, bring out our sensitive side with 37 and 45 percent. Taking the average valence of the top five on each of these charts gives us a total score of 48.6 percent valence for SoundCloud. QQ Music, meanwhile, is a bit less moody at 58.8 percent valence. So, does SoundCloud have more edge? We can’t say that definitively across the board, but we can say that the top of the SoundCloud Chart is less positively valenced than the top of QQ’s Western Music Chart when it comes to mood — and it’s all in the genres each streaming service caters to, which might suggest something about audience geography. Does China have a bigger appetite for happy pop than Westerners with a palette more open to edgy rap?Outro That’s it for your Daily Data Dump for Wednesday, May 29th, 2019. This is Rutger from Chartmetric.Free accounts are at app.chartmetric.com/signupAnd article links and show notes are at: podcast.chartmetric.com.Have a winning Wednesday, see you tomorrow!

Summary Some problems in data are well defined and benefit from a ready-made set of tools. For everything else, there’s Pachyderm, the platform for data science that is built to scale. In this episode Joe Doliner, CEO and co-founder, explains how Pachyderm started as an attempt to make data provenance easier to track, how the platform is architected and used today, and examples of how the underlying principles manifest in the workflows of data engineers and data scientists as they collaborate on data projects. In addition to all of that he also shares his thoughts on their recent round of fund-raising and where the future will take them. If you are looking for a set of tools for building your data science workflows then Pachyderm is a solid choice, featuring data versioning, first class tracking of data lineage, and language agnostic data pipelines.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! Alluxio is an open source, distributed data orchestration layer that makes it easier to scale your compute and your storage independently. By transparently pulling data from underlying silos, Alluxio unlocks the value of your data and allows for modern computation-intensive workloads to become truly elastic and flexible for the cloud. With Alluxio, companies like Barclays, JD.com, Tencent, and Two Sigma can manage data efficiently, accelerate business analytics, and ease the adoption of any cloud. Go to dataengineeringpodcast.com/alluxio today to learn more and thank them for their support. Understanding how your customers are using your product is critical for businesses of any size. To make it easier for startups to focus on delivering useful features Segment offers a flexible and reliable data infrastructure for your customer analytics and custom events. You only need to maintain one integration to instrument your code and get a future-proof way to send data to over 250 services with the flip of a switch. Not only does it free up your engineers’ time, it lets your business users decide what data they want where. Go to dataengineeringpodcast.com/segmentio today to sign up for their startup plan and get $25,000 in Segment credits and $1 million in free software from marketing and analytics companies like AWS, Google, and Intercom. On top of that you’ll get access to Analytics Academy for the educational resources you need to become an expert in data analytics for measuring product-market fit. You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Go to dataengineeringpodcast.com/conferences to learn more and take advantage of our partner discounts when you register. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. To help other people find the show please leave

Highlights It’s Found on Friday, and we’re digging in with our A&R tool to find breaking artists based on YouTube Channel Views, and that’s important, because YouTube is technically the most popular streaming platform in the world.Mission    Good morning, it’s Rutger again at Chartmetric with your 3-minute Data Dump where we upload charts, artists and playlists into your brain so you can stay up on the latest in the music data world.Date This is your Data Dump for Friday, May 24th 2019.Found on Friday: Momoiro Clover Z, Yella Beezy & Junip Welcome back to Found on Friday. We’re digging in with our A&R tool to find artists breaking through the surface in a global way. If we search according to highest growth percentage in YouTube Channel Views, we land on three artists with notable momentum on YouTube right now: Momoiro Clover Z, Yella Beezy, and Junip.First up, J-pop group, let's call them MCZ for short, the first to make theme music for Sailor Moon, Pokémon, AND Dragonball Z, in addition to being the fourth highest grossing artist in Japan in 2013, based on CD, DVD, and Blu-Ray sales — yes, physical is still VERY important in the Japanese music market — they've experienced a 14 percent growth in their YouTube channel views over the last 28-day period.Their Spotify monthly listeners have spiked almost 29 percent over the last 30 days as well. But the group is not new, so what shot them to the top of the breaking list? Well ... they did just come out with a new album on May 17th.…Just about tied with MCZ is Texas rapper Yella Beezy, whose growth percentage is up to 14.4 percent this period from the previous 30 days’ 9.5 percent. Yella Beezy, whose latest track features Gucci Mane and Quavo from Migos, also soared 16 spots from No. 50 to No. 34 on Billboard’s Emerging Artists chart. Switching gears altogether now for Sweden’s folk rock duo Junip, composed of soft-spoken singer-songwriter Jose Gonzalez and Tobias Winterkorn, who experienced a 13 percent jump in the last 28 day-period. This correlates with a 10.2 percent increase in their YouTube channel subscribers over the last 30 days, which is surprising, as it doesn’t look like they’ve released anything recently…. Maybe fans of Rogue Wave and Ben Howard got turned on to them? We don’t know. What we do know is Junip’s monthly Spotify listeners dropped an estimated .2 percent in the last 30 days, but their Spotify followers increased .4 percent in the same period. So, no, not all streaming services are created equal.Check out these stats: YouTube is technically the biggest music streaming source in the world, with close to a billion users consuming music via user upload video streaming. Compare that with just over 200 million users consuming music via “traditional” streaming services like Spotify and Apple Music, and the importance of YouTube stats as some indication of an artist’s digital presence worldwide becomes clear.So there you have it — a Japanese idol group, a Texas rapper, and a Swedish folk duo comprise an eclectic trio of international artists on a YouTube hot streak right now.Outro That’s it for your Daily Data Dump for Friday, May 24th 2019. This is Rutger from Chartmetric.Free accounts are at app.chartmetric.com/signupAnd article links and show notes are at: podcast.chartmetric.com.Happy Friday, see you tomorrow!

HighlightsGrab your passports, it’s Excursion Thursday, and we’re headed to Mumbai, India’s largest city and Spotify’s largest potential market.Mission   Good morning, it’s Jason here at Chartmetric with your 3-minute Data Dump where we upload charts, artists, and playlists into your brain so you can stay up on the latest in the music data world.DateThis is your Data Dump for Thursday, May 23rd, 2019.Excursion Thursday: MumbaiOn today’s Excursion Thursday, we’re taking off to India’s most populated city, Mumbai, which has quickly become a testing ground for Spotify’s global expansion strategy. Until 1995, the “Hollywood of India” was also called Bombay, what many in India saw as a vestige of British colonialism, hence the name change. The city’s booming movie industry lends the city its other famous moniker, “Bollywood”.Mumbai is not only the wealthiest city in India, but it’s also arguably the financial, arts, and entertainment capital of the entire country with an estimated 22.5 million  Mumbaikars more than doubling the population of New York City!It’s clear why Spotify’s weathering its recent challenges in-country, as India’s population is currently at 1.4 billion and climbing — that’s almost 20 percent of everybody on earth, while North America comprises around 5 percent. So, if Spotify’s been able to acquire an estimated 50M monthly active users out of North America’s 366M people and an estimated 60M monthly active users out of Europe’s 743M people, that gives them a market penetration rate lying somewhere between 8 and 15 percent. Apply that to a population of 1.4B, and SPOT’s stock price will rise, for sure.So, based on the city’s listening profile….how’s it going? Unfortunately, it’s too early to tap into Spotify’s local monthly listeners, but we can at least look at other Western platforms that are operating there.Mumbai’s Shazam and YouTube charts definitely reflect the battle between domestic and foreign repertoire preferences.According to the Top 90 tracks by Shazam Chart Occurrences in the past month, a total of 22 bear Indian ISRC codes. That’s around 25% of total Shazam’d tracks we captured, while there are 38 US-based ISRCs present, about 40%.Moving to Shazam’s most charted artists in Mumbai over the last 30 days, American rappers Swae Lee and Lil Nas X come in 1st and 3rd with 52 and 47 chart appearances, respectively, and Puerto Rican singer Farruko in 2nd with 50. Fourth and 5th place go to film music composers Vishal-Shekhar and star singer Arijit Singh with 42 and 41 chart appearances each.Using Top Tracks by YouTube Views, we see a mixed bag at the top, with T. Swift and Brendon Urie’s “Me!” at 235K average daily views and Katy Perry and Migos’ “Bon Appétit” at 77K daily views in 1st and 3rd place respectively. Second place goes to “Aankh Mare” from Bollywood movie Simmba sitting pretty at 188K views. Genre-wise on the Shazam charts in the past month, it’s still a battle between local and foreign fare: with Hip-Hop at 11 genre tags from mostly American artists, Dance at 15 genre tags from an international artist roster, and Pop at 22 genre tags from both Western and Indian artists. Twelve of Pop genre tags are from domestic artists, suggesting there’s a slight skew in the past month  toward the local when it comes to the genre.While Spotify competes with the entrenched Indian streaming service JioSaavn, partly headquartered in Mumbai and specializing in Bollywood music , Mumbai’s demand for both Indian and Western music will prove to either be Spotify’s ace in the hole or rock in its shoe.OutroThat’s a wrap for your Daily Data Dump for Thursday, May 23nd, 2019. This is Jason from Chartmetric.Free accounts are at app.chartmetric.com/signupAnd article links and show notes are at: podcast.chartmetric.com.Hope you’re not too jet-lagged from today’s Excursion Thursday, and we’ll see you back here tomorrow!

HighlightsIt’s Winner Wednesday again, and we’re looking at who’s hot on the Spotify and Deezer charts to examine just how global Europe’s biggest streaming services are?Mission   Good morning, it’s Jason here at Chartmetric with your 3-minute Data Dump where we upload charts, artists, and playlists into your brain so you can stay up on the latest in the music data world.DateThis is your Data Dump for Wednesday, May 22nd, 2019.Winner Wednesday: Deezer & Spotify...who's more global?On today’s Winner Wednesday, we’re looking at who’s hot on Europe’s biggest streaming services, Spotify and Deezer, on their Top 200 Spotify and Top 100 Deezer track charts for May 20th. The #1 and #2 tracks are the same across both platforms, with “emerging artists” Ed Sheeran and Justin Bieber taking the lead spot with “I Don’t Care”, tallying 58.4M streams on Spotify this week and having a 10/10 popularity score on Deezer. Holding strong for almost two months now, Billie Eilish’s “Bad Guy” occupies the #2 position on both apps, with 41.4M streams on Spotify this week and a 9.95/10 popularity score on Deezer currently. But starting from #3 down, the differences between Sweden’s Spotify and France’s Deezer are as wide as the North Sea in between them.For example, Lil Nas X and Billy Ray Cyrus’ “Old Town Road (Remix)” was 3rd on Spotify’s chart but only 9th on Deezer, where Daddy Yankee’s “Con Calma” took 3rd on Deezer but only 14th on Spotify.Shawn Mendes and the late Avicii both appear in each platform’s Top 10 in different places, but otherwise the tracks are completely different.Let’s look at the daily chart summaries: Billie Eilish has 13 songs on Spotify’s Top 200 chart, followed by Tyler, the Creator with 11, Post Malone with 8, with Cardi B and Khalid at 6 tracks each. On Deezer, a blast from the past: Neue Deutsche Härte (or German industrial metal) group Rammstein hold the top spot with 10 tracks in the Deezer Top 100 since their May 17th self-titled album release. For those that were of musical awareness in 1998, the German rockers managed to peak on Billboard’s Mainstream Rock chart at #20 and even appear on MTV’s Total Request Live, which was then the epicenter of US pop culture.Puerto Rico’s Ozuna followed Rammstein with 8 tracks in the Deezer Top 100, and fellow reggaeton kings Daddy Yankee, J Balvin and Anuel AA took the 3rd, 4th and 5th spot with 6 tracks each that day. Note that Spotify’s most placed artists this week are decidedly American, while Deezer’s winners are German, Colombian, and Puerto Rican. So, is Deezer the more global streaming service between the two?Well technically, yes: Deezer is operating in 187 countries compared to Spotify’s 79, though stateside, the now publicly-traded Spotify takes up most of our headlines.But remember: Deezer really just started expanding into the U.S. since 2016, and is privately owned by American conglomerate Access Industries, who also happens to own all of Warner Music Group. So keep your eyes peeled for different charts and each platform’s preferences, as it always helps to remember that no matter where your fans come from, Spotify, Deezer, YouTube, Apple Music, and Amazon listeners all buy the same concert ticket!OutroThat’s a wrap for your Daily Data Dump for Wednesday, May 22nd, 2019. This is Jason from Chartmetric.Free accounts are at app.chartmetric.com/signupAnd article links and show notes are at: podcast.chartmetric.com.Have a winning Wednesday, and we’ll see you back here tomorrow!

IoT has created a tidal wave that data savvy organizations can turn into profitable business solutions. Most IoT data comes from sensors, which are now attached to almost every device imaginable, from factory floor machines and agricultural fields to your cell phone and toothbrush. But IoT is forcing companies to rethink their data architectures to ingest, process, and analyze streaming data in real-time.

To help us understand the impact of IoT on data architectures, we invited Dan Graham to our show for a second time. Dan is a former product marketing manager at both IBM and Teradata, renowned for combining deep technical knowledge with industry marketing savvy. During his tenure at those companies, he was responsible for MPP data management systems, data warehouses, and data lakes, and most recently, the Internet of Things.

HighlightsThis Technique Tuesday brings you a fresh way to take a bite out of your data by learning how to curate the curators, the streaming world’s sometimes mysterious movers and doers. Mission   No, this isn’t Jason with a cold; it’s Chartmetric’s newest voice, Rutger Rosenborg, and I’m happy to be here uploading charts, artists, and playlists into your brain so you can stay up on the latest in the music data world. DateThis is your Data Dump for Tuesday, May 21st, 2019.Technique Tuesday: Curating the CuratorsOn today’s Technique Tuesday, we’re bringing you a fresh way to take a bite out of your data with a spoonful of meta-curation. Curators are the sometimes mysterious movers and doers of the streaming world determining what’s hot, what’s not, and what might have a shot — and all with a playlist. It might not be a surprise to anyone that Apple and Spotify are themselves the biggest curators in the streaming world — after all, they control their own DSPs. Let’s look at the green giant, Spotify,  which has a whopping 7,000-plus self-curated playlists to its name…. with a staggering 1.1 billion followers. How does that work, if its total user count is something like 200 million, counting both premium and ad-based users? Well, users must like Spotify playlists enough to subscribe to tons of them.On the other side of the ring, despite its lower worldwide subscriber count, as a curator, Apple boasts more than twice the number of playlists than Spotify at around 17,500, all said and done.C’est tout? Non, less is more for French streaming service Deezer, which  interestingly features official curators composed of a combination of geographic or genre based anonymous “editors” and face-forward “editors” like Fabio from Brazil, Emilia from Romania, and Stanislav from Russia. While Deezer’s playlist count is low, on the order of 1,500 or so dispersed amongst some 40-odd official Deezer editors, each editor ranges from thousands to multi-millions of followers.There’s also Amazon’s mysterious Music Experts, who dictate all 2,800 playlists in their ecosystem, from “All Hits” to “Country Heat,” and “Pop Culture” to “I Miss the ‘90s.” “Cleaning the House” is a good one too, by the way.But we’re talking macro level here. Let’s get into the weeds. As a curator, Spotify is clearly geared toward frontline pop hits, with its “Today’s Top Hits” playlist absolutely dominating the platform in terms of both listenership, at an estimated 5.7 million a month, and also follower count, at 23.2 million. Apple Music, on the other hand, is a bit more evenly dispersed, with its Hip-Hop, Alternative, and Pop sub-curators sitting at around 1,300 playlists each. Jazz, Rock, Indie, and Country hover between 800 and 1,000. Deezer is a bit more difficult to parse, numbers wise, because its curation focus is more geographic based. Suffice it to say, you’re probably not going to want to hit up Fabio for a Country Western pitch anytime soon.Still too macro? Then it’s microscope time. What about those other curators — you know, the ones who aren’t necessarily funded by billion-dollar corporations.On second thought … Fltr, Digster, and Topsify are three of the biggest third-party playlist curators, and they’re owned by Sony, Universal, and Warner, respectively. While it’s no secret where their curation interests lie, there are still the classic DJ tastemakers like Dmitri Vegas & Like Mike, who boast close to 2 million EDM-focused followers, or market-specific influencers like Hugo Gloss with 1.4 million Brazil-focused followers.What’s clear here is that Spotify and Deezer are somewhat more democratic and accessible platforms for individual tastemakers with some skin in the game. Aside from prominent artists, individual users have managed to rack up hundreds of thousands of followers and exert influence on the playlist game. Apple Music and Amazon Music, on the other hand, have a tighter grip on the curation wheel, making their platforms more difficult to penetrate for third-party tastemakers. OutroThat’s a wrap for your Daily Data Dump for Tuesday, May 21st, 2019. This is Rutger from Chartmetric.Free accounts are at app.chartmetric.com/signupAnd article links and show notes are at: podcast.chartmetric.com.Have a good rest of your Tuesday, and long live King Bran the Broken!

Summary In recent years the traditional approach to building data warehouses has shifted from transforming records before loading, to transforming them afterwards. As a result, the tooling for those transformations needs to be reimagined. The data build tool (dbt) is designed to bring battle tested engineering practices to your analytics pipelines. By providing an opinionated set of best practices it simplifies collaboration and boosts confidence in your data teams. In this episode Drew Banin, creator of dbt, explains how it got started, how it is designed, and how you can start using it today to create reliable and well-tested reports in your favorite data warehouse.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! Understanding how your customers are using your product is critical for businesses of any size. To make it easier for startups to focus on delivering useful features Segment offers a flexible and reliable data infrastructure for your customer analytics and custom events. You only need to maintain one integration to instrument your code and get a future-proof way to send data to over 250 services with the flip of a switch. Not only does it free up your engineers’ time, it lets your business users decide what data they want where. Go to dataengineeringpodcast.com/segmentio today to sign up for their startup plan and get $25,000 in Segment credits and $1 million in free software from marketing and analytics companies like AWS, Google, and Intercom. On top of that you’ll get access to Analytics Academy for the educational resources you need to become an expert in data analytics for measuring product-market fit. You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Go to dataengineeringpodcast.com/conferences to learn more and take advantage of our partner discounts when you register. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Your host is Tobias Macey and today I’m interviewing Drew Banin about DBT, the Data Build Tool, a toolkit for building analytics the way that developers build applications

Interview

Introduction How did you get involved in the area of data management? Can you start by explaining what DBT is and your motivation for creating it? Where does it fit in the overall landscape of data tools and the lifecycle of data in an analytics pipeline? Can you talk through the workflow for someone using DBT? One of the useful features of DBT for stability of analytics is the ability to write and execute tests. Can you explain how those are implemented? The packaging capabilities are beneficial for enabling collaboration. Can you talk through how the packaging system is implemented?

Are these packages driven by Fishtown Analytics or the dbt community?

What are the limitations of modeling everything as a SELECT statement? Making SQL code reusable is notoriously difficult. How does the Jinja templating of DBT address this issue and what are the shortcomings?

What are your thoughts on higher level approaches to SQL that compile down to the specific statements?

Can you explain how DBT is implemented and how the design has evolved since you first began working on it? What are some of the features of DBT that are often overlooked which you find particularly useful? What are some of the most interesting/unexpected/innovative ways that you have seen DBT used? What are the additional features that the commercial version of DBT provides? What are some of the most useful or challenging lessons that you have learned in the process of building and maintaining DBT? When is it the wrong choice? What do you have planned for the future of DBT?

Contact Info

Email @drebanin on Twitter drebanin on GitHub

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

DBT Fishtown Analytics 8Tracks Internet Radio Redshift Magento Stitch Data Fivetran Airflow Business Intelligence Jinja template language BigQuery Snowflake Version Control Git Continuous Integration Test Driven Development Snowplow Analytics

Podcast Episode

dbt-utils We Can Do Better Than SQL blog post from EdgeDB EdgeDB Looker LookML

Podcast Interview

Presto DB

Podcast Interview

Spark SQL Hive Azure SQL Data Warehouse Data Warehouse Data Lake Data Council Conference Slowly Changing Dimensions dbt Archival Mode Analytics Periscope BI dbt docs dbt repository

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast

HighlightsIt’s Found on Friday: we dig up an American rapper, a Dutch DJ and Albanian pop star spiking in their Spotify Popularity Index.Mission   Good morning, it’s Jason here at Chartmetric with your 3-minute Data Dump where we upload charts, artists and playlists into your brain so you can stay up on the latest in the music data world.DateThis is your Data Dump for Friday May 17th 2019.Found on Friday: MAJ, Adam Brown & XhensilaIt’s Found on Friday where we dip into our A&R tool to find emerging artists making their way into the public eye.If we search by the biggest change in Spotify Popularity Index (or SPI) in the past 28 days, we uncover three artists with very different backgrounds: Amercian rapper MAJ, Dutch DJ/producer Adam Brown and Albanian pop star Xhensila.I hope I’m saying these artists’ names correctly, here we go.Going from 1 to 39 SPI in the past month is MAJ, currently based out of Dallas, Texas, featuring “grunge-inflected production, soulful delivery, and nocturnal hip-hop with stark vulnerability and confessional storytelling.”With 155K Spotify monthly listeners and only 3K followers, this gives him a listeners to followers ratio of 51, which for a strong signal for him.From April 26th to May 3rd, he enjoyed a #47 slot on the 100-track New Music Friday playlist, which has 3.2M followers currently.MAJ is still enjoying a Spotify editorial playlist placement on the Shisha Lounge playlist at 375K followers, but more interestingly, he’s on 27 playlists with more than 10K followers that seem to be focused on sub-culture categories such as “sad” or “emo rap” or gaming culture playlists like EA Sports’ NHL franchise. It wouldn’t be a stretch to say that these lower-tier playlists are likely playing a big part in MAJ’s strong rise on the platform.Adam Brown in the Netherlands currently has 11 dance music tracks on Spotify, with his latest track “Your Body” being what seems to be driving his SPI rise in the past month from 1 to 31.This increase isn’t from Spotify playlisting, as he’s on no editorial playlists, and his biggest one is currently “Dance Hits” by curator globalmusicx with only 6.5K followers.The reasons for his jump in SPI in late April is not clear, but by checking his Twitter, it may be from a more organic off-platform source via his own hosted local dance radio show or possibly from club play, given the very electronic music-oriented region and that his #2 and #3 top Spotify monthly listener cities are very locally Dutch: Ermelo and Harderwijk. Definitely butchered those names.Last but not least is pop star Xhensila from Albania, who represents the kind of “emerging artist” that is only emerging to the Spotify market, as Xhensila is already a big deal in her part of the world.In the past month, she jumped from an SPI of 2 to 56 despite having only 100 monthly listeners with 449 followers for a ratio under 1.Her most followed playlist, “Albanian Hits 2019” has 20K followers, but her six total tracks don’t seem to be generating that much attention playlisting wise.More than likely, Xhensila’s Spotify popularity is being generated by her 1.3M followers on Instagram, where her streaming link in her IG bio leads to Spotify. One of the lessons that can be gleaned here is that Spotify statistics are just Spotify statistics...Xhensila obviously is quite the star in Albania, further proven by her 154K YouTube followers and her nine very popular music videos there, the biggest one hitting 39M views to date.So we’ll leave you for the weekend with an American rapper, Dutch DJ and Albanian pop queen to explore...three different paths, three different vibes.OutroThat’s it for your Daily Data Dump for Friday May 17th 2019. This is Jason from Chartmetric.Free accounts are at app.chartmetric.com/signupAnd article links and show notes are at: podcast.chartmetric.com.Happy Friday, see you tomorrow!