talk-data.com talk-data.com

Topic

Data Streaming

realtime event_processing data_flow

739

tagged

Activity Trend

70 peak/qtr
2020-Q1 2026-Q1

Activities

739 activities · Newest first

Summary Data is only valuable if you use it for something, and the first step is knowing that it is available. As organizations grow and data sources proliferate it becomes difficult to keep track of everything, particularly for analysts and data scientists who are not involved with the collection and management of that information. Lyft has build the Amundsen platform to address the problem of data discovery and in this episode Tao Feng and Mark Grover explain how it works, why they built it, and how it has impacted the workflow of data professionals in their organization. If you are struggling to realize the value of your information because you don’t know what you have or where it is then give this a listen and then try out Amundsen for yourself.

Announcements

Welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! Finding the data that you need is tricky, and Amundsen will help you solve that problem. And as your data grows in volume and complexity, there are foundational principles that you can follow to keep data workflows streamlined. Mode – the advanced analytics platform that Lyft trusts – has compiled 3 reasons to rethink data discovery. Read them at dataengineeringpodcast.com/mode-lyft. You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management.For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, the Open Data Science Conference, and Corinium Intelligence. Upcoming events include the O’Reilly AI Conference, the Strata Data Conference, and the combined events of the Data Architecture Summit and Graphorum. Go to dataengineeringpodcast.com/conferences to learn more and take advantage of our partner discounts when you register. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Your host is Tobias Macey and today I’m interviewing Mark Grover and Tao Feng about Amundsen, the data discovery platform and metadata engine that powers self service data access at Lyft

Interview

Introduction How did you get involved in the area of data management? Can you start by explaining what Amundsen is and the problems that it was designed to address?

What was lacking in the existing projects at the time that led you to building a new platform from the ground up?

How does Amundsen fit in the larger ecosystem of data tools?

How does it compare to what WeWork is building with Marquez?

Can you describe the overall architecture of Amundsen and how it has evolved since you began working on it?

What were the main assumptions that you had going into this project and how have they been challenged or updated in the process of building and using it?

What has been the impact of Amundsen on the workflows

Highlights  Spotify, Apple Music, and Deezer’s biggest playlists are growing — both in terms of follower count and also track count — but what does that mean for artists looking to land a big add?Mission   Good morning, it’s Rutger here at Chartmetric with your 3-minute Data Dump where we upload charts, artists and playlists into your brain so you can stay up on the latest in the music data world.We’re on the socials at “chartmetric”, that’s Chartmetric, no “S ”- follow us on LinkedIn, Instagram, Twitter, or Facebook, and talk to us! We’d love to hear from you.Feature: Labels PageHey Rutger, it’s Jason — sorry to interrupt, but can I just do a quick product update? Of course, what’s up?Thanks, man. Hi Chartmetric fans, you may or may not have gotten a chance to check out the new Labels Page feature that we discussed in the last podcast episode this week.We’ve temporarily pulled the feature back from its soft release because we just don’t think it’s up to the music analytics standard we strive for.If you’ve been with us for some time, you’ve seen how dedicated we are to innovating and as we say in the tech world, sometimes “breaking things”.Well, we’ve gotten a lot of your feedback and realize that we jumped the gun a bit early and we need to better clean, organize and visualize the label metadata that we have, which is what we do best.So we recognize the issue, and we are working swiftly to bring the Labels Page back with verve and more importantly, accuracy!Back to our regularly scheduled program, take it away, Rutger!Thanks, Jason!DateThis is your Data Dump for Friday, August 2nd, 2019.How 2019’s Playlist Growth Might Affect Emerging ArtistsThese days, getting onto streaming’s top playlists is sort of the name of the game.It really determines the visibility of emerging artists and cements the longevity of established ones.So, it got us wondering…. What’s been going on on the top playlists in 2019?Hitting the Playlists tab on the Chartmetric homepage brings up tons of playlist information for Spotify, Apple Music, Deezer, and Amazon.From there, we can compare everything going on when it comes to the playlists claiming the top spots across a number of different measurements.On Spotify, Today’s Top Hits maintains the highest follow number, starting the year off with 22.3M and hitting 23.6M by the end of June.That’s 5.8 percent increase for that six month period.On Deezer, Les Titres Du Moment claims the top follower spot, and over the same period, experienced only about 1 percent growth from 9.8M followers to 9.9M followers.Digging in a bit deeper, we can also compare playlist length, aka number of tracks.For that six month period, for example, Spotify’s Hot Country playlist grew 31.4 percent in length, while Apple Music’s The A-List: Pop playlist grew the same amount.But those aren’t the highest numbers. Spotify’s EDM-focused Mint playlist grew 35.8 percent, and Apple’s Hip-Hop-oriented Gymflow playlist grew 66.7 percent.Overall, Apple added more tracks to its top playlists than Spotify did — about 11 percent vs. 23 percent, to be exact. The growth of these playlists, both in terms of follower count and also track count, means a higher chance of an emerging artist landing on one of them and a significant increase in visibility if they do.However, it also makes it more likely that they get lost in the noise, making it hard to capitalize on an otherwise super exciting add.Knowing the genre breakdown of tracks and also the country distribution of artists can help, but we’ll have to save that for another episode. You can also tell us what you find by doing your own digging at chartmetric.com!Outro That’s it for your Daily Data Dump for Friday, August 2nd, 2019. This is Rutger from Chartmetric.Article links and show notes are at: podcast.chartmetric.comAnd if you like what we’re doing, don’t forget to leave us a rating or review!Happy Friday, have a great weekend, and we’ll see you next week!

Summary The ETL pattern that has become commonplace for integrating data from multiple sources has proven useful, but complex to maintain. For a small number of sources it is a tractable problem, but as the overall complexity of the data ecosystem continues to expand it may be time to identify new ways to tame the deluge of information. In this episode Tim Ward, CEO of CluedIn, explains the idea of eventual connectivity as a new paradigm for data integration. Rather than manually defining all of the mappings ahead of time, we can rely on the power of graph databases and some strategic metadata to allow connections to occur as the data becomes available. If you are struggling to maintain a tangle of data pipelines then you might find some new ideas for reducing your workload.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! To connect with the startups that are shaping the future and take advantage of the opportunities that they provide, check out Angel List where you can invest in innovative business, find a job, or post a position of your own. Sign up today at dataengineeringpodcast.com/angel and help support this show. You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management.For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Upcoming events include the O’Reilly AI Conference, the Strata Data Conference, and the combined events of the Data Architecture Summit and Graphorum. Go to dataengineeringpodcast.com/conferences to learn more and take advantage of our partner discounts when you register. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Your host is Tobias Macey and today I’m interviewing Tim Ward about his thoughts on eventual connectivity as a new pattern to replace traditional ETL

Interview

Introduction How did you get involved in the area of data management? Can you start by discussing the challenges and shortcomings that you perceive in the existing practices of ETL? What is eventual connectivity and how does it address the problems with ETL in the current data landscape? In your white paper you mention the benefits of graph technology and how it solves the problem of data integration. Can you talk through an example use case?

How do different implementations of graph databases impact their viability for this use case?

Can you talk through the overall system architecture and data flow for an example implementation of eventual connectivity? How much up-front modeling is necessary to make this a viable approach to data integration? How do the volume and format of the source data impact the technology and archit

Highlights  What’s in a name? For the subgenre game on Spotify, Apple, Deezer, and Amazon’s top playlists … a lot.Mission   Good morning, it’s Rutger here at Chartmetric with your 3-minute Data Dump where we upload charts, artists and playlists into your brain so you can stay up on the latest in the music data world.We’re on the socials at “chartmetric”, that’s Chartmetric, no “S ”- follow us on LinkedIn, Instagram, Twitter, or Facebook, and talk to us! We’d love to hear from you.DateThis is your Data Dump for Friday, July 26th, 2019.Odd Subgenres Gaining Ground on Streaming’s Top PlaylistsEver heard of Brostep, Lilith, Etherpop, Crunk, Redneck, Metropopolis, or Pagode?No? Well, chances are, you’ve listened to examples of some of these subgenres, because they now comprise a small but significant portion of the top playlists on Spotify, Apple, Deezer, and Amazon.Thanks to the new subgenre feature on our Artists tab, where we've categorized the more than 7,000 Spotify genre tags into a handful of parent genres, finding artists that fit these niche descriptors is easy.Just select the main genre and then scroll through the subgenres to filter to your heart’s content.If we apply this taxonomy to Spotify’s Hot Country playlist, which boasts around 5.5M followers, Redneck and Brostep account for 5.7 and 2.9 percent of that playlist’s track distribution when it comes to genre market share.What exactly do those subgenres sound like? Well, let’s just say that Redneck is REALLY country and Brostep is in-your-face Dubstep. On Get Turnt, which also has around 5.5M followers, Crunk has 1.6 percent of the playlist’s genre makeup.The Subgenre is also featured on Apple Music’s Rap Life, #OnRepeat, It’s Lit!!!, and Gymflow playlists, and on Amazon’s Rap Rotation playlist, finding its low at 0.8 percent and its high at 2.9 percent.Crunk actually emerged as a subgenre of Hip-Hop during the ‘90s in the American South.Crank it up.Deezer’s Brand New UK and Neue Hits playlists, meanwhile, feature something called Metropopolis, which is apparently a neologism of Spotify’s and includes artists like Charli XCX, Bleachers, and St. Vincent.That portmanteau — think urban pop — sits at 5.6 percent on Brand New UK (tied with House) and at 5.1 percent on Neue Hits (tied with Rap).Apple’s The A-List: Pop playlist and Amazon’s Pop Hits playlist share a subgenre called Etherpop, which sounds pretty self-explanatory as a combination of ethereal and pop.It wins out on Amazon, where it has 2.9 percent share of the playlist (tied with Emo and R&B) vs. 1.2 on Apple, where it’s tied with Rap, House, K-Pop, and yup, Metropopolis.And that brings us to two subgenres exclusive to Deezer and Amazon’s top playlists, respectively.On Deezer, the Pagode subgenre is seeing success on its Top Brazil and Explosão Brasil playlists, which makes total sense, considering it’s a form of Samba.Brazilians love it, too, as it dominates Explosão Brasil with a 20.8 percent share and comes in third on Top Brazil with 13.3 percent.Lilith, which claims 2.1 percent on Amazon’s Fresh Country and 5.8 percent on Amazon’s I Miss the ‘90s, seems to also be a geo-specific subgenre, with its roots in the Canadian-American traveling music festival, Lilith Fair, which featured an inspiring number of female-fronted folk and rock acts in the ‘90s.If you’re curious about what artists might fall into each of these subgenres — Brostep to Lilith, Etherpop to Crunk, Redneck to Metropopolis, or Pagode to … C86? — you’re in luck with our subgenre filter. Nerd out for free with a Chartmetric account!Outro That’s it for your Daily Data Dump for Friday, July 26th, 2019. This is Rutger from Chartmetric.Free accounts are at chartmetric.comAnd article links and show notes are at: podcast.chartmetric.comHappy Friday, have a great weekend, and we’ll see you next week!

Summary The current trend in data management is to centralize the responsibilities of storing and curating the organization’s information to a data engineering team. This organizational pattern is reinforced by the architectural pattern of data lakes as a solution for managing storage and access. In this episode Zhamak Dehghani shares an alternative approach in the form of a data mesh. Rather than connecting all of your data flows to one destination, empower your individual business units to create data products that can be consumed by other teams. This was an interesting exploration of a different way to think about the relationship between how your data is produced, how it is used, and how to build a technical platform that supports the organizational needs of your business.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! And to grow your professional network and find opportunities with the startups that are changing the world then Angel List is the place to go. Go to dataengineeringpodcast.com/angel to sign up today. You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management.For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Upcoming events include the O’Reilly AI Conference, the Strata Data Conference, and the combined events of the Data Architecture Summit and Graphorum. Go to dataengineeringpodcast.com/conferences to learn more and take advantage of our partner discounts when you register. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Your host is Tobias Macey and today I’m interviewing Zhamak Dehghani about building a distributed data mesh for a domain oriented approach to data management

Interview

Introduction How did you get involved in the area of data management? Can you start by providing your definition of a "data lake" and discussing some of the problems and challenges that they pose?

What are some of the organizational and industry trends that tend to lead to this solution?

You have written a detailed post outlining the concept of a "data mesh" as an alternative to data lakes. Can you give a summary of what you mean by that phrase?

In a domain oriented data model, what are some useful methods for determining appropriate boundaries for the various data products?

What are some of the challenges that arise in this data mesh approach and how do they compare to those of a data lake? One of the primary complications of any data platform, whether distributed or monolithic, is that of discoverability. How do you approach that in a data mesh scenario?

A corollary to the issue of discovery is that of access

Highlights  Is there any relation between follower counts on streaming services and follower counts on social media? Here’s a sneak peek at the trends we’re tracking for some of the biggest artists in the world.Mission   Good morning, it’s Rutger here at Chartmetric with your 3-minute Data Dump where we upload charts, artists and playlists into your brain so you can stay up on the latest in the music data world.Don’t forget to reach out to us on Instagram, Twitter, Facebook, or LinkedIn! We’d love to hear from you.DateThis is your Data Dump for Friday, July 19th, 2019.What Follower Counts Say About Social and Streaming TrendsIn the old days, customers became fans if they not only bought CDs but also concert tickets — consistently and repeatedly.In the digital era, the live space is still important, but streaming platform followers and social media followers are the new metrics for measuring fandom. But does streaming popularity correlate with social media popularity?Is it consistent across the board, or does each streaming platform relate differently to each social platform?To test out these queries, we pulled follower data for artists topping the charts from January through June, and then determined the correlation coefficients for Spotify and Instagram, Spotify and Twitter, and Spotify and Facebook, repeating this process for YouTube, SoundCloud, and Deezer.What panned out from all of our calculations and pretty charts? For the whole story, you’ll have to stay tuned for something special we have in the works for the near future.In the meantime, here’s a teaser: One thing that pops out immediately is how poorly Facebook is correlated with streaming services across the board.If we take the average correlation across eight of the top artists for the past six months, Facebook turns up negligible negative correlation coefficients for Spotify and YouTube and negligible positive correlation coefficients for SoundCloud and Deezer.Instagram, on the other hand, turns up near one-to-one correlations with Spotify and YouTube.Twitter correlates pretty well with each streaming service — but nowhere near the Instagram correlation.In the words of our resident Data Scientist Josh Hayes, “Seems like many of the platforms are moving in similar directions together … except for Facebook.”While Facebook owns both platforms, it’s apparent, at least for the top performing artists, that Facebook Fan growth has either stopped, declined, or failed to keep pace with follower growth on four streaming services.Naturally, the story gets a bit more complicated as we begin to look at trends for particular artists, genres, and more, but hang in there — the full story is coming soon!OutroThat’s it for your Daily Data Dump for Friday, July 19th 2019. This is Rutger from Chartmetric.Free accounts are at chartmetric.comAnd article links and show notes are at: podcast.chartmetric.comHappy Friday, have a great weekend, and we’ll see you next week!

podcast_episode
by Jason Joven (Chartmetric) , Chaz Jenkins (Chartmetric)

Highlights  In Part 3 of the music "trigger cities" mini-series, we explore the music tastes of Mexico City, São Paulo, Buenos Aires, Rio de Janiero, Bogotá, Lima and Santiago.Mission   Good morning, it’s Jason here at Chartmetric with your 3-minute Data Dump where we upload charts, artists and playlists into your brain so you can stay up on the latest in the music data world.We’re on the socials at “chartmetric”, that’s Chartmetric, no “S ”- follow us on Instagram, Twitter, Facebook or LinkedIn, and talk to us! We’d love to hear from you.DateThis is your Data Dump for Wednesday, July 17th, 2019.Latin America "Trigger" CitiesIn case you missed them, we have been working on a written mini-series called “trigger cities”, it’s a concept that Chartmetric’s Partner and Advisor, Chaz Jenkins, an international marketing guru coined many years ago.It’s the idea that in the streaming environment, our algorithms on YouTube, Spotify and all platforms are connected with the tastes of huge cities around the world who also love the same apps.Lauv, the uber-successful independent artist first saw playlist success with his 2017 hit “I Like Me Better” in Southeast Asia! Lauv...is not Asian, but SE Asians adore great pop love songs.Reggaeton from the likes of huge superstars like Colombia’s J Balvin and Puerto Rico’s Bad Bunny are now on top playlists like Spotify’s Today’s Top Hits, a primarily English-language playlist...but their come-up was based on Latin American listeners supporting them more than any other region.So in the interest of knowing what the local markets are like, we wrote about  seven different metropolitan areas in Latin America: Mexico City, São Paulo, Buenos Aires, Rio de Janiero, Bogotá, Lima and Santiago.Five speak Spanish, two speak Brazilian Portuguese, and all love the YouTube.It’s a known fact that Latin America turns to the Google platform more than anything else to listen to music, and the numbers are quite impressive: Bogotá, despite having less than half (10.7M) of Mexico City’s population, took the #1 spot in YouTube views in one week last month with 26.5M views across 1.6M+ artists. The Mexican capital, however, was not far behind with 24.8M, and the two cities seem to be leading YouTube’s consumption in the region, with Lima a distant #3 with 17.1M views.On Spotify, Mexico City-as Spotify’s proclaimed “World’s Music-Streaming Mecca”-took the top spot in the same week with 2.3B non-unique monthly listeners (and this is admittedly odd metric, check the show notes for a link to the explanation), far outstripping Santiago in the #2 spot with 1.5B non-unique monthly listeners (MLs).When it comes to genres, we compiled genre tags on Shazam chart occurrences in these seven cities and found what sounds each city was most curious about when they flipped out their phones.“Urbano latino”-which is primarily reggaeton and Latin trap and the most popular in Santiago, Lima and Bogotá-didn’t show up at all in Brazil, with Brazilian-native genres such as “Sertanejo” (Brazilian country music) asserting their unique identity in the region, with Pop/Rock/Dance all showing strongly in the past month for both cities.This is contrary to the idea that all of Latin America loves reggaeton...just not true.On Instagram, who do you think are the ten most followed artists in the region?Well there’s Selena Gomez, Justin Bieber, Ariana Grande and Beyoncé…...there’s also Maluma and Daddy Yankee...But do you know pop queen Anitta, local icon Ivete Sangalo, comedian-entertainer Whindersson Nunes or the Beyoncé-inspired Ludmilla? They’re all Brazilian, showing how much Brazilians love IG, and also how much they love their own country’s artists.So there’s a taste of Part 3 of our trigger cities mini-series, please do check it out on Medium or LinkedIn and let us know what you think! If you’re into Southeast Asia, we wrote about that too (Medium or LinkedIn). We hope they’re useful insights as you target social media campaigns, forge international collaborations or plan out a tour!Outro That’s it for your Daily Data Dump for Wednesday, July 17th 2019. This is Jason from Chartmetric.Free accounts are at chartmetric.comAnd article links and show notes are at: podcast.chartmetric.comHappy Wednesday, and we’ll see you Friday! 

Summary Successful machine learning and artificial intelligence projects require large volumes of data that is properly labelled. The challenge is that most data is not clean and well annotated, requiring a scalable data labeling process. Ideally this process can be done using the tools and systems that already power your analytics, rather than sending data into a black box. In this episode Mark Sears, CEO of CloudFactory, explains how he and his team built a platform that provides valuable service to businesses and meaningful work to developing nations. He shares the lessons learned in the early years of growing the business, the strategies that have allowed them to scale and train their workforce, and the benefits of working within their customer’s existing platforms. He also shares some valuable insights into the current state of the art for machine learning in the real world.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! Integrating data across the enterprise has been around for decades – so have the techniques to do it. But, a new way of integrating data and improving streams has evolved. By integrating each silo independently – data is able to integrate without any direct relation. At CluedIn they call it “eventual connectivity”. If you want to learn more on how to deliver fast access to your data across the enterprise leveraging this new method, and the technologies that make it possible, get a demo or presentation of the CluedIn Data Hub by visiting dataengineeringpodcast.com/cluedin. And don’t forget to thank them for supporting the show! You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management.For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Coming up this fall is the combined events of Graphorum and the Data Architecture Summit. The agendas have been announced and super early bird registration for up to $300 off is available until July 26th, with early bird pricing for up to $200 off through August 30th. Use the code BNLLC to get an additional 10% off any pass when you register. Go to dataengineeringpodcast.com/conferences to learn more and take advantage of our partner discounts when you register. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Your host is Tobias Macey and today I’m interviewing Mark Sears about Cloud Factory, masters of the art and science of labeling data for Machine Learning and more

Interview

Introduction How did you get involved in the area of data management? Can you start by explaining what CloudFactory is and the story behind it? What are some of the common requirements

podcast_episode
by Vance Joy , Mark Mulligan (Midia Research) , Jason Joven (Chartmetric) , AC/DC (AC/DC) , Steve Boom (Amazon Music)

Highlights  Who says music is all about young people and streaming? Amazon Music and American radio would beg to differ, and we’ll check out a couple of Australian artists who are doing well on them.Mission   Good morning, it’s Jason here at Chartmetric with your 3-minute Data Dump where we upload charts, artists and playlists into your brain so you can stay up on the latest in the music data world.We’re on the socials at “chartmetric”, that’s Chartmetric, no “S ”- follow us on Instagram, Twitter, Facebook or LinkedIn, and talk to us! We’d love to hear from you.FYI, we’re scaling back to 2 episodes per week, why? Because we’re working on some special projects that we will certainly tell you about over the next few months, but we need to make the time to do them! So don’t worry, your phone isn’t playing games with your heart….it’s just us and the Backstreet Boys.Having said all that….DateThis is your Data Dump for Friday, July 12th, 2019.Vance Joy and AC/DC on Amazon Music and US RadioThe Financial Times reported yesterday on the rise of Amazon Music, and how it has experienced a 70 percent growth in subscribers in the past year.The head of Amazon Music- Steve Boom (that’s a great name for a music guy)-  noted that all the other platforms were playing for the younger crowds, but not older consumers. Apparently 14 percent of subscribers to Amazon Music are aged 55 or older, compared with just 5 percent of Spotify’s customers, according to Midia Research’s Mark Mulligan.Now on the radio side of things, Music Business Worldwide reported that AM/FM US radio consumption is growing! Take that, streaming.Radio reached more folks than any other entertainment platform in 2019, according to Nielsen’s Audio Today 2019 report.272M Americans fire up their radios each week, that is 7M more listeners than 2016...and why? Because Americans love their cars, and radios are just there.Now to help illustrate that with actual artists, we’ll turn to two of Australia’s biggest ones, relative newcomer Vance Joy and classic rock gods AC/DC.Vance Joy, the pop/folk singer-songwriter from Melbourne is currently on19 Amazon editorial playlists, including the contextual playlists Rise and Shine, Road Trip: Folk and a chart-like playlist: Best Folk Songs of 2017.His massive hit “Riptide” is actually NOT the most playlisted on the platform, it’s actually another one of his records, “Lay It On Me”, placing in 9 of those 19 Amazon Music playlists.On the 300 influential American radio stations we cover, Joy had as many as 506 spins in the week of Sept 24th 2018, and the week of July 1st, it was down to 91.But it’s all good because the state of Wisconsin LOVES Vance Joy, as his songs have been 1% of all the tracks that state’s radio stations have played since September. Pretty impressive.Now for all-time rock greats AC/DC, straight out of Sydney:They are on 14 Amazon editorial playlists, including the #2 slot on Classic Rock for Lifting, the #5 spot for Pre-Game Grilling, and the #1 spot for 80s Hard Rock Workout...who’s feeling some testosterone?AC/DC hits like “You Shook Me All Night Long” and “Back in Black” seem to resonate most in Boston, Massachusetts and Gainesville, Florida…...but what’s really good to remember is that in case your phone runs out of battery, you can find either of these artists or others by flicking on the old car radio, or simply asking Alexa to do it for you.Outro That’s it for your Daily Data Dump for Friday, July 12th, 2019. This is Jason from Chartmetric.Free accounts are at chartmetric.comAnd article links and show notes are at: podcast.chartmetric.comHappy Friday, and we’ll see you next week! 

Summary The market for data warehouse platforms is large and varied, with options for every use case. ClickHouse is an open source, column-oriented database engine built for interactive analytics with linear scalability. In this episode Robert Hodges and Alexander Zaitsev explain how it is architected to provide these features, the various unique capabilities that it provides, and how to run it in production. It was interesting to learn about some of the custom data types and performance optimizations that are included.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! Integrating data across the enterprise has been around for decades – so have the techniques to do it. But, a new way of integrating data and improving streams has evolved. By integrating each silo independently – data is able to integrate without any direct relation. At CluedIn they call it “eventual connectivity”. If you want to learn more on how to deliver fast access to your data across the enterprise leveraging this new method, and the technologies that make it possible, get a demo or presentation of the CluedIn Data Hub by visiting dataengineeringpodcast.com/cluedin. And don’t forget to thank them for supporting the show! You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management.For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Coming up this fall is the combined events of Graphorum and the Data Architecture Summit. The agendas have been announced and super early bird registration for up to $300 off is available until July 26th, with early bird pricing for up to $200 off through August 30th. Use the code BNLLC to get an additional 10% off any pass when you register. Go to dataengineeringpodcast.com/conferences to learn more and take advantage of our partner discounts when you register. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Your host is Tobias Macey and today I’m interviewing Robert Hodges and Alexander Zaitsev about Clickhouse, an open source, column-oriented database for fast and scalable OLAP queries

Interview

Introduction How did you get involved in the area of data management? Can you start by explaining what Clickhouse is and how you each got involved with it?

What are the primary use cases that Clickhouse is targeting? Where does it fit in the database market and how does it compare to other column stores, both open source and commercial?

Can you describe how Clickhouse is architected? Can you talk through the lifecycle of a given record or set of records from when they first get inserted into Clickhouse, through the engine an

Summary Anomaly detection is a capability that is useful in a variety of problem domains, including finance, internet of things, and systems monitoring. Scaling the volume of events that can be processed in real-time can be challenging, so Paul Brebner from Instaclustr set out to see how far he could push Kafka and Cassandra for this use case. In this interview he explains the system design that he tested, his findings for how these tools were able to work together, and how they behaved at different orders of scale. It was an interesting conversation about how he stress tested the Instaclustr managed service for benchmarking an application that has real-world utility.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! Integrating data across the enterprise has been around for decades – so have the techniques to do it. But, a new way of integrating data and improving streams has evolved. By integrating each silo independently – data is able to integrate without any direct relation. At CluedIn they call it “eventual connectivity”. If you want to learn more on how to deliver fast access to your data across the enterprise leveraging this new method, and the technologies that make it possible, get a demo or presentation of the CluedIn Data Hub by visiting dataengineeringpodcast.com/cluedin. And don’t forget to thank them for supporting the show! You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management.For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Coming up this fall is the combined events of Graphorum and the Data Architecture Summit. The agendas have been announced and super early bird registration for up to $300 off is available until July 26th, with early bird pricing for up to $200 off through August 30th. Use the code BNLLC to get an additional 10% off any pass when you register. Go to dataengineeringpodcast.com/conferences to learn more and take advantage of our partner discounts when you register. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Your host is Tobias Macey and today I’m interviewing Paul Brebner about his experience designing and building a scalable, real-time anomaly detection system using Kafka and Cassandra

Interview

Introduction How did you get involved in the area of data management? Can you start by describing the problem that you were trying to solve and the requirements that you were aiming for?

What are some example cases where anomaly detection is useful or necessary?

Once you had established the requirements in terms of functionality and data volume, what was your approach for dete

Highlights  On Part 1 of our streaming manipulation series, we took you on a wild ride into depths of playlist fixing. Today, on Part 2, we’re zeroing in on fake artists.Mission   Good morning, it’s Rutger here at Chartmetric with your 3-minute Data Dump where we upload charts, artists and playlists into your brain so you can stay up on the latest in the music data world.DateThis is your Data Dump for Monday, July 1st, 2019.Enter the World of Streaming Manipulation, Part 2 - Fake ArtistsFor Part 1 of our streaming manipulation series, we covered some funny business in the playlisting world. On Part 2, we’re scratching a different part of streaming’s underbelly: fake artist accounts.Last November, Pop Buzz and others covered a mysterious account uploading ostensibly unreleased Ariana Grande tracks under the name Zandhr.As it turned out, the tracks had been available online for some time, but that didn’t change the fact that a streaming account reportedly not linked to Ariana Grande, according to the BBC, was uploading her intellectual property to potentially profit off of.While the Zandhr account has since been taken down, our data suggests the fake artist accrued 9.5K Spotify followers and almost 30K monthly Spotify listeners, in addition to landing an “Ariana Grande - Every Song” playlist with some 20K+ followers of its own.Playboi Carti found himself in a similar predicament when three different fake accounts — Lil Kambo, Unocarti, and Unocompac — started uploading his tracks, with some pitch-shifting his songs in an attempt to disguise the illegitimate uploads.While both Lil Kambo and Unocarti’s profiles appear to have been taken down, the former amassed a 50K+ playlist reach from 37 playlists and the latter almost a 20K playlist reach from 19 playlists.Unocompac, meanwhile, appears to still have at least one Playboi Carti song up, enjoying 14K Spotify followers and a 30K playlist reach from 54 playlists.The best — or worst — part is that Unocompac’s artist gallery on Spotify includes three out-of-focus nighttime shots of a white suburban teenager posing and throwing up fake gang signs.Shaking my damn head.While this all might seem rather innocuous, as most of these accounts never amass more than a couple of thousand followers, it’s important to remember ...One, fake artist accounts effectively steal intellectual property and income from the legitimate artists they’re “impersonating.”And two, fake artist accounts devalue the work of all legitimate artists who have put their blood, sweat, and tears into making and marketing their art. While this phenomenon probably isn’t something to worry about in the short-term, how it’s handled now will determine how big of a problem it becomes in the long-term.With so many metadata errors, artist-song mismatches, and unclaimed blackbox royalties as a result, the last thing artists need is an army of mysterious impersonators gaming the system. OutroThat’s it for your Daily Data Dump for Monday, July 1st, 2019. This is Rutger from Chartmetric.Free accounts are at chartmetric.comAnd article links and show notes are at: podcast.chartmetric.comHappy Monday, and we’ll see you tomorrow!

Highlights  Fake streams! Playlist manipulation! Fake artists! There’s a lot of buzz about it, but what does this look like in the data?Mission   Good morning, it’s Jason here at Chartmetric with your 3-minute Data Dump where we upload charts, artists and playlists into your brain so you can stay up on the latest in the music data world.DateThis is your Data Dump for Friday, June 28th, 2019.Enter the World of Streaming ManipulationLast week’s streaming code of conduct was signed by more than 20 major companies across the industry to combat streaming fraud, which is good for artist compensation and more forthcoming to the fans.How can we think about this prickly topic from a music data perspective? And when we say “this”, it’s not just fake streams. It’s also playlist manipulation and fake artist accounts.For sure, we are in very murky waters, and there is little actual data on the phenomenon.Recently American indie label Hopeless Records estimated 3-4 percent of global streams could be fraudulent.But a 2015 MBW article mentions how 60% or more Twitter followers on top artist accounts could also be fake.Granted, these are different types of fraudulent behavior, but it’s also a huge delta to try to account for.What we can do though is search for red flags in the music data available to us.For example: if we look at playlist manipulation, here’s one way to look at the data to try to identify potentially iffy behavior:We scanned the playlist charts looking for abnormally high 28-day follower increases, and found a non-editorial hip-hop genre playlist with a 262% increase in followers in the past month.While that could just be great marketing, currently having 110K followers-an impressive number-its max artist monthly listeners, however, is only ~470, which doesn’t seem to match up.This means that the only artist on the playlist that gets a lot of its unique listeners from here is getting less than 1% of its supposed followers actually listening to them.Again, possible, especially since the playlist has about 100 current tracks on it, but it’s ranked in the first third of the playlist, so it’s not likely.That artist, which only has a little over 200 followers, is playlisted among high-profile artists like Eminem, Kanye West and Cardi B, presumably to draw traffic, which would be smart marketing if done legitimately, but if so many followers are not streaming the actual tracks...it smells a little fishy.If that weren’t enough, there’s a three-piece pop band with only 16 followers, and two other rap artists who have 4 and 17 Spotify followers, respectively.All three have their listed label as a series of numbers, then “Records DK” or “DK2”, which is a default label for the distributor DistroKid, if left untouched.DistroKid is one of the most popular digital distributors available to independent artists and an official partner distributor with Spotify.If that still isn’t enough, all the playlist album artwork looks like carbon copies of official Spotify playlist album art. Again, good marketing tactic...or borderline deception?So while it’s admittedly an analytical leap, it is very possible that a playlist curator is buying illegitimate playlist followers to make themselves look good, they dupe unknowing artists into thinking they are getting amazing exposure, and the curator gets paid accordingly and in our opinion, unfairly.We could be completely 100% wrong on this, but the point is, there are certain ways you can look at the music data to try to suss out what’s likely real, and what at least should raise some red flags.We’ll try to unpack some other types of illegitimate activity from a data perspective next week.Outro That’s it for your Daily Data Dump for Friday, June 28th, 2019. This is Jason from Chartmetric.Do you know how NPR does their ask for donations every so often? That’s what we’re about to do now! But we’re just asking for an Apple Podcasts rating.Rutger and I put at least a few hours a day into each episode, researching, writing, editing, recording, editing again, publishing to multiple platforms, checking analytics...and it’d be really cool for us to get some feedback on how we’re doing: the good/bad/ugly. So it’d only takes a few thumb swipes out of your day, and you’d be sending us so much joy: we’d appreciate it.As always, free accounts are at chartmetric.comAnd article links and show notes are at: podcast.chartmetric.comHappy Friday, have a great weekend, and see you on Monday!

HighlightsFollow us down to the trigger cities of Southeast Asia where their Shazam, Spotify, and YouTube charts have some big implications for tour strategy and catalog exploitation.Mission   Good morning, it’s Rutger here at Chartmetric with your 3-minute Data Dump where we upload charts, artists and playlists into your brain so you can stay up on the latest in the music data world.DateThis is your Data Dump for Thursday, June 27th, 2019.Trigger Cities in Southeast Asia On our blog this week, Jason did an epic analysis of Southeast Asia’s trigger cities, revealing what implications their Shazam, Spotify, and YouTube charts have for tour strategy and catalog exploitation.We’re just scratching the surface of it here.First, Shazam. From Singapore’s 41 pop genre tags to Jakarta’s 40 to Kuala Lumpur’s 37 down to Bangkok’s 30, an overwhelming Southeast Asian love of pop music in the past month would be an understatement.However, the region doesn’t appear to care much about querying hip-hop or rap, as the genre only makes a 10th place appearance in Jakarta.On Spotify, K-pop group BLACKPINK is currently the hottest act throughout the region, having 2.11M monthly listeners in the past month.Our good friend Lauv (remember him from our June 3 episode?) slides into #2 with 2.10M monthly listeners.With the exception of BLACKPINK, all other artists have US or UK origins.Given Spotify’s northern European origins and that its most popular artists are also of Western origin, this makes sense.Ho Chi Minh City, Vietnam, however, seems to exist in its own silo. More commonly known as Saigon, locals prefer Korean acts, sharing a love of K-pop boy band SEVENTEEN with Bangkok.But the city’s #1 most listened-to artist on Spotify is their “queen of V-pop,” Mỹ Tâm. An outlier here, however, is Ho Chi Minh City’s third most listened to artist on Spotify: Nashville’s Landon Austin.Austin’s covers are apparently catnip for Southeast Asia’s love of non-controversial pop, because his top five cities by Spotify monthly listeners are all in Southeast Asia.Should Austin be touring the region like a madman, then?Based on the available data, it sure looks like it, but we can’t rule out the possibility of bots and bought streams — for which a lot more research still has to be done.On YouTube, BLACKPINK and BTS, two of Korea’s biggest international acts, consistently appear in the top 10 artists by YouTube daily video views.Aggregating the top 10 artists of each of the six Southeast Asian cities for YouTube daily views, the #6 most viewed artist is Brad Kane. If you missed our May 16 podcast episode on Quezon City, Kane was the titular character’s original singing voice for the 1992 Disney animated film Aladdin, which has just been re-released as a live action film starring Will Smith.The fact that the New York City actor, singer, and producer’s rendition of “A Whole New World” has stirred up so much engagement 27 years later in Southeast Asia says something about how locals consume music … not necessarily to support the artist, but for their own karaoke endeavors!So, if you’re looking to exploit catalog records, this might be the perfect spot.But don’t count out domestic artists.Three Southeast Asian artists make the region’s top 10 most viewed: Bangkok trap rapper YOUNGOHM (at #4 with 1.1M daily views), Indonesian singer Nella Kharisma (at #7 with 637K daily views), and Bangkok punk rock band Labanoon (at #9 with 589K daily views).One distinct takeaway with these domestic artists is that their YouTube support comes exclusively from their home countries. Since all three are proudly delivering content in their mother tongues, they are likely limiting their global market appeal, but it’s also why they resonate so well with their fellow country people.As Jason puts it, looking at a certain market’s music data raises our awareness about who the fans are, what their specific cultural histories have been, and how they are now living as a reflection of it.  Well said, but something to consider beyond the computer screen is the fact that digital behavior doesn’t always correspond directly to behavior in the real world.Which is why, before you completely tailor your tour or marketing strategy to your streaming data, make sure you’ve considered all avenues of information.Spotify numbers don’t always translate to ticket sales.OutroThat’s it for your Daily Data Dump for Thursday, June 27th, 2019. This is Rutger from Chartmetric.If you want to read Jason’s piece in full and look at some pretty charts, it’s up on our blog at blog.chartmetric.io.Free accounts are at chartmetric.comAnd article links and show notes are at: podcast.chartmetric.com.Happy Thursday, and see you tomorrow!

Summary Building a data platform that works equally well for data engineering and data science is a task that requires familiarity with the needs of both roles. Data engineering platforms have a strong focus on stateful execution and tasks that are strictly ordered based on dependency graphs. Data science platforms provide an environment that is conducive to rapid experimentation and iteration, with data flowing directly between stages. Jeremiah Lowin has gained experience in both styles of working, leading him to be frustrated with all of the available tools. In this episode he explains his motivation for creating a new workflow engine that marries the needs of data engineers and data scientists, how it helps to smooth the handoffs between teams working on data projects, and how the design lets you focus on what you care about while it handles the failure cases for you. It is exciting to see a new generation of workflow engine that is learning from the benefits and failures of previous tools for processing your data pipelines.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management.For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Coming up this fall is the combined events of Graphorum and the Data Architecture Summit. The agendas have been announced and super early bird registration for up to $300 off is available until July 26th, with early bird pricing for up to $200 off through August 30th. Use the code BNLLC to get an additional 10% off any pass when you register. Go to dataengineeringpodcast.com/conferences to learn more and take advantage of our partner discounts when you register. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Your host is Tobias Macey and today I’m interviewing Jeremiah Lowin about Prefect, a workflow platform for data engineering

Interview

Introduction How did you get involved in the area of data management? Can you start by explaining what Prefect is and your motivation for creating it? What are the axes along which a workflow engine can differentiate itself, and which of those have you focused on for Prefect? In some of your blog posts and your PyData presentation you discuss the concept of negative vs. positive engineering. Can you briefly outline what you mean by that and the ways that Prefect handles the negative cases for you? How is Prefect itself implemented and what tools or systems have you relied on most heavily for inspiration? How do you manage passing data between stages in a pipeline when they are running across distributed nodes? What was your decision making process when deciding to use Dask as your supported execution engine?

For tasks that require specific resources or dependencies how do you approach the idea of task affinity?

Does Prefect support managing tasks that bridge network boundaries? What are some of the features or capabilities of Prefect that are misunderstood or overlooked by users which you think should be exercised more often? What are the limitations of the open source core as compared to the cloud offering that you are building? What were your assumptions going into this project and how have they been challenged or updated as you dug deeper into the problem domain and received feedback from users? What are some of the most interesting/innovative/unexpected ways that you have seen Prefect used? When is Prefect the wrong choice? In your experience working on Airflow and Prefect, what are some of the common challenges and anti-patterns that arise in data engineering projects?

What are some best practices and industry trends that you are most excited by?

What do you have planned for the future of the Prefect project and company?

Contact Info

LinkedIn @jlowin on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Prefect Airflow Dask

Podcast Episode

Prefect Blog PyData Presentation Tensorflow Workflow Engine

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast

Streaming Data

Managers and staff responsible for planning, hiring, and allocating resources need to understand how streaming data can fundamentally change their organizations. Companies everywhere are disrupting business, government, and society by using data and analytics to shape their business. Even if you don’t have deep knowledge of programming or digital technology, this high-level introduction brings data streaming into focus. You won’t find math or programming details here, or recommendations for particular tools in this rapidly evolving space. But you will explore the decision-making technologies and practices that organizations need to process streaming data and respond to fast-changing events. By describing the principles and activities behind this new phenomenon, author Andy Oram shows you how streaming data provides hidden gems of information that can transform the way your business works. Learn where streaming data comes from and how companies put it to work Follow a simple data processing project from ingesting and analyzing data to presenting results Explore how (and why) big data processing tools have evolved from MapReduce to Kubernetes Understand why streaming data is particularly useful for machine learning projects Learn how containers, microservices, and cloud computing led to continuous integration and DevOps

podcast_episode
by Martin Mills (Beggars Group) , Jason Joven (Chartmetric) , Portia Sabin (Kill Rock Stars)

HighlightsFollowing a panel including Beggars Group’s Martin Mills and Kill Rock Star’s Portia Sabin, we’re looking at artists on their rosters and asking, “What makes them two of indie music’s longest lasting labels?”  Mission    Good morning, it’s Jason here at Chartmetric with your 3-minute Data Dump where we upload charts, artists, and playlists into your brain so you can stay up on the latest in the music data world.DateThis is your Data Dump for Friday, June 21, 2019.A2IM Indie Week, Day 4Several Indie icons closed out A2IM’s Indie Week in New York City yesterday, two of them being the legendary Martin Mills and Dr. Portia Sabin sharing what’s helped them make Beggars Group and Kill Rock Stars, respectively, some of indie music’s longest lasting labels. Beggars Group is the parent company of 4AD, Rough Trade Records, Matador, XL Recordings and Young Turks.Mills started it in London in 1977, and his many labels have gone on to sign everyone from Adele to Radiohead.While Adele hasn’t released anything for some time, her 25 album, which released physically in November 2015 and digitally in June 2016 via a joint deal between XL Recordings and Sony’s Columbia, “single-handedly revived global album sales”, according to the Guardian.The album’s streaming success is no joke either, as it’s maintained a 70-80 Spotify Popularity Index score over the last three years, and has been included on upwards of 12.5K Spotify playlists.That kind of success under XL’s guidance gave Adele the leverage to be able to sign an enormous and unprecedented £90 million deal with Sony in May 2016.No doubt the industry will be keen to check her next album from one of the industry’s biggest major labels.Now entering the underground, since 2006, Sabin has run Pacific Northwest-based indie label Kill Rock Stars, which has been a home to riot grrrl legends Bikini Kill and Sleater-Kinney, the late singer-songwriter Elliott Smith, and folk rockers the Decemberists.Sabin’s roster is more niche than Mills’, but Kill Rock Stars’ ability to navigate catalog digitization and promotion has allowed their artists to prosper.Smith, for instance, maintains some 1.4M monthly listeners on Spotify, despite the fact that he passed away tragically in 2003. In March 2017, Kill Rock Stars released an expanded edition of his 1997 album Either/Or, which helped increase Smith’s Spotify followers by around 70 percent to 430K and spiked his monthly listenership by an estimated 250K. Whether by keen artist development or catalog revitalization, Beggars Group and Kill Rock Stars have each found a way to not only survive longer than most indie labels, but to also thrive while doing so.OutroThat’s it for Indie Week and your Daily Data Dump for Friday, June 21, 2019. This is Jason from Chartmetric.Free accounts are at chartmetric.comAnd article links and show notes are at: podcast.chartmetric.comHappy Friday, and have a great weekend!

2019-06-19 // A2IM Indie Week, Day 2: Spotify’s Indie Curators HighlightsSpotify and major label curators always move the needle, but with Day 2 of A2IM’s Indie Week in the bag, we’re looking at important indies of the bunch.Mission    Good morning, it’s Rutger here at Chartmetric with your 3-minute Data Dump where we upload charts, artists, and playlists into your brain so you can stay up on the latest in the music data world.DateThis is your Data Dump for Wednesday, June 19, 2019.A2IM Indie Week, Day 2With Day 2 of A2IM’s Indie Week in the bag, we’re looking at important indie curators moving the needle on Spotify.Way up at the top is PopFiltr, with nearly 5 million playlist followers across 13 playlists.Boasting a 13 percent follower growth rate over the last 28-day period, PopFiltr has plenty to brag about, and artists or labels can submit their songs for consideration at popfiltr.com. Indiemono is another hidden gem, with 2.2 million playlist followers across a jaw-dropping 252 playlists, which a little something for everyone.In the last 28 days, Indiemono experienced a 3 percent follower growth rate, and they also offer an easy song submission process at indiemono.com.There’s also the indie indies, or the individual curators who are so good at what they do, they continue to kill it flying solo. Take Ignatious Pop, for example, whose 451 playlists have just over 2 million followers and a 4 percent growth rate in the last 28 days.Or Jesuss Vargas Gonzalez, whose 93 playlists have 1.5 million followers and an 11 percent growth rate in the last 28 days.Landing their playlists is probably going to be a bit harder, as they’re less about submissions and more about discoveries.Also keep an eye on up-and-comers Playlist Pop, with a 71 percent growth rate…. Independent Hits, with a 539 percent growth rate, meaning they’re probably new and growing really fast….And ambitious LA-based indie label and playlist network Plvylists (who’ve swapped out the “A” for a “V”), with a 125 percent growth rate.The more that major streaming platforms corral the radio market, the more important curators will become as promoters of what’s hot, what’s new, and what’s never been heard.OutroThat’s it for your Daily Data Dump for Wednesday, June 19, 2019. This is Rutger from Chartmetric.Free accounts are at chartmetric.comAnd article links and show notes are at: podcast.chartmetric.com.Happy Wednesday, and we’ll see you tomorrow from Indie Week!

HighlightsWe’re on the road! We’re at A2IM’s Indie Week in New York City, Day 1 is over and my feet hurt.Mission    Good morning, it’s Jason here at Chartmetric with your 3-minute Data Dump where we upload charts, artists and playlists into your brain so you can stay up on the latest in the music data world.DateThis is your Data Dump for Tuesday, June 18, 2019.A2IM Indie Week, Day 1Hi all, Jason reporting from New York City and this will admittedly be a quick one.Day 1 of Indie Week is over, wanted to first share some thoughts before we call it a night.In talks today with various labels, distributors, agencies and so on involved with different sectors of the music business, three takeaways were as follows:People might not always need super-charged, crazy data ninja magic insights...they simply want to know that they got on a playlist.Sometimes there’s so much going with multiple artists on a label roster or they have 30 Spotify or Apple for Artists tabs open, all with multiple tracks on playlists in different territories…...and you just want to know with a simple notification that a certain track made a playlist. We hear you, and simple can also be best.Stream count does not always equal revenue in other categories, like merchandise or branding opportunities or ticket sales.Dependent on genre or the way an artist engages with their fans, they may not be creating crazy streaming numbers on the typical music platforms, but they’ll still be selling out multiple shows or merch items.Maybe they resonate more on physical, or YouTube or terrestrial radio or TikTok, but the streaming playlist world isn’t the end all, be all.On the same token, just because an artist is highly touted with ba-jillion streams, doesn’t necessarily mean they do as well in other revenue categories.So make sure you’re taking all types of data into account, not just spins...any maybe what you really need to be tracking still has yet to find a quality, scalable data solution.Sharing data insights with your artists can help encourage desired behavior.Maybe your artist doesn’t like social media. Maybe they don’t want to tour in a particular part of town. Maybe they don’t want to work on a collaboration with another artist who could widen your fan base...these are all understandable things that from an artist’s perspective, might not be very obvious moves and might feel too “businessey” for them to buy into as a creative being.But most artists today I’d argue are quite data-savvy, and if you shared a certain chart of how that one Tweet you did get them to do helped get them 10 or 100 more followers for them to connect with down the road, all the better. Or that even though they just want to tour stateside...what if they saw their last EP over-indexed by 35% in monthly listeners in Jakarta, Indonesia in the past month...maybe it’s time to renew that passport?All this to say: of course you’re sharing your coolest data insights with your marketing team or promotion person or what have you….but consider being more proactive with sharing them with your artist, because they might just appreciate it!OutroThat’s it for your Daily Data Dump for Tuesday, June 18, 2019. This is Jason from Chartmetric.Free accounts are at chartmetric.comAnd article links and show notes are at: podcast.chartmetric.com.Happy Tuesday, we’ll see you tomorrow from Indie Week! Peace.

podcast_episode
by Vagabon (Nonesuch Records) , Jason Joven (Chartmetric) , Khruangbin (Dead Oceans) , Julien Baker (Matador Records)

HighlightsWe’re on the road! We’re at A2IM’s Indie Week in New York City and so we’ll publish our music data-related thoughts and experiences for you starting in tomorrow’s episode in case you can’t make it.But for today, we’ll celebrate the indie community on Amazon Music with an indie-focused New Music Friday Monday!Mission   Good morning, it’s Jason here at Chartmetric with your 3-minute Data Dump where we upload charts, artists and playlists into your brain so you can stay up on the latest in the music data world.DateThis is your Data Dump for Monday, June 17, 2019.New Music Friday Monday: Fresh Indie on AmazonHopping over to the “Fresh Indie” playlist on Amazon Music, we’ve got no less than 60 tracks of the most brand spanking new independent music in the streaming world.The tracks all come from over 35 different indie labels to include 4AD, ATO Records and XL recordings.Over 64% of the artists featured are from the US, 16% from the UK, and then Canada/Norway/Australia/New Zealand making up the rest of the Anglo-focused playlist.Just under half of the list has either the indiepop, folk-pop or indietronica genre tag attached to it, with 15+ other genre tags thrown in to make for a diverse-sounding set.In the #4 position is the funk-addled “Mary Always” instrumental track by Houston-born band Khruangbin, mixing soul, dub, psychedelia, and Thai funk.The track is currently on nine Spotify editorial playlists including All New Indie w/ 958K followers and 2 Apple editorial playlists including Today’s Indie Rock.The great playlist promotion is coming out of Bloomington, Indiana, where the track’s Dead Oceans label is housed with the Secretly Group, an umbrella of indie labels putting out rock music of different flavors.In the #9 spot is the spacious, introspective track “Conversation Piece” by Memphis, Tennessee’s Julien Baker.Currently on no Spotify editorial playlists and 1 Apple editorial playlist, the Late Night Menu, the Matador Records release is the latest from the singer-songwriter known for heart-wrenching lyricism and melody.What’s uber cool about Baker is that she is also part of supergroup boygenius, also under Matador, with Phoebe Bridgers and Lucy Dacus, kind of following the K-pop model of supergroup splitting off into solo careers, but just the reverse, as boygenius formed in 2018 and each member had solo careers as early as 2014. Last but not least is “Flood Hands” by Vagabon, coming from Nonesuch Records.Vagabon is in the #12 slot on the Amazon playlist, currently on 3 Spotify editorial playlists, also including All New Indie with Khruangbin and 2 Apple editorial playlists, also including Today’s Indie Rock.Released on June 13, it’s the latest from the Cameroon-born multi-instrumentalist now based in NYC...where we are this week for A2IM’s Indie Week!OutroThat’s it for your Daily Data Dump for Monday, June 17, 2019. This is Jason from Chartmetric.Free accounts are at chartmetric.comAnd article links and show notes are at: podcast.chartmetric.comHappy Monday, we’ll see you tomorrow from NYC’s Indie Week floor! Bye.