talk-data.com talk-data.com

Topic

Data Streaming

realtime event_processing data_flow

739

tagged

Activity Trend

70 peak/qtr
2020-Q1 2026-Q1

Activities

739 activities · Newest first

Highlights  It’s official: We've launched 6MO, our first-ever Global Music Industry Data Report! We're thrilled to present you with our comprehensive view — from a music data perspective — of the first six months of 2019. Dig in to Part 1 with us here.Mission   Good morning, it’s Rutger here at Chartmetric with your 3-minute Data Dump where we upload charts, artists, and playlists into your brain so you can stay up on the latest in the music data world.We’re on the socials at “chartmetric” — that’s Chartmetric, no “S.” Follow us on LinkedIn, Instagram, Twitter, or Facebook, and talk to us! We’d love to hear from you.DateThis is your Data Dump for Wednesday, Oct. 2, 2019.6MO Global Music Industry Data Report, Part 1: Semi-Annual AwardsIf you haven’t heard yet, we officially released our first-ever Global Music Industry Data Report on Tuesday, and the response has us very excited to dive into it with you guys here.Last week, we explained the 30-page structure: Semi-Annual Awards, Platform-Playlist Analysis, and Strategic Business Insights.Today, we’re tackling Part 1, our Chartmetric Semi-Annual Awards, which rank the top performing artists in terms of absolute and percentage-based growth across multiple metrics on June 30, 2019, the last day of the six-month period we tracked.By the way, if you’ve got the report in hand, feel free to scroll or flip along with us.First off, our Cross-Platform Performance Award, as you might imagine, revealed some familiar names in the Top 10 in terms of overall streaming and social popularity — from T. Swift to Shawn Mendes and Rihanna to Justin Bieber and Ariana Grande.However, the interesting stories were J Balvin at No. 2 and Daddy Yankee at No. 7, reflecting Latin’s growth outside of Latin America itself, and the late Avicii at No. 10, likely due to his strong catalog consistently driving 3M+ YouTube views daily, his April release of “SOS” with Aloe Blacc, and the full posthumous album release of Tim on June 6.When it came to YouTube Channel Views gain as of June 25, 2019, six of the Top 10 artists with the highest gains were primarily Spanish-speaking, showcasing the strength of both Latin content and also the popularity of the YouTube platform for Latin audiences.Keep in mind, however, that India-specific music charts didn’t launch until two weeks ago, so that data could very well change up the distribution in a big way.Stay tuned for our July to December report to see if 6MO months prove that to be the case!For Spotify Monthly Listener Gain as of June 30, 2019, collaborations were crucial to Lunay’s 557 percent and Jhay Cortez’s 521 percent lifts — not to mention Billy Ray Cyrus’ 3,032 percent increase as a result of his “Old Town Road” collab with Lil Nas X.On Twitter, Follower Gain was all about diversity, with three Korean groups, three Americans, two Brazilians, one Nigerian, and one Turkish rocker comprising the Top 10 percentage gains.And on our own platform, BTS won out on the Artist Follower front and Spotify curators dominated in terms of Playlist Followers. It would be an understatement to say that this is just the tip of the iceberg for Part 1, so please, keep digging into it, and let us know what else you find!Next up, we’re taking on Part 2, our Platform-Playlist Analysis, where we break down artist country market share and artist genre market share on Amazon, Apple, Deezer, and Spotify’s top 30 playlists.So, stay tuned for that!Outro That’s it for your Daily Data Dump for Wednesday, Oct. 2, 2019. This is Rutger from Chartmetric.Free accounts are available at chartmetric.com And article links and show notes are at: podcast.chartmetric.comBy the way, if you haven’t downloaded our report yet, you can find it all across our socials and in our show notes!Happy Wednesday, and we’ll see you on Friday for Part 2!

Summary Building an end-to-end data pipeline for your machine learning projects is a complex task, made more difficult by the variety of ways that you can structure it. Kedro is a framework that provides an opinionated workflow that lets you focus on the parts that matter, so that you don’t waste time on gluing the steps together. In this episode Tom Goldenberg explains how it works, how it is being used at Quantum Black for customer projects, and how it can help you structure your own. Definitely worth a listen to gain more understanding of the benefits that a standardized process can provide.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, Corinium Global Intelligence, and Data Council. Upcoming events include the combined events of the Data Architecture Summit and Graphorum, Data Council in Barcelona, and the Data Orchestration Summit. Go to dataengineeringpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today. Your host is Tobias Macey and today I’m interviewing Tom Goldenberg about Kedro, an open source development workflow tool that helps structure reproducible, scaleable, deployable, robust and versioned data pipelines.

Interview

Introduction How did you get involved in the area of data management? Can you start by explaining what Kedro is and its origin story? Who are the primary users of Kedro, and how does it fit into and impact the workflow of data engineers and data scientists?

Can you talk through a typical lifecycle for a project that is built using Kedro?

What are the overall features of Kedro and how do they compound to encourage best practices for data projects? How does the culture and background of QuantumBlack influence the design and capabilities of Kedro?

What was the motivation for releasing it publicly as an open source framework?

What are some examples of ways that Kedro is being used within QuantumBlack and how has that experience informed the design and direction of the project? Can you describe how Kedro itself is implemented and how it has evolved since you first started working on it? There has been a recent trend away from end-to-end ETL frameworks and toward a decoupled model that focuses on a programming target with pluggable execution. What are the industry pressures that are driving that shift and what are your thoughts on how that will manifest in the long term? How do the capabilities and focus of Kedro compare to similar projects such as Prefect and Dagster? It has not yet reached a stable release. What are the aspects of Kedro that are still in flux and where are the changes most concentrated?

What is still missing for a stable 1.x release?

What are some of the most interesti

Highlights  Obsessed with streaming? Rightfully so, but after almost a year of coverage on Chartmetric, let’s go over some useful US radio facts that may help your artist’s overall distribution strategy.Mission   Good morning, it’s Jason here at Chartmetric with your 3-minute Data Dump where we upload charts, artists and playlists into your brain so you can stay up on the latest in the music data world.Chartmetric’s social media handle is Chartmetric, no “S ”- follow us onTwitter, LinkedIn, Instagram, and Facebook- we’re always posting useful music tidbits, we’d love to hear from you!DateThis is your Data Dump for Wednesday, Sept. 25, 2019.Radio in the Streaming Era: US Radio Facts for Streaming ExpertsComing up on one year ago, Chartmetric added 300 US radio stations to our 20+ sources of music data.Why? Well, radio is still considered one of the major ways to break an artist into the mainstream here in the States, and to many, it remains a strong advantage of the major labels, who are well-networked in the radio community.You can check our blog article about it in the show notes, but for those who maybe never got a chance to learn about the world before streaming, we thought we’d take the time to review some basic radio facts to help you put it all in context.First, a radio spin does NOT equal a streaming play! Nowadays, we’re so used to looking at total streams on whatever platform, how many plays are coming from what playlist, or how many plays came from a user’s library...but nonetheless, each stream is just a one-to-one relationship with a listener.With radio, one spin can mean thousands of listeners, at the same time, and usually in the same geographic area! A one to many relationship is how terrestrial radio differentiates itself from streaming, and it requires a certain appreciation to realize that just because radio spin counts aren’t as big in quantity as streams in a given time period, they are much more geographically attributable, they’re time-stamped, and they play to many more people.On the many more people part, one term to be aware of is “AQH”, which stands for average quarter-hour persons, or the amount of unique listeners in a 15-minute period listening for at least 5 minutes.Have you ever been stuck in highway traffic and flipped through radio stations, only to hear commercials? Well, I bet it was around one quarter hour before or after the hour when that happened.Why? The reason is the way Nielsen Audio records AQH, because by playing commercials on the :15 and :45 minute marks, they maximize the period of time they play music (and thus, get the highest AQH possible). This raises their profile for advertisers wanting to buy time and more exposure on their station.The AM Drive during morning rush hour is primo ad time, so while 5-10AM is highly lucrative for radio stations, it’s probably not when your new song is going to get played. You probably have a better chance in the PMD (guess what that is), Evening or Overnight dayparts.Location-wise, New York City, Los Angeles and Chicago stations tend to have the highest AQH ratings, which makes sense given they’re the top three populated cities in the country.Another term you may have heard is “Radio Format”, and this loosely refers to the type of genres a station plays, and it’s really more of a way for advertisers to recognize a station’s listener demographics.Surely you’ve heard “Top 40”, and that also goes by Contemporary Hit Radio, or “CHR”, and it’s what you’d expect, the latest and greatest from mostly major labels.Country is another format, and of course it plays best in the South, but also a sizeable presence in other places like Chicago.Urban plays better in the Midwest/South/East and is made up of hip-hop and rap, while Rhythmic plays bigger in the West; a mix of Top 40 and Urban where R&B, dance, hip-hop, and pop all intermingle.Other formats like “Triple A”, Alternative and Hot Adult Contemporary exist as well, we invite you to check out the blog article to learn more. Many thanks to our data supplier RadioWave and Seth Keller for their expertise.OutroThat’s it for your Daily Data Dump for Wednesday, Sept. 25, 2019. This is Jason from Chartmetric.Free accounts are available at chartmetric.com And article links and show notes are at: podcast.chartmetric.comHappy Wednesday, and we’ll see you on Friday!

Summary Object storage is quickly becoming the unifying layer for data intensive applications and analytics. Modern, cloud oriented data warehouses and data lakes both rely on the durability and ease of use that it provides. S3 from Amazon has quickly become the de-facto API for interacting with this service, so the team at MinIO have built a production grade, easy to manage storage engine that replicates that interface. In this episode Anand Babu Periasamy shares the origin story for the MinIO platform, the myriad use cases that it supports, and the challenges that they have faced in replicating the functionality of S3. He also explains the technical implementation, innovative design, and broad vision for the project.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management.For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, Corinium Global Intelligence, and Data Council. Upcoming events include the O’Reilly AI conference, the Strata Data conference, the combined events of the Data Architecture Summit and Graphorum, and Data Council in Barcelona. Go to dataengineeringpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today. Your host is Tobias Macey and today I’m interviewing Anand Babu Periasamy about MinIO, the neutral, open source, enterprise grade object storage system.

Interview

Introduction How did you get involved in the area of data management? Can you explain what MinIO is and its origin story? What are some of the main use cases that MinIO enables? How does MinIO compare to other object storage options and what benefits does it provide over other open source platforms?

Your marketing focuses on the utility of MinIO for ML and AI workloads. What benefits does object storage provide as compared to distributed file systems? (e.g. HDFS, GlusterFS, Ceph)

What are some of the challenges that you face in terms of maintaining compatibility with the S3 interface?

What are the constraints and opportunities that are provided by adhering to that API?

Can you describe how MinIO is implemented and the overall system design?

How has that design evolved since you first began working on it?

What assumptions did you have at the outset and how have they been challenged or updated?

What are the axes for scaling that MinIO provides and how does it handle clustering?

Where does it fall on the axes of availability and consistency in the CAP theorem?

One of the useful features that you provide is efficient erasure coding, as well as protection against data corruption. How much overhead do those capabilties incur, in terms of computational efficiency and, in a clustered scenario, storage volume? For someone who is interested in running MinIO, what is involved in deploying and maintain

podcast_episode
by Alan Walker (Liquid State (ambassador)) , Taylor Swift (Republic Records) , Jason Joven (Chartmetric) , R3HAB (Liquid State)

2019-09-20 // Taylor Swift enjoys Chinese success on QQ Music, with R3HAB set to in the near future with Tencent Highlights  If the 2000s belonged to 50 Cent, the future belongs to Tencent. We’ll check out a few Western artists who are active in the Chinese market, and how the tech conglomerate may matter to them in the near future.Mission   Good morning, it’s Jason here at Chartmetric with your 3-minute Data Dump where we upload charts, artists and playlists into your brain so you can stay up on the latest in the music data world.Chartmetric’s social media handle is Chartmetric, no “S ”- follow us on LinkedIn, Instagram, Twitter, and Facebook- we’re always posting fun music facts, we’d love to hear from you!DateThis is your Data Dump for Friday, Sept. 20, 2019.Taylor Swift enjoys Chinese success on QQ Music, with R3HAB set to in the near futureMusic Business Worldwide yesterday reported on Tencent, the giant Chinese tech company responsible for running the massively popular WeChat messaging platform with over 1B users and related music streaming app QQ Music, with over 650M active monthly users. One piece highlighted how Tencent is reportedly in talks to buy 10 to 20% of Universal Music Group, in a move that would surely be a boon for all artists operating with the major label.Some already there don’t need it! Looking at the QQ Music Western chart for this week, one of their artists under the Republic Records imprint is already enjoying her access to Chinese music fans, an artist by the name of Taylor Swift.While Tay Tay isn’t in the Top 20 this week, she does have by far the most tracks on the 100 track chart, placing 17 tracks of her recent Lover album onto the list.This obviously suggests that her entire album is getting quite an amount of attention on the platform, rather than just a few hits like Camila Cabello at 3 tracks or Ed Sheeran at 2.She’s not the only artist with new album release doing well there however, as Post Malone placed seven of his 17-track album Hollywood’s Bleeding in the QQ Western Top 100 and showing that Chinese fans are into trap just as much as pop music.Someone who doesn’t show up on the QQ Chart this week but may be doing so very soon is Dutch-Moroccan DJ/producer R3HAB, who just signed to Tencent’s joint venture label with Sony, named Liquid State.The Hong-Kong based electronic-focused label must be excited to host the international artist’s content in the Chinese market, as he’s played at least five live shows on the mainland this year, the last three being in Shanghai, Harbin and Chengdu, according to Songkick data.R3HAB’s exposure on Spotify and YouTube has been mostly European, getting most of his streams from cities like Amsterdam, Oslo, Warsaw and Paris, but the electronic sound does indeed lend itself to a global audience, just like Liquid State “ambassador” Alan Walker can attest to.The British-Norwegian DJ has an almost 35% Instagram follower demographic from Asia, over 30% of them hailing from Indonesia and India alone and accounting for over 2M followers in those markets.So with Liquid State and Tencent now in his corner, it looks like R3HAB could very well start exhibiting Taylor Swift-like success there, because with over 83% of the Chinese music market controlled by Tencent, the promotional advantages will be plenty.Outro That’s it for your Daily Data Dump for Friday, Sept. 20, 2019. This is Jason from Chartmetric.Free accounts are available at chartmetric.com And article links and show notes are at: podcast.chartmetric.comHappy Friday, have a great weekend, and we’ll see you next week!

Summary The conventional approach to analytics involves collecting large amounts of data that can be cleaned, followed by a separate step for analysis and interpretation. Unfortunately this strategy is not viable for handling real-time, real-world use cases such as traffic management or supply chain logistics. In this episode Simon Crosby, CTO of Swim Inc., explains how the SwimOS kernel and the enterprise data fabric built on top of it enable brand new use cases for instant insights. This was an eye opening conversation about how stateful computation of data streams from edge devices can reduce cost and complexity as compared to batch oriented workflows.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! Listen, I’m sure you work for a ‘data driven’ company – who doesn’t these days? Does your company use Amazon Redshift? Have you ever groaned over slow queries or are just afraid that Amazon Redshift is gonna fall over at some point? Well, you’ve got to talk to the folks over at intermix.io. They have built the “missing” Amazon Redshift console – it’s an amazing analytics product for data engineers to find and re-write slow queries and gives actionable recommendations to optimize data pipelines. WeWork, Postmates, and Medium are just a few of their customers. Go to dataengineeringpodcast.com/intermix today and use promo code DEP at sign up to get a $50 discount! You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management.For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, Corinium Global Intelligence, and Data Council. Upcoming events include the O’Reilly AI conference, the Strata Data conference, the combined events of the Data Architecture Summit and Graphorum, and Data Council in Barcelona. Go to dataengineeringpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today. Your host is Tobias Macey and today I’m interviewing Simon Crosby about Swim.ai, a data fabric for the distributed enterprise

Interview

Introduction How did you get involved in the area of data management? Can you start by explaining what Swim.ai is and how the project and business got started?

Can you explain the differentiating factors between the SwimOS and Data Fabric platforms that you offer?

What are some of the use cases that are enabled by the Swim platform that would otherwise be impractical or intractable? How does Swim help alleviate the challenges of working with sensor oriented applications or edge computing platforms? Can you describe a typical design for an application or system being built on top of the Swim platform?

What does the developer workflow look like?

What kind of tooling do you have for diagnosing and debugging errors in an application built on top of Swim?

Can you describe the internal design for the SwimOS and ho

Highlights  Streaming might favor frontline singles, but some tracks buck the trend. Looking at Spotify, Apple, Amazon, and Deezer’s Top 100 charts, we examine what tracks and artists are able to ride the wave of longevity.Mission   Good morning, it’s Rutger here at Chartmetric with your 3-minute Data Dump where we upload charts, artists, and playlists into your brain so you can stay up on the latest in the music data world.We’re on the socials at “chartmetric” — that’s Chartmetric, no “S.” Follow us on LinkedIn, Instagram, Twitter, or Facebook, and talk to us! We’d love to hear from you.DateThis is your Data Dump for Wednesday, Sept. 18th, 2019.Post Malone Leads Track Longevity on Streaming ChartsWhen it comes to streaming, we’re trained to think immediacy and expendability, because, let’s face it, those are the kinds of qualities that characterize today’s digital singles-driven industry.On the streaming charts, however, things aren’t that simple, and some tracks can ride out their Top 100 position for more than a year. Pulling up Spotify’s Daily Global Chart on our charts tab, for example, we can scroll down a little to see chart summaries according to many different variables, including “By Time on Chart.” Within Spotify’s Top 100, Post Malone’s “Rockstar” might only be sporting a No. 81 spot, but it’s been on the chart for 508 days — that’s almost a year and a half.If we extend the Daily Global Chart to include the next 100 tracks, “Closer,” by the Chainsmokers and Halsey, might be in a precarious position at No. 199, but the track has enjoyed some 1,103 days on Spotify’s Top 200.To be clear, that’s three years.Toggling Apple’s Top 100, at No. 58, Travis Scott’s “Sicko Mode” claims the top spot, in terms of time on chart, with 361 days, or just short of a year.Meanwhile, Amazon’s Top 100 features a four-way tie at 210 days. At No. 20, it’s “High Hopes,” by Panic! At The Disco.No. 41 is Bebe Rexha’s “Meant to Be (featuring Florida Georgia Line).”No. 56 is “Youngblood” by 5 Seconds of Summer.And No. 60 is “Better Now,” by, guess who? Post Malone.Interestingly, Deezer’s Top 100 has a six-way tie at 195 days.At No. 10, it’s “Con Calma” by Daddy Yankee and Snow, while No. 19 is “Calma” by Pedro Capó and Farruko — ¾ of whom are Puerto Rican who all like to keep it cool.No. 27 is once again Post Malone, but this time, with “Sunflower,” from the Spider-Man: Into the Spider-Verse soundtrack.No. 66 is “Te Vi” by Piso 21 and Micro Tdh, No. 68 is “Adan Y Eva” by Paulo Londra, and No. 70 is “Giant” by Calvin Harris and Rag'n'Bone Man.So, while Amazon and Deezer’s track longevities might be a bit more evenly spread, they’re also significantly lower than the longest lasting tracks on Apple’s and Spotify’s charts.Another takeaway here is that Posty has managed to keep tracks from two separate releases, Beerbongs & Bentleys and the Spider-Man soundtrack, relevant — and that’s irrespective of his new album, Hollywood’s Bleeding, dominating the top of those same charts.OutroThat’s it for your Daily Data Dump for Wednesday, Sept. 18th, 2019. This is Rutger from Chartmetric.Free accounts are available at chartmetric.com And article links and show notes are at: podcast.chartmetric.comHappy Wednesday, and we’ll see you on Friday!

Summary The first stage in every data project is collecting information and routing it to a storage system for later analysis. For operational data this typically means collecting log messages and system metrics. Often a different tool is used for each class of data, increasing the overall complexity and number of moving parts. The engineers at Timber.io decided to build a new tool in the form of Vector that allows for processing both of these data types in a single framework that is reliable and performant. In this episode Ben Johnson and Luke Steensen explain how the project got started, how it compares to other tools in this space, and how you can get involved in making it even better.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management.For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, Corinium Global Intelligence, and Data Council. Upcoming events include the O’Reilly AI conference, the Strata Data conference, the combined events of the Data Architecture Summit and Graphorum, and Data Council in Barcelona. Go to dataengineeringpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today. Your host is Tobias Macey and today I’m interviewing Ben Johnson and Luke Steensen about Vector, a high-performance, open-source observability data router

Interview

Introduction How did you get involved in the area of data management? Can you start by explaining what the Vector project is and your reason for creating it?

What are some of the comparable tools that are available and what were they lacking that prompted you to start a new project?

What strategy are you using for project governance and sustainability? What are the main use cases that Vector enables? Can you explain how Vector is implemented and how the system design has evolved since you began working on it?

How did your experience building the business and products for Timber influence and inform your work on Vector? When you were planning the implementation, what were your criteria for the runtime implementation and why did you decide to use Rust? What led you to choose Lua as the embedded scripting environment?

What data format does Vector use internally?

Is there any support for defining and enforcing schemas?

In the event of a malformed message is there any capacity for a dead letter queue?

What are some strategies for formatting source data to improve the effectiveness of the information that is gathered and the ability of Vector to parse it into useful data? When designing an event flow in Vector what are the available mechanisms for testing the overall delivery and any transformations? What options are available to operators to support visibility into the running system? In terms of deployment topologies, what ca

Highlights  Howdy! Today, we’re going Country, looking at the CMAs’ full list of nominations and making some educated guesses about who might win based on streaming and social data.Mission   Good morning, it’s Rutger here at Chartmetric with your 3-minute Data Dump where we upload charts, artists, and playlists into your brain so you can stay up on the latest in the music data world.We’re on the socials at “chartmetric” — that’s Chartmetric, no “S.” Follow us on LinkedIn, Instagram, Twitter, or Facebook, and talk to us! We’d love to hear from you.DateThis is your Data Dump for Friday, Sept. 6th, 2019.Data Predictions for the 2019 Country Music Association AwardsSaddle up, because we’re heading to Nashville, figuratively speaking, to check out what’s going on with this year’s CMAs. Based on streaming and social data, can we make some educated guesses about who might win at this year’s ceremony?12 categories comprise the full range of awards, including Album of the Year, Single of the Year, Musical Event of the Year, Music Video of the Year, and New Artist of the Year, among others.Maren Morris leads the pack with a total of six noms, and if we filter our “Artists” tab for the Country genre, Morris comes up at No. 3 in terms of Chartmetric rank, so she’s gotta at least win one, right?Her track “Girl,” which is nominated for Single of the Year, alongside Blake Shelton’s “God’s Country,” has a good shot.While neither are currently charting on Apple Music or Spotify, if we filter for genre on Amazon’s Track Charts, Shelton has the edge over Morris with a No. 5 rank compared to No. 12 for Morris.But Shelton’s Chartmetric rank is No. 7 compared to Morris’ No. 3, so this one’s going to be close — a real tossup, if data has anything to say about the matter.Overall, Morris also has some competition from Carrie Underwood, who is up for an impressive three awards. When it comes to Album of the Year, though, Morris leads with an 81 Spotify popularity score for her album “Girl,” compared to 65 for Underwood’s “Cry Pretty.”Morris’ closest competition in this category, if we’re going strictly by streaming performance? Thomas Rhett’s “Center Point Road” at 79.Looking at the Nashville “Cities” page, Rhett also has the seventh highest Spotify Monthly Listener count for the Tennessee capital where the CMAs will be held.And then there’s the category everyone’s wondering about: Musical Event of the Year.The category is interestingly described on the CMAs website as “a collaboration of two or more people either or all of whom are known primarily as a Country artist.” After being quietly stripped of his brief Country label by Billboard, Lil Nas X can at least find solace in the fact that his “Old Town Road” collab with Billy Ray Cyrus landed a Musical Event of the Year nom at this year’s CMAs.We probably don’t need to go back over how big of a viral sensation this surprise crossover hit was, so we’ll just say that, as far as Chartmetric rank goes, Lil Nas X is sitting at No. 32 out of all 1.7M+ artists that we track, and Billy Ray Cyrus is at No. 1 if we filter our “Artists” tab for the Country genre.So, even if “Old Town Road” doesn’t win the CMA award, I think we can all agree it really was the musical event of the year … so far. OutroThat’s it for your Daily Data Dump for Friday, Sept. 6th, 2019. This is Rutger from Chartmetric.Free accounts are available at chartmetric.com And article links and show notes are at: podcast.chartmetric.comHappy Friday, have a great weekend, and we’ll see you next week!

Learn PySpark: Build Python-based Machine Learning and Deep Learning Models

Leverage machine and deep learning models to build applications on real-time data using PySpark. This book is perfect for those who want to learn to use this language to perform exploratory data analysis and solve an array of business challenges. You'll start by reviewing PySpark fundamentals, such as Spark’s core architecture, and see how to use PySpark for big data processing like data ingestion, cleaning, and transformations techniques. This is followed by building workflows for analyzing streaming data using PySpark and a comparison of various streaming platforms. You'll then see how to schedule different spark jobs using Airflow with PySpark and book examine tuning machine and deep learning models for real-time predictions. This book concludes with a discussion on graph frames and performing network analysis using graph algorithms in PySpark. All the code presented in the book will be available in Python scripts on Github. What You'll Learn Develop pipelines for streaming data processing using PySpark Build Machine Learning & Deep Learning models using PySpark latest offerings Use graph analytics using PySpark Create Sequence Embeddings from Text data Who This Book is For Data Scientists, machine learning and deep learning engineers who want to learn and use PySpark for real time analysis on streaming data.

Summary Data professionals are working in a domain that is rapidly evolving. In order to stay current we need access to deeply technical presentations that aren’t burdened by extraneous marketing. To fulfill that need Pete Soderling and his team have been running the Data Council series of conferences and meetups around the world. In this episode Pete discusses his motivation for starting these events, how they serve to bring the data community together, and the observations that he has made about the direction that we are moving. He also shares his experiences as an investor in developer oriented startups and his views on the importance of empowering engineers to launch their own companies.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! Listen, I’m sure you work for a ‘data driven’ company – who doesn’t these days? Does your company use Amazon Redshift? Have you ever groaned over slow queries or are just afraid that Amazon Redshift is gonna fall over at some point? Well, you’ve got to talk to the folks over at intermix.io. They have built the “missing” Amazon Redshift console – it’s an amazing analytics product for data engineers to find and re-write slow queries and gives actionable recommendations to optimize data pipelines. WeWork, Postmates, and Medium are just a few of their customers. Go to dataengineeringpodcast.com/intermix today and use promo code DEP at sign up to get a $50 discount! You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management.For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, Corinium Global Intelligence, and Data Council. Upcoming events include the O’Reilly AI conference, the Strata Data conference, the combined events of the Data Architecture Summit and Graphorum, and Data Council in Barcelona. Go to dataengineeringpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today. Your host is Tobias Macey and today I’m interviewing Pete Soderling about his work to build and grow a community for data professionals with the Data Council conferences and meetups, as well as his experiences as an investor in data oriented companies

Interview

Introduction How did you get involved in the area of data management? What was your original reason for focusing your efforts on fostering a community of data engineers?

What was the state of recognition in the industry for that role at the time that you began your efforts?

The current manifestation of your community efforts is in the form of the Data Council conferences and meetups. Previously they were known as Data Eng Conf and before that was Hakka Labs. Can you discuss the evolution of your efforts to grow this community?

How has the community itself changed and grown over the past few years?

Communities form around a huge variety of focal points. What are some of the complexities or challenges in building one based on something as nebulous as data? Where do you draw inspiration and direction for how to manage such a large and distributed community?

What are some of the most interesting/challenging/unexpected aspects of community management that you have encountered?

What are some ways that you have been surprised or delighted in your interactions with the data community? How do you approach sustainability of the Data Council community and the organization itself? The tagline that you have focused on for Data Council events is that they are no fluff, juxtaposing them against larger business oriented events. What are your guidelines for fulfilling that promise and why do you think that is an important distinction? In addition to your community building you are also an investor. How did you get involved in that side of your business and how does it fit into your overall mission? You also have a stated mission to help engineers build their own companies. In your opinion, how does an engineer led business differ from one that may be founded or run by a business oriented individual and why do you think that we need more of them?

What are the ways that you typically work to empower engineering founders or encourage them to create their own businesses?

What are some of the challenges that engineering founders face and what are some common difficulties or misunderstandings related to business?

What are your opinions on venture-backed vs. "lifestyle" or bootstrapped businesses?

What are the characteristics of a data business that you look at when evaluating a potential investment? What are some of the current industry trends that you are most excited by?

What are some that you find concerning?

What are your goals and plans for the future of Data Council?

Contact Info

@petesoder on Twitter LinkedIn @petesoder on Medium

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don’t forget to check out our other show, Podcast.init to learn about the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat

Links

Data Council Database Design For Mere Mortals Bloomberg Garmin 500 Startups Geeks On A Plane Data Council NYC 2019 Track Summary Pete’s Angel List Syndicate DataOps

Data Kitchen Episode DataOps Vs DevOps Episode

Great Expectations

Podcast.init Interview

Elementl Dagster

Data Council Presentation

Data Council Call For Proposals

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast

Real-Time Data Analytics for Large Scale Sensor Data

Real-Time Data Analytics for Large-Scale Sensor Data covers the theory and applications of hardware platforms and architectures, the development of software methods, techniques and tools, applications, governance and adoption strategies for the use of massive sensor data in real-time data analytics. It presents the leading-edge research in the field and identifies future challenges in this fledging research area. The book captures the essence of real-time IoT based solutions that require a multidisciplinary approach for catering to on-the-fly processing, including methods for high performance stream processing, adaptively streaming adjustment, uncertainty handling, latency handling, and more. Examines IoT applications, the design of real-time intelligent systems, and how to manage the rapid growth of the large volume of sensor data Discusses intelligent management systems for applications such as healthcare, robotics and environment modeling Provides a focused approach towards the design and implementation of real-time intelligent systems for the management of sensor data in large-scale environments

podcast_episode
by Rutger (Chartmetric) , Missy Elliott (Missy Elliott (independent/artist))

Highlights  Don’t call it a comeback: This week, the VMAs wrapped and the CMAs dropped their noms, but do the ceremonies even matter to an artist’s bottom line? Let’s dive into Missy Elliott’s data and see.Mission   Good morning, it’s Rutger here at Chartmetric with your 3-minute Data Dump where we upload charts, artists and playlists into your brain so you can stay up on the latest in the music data world.We’re on the socials at “chartmetric” — that’s Chartmetric, no “S.” Follow us on LinkedIn, Instagram, Twitter, or Facebook, and talk to us! We’d love to hear from you.DateThis is your Data Dump for Friday, August 30th, 2019.The Music Awards Effect: Bump or Bust?With the VMAs wrapping and the CMAs dropping their noms this week, the ceremonies grabbed a lot of headlines, but what exactly does that mean for artist success? Missy Elliott’s highly anticipated return and long overdue recognition with both an electrifying performance and also a gracious acceptance of MTV’s Michael Jackson Video Vanguard Award at Monday night’s VMAs should offer the perfect case study as we look forward to the 53rd Annual Country Music Association Awards in November.While she’s been workin’ it since the late ‘90s and even late ‘80s, Elliott’s Super Bowl halftime performance in 2015 had some youngsters thinking she was a new artist on the verge of blowing up.Those in the know had to flip it and reverse it on the younguns to let ‘em know about her four wins and 22 nominations at the Grammy Awards, her 30 million records sold in the U.S. and status as the best-selling female rapper in Nielsen Music history (as of 2017), and her history-making induction into the Songwriters Hall of Fame this year — the first female rapper ever!With her surprise Iconology album release last week and this week’s VMA-driven visibility, do the data say comeback or consistency?It really depends what data sources we’re talking about.For an artist like Elliott, whose streaming numbers have consistently been on the incline since the beginning of the year, this awards ceremony doesn’t appear to have made much of a splash.Elliott’s daily Spotify follower change has generally hovered at around 700, bringing her from 922K on Jan. 1, 2019, to 1.1M on Aug. 29, 2019.Save for a minor dip in March, her Spotify popularity has generally been on the upswing from 76 on Jan. 1, 2019, to 79 on Aug. 29, 2019.The lowest it got was 73 — so not much variation at all.But the behavior of her social media growth tells another story.On Instagram, Twitter, and Wikipedia, we see a dramatic spike in followers and views — most significantly around her VMAs appearance.On Insta, from Thursday, Aug. 22, the date of her album release, to Monday, Aug. 26, her daily change in followers jumped from 2K to 15K.Following her VMAs appearance, however, that daily change spiked to almost 42K.There’s a similar pattern on Twitter, where Friday and Saturday gave her a 3K daily follower increase, up from low to mid hundreds, and following her VMAs appearance, she shot up to around 6KThe music awards effect is perhaps the most pronounced when it comes to Elliott’s daily Wikipedia views, which have hovered between 2K and 5K since the start of the year.Last Thursday, on the day of her album drop, however, that number almost reached 33K. On Monday, the night of the VMAs? Almost 150K.So, at least with an artist as influential as Missy Elliott, a big music awards moment could lead to a big bump in social relevancy, even if that same artist might not see quite as much volatility in their streaming data.As we edge nearer to the CMAs, where Maren Morris and Lil Nas X are the big standouts this year, let’s see if the same trends follow suit.Outro That’s it for your Daily Data Dump for Friday, August 30th, 2019. This is Rutger from Chartmetric.Free accounts are available at chartmetric.com And article links and show notes are at: podcast.chartmetric.comHappy Friday, have a great weekend, and we’ll see you next week!

Summary Data engineers are responsible for building tools and platforms to power the workflows of other members of the business. Each group of users has their own set of requirements for the way that they access and interact with those platforms depending on the insights they are trying to gather. Benn Stancil is the chief analyst at Mode Analytics and in this episode he explains the set of considerations and requirements that data analysts need in their tools and. He also explains useful patterns for collaboration between data engineers and data analysts, and what they can learn from each other.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management.For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, Corinium Global Intelligence, and Data Counsil. Upcoming events include the O’Reilly AI conference, the Strata Data conference, the combined events of the Data Architecture Summit and Graphorum, and Data Council in Barcelona. Go to dataengineeringpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today. Your host is Tobias Macey and today I’m interviewing Benn Stancil, chief analyst at Mode Analytics, about what data engineers need to know when building tools for analysts

Interview

Introduction How did you get involved in the area of data management? Can you start by describing some of the main features that you are looking for in the tools that you use? What are some of the common shortcomings that you have found in out-of-the-box tools that organizations use to build their data stack? What should data engineers be considering as they design and implement the foundational data platforms that higher order systems are built on, which are ultimately used by analysts and data scientists?

In terms of mindset, what are the ways that data engineers and analysts can align and where are the points of conflict?

In terms of team and organizational structure, what have you found to be useful patterns for reducing friction in the product lifecycle for data tools (internal or external)? What are some anti-patterns that data engineers can guard against as they are designing their pipelines? In your experience as an analyst, what have been the characteristics of the most seamless projects that you have been involved with? How much understanding of analytics are necessary for data engineers to be successful in their projects and careers?

Conversely, how much understanding of data management should analysts have?

What are the industry trends that you are most excited by as an analyst?

Contact Info

LinkedIn @bennstancil on Twitter Website

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for

Summary Managing big data projects at scale is a perennial problem, with a wide variety of solutions that have evolved over the past 20 years. One of the early entrants that predates Hadoop and has since been open sourced is the HPCC (High Performance Computing Cluster) system. Designed as a fully integrated platform to meet the needs of enterprise grade analytics it provides a solution for the full lifecycle of data at massive scale. In this episode Flavio Villanustre, VP of infrastructure and products at HPCC Systems, shares the history of the platform, how it is architected for scale and speed, and the unique solutions that it provides for enterprise grade data analytics. He also discusses the motivations for open sourcing the platform, the detailed workflow that it enables, and how you can try it for your own projects. This was an interesting view of how a well engineered product can survive massive evolutionary shifts in the industry while remaining relevant and useful.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! To connect with the startups that are shaping the future and take advantage of the opportunities that they provide, check out Angel List where you can invest in innovative business, find a job, or post a position of your own. Sign up today at dataengineeringpodcast.com/angel and help support this show. You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management.For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, Corinium Global Intelligence, and Data Counsil. Upcoming events include the O’Reilly AI conference, the Strata Data conference, the combined events of the Data Architecture Summit and Graphorum, and Data Council in Barcelona. Go to dataengineeringpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Your host is Tobias Macey and today I’m interviewing Flavio Villanustre about the HPCC Systems project and his work at LexisNexis Risk Solutions

Interview

Introduction How did you get involved in the area of data management? Can you start by describing what the HPCC system is and the problems that you were facing at LexisNexis Risk Solutions which led to its creation?

What was the overall state of the data landscape at the time and what was the motivation for releasing it as open source?

Can you describe the high level architecture of the HPCC Systems platform and some of the ways that the design has changed over the years that it has been maintained? Given how long the project has been in use, c

Highlights  YouTube’s global importance is no secret, but monitoring its playlist performance hasn’t always been easy. Using our new YouTube Playlists chart, you can finally track which lists are getting the most play!Mission   Good morning, it’s Rutger here at Chartmetric with your 3-minute Data Dump where we upload charts, artists and playlists into your brain so you can stay up on the latest in the music data world.We’re on the socials at “chartmetric” — that’s Chartmetric, no “S.” Follow us on LinkedIn, Instagram, Twitter, or Facebook, and talk to us! We’d love to hear from you.DateThis is your Data Dump for Friday, August 16th, 2019.The Top YouTube Playlists to Land OnIf you’re tracking YouTube playlists as part of your daily routine, or if you want to start thinking about a digital strategy for the platform, then you’re in luck.Thanks to our brand new feature, you can check the rankings of more than 500 YouTube-curated playlists, sorting by total playlist views, number of videos, 28-day add ratio and last update date. Then, you can click through to see who's charting on the world's biggest video streaming platform.As it stands now, YouTube’s Popular Music Videos playlist is far and away the top performer with more than 1B total views, 200 tracks, and a 99 percent 28-day add ratio.It’s heavily weighted with Pop, Hip-Hop, and Rap artists from the USA — think Ariana Grande, Chris Brown, and Rick Ross — but R&B, Latin, and Reggaeton aren’t too far behind.In the No. 2 spot is the 72-track Pop Hotlist playlist, which has about a third of the views and half of the 28-day add ratio of the Popular Music Videos playlist.Katy Perry, OneRepublic, and Puerto Rican rapper Anuel AA rule this one — as does the USA, once again. YouTube has a number of genre-specific Hotlists, actually, including the Latin Hotlist at No. 4, the Hip-Hop and R&B Hotlist at No. 5, the Country Hotlist at No. 6, and the Regional Mexican Hotlist at No. 7.The No. 3 spot, however, goes to New Music This Week, which has a bit more than 300M views and a 100 percent 28-day add ratio.While Pop holds about a third of the market share on it, with R&B and Rap tied for second at a bit less than a tenth, Country rounds out third, tying with Hip-Hop and Metropopolis. USA dominates the playlist again, holding almost ¾ of the artist country market share.With such a global platform, you might expect to see less American representation and more international representation, but keep in mind, these market shares describe artist-track distribution on the playlists and not listener geography.So, Americans might be pumping out the majority of the content that’s landing on these playlists, but the global community is likely forcing it to the top. OutroThat’s it for your Daily Data Dump for Friday, August 16th, 2019. This is Rutger from Chartmetric.Free accounts are available at chartmetric.com And article links and show notes are at: podcast.chartmetric.comHappy Friday, have a great weekend, and we’ll see you next week!

Summary The extract and load pattern of data replication is the most commonly needed process in data engineering workflows. Because of the myriad sources and destinations that are available, it is also among the most difficult tasks that we encounter. Fivetran is a platform that does the hard work for you and replicates information from your source systems into whichever data warehouse you use. In this episode CEO and co-founder George Fraser explains how it is built, how it got started, and the challenges that creep in at the edges when dealing with so many disparate systems that need to be made to work together. This is a great conversation to listen to for a better understanding of the challenges inherent in synchronizing your data.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management.For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and Corinium Global Intelligence. Upcoming events include the O’Reilly AI Conference, the Strata Data Conference, and the combined events of the Data Architecture Summit and Graphorum. Go to dataengineeringpodcast.com/conferences to learn more and take advantage of our partner discounts when you register. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Your host is Tobias Macey and today I’m interviewing George Fraser about FiveTran, a hosted platform for replicating your data from source to destination

Interview

Introduction How did you get involved in the area of data management? Can you start by describing the problem that Fivetran solves and the story of how it got started? Integration of multiple data sources (e.g. entity resolution) How is Fivetran architected and how has the overall system design changed since you first began working on it? monitoring and alerting Automated schema normalization. How does it work for customized data sources? Managing schema drift while avoiding data loss Change data capture What have you found to be the most complex or challenging data sources to work with reliably? Workflow for users getting started with Fivetran When is Fivetran the wrong choice for collecting and analyzing your data? What have you found to be the most challenging aspects of working in the space of data integrations?}} What have been the most interesting/unexpected/useful lessons that you have learned while building and growing Fivetran? What do you have planned for the future of Fivetran?

Contact Info

LinkedIn @frasergeorgew on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

podcast_episode
by Andrew Yang (Democratic candidate) , Rutger (Chartmetric) , Cory Booker (Democratic candidate) , Marianne Williamson (Democratic candidate) , Beto O'Rourke (Democratic candidate) , Joe Biden (Democratic candidate) , Kamala Harris (Democratic candidate)

Highlights  We do our best to stay out of the political fray, but with the Democratic debates wrapping last week, we look at the candidates’ streaming profiles to get a sense of not only who they’re listening to — but who is listening to them. Mission   Good morning, it’s Rutger here at Chartmetric with your 3-minute Data Dump where we upload charts, artists and playlists into your brain so you can stay up on the latest in the music data world.We’re on the socials at “chartmetric”, that’s Chartmetric, no “S ”- follow us on LinkedIn, Instagram, Twitter, or Facebook, and talk to us! We’d love to hear from you.DateThis is your Data Dump for Friday, August 9th, 2019.When Streaming Gets Political, Who Gets the Vote?Think profiles and playlists are only for artists? Think again.We do our best to stay out of politics, but with the debates wrapping last week, why not look at the candidates’ streaming profiles to get a sense of not only who they’re listening to — but who is listening to them ... and how?Since no one is vying for the Republican nomination, with Trump seeking a second term, we’ll look at six of the Democratic candidates: Cory Booker, Kamala Harris, Andrew Yang, Beto O’Rourke, Joe Biden, and Marianne Williamson.Booker’s Cory Booker’s Music playlist might only have around 150 followers, but what it lacks in popularity it makes up for in track popularity, with almost half of its 101 tracks at a popularity score of 60 or above.At around 180 followers, Biden’s streaming profile has about the same count as Booker’s playlist — but only 22 monthly listeners. Harris’ Kamala’s Summer Playlist ups the ante with more than 4,000 followers tuning into her heavily rap, hip-hop, soul, and funk oriented 46-song listing. O’Rourke’s BBQ for Beto has some 500 followers listening to his 94-song playlist, which ranges in genre from folk to classic rock and country to hardcore punk.Keep in mind, O-Rourke used to play in punk bands, so he definitely knows what’s up in that regard.As might be expected, Williamson’s streaming profile is heavily geared toward meditation, prayer, and motivational speeches — and it’s paying off.Williamson’s follower count is almost 1,800 with a Spotify monthly listener count of around 1,200, helping to make her listeners to followers ratio about 1 to 1.Perhaps most interesting, and maybe most unsurprising, is that her daily Wikipedia views shot up from some 12,000 to some 350,000 following the debate, correlating strongly with a daily change in her Instagram followers from around 600 to around 11,000 on Aug. 1.Yang’s Favorite Jams playlist leaves the others in the dust with more than 5,000 followers — despite it being primarily catalogue based.Oh, and in case you didn’t know, Yang appears to really enjoy Florence & the Machine and the Cure.Streaming performance might not be an indication of political performance, but it does give us some insight into who these candidates are looking to win over — or at least how good their music taste is.Outro That’s it for your Daily Data Dump for Friday, August 9th, 2019. This is Rutger from Chartmetric.Free accounts are available at chartmetric.com Article links and show notes are at: podcast.chartmetric.comHappy Friday, have a great weekend, and we’ll see you next week!

Highlights  What happens when a global artist gets sued by another for copyright infringement? Not much for the former, but a notable increase for the latter...we’re talking about music data by the way, not legal damage payments.Mission   Good morning, it’s Jason here at Chartmetric with your 3-minute Data Dump where we upload charts, artists and playlists into your brain so you can stay up on the latest in the music data world.We’re on the socials at “chartmetric”, that’s Chartmetric, no “S ”- follow us on LinkedIn, Instagram, Twitter, or Facebook, and talk to us! We’d love to hear from you.DateThis is your Data Dump for Wednesday, August 7th, 2019.Katy Perry, Flame and the Effects of PublicityWe’ve all seen industry gossip and trade news before...but have you ever used it to measure the impact of publicity?For the past week, both general and industry news sources have been reporting on the lawsuit of American rap artist Flame and the alleged copyright infringement of Katy Perry and team on her widespread 2013 hit, “Dark Horse”.Though debate still rages in the industry on whether it was valid in the eyes of copyright law, the jury itself decided it was indeed infringement, and ordered the American pop star and songwriting team to pay $2.78M in collective damages to the defendant.From a data perspective, what’s interesting is how this kind of news affects their digital profiles.For example, looking at the past week of social and streaming data for Katy Perry since news of the lawsuit first broke around July 30th, there was….basically no effect.No extra playlists or apparent correlation to Instagram follower count or Spotify monthly listeners….just more Katy Perry-level numbers, which is more than 7K new daily Spotify followers, 13M more daily YouTube views and 21K new daily IG followers….all in a day’s work.But for a lesser known artist like the Christian rapper Flame, he did experience a notable increase in digital profile.As of late, Flame in comparison had only gained about 67 new daily Spotify followers and 184 new YouTube daily views….and actually did score a few new charts.For example, “Joyful Noise”, which was the 2009 Flame track that was allegedly knowingly copied from by the Perry team, charted on official Viral 50 Spotify daily charts for the United Kingdom, New Zealand and Canada.It was for one day on August 2nd, a few days after the news had been able to make the news rounds in places like the Guardian, Rolling Stone, the BBC and Associated Press, and the track itself sat in the 13th, 15th and 24th respective positions on the 50-track viral charts.Its YouTube video went from 1800 daily views on Monday July 30th to more than 580 times that, peaking on July 31st at over 1M daily views….even the track’s Genius page, which only had less than 10 daily views the week prior, jumped to 740 the week of the proceedings.More than the track itself, Flame’s general artist profile gained 331 Spotify followers on July 31st, almost 5x his recent daily average, and his Spotify monthly listener count as of Monday August 5th has more than doubled in size to over 500K from his count only a week prior.He experienced similar multiples of increase in Twitter followers, retweets and Wikipedia views.So what does this mean for the rest of us? Possibly, a way to measure the effects of publicity….it’s virtually impossible to do so on a global superstar, but with an artist with a relatively little daily digital footprint, we can see which platforms are most affected by such news on certain given sources...and maybe be able to plan for future publicity expectations when working with your own artists.Because as they say in the show biz, “Any publicity is good publicity.”Outro That’s it for your Daily Data Dump for Wednesday, August 7th, 2019. This is Jason from Chartmetric.Free accounts are available at chartmetric.com Article links and show notes are at: podcast.chartmetric.comHappy Wednesday, see you on Friday!