talk-data.com talk-data.com

Topic

Singer

etl data_integration open_source

35

tagged

Activity Trend

3 peak/qtr
2020-Q1 2026-Q1

Activities

35 activities · Newest first

Summary

Building streaming applications has gotten substantially easier over the past several years. Despite this, it is still operationally challenging to deploy and maintain your own stream processing infrastructure. Decodable was built with a mission of eliminating all of the painful aspects of developing and deploying stream processing systems for engineering teams. In this episode Eric Sammer discusses why more companies are including real-time capabilities in their products and the ways that Decodable makes it faster and easier.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare production and development environments and column-level lineage to show you the exact impact of every code change on data, metrics, and BI tools, keeping your team productive and stakeholders happy. Datafold integrates with dbt, the modern data stack, and seamlessly plugs in your data CI for team-wide and automated testing. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free! As more people start using AI for projects, two things are clear: It’s a rapidly advancing field, but it’s tough to navigate. How can you get the best results for your use case? Instead of being subjected to a bunch of buzzword bingo, hear directly from pioneers in the developer and data science space on how they use graph tech to build AI-powered apps. . Attend the dev and ML talks at NODES 2023, a free online conference on October 26 featuring some of the brightest minds in tech. Check out the agenda and register today at Neo4j.com/NODES. Your host is Tobias Macey and today I'm interviewing Eric Sammer about starting your stream processing journey with Decodable

Interview

Introduction How did you get involved in the area of data management? Can you describe what Decodable is and the story behind it?

What are the notable changes to the Decodable platform since we last spoke? (October 2021) What are the industry shifts that have influenced the product direction?

What are the problems that customers are trying to solve when they come to Decodable? When you launched your focus was on SQL transformations of streaming data. What was the process for adding full Java support in addition to SQL? What are the developer experience challenges that are particular to working with streaming data?

How have you worked to address that in the Decodable platform and interfaces?

As you evolve the technical and product direction, what is your heuristic for balancing the unification of interfaces and system integration against the ability to swap different components or interfaces as new technologies are introduced? What are the most interesting, innovative, or unexpected ways that you have seen Decodable used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Decodable? When is Decodable the wrong choice? What do you have planned for the future of Decodable?

Contact Info

esammer on GitHub LinkedIn

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

Decodable

Podcast Episode

Understanding the Apache Flink Journey Flink

Podcast Episode

Debezium

Podcast Episode

Kafka Redpanda

Podcast Episode

Kinesis PostgreSQL

Podcast Episode

Snowflake

Podcast Episode

Databricks Startree Pinot

Podcast Episode

Rockset

Podcast Episode

Druid InfluxDB Samza Storm Pulsar

Podcast Episode

ksqlDB

Podcast Episode

dbt GitHub Actions Airbyte Singer Splunk Outbox Pattern

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Sponsored By: Neo4J: NODES Conference Logo

NODES 2023 is a free online conference focused on graph-driven innovations with content for all skill levels. Its 24 hours are packed with 90 interactive technical sessions from top developers and data scientists across the world covering a broad range of topics and use cases. The event tracks: - Intelligent Applications: APIs, Libraries, and Frameworks – Tools and best practices for creating graph-powered applications and APIs with any software stack and programming language, including Java, Python, and JavaScript - Machine Learning and AI – How graph technology provides context for your data and enhances the accuracy of your AI and ML projects (e.g.: graph neural networks, responsible AI) - Visualization: Tools, Techniques, and Best Practices – Techniques and tools for exploring hidden and unknown patterns in your data and presenting complex relationships (knowledge graphs, ethical data practices, and data representation)

Don’t miss your chance to hear about the latest graph-powered implementations and best practices for free on October 26 at NODES 2023. Go to Neo4j.com/NODES today to see the full agenda and register!Rudderstack: Rudderstack

Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstackMaterialize: Materialize

You shouldn't have to throw away the database to build with fast-changing data. Keep the familiar SQL, keep the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date.

That is Materialize, the only true SQL streaming database built from the ground up to meet the needs of modern data products: Fresh, Correct, Scalable — all in a familiar SQL UI. Built on Timely Dataflow and Differential Dataflow, open source frameworks created by cofounder Frank McSherry at Microsoft Research, Materialize is trusted by data and engineering teams at Ramp, Pluralsight, Onward and more to build real-time data products without the cost, complexity, and development time of stream processing.

Go to materialize.com today and get 2 weeks free!Datafold: Datafold

This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare…

Summary

Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of multiple options for flexible data integration, with a roughly equal distribution of commercial and open source options. The challenge is that most of those options are complex to operate and exist in their own silo. The dlt project was created to eliminate overhead and bring data integration into your full control as a library component of your overall data system. In this episode Adrian Brudaru explains how it works, the benefits that it provides over other data integration solutions, and how you can start building pipelines today.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free! This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare production and development environments and column-level lineage to show you the exact impact of every code change on data, metrics, and BI tools, keeping your team productive and stakeholders happy. Datafold integrates with dbt, the modern data stack, and seamlessly plugs in your data CI for team-wide and automated testing. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold Your host is Tobias Macey and today I'm interviewing Adrian Brudaru about dlt, an open source python library for data loading

Interview

Introduction How did you get involved in the area of data management? Can you describe what dlt is and the story behind it?

What is the problem you want to solve with dlt? Who is the target audience?

The obvious comparison is with systems like Singer/Meltano/Airbyte in the open source space, or Fivetran/Matillion/etc. in the commercial space. What are the complexities or limitations of those tools that leave an opening for dlt? Can you describe how dlt is implemented? What are the benefits of building it in Python? How have the design and goals of the project changed since you first started working on it? How does that language choice influence the performance and scaling characteristics? What problems do users solve with dlt? What are the interfaces available for extending/customizing/integrating with dlt? Can you talk through the process of adding a new source/destination? What is the workflow for someone building a pipeline with dlt? How does the experience scale when supporting multiple connections? Given the limited scope of extract and load, and the composable design of dlt it seems like a purpose built companion to dbt (down to th

Summary

Business intelligence has gone through many generational shifts, but each generation has largely maintained the same workflow. Data analysts create reports that are used by the business to understand and direct the business, but the process is very labor and time intensive. The team at Omni have taken a new approach by automatically building models based on the queries that are executed. In this episode Chris Merrick shares how they manage integration and automation around the modeling layer and how it improves the organizational experience of business intelligence.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Truly leveraging and benefiting from streaming data is hard - the data stack is costly, difficult to use and still has limitations. Materialize breaks down those barriers with a true cloud-native streaming database - not simply a database that connects to streaming systems. With a PostgreSQL-compatible interface, you can now work with real-time data using ANSI SQL including the ability to perform multi-way complex joins, which support stream-to-stream, stream-to-table, table-to-table, and more, all in standard SQL. Go to dataengineeringpodcast.com/materialize today and sign up for early access to get started. If you like what you see and want to help make it better, they're hiring across all functions! Your host is Tobias Macey and today I'm interviewing Chris Merrick about the Omni Analytics platform and how they are adding automatic data modeling to your business intelligence

Interview

Introduction How did you get involved in the area of data management? Can you describe what Omni Analytics is and the story behind it?

What are the core goals that you are trying to achieve with building Omni?

Business intelligence has gone through many evolutions. What are the unique capabilities that Omni Analytics offers over other players in the market?

What are the technical and organizational anti-patterns that typically grow up around BI systems?

What are the elements that contribute to BI being such a difficult product to use effectively in an organization?

Can you describe how you have implemented the Omni platform?

How have the design/scope/goals of the product changed since you first started working on it?

What does the workflow for a team using Omni look like?

What are some of the developments in the broader ecosystem that have made your work possible?

What are some of the positive and negative inspirations that you have drawn from the experience that you and your team-mates have gained in previous businesses?

What are the most interesting, innovative, or unexpected ways that you have seen Omni used?

What are the most interesting, unexpected, or challenging lessons that you have learned while working on Omni?

When is Omni the wrong choice?

What do you have planned for the future of Omni?

Contact Info

LinkedIn @cmerrick on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

Omni Analytics Stitch RJ Metrics Looker

Podcast Episode

Singer dbt

Podcast Episode

Teradata Fivetran Apache Arrow

Podcast Episode

DuckDB

Podcast Episode

BigQuery Snowflake

Podcast Episode

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Sponsored By: Materialize: Materialize

Looking for the simplest way to get the freshest data possible to your teams? Because let's face it: if real-time were easy, everyone would be using it. Look no further than Materialize, the streaming database you already know how to use.

Materialize’s PostgreSQL-compatible interface lets users leverage the tools they already use, with unsurpassed simplicity enabled by full ANSI SQL support. Delivered as a single platform with the separation of storage and compute, strict-serializability, active replication, horizontal scalability and workload isolation — Materialize is now the fastest way to build products with streaming data, drastically reducing the time, expertise, cost and maintenance traditionally associated with implementation of real-time features.

Sign up now for early access to Materialize and get started with the power of streaming data with the same simplicity and low implementation cost as batch cloud data warehouses.

Go to materialize.comSupport Data Engineering Podcast

Summary Collecting, integrating, and activating data are all challenging activities. When that data pertains to your customers it can become even more complex. To simplify the work of managing the full flow of your customer data and keep you in full control the team at Rudderstack created their eponymous open source platform that allows you to work with first and third party data, as well as build and manage reverse ETL workflows. In this episode CEO and founder Soumyadeb Mitra explains how Rudderstack compares to the various other tools and platforms that share some overlap, how to set it up for your own data needs, and how it is architected to scale to meet demand.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Today’s episode is Sponsored by Prophecy.io – the low-code data engineering platform for the cloud. Prophecy provides an easy-to-use visual interface to design & deploy data pipelines on Apache Spark & Apache Airflow. Now all the data users can use software engineering best practices – git, tests and continuous deployment with a simple to use visual designer. How does it work? – You visually design the pipelines, and Prophecy generates clean Spark code with tests on git; then you visually schedule these pipelines on Airflow. You can observe your pipelines with built in metadata search and column level lineage. Finally, if you have existing workflows in AbInitio, Informatica or other ETL formats that you want to move to the cloud, you can import them automatically into Prophecy making them run productively on Spark. Create your free account today at dataengineeringpodcast.com/prophecy. The only thing worse than having bad data is not knowing that you have it. With Bigeye’s data observability platform, if there is an issue with your data or data pipelines you’ll know right away and can get it fixed before the business is impacted. Bigeye let’s data teams measure, improve, and communicate the quality of your data to company stakeholders. With complete API access, a user-friendly interface, and automated yet flexible alerting, you’ve got everything you need to establish and maintain trust in your data. Go to dataengineeringpodcast.com/bigeye today to sign up and start trusting your analyses. Your host is Tobias Macey and today I’m interviewing Soumyadeb Mitra about his experience as the founder of Rudderstack and its role in your data platform

Interview

Introduction How did you get involved in the area of data management? Can you describe what Rudderstack is and the story behind it? What are the main use cases that Rudderstack is designed to support? Who are the target users of Rudderstack?

How does the availability of the managed cloud service change the user profiles that you can target? How do these user profiles influence your focus and prioritization of features and user experience?

How would you characterize the position of Rudderstack in the current data ecosystem?

What other tools/systems might you replace with Rudderstack?

How do you think about the application of Rudderstack compared to tools for data integration (e.g. Singer, Stitch, Fivetran) and reverse ETL (e.g. Grouparoo, Hightouch, Census)? Can you describe how the Rudderstack platform is desig

Summary Data integration in the form of extract and load is the critical first step of every data project. There are a large number of commercial and open source projects that offer that capability but it is still far from being a solved problem. One of the most promising community efforts is that of the Singer ecosystem, but it has been plagued by inconsistent quality and design of plugins. In this episode the members of the Meltano project share the work they are doing to improve the discovery, quality, and capabilities of Singer taps and targets. They explain their work on the Meltano Hub and the Singer SDK and their long term goals for the Singer community.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Are you bored with writing scripts to move data into SaaS tools like Salesforce, Marketo, or Facebook Ads? Hightouch is the easiest way to sync data into the platforms that your business teams rely on. The data you’re looking for is already in your data warehouse and BI tools. Connect your warehouse to Hightouch, paste a SQL query, and use their visual mapper to specify how data should appear in your SaaS systems. No more scripts, just SQL. Supercharge your business teams with customer data using Hightouch for Reverse ETL today. Get started for free at dataengineeringpodcast.com/hightouch. Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets & code, Atlan enables teams to create a single source of truth for all their data assets, and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker and more. Go to dataengineeringpodcast.com/atlan today and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $3000 on an annual subscription Your host is Tobias Macey and today I’m interviewing Douwe Maan, Taylor Murphy, and AJ Steers about their work to level up the Singer ecosystem through projects like Meltano Hub and the Singer SDK

Interview

Introduction How did you get involved in the area of data management? Can you start by describing what the Singer ecosystem is? What are the current weak points/challenges in the ecosystem? What is the current role of the Meltano project/community within the ecosystem?

What are the projects and activities related to Singer that you are focused on?

What are the main goals of the Meltano Hub?

What criteria are you using to determine which projects to include in the hub? Why is the number of targets so small? What additional functionality do you have planned for the hub?

What functionality does the SDK provide?

How does the presence of the SDK make it easier to write taps/targets? What do you believe the long-term impacts of the SDK on the overall availability and quality of plugins will be?

Now that you have spun out your own business and raised funding, how does that influence the priorities and focus of your work?

How do you hope to productize what you have built at Meltano?

What are the most interesting, innovative, or unexpected ways that you have seen Meltano and Singer plugins used? What are

On this episode, we talk with Latin music mogul Paris Cabezas. Born and raised in rural Cuba, the MIT Applied Mathematics grad got his start working on the first generation of Yamaha’s digital mixing consoles. This studio engineering stint helped him become the Grammy-nominated producer that he is now, and he's also been able to apply his technical acumen to the various functions of InnerCat Music Group, which Cabezas founded in 2012.

InnerCat handles artist marketing, music distribution, YouTube optimization, and neighboring rights for a range of artists, many of whom are Latin stars like Puerto Rican singer-songwriter Farruko. The music group's artists and network of owned and operated channels garner 630M+ streams per month, 330M video views per month, and 22M subscribers on all networks, and they've been able to pay out more than $7M in royalties to indie artists. Innercat focuses on a data-driven, tech-it-yourself approach to digital assets, and the results speak for themselves. Connect with Paris on LinkedIn, Instagram, or Twitter. If you want more free insights, follow our podcast, our blog, and our socials. If you're an artist with a free Chartmetric account, sign up for the artist plan, made exclusively for you, here. If you're new to Chartmetric, follow the URL above after creating a free account here.

Summary Data integration is a critical piece of every data pipeline, yet it is still far from being a solved problem. There are a number of managed platforms available, but the list of options for an open source system that supports a large variety of sources and destinations is still embarrasingly short. The team at Airbyte is adding a new entry to that list with the goal of making robust and easy to use data integration more accessible to teams who want or need to maintain full control of their data. In this episode co-founders John Lafleur and Michel Tricot share the story of how and why they created Airbyte, discuss the project’s design and architecture, and explain their vision of what an open soure data integration platform should offer. If you are struggling to maintain your extract and load pipelines or spending time on integrating with a new system when you would prefer to be working on other projects then this is definitely a conversation worth listening to.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Modern Data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days. Datafold helps Data teams gain visibility and confidence in the quality of their analytical data through data profiling, column-level lineage and intelligent anomaly detection. Datafold also helps automate regression testing of ETL code with its Data Diff feature that instantly shows how a change in ETL or BI code affects the produced data, both on a statistical level and down to individual rows and values. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Go to dataengineeringpodcast.com/datafold today to start a 30-day trial of Datafold. Once you sign up and create an alert in Datafold for your company data, they will send you a cool water flask. RudderStack’s smart customer data pipeline is warehouse-first. It builds your customer data warehouse and your identity graph on your data warehouse, with support for Snowflake, Google BigQuery, Amazon Redshift, and more. Their SDKs and plugins make event streaming easy, and their integrations with cloud applications like Salesforce and ZenDesk help you go beyond event streaming. With RudderStack you can use all of your customer data to answer more difficult questions and then send those insights to your whole customer data stack. Sign up free at dataengineeringpodcast.com/rudder today. Your host is Tobias Macey and today I’m interviewing Michel Tricot and John Lafleur about Airbyte, an open source framework for building data integration pipelines.

Interview

Introduction How did you get involved in the area of data management? Can you start by explaining what Airbyte is and the story behind it? Businesses and data engineers have a variety of options for how to manage their data integration. How would you characterize the overall landscape and how does Airbyte distinguish itself in that space? How would you characterize your target users?

How have those personas instructed the priorities and design of Airbyte? What do you see as the benefits and tradeoffs of a UI oriented data integration platform as compared to a code first approach?

what are the complex/challenging elements of data integration that makes it such a slippery problem? motivation for creating open source ELT as a business Can you describe how the Airbyte platform is implemented?

What was your motivation for choosing Java as the primary language?

incidental complexity of forcing all connectors to be packaged as containers shortcomings of the Singer specification/motivation for creating a backwards incompatible interface perceived potential for community adoption of Airbyte specification tradeoffs of using JSON as interchange format vs. e.g. protobuf/gRPC/Avro/etc.

information lost when converting records to JSON types/how to preserve that information (e.g. field constraints, valid enums, etc.)

interfaces/extension points for integrating with other tools, e.g. Dagster abstraction layers for simplifying implementation of new connectors tradeoffs of storing all connectors in a monorepo with the Airbyte core

impact of community adoption/contributions

What is involved in setting up an Airbyte installation? What are the available axes for scaling an Airbyte deployment? challenges of setting up and maintaining CI environment for Airbyte How are you managing governance and long term sustainability of the project? What are some of the most interesting, unexpected, or innovative ways that you have seen Airbyte used? What are the most interesting, unexpected, or challenging lessons that you have learned while building Airbyte? When is Airbyte the wrong choice? What do you have planned for the future of the project?

Contact Info

Michel

LinkedIn @MichelTricot on Twitter michel-tricot on GitHub

John

LinkedIn @JeanLafleur on Twitter johnlafleur on GitHub

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Airbyte Liveramp Fivetran

Podcast Episode

Stitch Data Matillion DataCoral

Podcast Episode

Singer Meltano

Podcast Episode

Airflow

Podcast.init Episode

Kotlin Docker Monorepo Airbyte Specification Great Expectations

Podcast Episode

Dagster

Data Engineering Podcast Episode Podcast.init Episode

Prefect

Podcast Episode

DBT

Podcast Episode

Kubernetes Snowflake

Podcast Episode

Redshift Presto Spark Parquet

Podcast Episode

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast

Summary Businesses often need to be able to ingest data from their customers in order to power the services that they provide. For each new source that they need to integrate with it is another custom set of ETL tasks that they need to maintain. In order to reduce the friction involved in supporting new data transformations David Molot and Hassan Syyid built the Hotlue platform. In this episode they describe the data integration challenges facing many B2B companies, how their work on the Hotglue platform simplifies their efforts, and how they have designed the platform to make these ETL workloads embeddable and self service for end users.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Modern Data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days. Datafold helps Data teams gain visibility and confidence in the quality of their analytical data through data profiling, column-level lineage and intelligent anomaly detection. Datafold also helps automate regression testing of ETL code with its Data Diff feature that instantly shows how a change in ETL or BI code affects the produced data, both on a statistical level and down to individual rows and values. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Go to dataengineeringpodcast.com/datafold today to start a 30-day trial of Datafold. Once you sign up and create an alert in Datafold for your company data, they will send you a cool water flask. This episode of Data Engineering Podcast is sponsored by Datadog, a unified monitoring and analytics platform built for developers, IT operations teams, and businesses in the cloud age. Datadog provides customizable dashboards, log management, and machine-learning-based alerts in one fully-integrated platform so you can seamlessly navigate, pinpoint, and resolve performance issues in context. Monitor all your databases, cloud services, containers, and serverless functions in one place with Datadog’s 400+ vendor-backed integrations. If an outage occurs, Datadog provides seamless navigation between your logs, infrastructure metrics, and application traces in just a few clicks to minimize downtime. Try it yourself today by starting a free 14-day trial and receive a Datadog t-shirt after installing the agent. Go to dataengineeringpodcast.com/datadog today to see how you can enhance visibility into your stack with Datadog. Your host is Tobias Macey and today I’m interviewing David Molot and Hassan Syyid about Hotglue, an embeddable data integration tool for B2B developers built on the Python ecosystem.

Interview

Introduction How did you get involved in the area of data management? Can you start by describing what you are building at Hotglue?

What was your motivation for starting a business to address this particular problem?

Who is the target user of Hotglue and what are their biggest data problems?

What are the types and sources of data that they are likely to be working with? How are they currently handling solutions for those problems? How does the introduction of Hotglue simplify or improve their work?

What is involved in getting Hotglue integrated into a given customer’s environment? How is Hotglue itself implemented?

How has the design or goals of the platform evolved since you first began building it? What were some of the initial assumptions that you had at the outset and how well have they held up as you progressed?

Once a customer has set up Hotglue what is their workflow for building and executing an ETL workflow?

What are their options for working with sources that aren’t supported out of the box?

What are the biggest design and implementation challenges that you are facing given the need for your product to be embedded in customer platforms and exposed to their end users? What are some of the most interesting, innovative, or unexpected ways that you have seen Hotglue used? What are the most interesting, unexpected, or challenging lessons that you have learned while building Hotglue? When is Hotglue the wrong choice? What do you have planned for the future of the product?

Contact Info

David

@davidmolot on Twitter LinkedIn

Hassan

hsyyid on GitHub LinkedIn

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don’t forget to check out our other show, Podcast.init to learn about the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat

Links

Hotglue Python

The Python Podcast.init

B2B == Business to Business Meltano

Podcast Episode

Airbyte Singer

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast

Summary The first stage of every data pipeline is extracting the information from source systems. There are a number of platforms for managing data integration, but there is a notable lack of a robust and easy to use open source option. The Meltano project is aiming to provide a solution to that situation. In this episode, project lead Douwe Maan shares the history of how Meltano got started, the motivation for the recent shift in focus, and how it is implemented. The Singer ecosystem has laid the groundwork for a great option to empower teams of all sizes to unlock the value of their Data and Meltano is building the reamining structure to make it a fully featured contender for proprietary systems.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise. When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Today’s episode of the Data Engineering Podcast is sponsored by Datadog, a SaaS-based monitoring and analytics platform for cloud-scale infrastructure, applications, logs, and more. Datadog uses machine-learning based algorithms to detect errors and anomalies across your entire stack—which reduces the time it takes to detect and address outages and helps promote collaboration between Data Engineering, Operations, and the rest of the company. Go to dataengineeringpodcast.com/datadog today to start your free 14 day trial. If you start a trial and install Datadog’s agent, Datadog will send you a free T-shirt. You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data platforms. For more opportunities to stay up to date, gain new skills, and learn from your peers there are a growing number of virtual events that you can attend from the comfort and safety of your home. Go to dataengineeringpodcast.com/conferences to check out the upcoming events being offered by our partners and get registered today! Your host is Tobias Macey and today I’m interviewing Douwe Maan about Meltano, an open source platform for building, running & orchestrating ELT pipelines.

Interview

Introduction How did you get involved in the area of data management? Can you start by describing what Meltano is and the story behind it? Who is the target audience?

How does the focus on small or early stage organizations constrain the architectural decisions that go into Meltano?

What have you found to be the complexities in trying to encapsulate the entirety of the data lifecycle in a single tool or platform?

What are the most painful transitions in that lifecycle and how does that pain manifest?

How and why has the focus of the project shifted from its original vision? With your current focus on the data integration/data transfer stage of the lifecycle, what are you seeing as the biggest barriers to entry with the current ecosystem?

What are the main elements of

Highlights  Australian singer-songwriter Tones and I and North Carolina-born/Eastern Europe-raised rapper Ashnikko are some of the first notable case studies in TikTok virality. How are they capitalizing on it?Mission   Good morning, it’s Rutger here at Chartmetric with your 3-minute Data Dump where we upload charts, artists, and playlists into your brain so you can stay up on the latest in the music data world.We’re on the socials at “chartmetric” — that’s Chartmetric, one word and no “S.” Follow us on LinkedIn, Instagram, Twitter, or Facebook, and talk to us! We’d love to hear from you.DateThis is your Data Dump for Friday, Oct. 18, 2019.The Young Female Artists Capitalizing on Their TikTok ViralityAs TikTok’s popularity intensifies, so does the scrutiny, which is why the National Music Publishers’ Association recently claimed the platform “consistently violated US copyright law and the rights of songwriters and music publishers.”Whether or not this is true, it’s clear the platform-on-the-rise, which is making its mark as a new breeding ground for discovery, is benefiting some creatives substantially.We first noticed 19-year-old Australian singer-songwriter Tones and I’s “Dance Monkey,” for instance, on the TikTok charts a couple of months back. At the time, the Sony artist’s Spotify footprint was promising, but not exactly huge.When “Dance Monkey” first came out in May, she had around 3.5K Followers.Once July hit and the TikToks started pouring in, her Spotify Follower stats rose precipitously, from 15K in July to 30K in August, 70K in September, and 150K this month.Four of the Top 44 TikTok videos are still sporting “Dance Monkey” soundtracks, and the song is still in the Top 200 on TikTok’s track charts. There’s a similar story happening with North Carolina-born and Eastern Europe-raised rapper Ashnikko, aka Ashton Casey.23-year-old Ashnikko, who embraces Japanese anime and video game references, recently dropped the collaborative track “STUPID” with Yung Baby Tate, and her TikTok climb is stunning. Six of the Top 44 TikTok videos are already using her track, and she’s also in the Top 200 on TikTok’s track charts.Right now, her stats across other platforms like Spotify are exhibiting a growth pattern similar to Tones and I’s when she first started carving out her TikTok niche, so all indications point to Ashnikko being an artist to watch going into 2020 — and not just on TikTok.With edgy, stylish teenage phenom Billie Eilish having worked wonders for Universal — are Tones and I and Ashnikko, respectively, Sony and Warner’s rebuttal? OutroThat’s it for your Daily Data Dump for Friday, Oct. 18, 2019. This is Rutger from Chartmetric.Free accounts are available at chartmetric.com And article links and show notes are at: podcast.chartmetric.comIf you haven’t downloaded 6MO, our Global Music Industry Data Report, yet, you can find it all across our socials and in our show notes!Happy Friday, have a great weekend, and we’ll see you next week! 

Highlights  UK singer-songwriter and producer prodigy Labrinth has created a hallucinatory experience with his soundtrack of HBO’s new show Euphoria, and with data as our guide, we’re going to try to navigate the psychedelic experience with you.Mission   Good morning, it’s Rutger here at Chartmetric with your 3-minute Data Dump where we upload charts, artists, and playlists into your brain so you can stay up on the latest in the music data world.We’re on the socials at “chartmetric” — that’s Chartmetric, one word and no “S.” Follow us on LinkedIn, Instagram, Twitter, or Facebook, and talk to us! We’d love to hear from you.DateThis is your Data Dump for Friday, Oct. 11, 2019.Take a Psych Trip Through Labrinth’s ‘Euphoria’Besides collaborating with Sia, Diplo, and Beyonce in recent years, Timothy Lee McKenzie, better known by his stage name, Labrinth, just scored — literally and figuratively — his first TV series, HBO’s Euphoria.According to Rolling Stone, “His soundtrack ... hums with soft electricity, perfectly complementing the journey of the main character, Rue, a teenager caught in limbo between the euphoria of a drug high and the harsh consequences of addiction.”It’s rare that a TV show soundtrack generates high — if any at all — demand, but according to McKenzie himself, “If I put a post up, the first message is ‘Where’s the album? Where’s the soundtrack?!’ So I’m like, ‘OK, don’t worry.’ We’re working on getting ‘em what they need.”And he and the HBO team did just that, releasing the soundtrack last Friday.Though his early April releases of “SIN” and LSD, his Sia/Diplo collab, accounted for his highest Spotify Follower gains this year, at 5K and 3K, respectively, Euphoria has him at a 2K increase.That said, on Insta and Wikipedia, an early single drop from the soundtrack on Aug. 3 gave Labrinth his most significant spikes with a 5K follower increase and 3.5K views, respectively.It’s an interesting strategy for artists, labels, and managers to think about, because not only are there upfront fiscal upsides from synchronizations, but there are also the inherent promotional upsides couched in the television and video streaming industry’s massive marketing budgets. That’s not to say that it limits a series to only one artist, of course.Euphoria’s official Spotify playlist, which includes every track used in Season 1, ranges from Solange to Lizzo, Blood Orange to Randy Newman and much, much more.Unfortunately, an individual curator seems to have ripped the official Euphoria playlist and pawned it off as their own, outperforming the official playlist by about 3 to 1 in terms of follower count.Which just goes to show — albeit unscrupulously — that understanding and anticipating trends and listener behavior can go a long way toward building audiences in the streaming era.  OutroThat’s it for your Daily Data Dump for Friday, Oct. 11, 2019. This is Rutger from Chartmetric.Free accounts are available at chartmetric.com And article links and show notes are at: podcast.chartmetric.comIf you haven’t downloaded 6MO, our Global Music Industry Data Report, yet, you can find it all across our socials and in our show notes!Happy Friday, have a great weekend, and we’ll see you next week! 

Highlights  Know thy neighbor, you may have been told, and to that us music data nerds would say, know thy artist neighbor...we’ll do so with rapper Pusha T through Chartmetric’s Neighboring Artists feature.Mission   Good morning, it’s Jason here at Chartmetric with your 3-minute Data Dump where we upload charts, artists and playlists into your brain so you can stay up on the latest in the music data world.We’re on the socials at “chartmetric”, that’s Chartmetric, no “S ”- follow us on LinkedIn, Instagram, Twitter, or Facebook, and talk to us! We’d love to hear from you.DateThis is your Data Dump for Wednesday, Aug. 28, 2019.New Kids on the Block: Identifying Pusha T's Artist NeighborsVirginia Beach veteran rapper Pusha T, dropped a new song, “Sociopath,” on Monday.The Kanye West-produced track featuring Kash Doll was leaked early, but Complex Magazine’s "Best Rapper Alive" of 2018 is still keeping it moving.As of yesterday, Pusha T’s Chartmetric rank was at 729th out of the 1.7M artists we track globally.He has 8.2M SoundCloud followers, 140M total YouTube channel views and a Spotify Monthly Listener count at 3.8M.Now, if we break out Pusha T’s Artist Neighbors by his Chartmetric rank alone...To the north, we’ve got Singer-songwriter god Sara Bareilles ranked 728th, and Irish indie band Two Door Cinema Club in 725th place.To the south of Push, there’s American pop rockers Cage the Elephant ranked 731st and Australian rockers The Vines in 732nd place.Would Push ever do a cross-genre track with these acts who are at similar popularity levels in the digital world? It wouldn’t be his first: his guest verse on Justin Timberlake’s 2002 solo album debut “Like I Love You” or 2017 guest on Linkin Park’s “Good Goodbye” with grime rapper Stormzy have both accumulated tens of millions of spins on Spotify alone.But if we filter by genre cluster-which through Chartmetric’s data science magic we find to be rap, trap music, pop, pop rap, southern hip hop-Pusha T’s Artist Neighbors now turn into:Brooklyn’s Desiigner at 623rd place and Atlanta’s Playboi Carti at 591st place above Push.And below, is Toronto’s PARTYNEXTDOOR at 859 and Diddy himself at 889.So if Push were looking for a more similar sound in vein to plan a tour with or collab with, he could easily generate some creative ideas this way.Our data science skills are growing strong with the Force back at Chartmetric HQ, so be sure to keep an eye out for more super-cool and hopefully super-useful features to come.Outro That’s it for your Daily Data Dump for Wednesday, Aug. 28, 2019. This is Jason from Chartmetric.Free accounts are available at chartmetric.com And article links and show notes are at: podcast.chartmetric.comHappy Wednesday, and we’ll see you on Friday!

podcast_episode
by Vance Joy , Mark Mulligan (Midia Research) , Jason Joven (Chartmetric) , AC/DC (AC/DC) , Steve Boom (Amazon Music)

Highlights  Who says music is all about young people and streaming? Amazon Music and American radio would beg to differ, and we’ll check out a couple of Australian artists who are doing well on them.Mission   Good morning, it’s Jason here at Chartmetric with your 3-minute Data Dump where we upload charts, artists and playlists into your brain so you can stay up on the latest in the music data world.We’re on the socials at “chartmetric”, that’s Chartmetric, no “S ”- follow us on Instagram, Twitter, Facebook or LinkedIn, and talk to us! We’d love to hear from you.FYI, we’re scaling back to 2 episodes per week, why? Because we’re working on some special projects that we will certainly tell you about over the next few months, but we need to make the time to do them! So don’t worry, your phone isn’t playing games with your heart….it’s just us and the Backstreet Boys.Having said all that….DateThis is your Data Dump for Friday, July 12th, 2019.Vance Joy and AC/DC on Amazon Music and US RadioThe Financial Times reported yesterday on the rise of Amazon Music, and how it has experienced a 70 percent growth in subscribers in the past year.The head of Amazon Music- Steve Boom (that’s a great name for a music guy)-  noted that all the other platforms were playing for the younger crowds, but not older consumers. Apparently 14 percent of subscribers to Amazon Music are aged 55 or older, compared with just 5 percent of Spotify’s customers, according to Midia Research’s Mark Mulligan.Now on the radio side of things, Music Business Worldwide reported that AM/FM US radio consumption is growing! Take that, streaming.Radio reached more folks than any other entertainment platform in 2019, according to Nielsen’s Audio Today 2019 report.272M Americans fire up their radios each week, that is 7M more listeners than 2016...and why? Because Americans love their cars, and radios are just there.Now to help illustrate that with actual artists, we’ll turn to two of Australia’s biggest ones, relative newcomer Vance Joy and classic rock gods AC/DC.Vance Joy, the pop/folk singer-songwriter from Melbourne is currently on19 Amazon editorial playlists, including the contextual playlists Rise and Shine, Road Trip: Folk and a chart-like playlist: Best Folk Songs of 2017.His massive hit “Riptide” is actually NOT the most playlisted on the platform, it’s actually another one of his records, “Lay It On Me”, placing in 9 of those 19 Amazon Music playlists.On the 300 influential American radio stations we cover, Joy had as many as 506 spins in the week of Sept 24th 2018, and the week of July 1st, it was down to 91.But it’s all good because the state of Wisconsin LOVES Vance Joy, as his songs have been 1% of all the tracks that state’s radio stations have played since September. Pretty impressive.Now for all-time rock greats AC/DC, straight out of Sydney:They are on 14 Amazon editorial playlists, including the #2 slot on Classic Rock for Lifting, the #5 spot for Pre-Game Grilling, and the #1 spot for 80s Hard Rock Workout...who’s feeling some testosterone?AC/DC hits like “You Shook Me All Night Long” and “Back in Black” seem to resonate most in Boston, Massachusetts and Gainesville, Florida…...but what’s really good to remember is that in case your phone runs out of battery, you can find either of these artists or others by flicking on the old car radio, or simply asking Alexa to do it for you.Outro That’s it for your Daily Data Dump for Friday, July 12th, 2019. This is Jason from Chartmetric.Free accounts are at chartmetric.comAnd article links and show notes are at: podcast.chartmetric.comHappy Friday, and we’ll see you next week! 

HighlightsFollow us down to the trigger cities of Southeast Asia where their Shazam, Spotify, and YouTube charts have some big implications for tour strategy and catalog exploitation.Mission   Good morning, it’s Rutger here at Chartmetric with your 3-minute Data Dump where we upload charts, artists and playlists into your brain so you can stay up on the latest in the music data world.DateThis is your Data Dump for Thursday, June 27th, 2019.Trigger Cities in Southeast Asia On our blog this week, Jason did an epic analysis of Southeast Asia’s trigger cities, revealing what implications their Shazam, Spotify, and YouTube charts have for tour strategy and catalog exploitation.We’re just scratching the surface of it here.First, Shazam. From Singapore’s 41 pop genre tags to Jakarta’s 40 to Kuala Lumpur’s 37 down to Bangkok’s 30, an overwhelming Southeast Asian love of pop music in the past month would be an understatement.However, the region doesn’t appear to care much about querying hip-hop or rap, as the genre only makes a 10th place appearance in Jakarta.On Spotify, K-pop group BLACKPINK is currently the hottest act throughout the region, having 2.11M monthly listeners in the past month.Our good friend Lauv (remember him from our June 3 episode?) slides into #2 with 2.10M monthly listeners.With the exception of BLACKPINK, all other artists have US or UK origins.Given Spotify’s northern European origins and that its most popular artists are also of Western origin, this makes sense.Ho Chi Minh City, Vietnam, however, seems to exist in its own silo. More commonly known as Saigon, locals prefer Korean acts, sharing a love of K-pop boy band SEVENTEEN with Bangkok.But the city’s #1 most listened-to artist on Spotify is their “queen of V-pop,” Mỹ Tâm. An outlier here, however, is Ho Chi Minh City’s third most listened to artist on Spotify: Nashville’s Landon Austin.Austin’s covers are apparently catnip for Southeast Asia’s love of non-controversial pop, because his top five cities by Spotify monthly listeners are all in Southeast Asia.Should Austin be touring the region like a madman, then?Based on the available data, it sure looks like it, but we can’t rule out the possibility of bots and bought streams — for which a lot more research still has to be done.On YouTube, BLACKPINK and BTS, two of Korea’s biggest international acts, consistently appear in the top 10 artists by YouTube daily video views.Aggregating the top 10 artists of each of the six Southeast Asian cities for YouTube daily views, the #6 most viewed artist is Brad Kane. If you missed our May 16 podcast episode on Quezon City, Kane was the titular character’s original singing voice for the 1992 Disney animated film Aladdin, which has just been re-released as a live action film starring Will Smith.The fact that the New York City actor, singer, and producer’s rendition of “A Whole New World” has stirred up so much engagement 27 years later in Southeast Asia says something about how locals consume music … not necessarily to support the artist, but for their own karaoke endeavors!So, if you’re looking to exploit catalog records, this might be the perfect spot.But don’t count out domestic artists.Three Southeast Asian artists make the region’s top 10 most viewed: Bangkok trap rapper YOUNGOHM (at #4 with 1.1M daily views), Indonesian singer Nella Kharisma (at #7 with 637K daily views), and Bangkok punk rock band Labanoon (at #9 with 589K daily views).One distinct takeaway with these domestic artists is that their YouTube support comes exclusively from their home countries. Since all three are proudly delivering content in their mother tongues, they are likely limiting their global market appeal, but it’s also why they resonate so well with their fellow country people.As Jason puts it, looking at a certain market’s music data raises our awareness about who the fans are, what their specific cultural histories have been, and how they are now living as a reflection of it.  Well said, but something to consider beyond the computer screen is the fact that digital behavior doesn’t always correspond directly to behavior in the real world.Which is why, before you completely tailor your tour or marketing strategy to your streaming data, make sure you’ve considered all avenues of information.Spotify numbers don’t always translate to ticket sales.OutroThat’s it for your Daily Data Dump for Thursday, June 27th, 2019. This is Rutger from Chartmetric.If you want to read Jason’s piece in full and look at some pretty charts, it’s up on our blog at blog.chartmetric.io.Free accounts are at chartmetric.comAnd article links and show notes are at: podcast.chartmetric.com.Happy Thursday, and see you tomorrow!

HighlightsIn just over a year, King Princess has gone from 10 Spotify playlists to more than 1,000 and 5,000 Twitter followers to more than 100,000. Now, they’re on Mark Ronson’s new track, “Pieces of Us.”Mission   Good morning, it’s Rutger here at Chartmetric with your 3-minute Data Dump where we upload charts, artists and playlists into your brain so you can stay up on the latest in the music data world.DateThis is your Data Dump for Monday, June 24, 2019.New Music Friday Monday: the Royal Rise of King PrincessIn just a year, King Princess has gone from a humble 10 Spotify playlists to a star-powered 1K, and 5K Twitter followers to 100K+. The 20-year-old, New York City-born singer-songwriter/producer now finds themselves on Mark Ronson’s June 17 release, “Pieces of Us,” which currently sits at the No. 5 spot on Spotify’s New Music Friday playlist. How’d they get there so quickly?Using the Analyze function on their artist page to compare their Spotify playlist evolution with their social follows, we can chart it.On March 9, 2018, King Princess had just under 5K Twitter followers and was only on 10 Spotify playlists.That’s great for a young indie artist, but clearly not on par with the metrics typical Mark Ronson collaborators (who include Bruno Mars, Lady Gaga, Adele, Miley Cyrus, and Amy Winehouse) usually boast. By June 21, 2018, King Princess got closer with almost 30K Twitter followers and 237 Spotify playlists.Mind you, that exponential jump was in a matter of just three measly months.Fast forward to March 9, 2019, and King Princess hit just a bit more than 92K Twitter followers, 374K Instagram followers, and was on 1,000 Spotify playlists.By then, they had also been booked to play this year’s Coachella, Lollapalooza, and Governors Ball festivals.                                                                                   Today, the alternative pop star has some 111K Twitter followers, 450K+ Instagram followers, about 25K Facebook fans, and sits on more than 1K Spotify playlists.In less than a year and a half, King Princess’ Spotify Popularity Index went from 0 to 75 out of 100.But it’s not all out of thin air.In February 2017, they became the first artist to sign to Mark Ronson’s Zelig Recordings, which is a Columbia Records imprint ultimately owned by Sony.It’s only natural for them to be featured on Mark Ronson’s “Pieces of Us,” which was released on June 17, and currently sits in the No. 5 spot on Spotify’s New Music Friday playlist.You might say, with a little help from Ronson, King Princess’ royal rise was written in the stars all along.OutroThat’s it for your Daily Data Dump for Monday, June 24, 2019. This is Rutger from Chartmetric.Free accounts are at app.chartmetric.com/signupAnd article links and show notes are at: podcast.chartmetric.com.Happy Monday, see you tomorrow!

HighlightsFollowing a panel including Beggars Group’s Martin Mills and Kill Rock Star’s Portia Sabin, we’re looking at artists on their rosters and asking, “What makes them two of indie music’s longest lasting labels?”  Mission    Good morning, it’s Jason here at Chartmetric with your 3-minute Data Dump where we upload charts, artists, and playlists into your brain so you can stay up on the latest in the music data world.DateThis is your Data Dump for Friday, June 21, 2019.A2IM Indie Week, Day 4Several Indie icons closed out A2IM’s Indie Week in New York City yesterday, two of them being the legendary Martin Mills and Dr. Portia Sabin sharing what’s helped them make Beggars Group and Kill Rock Stars, respectively, some of indie music’s longest lasting labels. Beggars Group is the parent company of 4AD, Rough Trade Records, Matador, XL Recordings and Young Turks.Mills started it in London in 1977, and his many labels have gone on to sign everyone from Adele to Radiohead.While Adele hasn’t released anything for some time, her 25 album, which released physically in November 2015 and digitally in June 2016 via a joint deal between XL Recordings and Sony’s Columbia, “single-handedly revived global album sales”, according to the Guardian.The album’s streaming success is no joke either, as it’s maintained a 70-80 Spotify Popularity Index score over the last three years, and has been included on upwards of 12.5K Spotify playlists.That kind of success under XL’s guidance gave Adele the leverage to be able to sign an enormous and unprecedented £90 million deal with Sony in May 2016.No doubt the industry will be keen to check her next album from one of the industry’s biggest major labels.Now entering the underground, since 2006, Sabin has run Pacific Northwest-based indie label Kill Rock Stars, which has been a home to riot grrrl legends Bikini Kill and Sleater-Kinney, the late singer-songwriter Elliott Smith, and folk rockers the Decemberists.Sabin’s roster is more niche than Mills’, but Kill Rock Stars’ ability to navigate catalog digitization and promotion has allowed their artists to prosper.Smith, for instance, maintains some 1.4M monthly listeners on Spotify, despite the fact that he passed away tragically in 2003. In March 2017, Kill Rock Stars released an expanded edition of his 1997 album Either/Or, which helped increase Smith’s Spotify followers by around 70 percent to 430K and spiked his monthly listenership by an estimated 250K. Whether by keen artist development or catalog revitalization, Beggars Group and Kill Rock Stars have each found a way to not only survive longer than most indie labels, but to also thrive while doing so.OutroThat’s it for Indie Week and your Daily Data Dump for Friday, June 21, 2019. This is Jason from Chartmetric.Free accounts are at chartmetric.comAnd article links and show notes are at: podcast.chartmetric.comHappy Friday, and have a great weekend!

podcast_episode
by Gurr , Rutger (Chartmetric) , Yes We Mystic , Leoniden , Renata Zeiguer (Renata Zeiguer) , Mira Lu Kovacs , ORI , Surfbort (Surfbort)

2019-06-20 // A2IM Indie Week, Day 3: Reeperbahn Festival HighlightsHamburg’s Reeperbahn Festival is presenting seven different acts on Day 3 of A2IM’s Indie Week, and we will check out how they’re doing in the data world.Mission    Good morning, it’s Rutger here at Chartmetric with your 3-minute Data Dump where we upload charts, artists, and playlists into your brain so you can stay up on the latest in the music data world.DateThis is your Data Dump for Thursday, June 20, 2019.A2IM Indie Week, Day 3At the corner of Allen and Houston on the Lower East Side of Manhattan, NYC, last night, six bands shared their live music as an official part of A2IM’s Indie Week: the Reeperbahn Festival, New York Edition.Based around the entertainment and red-light district in Hamburg, Germany, since 2006, the Reeperbahn Festival will be putting on more than 900 events in the area this September with international artists of all genres, in addition to a conference for music and related industry professionals.  In the 07:30-08:00pm slot was Winnipeg, Canada’s Yes We Mystic, with their melodic and experimental indie rock sound.With 7K Spotify monthly listeners and their top four listener cities being Toronto, Winnipeg, Vancouver and Montreal, they have obviously created a stir in their home country very effectively.Half of their 2.2K Instagram followers are from their hometown of Winnipeg, so Reeperbahn has done a lovely job of finding a truly homegrown act and bringing them to a bigger stage.Jerusalem-born singer-songwriter-producer ORI graced the stage in the 8:05-8:35pm slot. Using original looping samples of his own voice and instrumentation, he’s got over 10K monthly listeners and his 2017 single “Black Book” is currently on the The Austin 100 playlist from NPR Music and the official SXSW 2019 playlist, where he played two official sets this past March.His current Spotify Popularity Index may only be at 28 for now, but his work has also been sampled by Kendrick Lamar and Jay Rock.Closing out the night in the 11:20-midnight slot was Surfbort, most likely because they are a hard act to follow with their hottest tracks titled “Hippoe Vomit Inhaler”, “Pretty little fucker” and “Dicks”.The Brooklyn-based punk outfit is named after a Beyoncé lyric and is just under 20K monthly listeners and is featured on Green Day’s “Oakland Coffee” playlist as well as the official SXSW 2019 playlist, as they also played a set in Austin last spring.In other slots were Germany’s Leoniden and Gurr, Austria’s Mira Lu Kovacs, and America’s Renata Zeiguer (US), but check them out...if Hamburg’s Reeperbahn NY Edition picked them, they’re sure to be worth the listen and maybe your latest favorite sound.OutroThat’s it for your Daily Data Dump for Thursday, June 20, 2019. This is Rutger from Chartmetric.Free accounts are at chartmetric.comAnd article links and show notes are at: podcast.chartmetric.com.Happy Thursday, and we’ll see you tomorrow!

HighlightsWe’re on the road! We’re at A2IM’s Indie Week in New York City and so we’ll publish our music data-related thoughts and experiences for you starting in tomorrow’s episode in case you can’t make it.But for today, we’ll celebrate the indie community on Amazon Music with an indie-focused New Music Friday Monday!Mission   Good morning, it’s Jason here at Chartmetric with your 3-minute Data Dump where we upload charts, artists and playlists into your brain so you can stay up on the latest in the music data world.DateThis is your Data Dump for Monday, June 17, 2019.New Music Friday Monday: Fresh Indie on AmazonHopping over to the “Fresh Indie” playlist on Amazon Music, we’ve got no less than 60 tracks of the most brand spanking new independent music in the streaming world.The tracks all come from over 35 different indie labels to include 4AD, ATO Records and XL recordings.Over 64% of the artists featured are from the US, 16% from the UK, and then Canada/Norway/Australia/New Zealand making up the rest of the Anglo-focused playlist.Just under half of the list has either the indiepop, folk-pop or indietronica genre tag attached to it, with 15+ other genre tags thrown in to make for a diverse-sounding set.In the #4 position is the funk-addled “Mary Always” instrumental track by Houston-born band Khruangbin, mixing soul, dub, psychedelia, and Thai funk.The track is currently on nine Spotify editorial playlists including All New Indie w/ 958K followers and 2 Apple editorial playlists including Today’s Indie Rock.The great playlist promotion is coming out of Bloomington, Indiana, where the track’s Dead Oceans label is housed with the Secretly Group, an umbrella of indie labels putting out rock music of different flavors.In the #9 spot is the spacious, introspective track “Conversation Piece” by Memphis, Tennessee’s Julien Baker.Currently on no Spotify editorial playlists and 1 Apple editorial playlist, the Late Night Menu, the Matador Records release is the latest from the singer-songwriter known for heart-wrenching lyricism and melody.What’s uber cool about Baker is that she is also part of supergroup boygenius, also under Matador, with Phoebe Bridgers and Lucy Dacus, kind of following the K-pop model of supergroup splitting off into solo careers, but just the reverse, as boygenius formed in 2018 and each member had solo careers as early as 2014. Last but not least is “Flood Hands” by Vagabon, coming from Nonesuch Records.Vagabon is in the #12 slot on the Amazon playlist, currently on 3 Spotify editorial playlists, also including All New Indie with Khruangbin and 2 Apple editorial playlists, also including Today’s Indie Rock.Released on June 13, it’s the latest from the Cameroon-born multi-instrumentalist now based in NYC...where we are this week for A2IM’s Indie Week!OutroThat’s it for your Daily Data Dump for Monday, June 17, 2019. This is Jason from Chartmetric.Free accounts are at chartmetric.comAnd article links and show notes are at: podcast.chartmetric.comHappy Monday, we’ll see you tomorrow from NYC’s Indie Week floor! Bye.

Highlights  Do you know what a playback singer is? Or how about that Mexican Norteño music has German polka in it? I sure didn’t, but our A&R tool did!Mission   Good morning, it’s Jason here at Chartmetric with your 3-minute Data Dump where we upload charts, artists and playlists into your brain so you can stay up on the latest in the music data world.DateThis is your Data Dump for Friday, June 14th, 2019.Found on Friday: 4 Indian Playback Singers and 2 Norteño BandasSo checking into our A&R tool which roams the Interwebs for the biggest delta, or change, in between now and 28 days ago, we focus on the singular metric of total YouTube views via their artist channel.Looking at the Top 20 biggest gains, what’s not surprising? Billie Eilish at #5, that’s cool, Will Smith at #7 after the new Aladdin movie releasing, that’s also awesome…But you know what’s really hot? Indian playback singers, because they occupy positions 1 through 4!A playback singer in Bollywood masterfully records world-class vocals for songs for the on-camera actors to lip-sync to during shooting. For us Westerners who are obsessed with authenticity, let’s just imagine a publicly accepted form of lip-sync that not only helps create great Indian movies, but also celebrates the playback singers themselves.In the #1 spot is Calcutta-born Kumar Sanu with 30% YouTube view growth to 16.5M, who also just appeared on TV show Sa Re Ga Ma Pa L'il Champs, which pits 5-15 year olds against each other in a singing competition.In the #2 position is Arijit Singh who saw 20% YouTube view growth to 18.7M, and just released “Bekhayali” from Indian dramatic film Kabir Singh on June 3rd.Coming #3 on our list, but #1 in the Bollywood industry, is Lata Mangeshkar with 19% view growth to 9M, but it’s honestly a footnote to one of the most well-known and highly-respected playback singers ever.Mangeshkar has been listed in the Guinness Book of World Records as the most recorded artist with over 30K tracks in 20 different languages, the recipient of the Bharat Ratna,    India’s highest civilian honor (equivalent to the US Presidential Medal of Freedom), recipient of France’s Legion of Honour, and publicly selected as 10th Greatest Indian of modern times.How’s that for achievement? I really don’t think she cares about her YouTube views right now, nor should she. Hats off to her.Moving to Mexico, Norteño music is a genre of Northern Mexico that blends German polka and waltz traditions with Mexican ones.For all of us not familiar with Mexican music, the key instruments that define Norteño is the accordion (gracias a los europeos) and the bajo sexto, which translates to “sixth bass”, and looks like a 12-string guitar, but is used as a bass instrument.Now in the #6 position is Los Invasores De Nuevo León, with 10% YouTube view growth to 26M.The Latin Grammy-nominated Los Invasores, or “The Invaders of Nuevo León”, formed in 1978, and are currently on tour in south Texas,In the #16 position is Los Tucanes De Tijuana, with 5% view growth to 132M.“Los Tucanes”, or “The Toucans of Tijuana”, made history this year as first norteño act to play Coachella, also getting keys to the city.And if you want to catch up with some meme action, look up the “La Chona” challenge...their fast-paced 1994 record received a revival last year when uploaders recorded themselves dancing to “La Chona” outside their moving vehicles, a la Drake’s “In My Feelings”.OutroBueno! That’s it for your Daily Data Dump for Friday, June 14th, 2019. This is Jason from Chartmetric.Please give us a shout-out on iTunes. If you’re on an iPhone, dodge those crafty notifications and just scroll down on the Daily Data Dump page in your Apple Podcasts app or in the Ratings and Review tab in your iTunes app on your laptop, and show some love, Rutger and I appreciate it.Free accounts are at chartmetric.comAnd article links and show notes are at: podcast.chartmetric.comHappy Friday, have a great weekend, and see you on Monday!

Highlights  It’s Found on Friday, and we’re using Spotify playlist adds and reach to introduce you to a tropical DJ from Spain, an American lo-fi beats producer and an Irish singer-songwriter with literary flair.Mission   Good morning, it’s Jason here at Chartmetric with your 3-minute Data Dump where we upload charts, artists and playlists into your brain so you can stay up on the latest in the music data world.DateThis is your Data Dump for Friday, June 7th, 2019.Found on Friday: Playlist Reach Uncovers a Galician DJ, an American lo-fi beats producer and an Irish Literary SongwriterIt’s Found on Friday, which means we are digitally crate-digging for new artists in the proverbial streaming record shops of the Internets, and this time through the lens of “reach”.In the world of social media, reach is the unique number of people who see a particular piece of content, and we can contrast that with “impressions”, which are the total number of times they see that content, and “engagement”, which is the number of interactions those audience members actively take upon that content.In Spotify’s streaming world, reach in one sense is obviously playlisting, and we can aggregate how many followers a particular playlist has, and at the artist-level, aggregate how many total playlist followers that artist has at any given point.These of course are non-unique follower counts, as we all are probably following dozens if not hundreds of playlists from each of our single profiles.Nevertheless, it’s still a measure of reach, and that can be an important metric for determining which artists are in a great position to break. Now ranked by number of new popular playlists adds in the past 30 days, Spanish DJ Zeper occupies the #1 spot today.From Pontevedra, Galicia, the young producer has a very accessible tropical dance vibe that has Majestic Casual vibes and would easily fit in any college student’s chillout or study playlist. Currently on 50 playlists with 10K or more followers, Zeper’s total playlist reach is over 2.8M followers, growing over 45K total followers since last week.His latest release was “Stop” on May 31st collaborating with another emerging artist KRIMETZ.Now added on an additional 39 playlists with over 10K followers each is American artist Hurley Mower.With his polished take on the lo-fi beats genre, Mower gained nearly another 30K aggregated playlist followers in the past week, bringing him over the 2M mark.With 207K monthly listeners and only 5.3K followers on his own Spotify profile, he’s got a listener to follower ratio of 38, which definitely puts him well into the promising artist category for that metric.Last but not least is Jealous of the Birds. Such an interesting name.On 5 playlists with more than 10K followers, the Irish singer-songwriter has over 767K total playlist followers, including Spotify’s Evening Acoustic playlist in the 84/100 spot and the Sad Indie playlist in the 60/80 position.She’s no stranger to attention however, her previous tracks have been from NPR’s All Songs Considered and BBC Radio 1’s Tune of the Week.No matter what you’re vibe, there’s some new artists hanging out on your smartphone, check them out this weekend!Outro That’s it for your Daily Data Dump for Friday, June 7th, 2019. This is Jason from Chartmetric.Do you like this podcast? Does it help your day? If so, this is the part where we grovel at your feet for an iTunes rating or review...we are a business to business podcast, so it’s not like we’re trying to blow up, but if we can grow our audience some more to maybe start a music data interest community, we think that could be a really cool thing.So if you like what we do, please give us a shout-out on iTunes. If you’re on an iPhone, just scroll all the way down on the Daily Data Dump page in your Apple Podcasts app or in the Ratings and Review tab in your iTunes app on your laptop, and show some love, Rutger and I will do a silent happy dance for every star that we get.Free accounts are at app.chartmetric.com/signupAnd article links and show notes are at: podcast.chartmetric.comHappy Friday, have a great weekend, and see you on Monday!