talk-data.com talk-data.com

S

Speaker

Skyler Myers

1

talks

Head of Data Engineering Entrada

I am currently the head of data engineering at Entrada, a boutique consulting firm offering Databricks implementation services. Prior to that, I was a Sr. Solutions Consultant at Databricks for 2 years, helping to create tailored solutions for some of their biggest clients.

Bio from: Data + AI Summit 2025

Filtering by: Data + AI Summit 2025 ×

Filter by Event / Source

Talks & appearances

Showing 1 of 1 activities

Search activities →
Creating a Custom PySpark Stream Reader with PySpark 4.0

PySpark supports many data sources out of the box, such as Apache Kafka, JDBC, ODBC, Delta Lake, etc. However, some older systems, such as systems that use JMS protocol, are not supported by default and require considerable extra work for developers to read from them. One such example is ActiveMQ for streaming. Traditionally, users of ActiveMQ have to use a middle-man in order to read the stream with Spark (such as writing to a MySQL DB using Java code and reading that table with Spark JDBC). With PySpark 4.0’s custom data sources (supported in DBR 15.3+) we are able to cut out the middle-man processing using batch or Spark Streaming and consume the queues directly from PySpark, saving developers considerable time and complexity in getting source data into your Delta Lake and governed by Unity Catalog and orchestrated with Databricks Workflows.