Topic

AWS Glue

etl data_catalog aws

Activities

2

tagged

Activity Trend

10 peak/qtr

2020-Q1 2026-Q2

Top Events

AWS re:Invent 2024 15 O'Reilly Data Engineering Books 6 Data + AI Summit 2025 3 Data Engineering Podcast 2 O'Reilly Data Science Books 2 Databricks DATA + AI Summit 2023 2 Leaders of Analytics 1 Data Expo NL 2025 1 dbt Coalesce 2025 1 PyData Amsterdam 2025 1 Airflow Summit 2023 1 Experiencing Data w/ Brian T. O’Neill (AI & data product management leadership—powered by UX design) 1

Top Speakers

Noritaka Sekiyama (Amazon Web Services (AWS)) 3 Leonardo Gomez (AWS) 2 Tobias Macey 2 Viquar Khan 1 James T. McClave 1 Jason Williams 1 Trâm Ngọc Phạm 1 Gurmeet Saran (Anthropic) 1 Aaron Wishnick 1 Alvaro Videla 1 Luis Campos (AWS) 1 Navneet Srivastava (Amazon Web Services) 1

Activities

Showing filtered results

All Video Podcast Book

Filtering by: Databricks DATA + AI Summit 2023 ×

An API for Deep Learning Inferencing on Apache Spark™

2023-07-26 · Databricks DATA + AI Summit 2023 Watch

video

by Lee Yang

API Big Data Databricks ETL/ELT LLM MLOps PySpark Spark

Apache Spark is a popular distributed framework for big data processing. It is commonly used for ETL (extract, transform and load) across large datasets. Today, the transform stage can often include the application of deep learning models on the data. For example, common models can be used for classification of images, sentiment analysis of text, language translation, anomaly detection, and many other use cases. Applying these models within Spark can be done today with the combination of PySpark, Pandas_UDF, and a lot of glue code. Often, that glue code can be difficult to get right, because it requires expertise across multiple domains - deep learning frameworks, PySpark APIs, pandas_UDF internal behavior, and performance optimization.

In this session, we introduce a new, simplified API for deep learning inferencing on Spark, introduced in SPARK-40264 as a collaboration between NVIDIA and Databricks, which seeks to standardize and open source this glue code to make deep learning inference integrations easier for everyone. We discuss its design and demonstrate its usage across multiple deep learning frameworks and models.

Talk by: Lee Yang

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Processing Delta Lake Tables on AWS Using AWS Glue, Amazon Athena, and Amazon Redshift

2023-07-26 · Databricks DATA + AI Summit 2023 Watch

video

by Noritaka Sekiyama (Amazon Web Services (AWS)) , Akira Ajisaka

Athena AWS Amazon EMR Amazon RDS Cloud Computing Data Lake Data Lakehouse Databricks Delta DWH DynamoDB MongoDB +3 more

Delta Lake is an open source project that helps implement modern data lake architectures commonly built on cloud storages. With Delta Lake, you can achieve ACID transactions, time travel queries, CDC, and other common use cases on the cloud.

There are a lot of use cases of Delta tables on AWS. AWS has invested a lot in this technology, and now Delta Lake is available with multiple AWS services, such as AWS Glue Spark jobs, Amazon EMR, Amazon Athena, and Amazon Redshift Spectrum. AWS Glue is a serverless, scalable data integration service that makes it easier to discover, prepare, move, and integrate data from multiple sources. With AWS Glue, you can easily ingest data from multiple data sources such as on-prem databases, Amazon RDS, DynamoDB, MongoDB into Delta Lake on Amazon S3 even without expertise in coding.

This session will demonstrate how to get started with processing Delta Lake tables on Amazon S3 using AWS Glue, and querying from Amazon Athena, and Amazon Redshift. The session also covers recent AWS service updates related to Delta Lake.

Talk by: Noritaka Sekiyama and Akira Ajisaka

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc