talk-data.com

Topic

Redshift

Amazon Redshift

data_warehouse cloud aws olap

Activities

tagged

Activity Trend

17 peak/qtr

2020-Q1 2026-Q2

Top Events

Data Engineering Podcast 61 AWS re:Invent 2024 24 O'Reilly Data Engineering Books 10 Databricks DATA + AI Summit 2023 3 The Analytics Power Hour 2 Airflow Summit 2024 2 dbt Coalesce 2022 2 Dbt Coalesce 2024 2 Data + AI Summit 2025 2 dbt Coalesce 2023 1 Airflow Summit 2020 1 DataFramed 1

Top Speakers

Tobias Macey 61 Neeraja Rentachintala (Amazon) 3 Maxime Beauchemin (Preset) 2 Noritaka Sekiyama (Amazon Web Services (AWS)) 2 Shruti Worlikar (AWS Analytics) 2 Julien Le Dem (Astronomer) 2 Tim Wilson (Analytics Power Hour - Columbus (OH) 2 Anusha Challa (AWS) 2 Gareth Eagar 2 Harshida Patel (AWS) 2 Michael Helbling (Search Discovery) 2 Gleb Mezhanskiy (Datafold) 2

Activities

Showing filtered results

All Video Podcast Book

Filtering by: Databricks DATA + AI Summit 2023 ×

Processing Delta Lake Tables on AWS Using AWS Glue, Amazon Athena, and Amazon Redshift

2023-07-26 · Databricks DATA + AI Summit 2023 Watch

video

by Noritaka Sekiyama (Amazon Web Services (AWS)) , Akira Ajisaka

Athena AWS Amazon EMR AWS Glue Amazon RDS Cloud Computing Data Lake Data Lakehouse Databricks Delta DWH DynamoDB +3 more

Delta Lake is an open source project that helps implement modern data lake architectures commonly built on cloud storages. With Delta Lake, you can achieve ACID transactions, time travel queries, CDC, and other common use cases on the cloud.

There are a lot of use cases of Delta tables on AWS. AWS has invested a lot in this technology, and now Delta Lake is available with multiple AWS services, such as AWS Glue Spark jobs, Amazon EMR, Amazon Athena, and Amazon Redshift Spectrum. AWS Glue is a serverless, scalable data integration service that makes it easier to discover, prepare, move, and integrate data from multiple sources. With AWS Glue, you can easily ingest data from multiple data sources such as on-prem databases, Amazon RDS, DynamoDB, MongoDB into Delta Lake on Amazon S3 even without expertise in coding.

This session will demonstrate how to get started with processing Delta Lake tables on Amazon S3 using AWS Glue, and querying from Amazon Athena, and Amazon Redshift. The session also covers recent AWS service updates related to Delta Lake.

Talk by: Noritaka Sekiyama and Akira Ajisaka

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Running a Low Cost, Versatile Data Management Ecosystem with Apache Spark at Core

2022-07-19 · Databricks DATA + AI Summit 2023 Watch

video

AI/ML Analytics AWS Amazon EC2 Amazon EMR CI/CD Data Management Databricks ETL/ELT Spark

Data is the key component of Analytics, AI or ML platform. Organizations may not be successful without having a Platform that can Source, Transform, Quality check and present data in a reportable format that can drive actionable insights.

This session will focus on how Capital One HR Team built a Low Cost Data movement Ecosystem that can source data, transform at scale and build the data storage (Redshift) at a level that can be easily consumed by AI/ML programs - by using AWS Services with combination of Open source software(Spark) and Enterprise Edition Hydrograph (UI Based ETL tool with Spark as backend) This presentation is mainly to demonstrate the flexibility that Apache Spark provides for various types ETL Data Pipelines when we code in Spark.

We have been running 3 types of pipelines over 6+ years , over 400+ nightly batch jobs for $1000/mo. (1) Spark on EC2 (2) UI Based ETL tool with Spark backend (on the same EC2) (3) Spark on EMR. We have a CI/CD pipeline that supports easy integration and code deployment in all non-prod and prod regions ( even supports automated unit testing). We will also demonstrate how this ecosystem can failover to a different region in less than 15 minutes , making our application highly resilient.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

ROAPI: Serve Not So Big Data Pipeline Outputs Online with Modern APIs

2022-07-19 · Databricks DATA + AI Summit 2023 Watch

video

AI/ML Analytics API AWS Amazon EC2 Amazon EMR Big Data CI/CD Databricks ETL/ELT Spark