talk-data.com talk-data.com

Event

Databricks DATA + AI Summit 2023

2026-01-11 YouTube Visit website ↗

Activities tracked

46

Filtering by: ETL/ELT ×

Sessions & talks

Showing 26–46 of 46 · Newest first

Search within this event →
Migrating Complex SAS Processes to Databricks - Case Study

Migrating Complex SAS Processes to Databricks - Case Study

2022-07-19 Watch
video

Many federal agencies use SAS software for critical operational data processes. While SAS has historically been a leader in analytics, it has often been used by data analysts for ETL purposes as well. However, modern data science demands on ever-increasing volumes and types of data require a shift to modern, cloud architectures and data management tools and paradigms for ETL/ELT. In this presentation, we will provide a case study at Centers for Medicare and Medicaid Services (CMS) detailing the approach and results of migrating a large, complex legacy SAS process to modern, open-source/open-standard technology - Spark SQL & Databricks – to produce results ~75% faster without reliance on proprietary constructs of the SAS language, with more scalability, and in a manner that can more easily ingest old rules and better govern the inclusion of new rules and data definitions. Significant technical and business benefits derived from this modernization effort are described in this session.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Presto On Spark: A Unified SQL Experience

Presto On Spark: A Unified SQL Experience

2022-07-19 Watch
video

Presto was originally designed to run interactive queries against data warehouses, but now it has evolved into a unified SQL engine on top of open data lake analytics for both interactive and batch workloads. However, Presto doesn't scale to very large and complex batch pipelines. Presto Unlimited was designed to address such scalability challenges but it didn’t fully solve fault tolerance, isolation, and resource management.

Spark is the tool of choice across the industry for running large scale complex batch ETL pipelines. This motivated the development of Presto On Spark. Presto on Spark runs Presto as a library that is submitted with spark-submit to a Spark cluster. It leverages Spark for scaling shuffle, worker execution, and resource management. It thereby eliminates any query conversion between interactive and batch use cases. This solution helps enable a performant and scalable platform with seamless end-to-end experience to explore and process data.

Many analysts at Intuit use Presto to explore data in the Data Lake/S3 and use Spark for batch processing. These analysts would earlier spend several hours converting these exploration SQLs written for Presto to Spark SQL to operationalize/schedule them as data pipelines. Presto On Spark is now used by analysts at Intuit to run thousands of critical jobs. No query conversion is required here, improved analysts' productivity and empowered them to deliver insights at high speed.

Benefits from session: Attendees will learn about Presto On Spark architecture Attendees will learn when To Use Spark's Execution Engine With Presto Attendees will learn how Intuit runs thousands of presto jobs daily leveraging databricks platform which they can apply to their own work

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Radical Speed on the Lakehouse: Photon Under the Hood

Radical Speed on the Lakehouse: Photon Under the Hood

2022-07-19 Watch
video

Many organizations are standardizing on the lakehouse, however, this new architecture poses challenges with an underlying query execution engine for accessing structured and unstructured data. The execution engine needs to provide the performance of a data warehouse and the scalability of data lakes. To ensure optimum performance, the Databricks Lakehouse Platform offers Photon. This next-gen vectorized query execution engine outperforms existing data warehouses in SQL workloads and implements a more general execution framework for efficient processing of data with support of the Apache Spark™ API. With Photon, analytical queries are seeing a 3 to 5x speed increase, with a 40% reduction in compute hours for ETL workloads. In this session, we will dive into Photon, describe its integration with the Databricks Platform and Apache Spark™ runtimes, talk through customer use cases, and show how your SQL and DataFrame workloads can benefit from the performance of Photon.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Realize the Promise of Streaming with the Databricks Lakehouse Platform

Realize the Promise of Streaming with the Databricks Lakehouse Platform

2022-07-19 Watch
video
Erica Lee (Upwork)

Streaming is the future of all data pipelines and applications. It enables businesses to make data-driven decisions sooner and react faster, develop data-driven applications considered previously impossible, and deliver new and differentiated experiences to customers. However, many organizations have not realized the promise of streaming to its full potential because it requires them to completely redevelop their data pipelines and applications on new, complex, proprietary, and disjointed technology stacks.

The Databricks Lakehouse Platform is a simple, unified, and open platform that supports all streaming workloads ranging from ingestion, ETL to event processing, event-driven application, and ML inference. In this session, we will discuss the streaming capabilities of the Lakehouse Platform and demonstrate how easy it is to build end-to-end, scalable streaming pipelines and applications, to fulfill the promise of streaming for your business. You will also hear from Erica Lee, VP of ML at Upwork, the world's largest Work Marketplace, share how the Upwork team uses Databricks to enable real-time predictions by computing ML features in a continuous streaming manner.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Running a Low Cost, Versatile Data Management Ecosystem with Apache Spark at Core

Running a Low Cost, Versatile Data Management Ecosystem with Apache Spark at Core

2022-07-19 Watch
video

Data is the key component of Analytics, AI or ML platform. Organizations may not be successful without having a Platform that can Source, Transform, Quality check and present data in a reportable format that can drive actionable insights.

This session will focus on how Capital One HR Team built a Low Cost Data movement Ecosystem that can source data, transform at scale and build the data storage (Redshift) at a level that can be easily consumed by AI/ML programs - by using AWS Services with combination of Open source software(Spark) and Enterprise Edition Hydrograph (UI Based ETL tool with Spark as backend) This presentation is mainly to demonstrate the flexibility that Apache Spark provides for various types ETL Data Pipelines when we code in Spark.

We have been running 3 types of pipelines over 6+ years , over 400+ nightly batch jobs for $1000/mo. (1) Spark on EC2 (2) UI Based ETL tool with Spark backend (on the same EC2) (3) Spark on EMR. We have a CI/CD pipeline that supports easy integration and code deployment in all non-prod and prod regions ( even supports automated unit testing). We will also demonstrate how this ecosystem can failover to a different region in less than 15 minutes , making our application highly resilient.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Simon Whiteley + Denny Lee Live Ask Me Anything

Simon Whiteley + Denny Lee Live Ask Me Anything

2022-07-19 Watch
video
Denny Lee (Databricks) , Simon Whiteley (Advancing Analytics)

Simon and Denny Build A Thing is a live webshow, where Simon Whiteley (Advancing Analytics) and Denny Lee (Databricks) are building out a TV Ratings Analytics tool, working through the various challenges of building out a Data Lakehouse using Databricks. In this session, they'll be talking through their Lakehouse Platform, revisiting various pieces of functionality, and answering your questions, Live!

This is your chance to ask questions around structuring a lake for enterprise data analytics, the various ways we can use Delta Live Tables to simplify ETL or how to get started serving out data using Databricks SQL. We have a whole load of things to talk through, but we want to hear YOUR questions, which we can field from industry experience, community engagement and internal Databricks direction. There's also a chance we'll get distracted and talk about the Expanse for far too long.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Take Databricks Lakehouse to the Max with Informatica​

Take Databricks Lakehouse to the Max with Informatica​

2022-07-19 Watch
video

The hard part of ML and analytics is not building data models. It’s getting the data right and into production. Join us to learn how Informatica’s Intelligent Data Management Cloud (IDMC) helps you maximize the benefits of the Databricks’ Unified Analytics platform. Learn how our cloud-native capabilities can shorten your time to results. See how to enable more data users to easily load data and develop data engineering workflows on Databricks in ELT mode at scale. Find out how Informatica delivers all the necessary governance and compliance guardrails you need to operate analytics, AI and ML. Accelerate adoption and maximize agility while maintaining control of your data and lowering risk.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Turning Big Biology Data into Insights on Disease – The Power of Circulating Biomarkers

Turning Big Biology Data into Insights on Disease – The Power of Circulating Biomarkers

2022-07-19 Watch
video

Profiling small molecules in human blood across global populations gives rise to a greater understanding of the varied biological pathways and processes that contribute to human health and diseases. Herein, we describe the development of a comprehensive Human Biology Database, derived from nontargeted molecular profiling of over 300,000 human blood samples from individuals across diverse backgrounds, demographics, geographical locations, lifestyles, diseases, and medication regimens, and its applications to inform drug development.

Approximately 11,000 circulating molecules have been captured and measured per sample using Sapient’s high-throughput, high-specificity rapid liquid chromatography-mass spectrometry (rLC-MS) platform. The samples come from cohorts with adjudicated clinical outcomes from prospective studies lasting 10-25 years, as well as data on individuals’ diet, nutrition, physical exercise, and mental health. Genetic information for a subset of subjects is also included and we have added microbiome sequencing data from over 150,000 human samples in diverse diseases.

An efficient data science environment is established to enable effective health insight mining across this vast database. Built on a customized AWS and Databricks “infrastructure-as-code” Terraform configuration, we employ streamlined data ETL and machine learning-based approaches for rapid rLC-MS data extraction. In mining the database, we have been able to identify circulating molecules potentially causal to disease; illuminate the impact of human exposures like diet and environment on disease development, aging, and mortality over decades of time; and support drug development efforts through identification of biomarkers of target engagement, pharmacodynamics, safety, efficacy, and more.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Dive Deeper into Data Engineering on Databricks

Dive Deeper into Data Engineering on Databricks

2022-07-19 Watch
video

To derive value from data, engineers need to collect, transform, and orchestrate data from various data types and source systems. However, today’s data engineering solutions support only a limited number of delivery styles, involve a significant amount of hand-coding, and have become resource-intensive. Modern data engineering requires more advanced data lifecycle for data ingestion, transformation, and processing. In this session, learn how the Databricks Lakehouse Platform provides an end-to-end data engineering solution — ingestion, processing and scheduling — that automates the complexity of building and maintaining pipelines and running ETL workloads directly on a data lake, so your team can focus on quality and reliability to drive valuable insights.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

You Have BI. Now What? Activate Your Data!

You Have BI. Now What? Activate Your Data!

2022-07-19 Watch
video

Analytics has long been the end goal for data teams— standing up dashboards and exporting reports for business teams. But what if data teams could extend their work directly into the tools business teams use?

The next evolution for data teams is Activation. Smart organizations use reverse ETL to extend the value of Databricks by syncing data directly into business platforms, making their lakehouse a Customer Data Platform (CDP). By making Databricks the single source of truth for your data, you can create business models in your lakehouse and serve them directly to your marketing tools, ad networks, CRMs, and more. This saves time and money, unlocks new use cases for your data and turns data team efforts into revenue generating activities.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Delta Live Tables: Modern Software Engineering and Management for ETL

Delta Live Tables: Modern Software Engineering and Management for ETL

2022-07-19 Watch
video

Data engineers have the difficult task of cleansing complex, diverse data, and transforming it into a usable source to drive data analytics, data science, and machine learning. They need to know the data infrastructure platform in depth, build complex queries in various languages and stitch them together for production. Join this talk to learn how Delta Live Tables (DLT) simplifies the complexity of data transformation and ETL. DLT is the first ETL framework to use modern software engineering practices to deliver reliable and trusted data pipelines at any scale. Discover how analysts and data engineers can innovate rapidly with simple pipeline development and maintenance, how to remove operational complexity by automating administrative tasks and gaining visibility into pipeline operations, how built-in quality controls and monitoring ensure accurate BI, data science, and ML, and how simplified batch and streaming can be implemented with self-optimizing and auto-scaling data pipelines.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Measuring the Success of Your Algorithm Using a Shadow System

Measuring the Success of Your Algorithm Using a Shadow System

2022-07-19 Watch
video

How to determine whether your new data product is a success if you cannot use A/B testing techniques?

At Gousto we recently implemented our newest algorithm to route orders to sites. Comparing this to the previous algorithm using classic A/B testing techniques was not possible, because the algorithm requires a full set of orders to optimise and ensure the volume we send to sites remains stable. A routing algorithm is a high impact product. To ensure confidence in our algorithm before go-live, we came up with a different experimentation strategy. This included building a full-blown shadow system. For measuring its performance we built a set of data pipelines (including ETL) using Databricks.

Sometimes an A/B test cannot do the job. This talk will outline challenges and benefits of building a shadow system, providing the audience with an A/B testing alternative and an overview of relevant considerations in terms of choosing and building this experiment design.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

How AARP Services, Inc. automated SAS transformation to Databricks using LeapLogic

How AARP Services, Inc. automated SAS transformation to Databricks using LeapLogic

2022-07-19 Watch
video

While SAS has been a standard in analytics and data science use cases, it is not cloud-native and does not scale well. Join us to learn how AARP automated the conversion of hundreds of complex data processing, model scoring, and campaign workloads to Databricks using LeapLogic, an intelligent code transformation accelerator that can transform any and all legacy ETL, analytics, data warehouse and Hadoop to modern data platforms.

In this session experts from AARP and Impetus will share about collaborating with Databricks and how they were able to: • Automate modernization of SAS marketing analytics based on coding best practices • Establish a rich library of Spark and Python equivalent functions on Databricks with the same capabilities as SAS procedures, DATA step operations, macros, and functions • Leverage Databricks-native services like Delta Live Tables to implement waterfall techniques for campaign execution and simplify pipeline monitoring

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

How socat and UNIX Pipes Can Help Data Integration

How socat and UNIX Pipes Can Help Data Integration

2022-07-19 Watch
video

Nearly every developer is familiar with creating a CLI. Containerized CLIs provide a flexible, cross-language standard with a low barrier to entry for open-source contributors. The ETL process can be reduced to two CLIs: one that reads data and one that writes data. While this interface is simple enough to implement from the contributor’s side, Kubernetes’ distributed nature means orchestrating data transfer between the CLIs on Kubernetes presents an unsolved problem.

This talk describes a novel approach to reliably orchestrate CLIs on Kubernetes for data integration. Through this lens, we go through the evaluation of strategies and describe the pros and cons of each architecture for horizontally scaling containerised data integration workflows on Kubernetes. We also cover the journey of implementing a TCP-based “process” abstraction over CLIs using socat and UNIX pipes. This same approach powers all of Airbyte’s Kubernetes deployments and helps sync TBs of data daily.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

How the Largest County in the US is Transforming Hiring with a Modern Data Lakehouse

How the Largest County in the US is Transforming Hiring with a Modern Data Lakehouse

2022-07-19 Watch
video

Los Angeles County’s Department of Human Resources (DHR) is responsible for attracting a diverse workforce for the 37 departments it supports. Each year, DHR processes upwards of 400,000 applications for job opportunities making it one of the largest employers in the nation. Managing a hiring process of this scale is complex with many complicated factors such as background checks and skills examination. These processes, if not managed properly, can create bottlenecks and a poor experience for both candidates and hiring managers.

In order to identify areas for improvement, DHR set out to build detailed operational metrics across each stage of the hiring process. DHR used to conduct high level analysis manually using excel and other disparate tools. The data itself was limited, difficult to obtain, and analyze. In addition, it was taking analysts weeks to manually pull data from half a dozen siloed systems into excel for cleansing and analysis. This process was labor-intensive, inefficient, and prone to human error.

To overcome these challenges, DHR in partnership with Internal Services Department (ISD) adopted a modern data architecture in the cloud. Powered by the Azure Databricks Lakehouse, DHR was able to bring together their diverse volumes of data into a single platform for data analytics. Manual ETL processes that took weeks could now be automated in 10 minutes or less. With this new architecture, DHR has built Business Intelligence dashboards to unpack the hiring process to get a clear picture of where the bottlenecks are and track the speed with which candidates move through the process The dashboards allow the County departments innovate and make changes to enhance and improve the experience of potential job seekers and improve the timeliness of securing highly qualified and diverse County personnel at all employment levels.

In this talk, we’ll discuss DHR’s journey towards building a data-driven hiring process, the architecture decisions that enabled this transformation and the types of analytics that we’ve deployed to improve hiring efforts.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Day 1 Afternoon Keynote |  Data + AI Summit 2022

Day 1 Afternoon Keynote | Data + AI Summit 2022

2022-07-19 Watch
video
Eric Sun (Coinbase) , Zaheera Valani (Databricks) , Arsalan Tavakoli (Databricks) , Zhamak Dehghani (Nextdata) , Francois Ajenstat , George Fraser (Fivetran)

Day 1 Afternoon Keynote | Data + AI Summit 2022 Supercharging our data architecture at Coinbase using Databricks Lakehouse | Eric Sun | Keynote Partner Connect & Ecosystem Strategy | Zaheera Valani What are ELT and CDC, and why are all the cool kids doing it? |George Fraser Analytics without Compromise | Francois Ajenstat Fireside Chat with Zhamak Dehghani and Arsalan Tavakoli

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Partner Connect & Ecosystem Strategy

Partner Connect & Ecosystem Strategy

2022-07-19 Watch
video
Zaheera Valani (Databricks) , Francois Ajenstat , George Fraser (Fivetran)

Data + AI Summit Keynotes from: Partner Connect & Ecosystem Strategy (Zaheera Valani) What are ELT and CDC, and why are all the cool kids doing it? (George Fraser) Analytics without Compromise (Francois Ajenstat)

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

ROAPI: Serve Not So Big Data Pipeline Outputs Online with Modern APIs

ROAPI: Serve Not So Big Data Pipeline Outputs Online with Modern APIs

2022-07-19 Watch
video

Data is the key component of Analytics, AI or ML platform. Organizations may not be successful without having a Platform that can Source, Transform, Quality check and present data in a reportable format that can drive actionable insights.

This session will focus on how Capital One HR Team built a Low Cost Data movement Ecosystem that can source data, transform at scale and build the data storage (Redshift) at a level that can be easily consumed by AI/ML programs - by using AWS Services with combination of Open source software(Spark) and Enterprise Edition Hydrograph (UI Based ETL tool with Spark as backend) This presentation is mainly to demonstrate the flexibility that Apache Spark provides for various types ETL Data Pipelines when we code in Spark.

We have been running 3 types of pipelines over 6+ years , over 400+ nightly batch jobs for $1000/mo. (1) Spark on EC2 (2) UI Based ETL tool with Spark backend (on the same EC2) (3) Spark on EMR. We have a CI/CD pipeline that supports easy integration and code deployment in all non-prod and prod regions ( even supports automated unit testing). We will also demonstrate how this ecosystem can failover to a different region in less than 15 minutes , making our application highly resilient.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Scalable XGBoost on GPU Clusters

Scalable XGBoost on GPU Clusters

2022-07-19 Watch
video

XGBoost is a popular open-source implementation of gradient boosting tree algorithms. In this talk, we walk through some of the new features in XGBoost that help us train better models, and explain how to scale up the pipeline to larger datasets with GPU clusters.

It is challenging to train gradient boosting models with the growing size and complexity of data. The latest XGBoost introduces categorical data support to help data scientists work with non-numerical data without the need for encoding. The new XGBoost could train multi-output models to handle datasets with non-exclusive class labels and multi-target regression. XGBoost has also introduced a new AUC implementation that supports more model types and features a robust approximation in distributed environments.

The latest XGBoost has significantly improved its built-in GPU support for scalability and performance. The data loading and processing have been improved for increased memory efficiency, enabling users to handle larger datasets. GPU-based model training is over 2x faster compared to past versions. The performance improvement has also been extended to model explanation. XGBoost added GPU-based SHAP value computation, obtaining more than 10x speedup compared to the traditional CPU-based method. On Spark GPU clusters, end-to-end pipelines could now be accelerated on GPU from feature engineering in ETL to model training/inference in XGBoost.

We will walk through these XGBoost improvements with the newly released XGBoost packages from DMLC. Benchmark results will be shared. Example applications and notebooks will be provided for audiences to learn these new features on the cloud.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Simplify Global DataOps and MLOps Using Okta’s FIG Automation Library

Simplify Global DataOps and MLOps Using Okta’s FIG Automation Library

2022-07-19 Watch
video

Think for a moment about an ML pipeline that you have created. Was it tedious to write? Did you have to familiarize yourself with technology outside your normal domain? Did you find many bugs? Did you give up with a “good enough” solution? Even simple ML pipelines are tedious. Complex ML pipelines make teams that include Data Engineers and ML Engineers still end up with delays and bugs. Okta’s FIG (Feature Infrastructure Generator) simplifies this with a configuration language for Data Scientists that produces scalable and correct ML pipelines, even highly complex ones. FIG is “just a library” in the sense that you can PIP install it. Once installed, FIG will configure your AWS account, creating ETL jobs, workflows, and ML training and scoring jobs. Data Scientists then use FIG’s configuration language to specify features and model integrations. With a single function call, FIG will run an ML pipeline to generate feature data, train models, and create scoring data. Feature generation is performed in a scalable, efficient, and temporally correct manner. Model training artifacts and scoring are automatically labeled and traced. This greatly simplifies the ML prototyping experience. Once it is time to productionize a model, FIG is able to use the same configuration to coordinate with Okta’s deployment infrastructure to configure production AWS accounts, register build and model artifacts, and setup monitoring. This talk will show a demo of using FIG in the development of Okta’s next generation security infrastructure. The demo includes a walkthrough of the configuration language and how that is translated into AWS during a prototyping session. The demo will also briefly cover how FIG interacts with Okta’s deployment system to make productionization seamless.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Supercharging our data architecture at Coinbase using Databricks Lakehouse   Eric Sun

Supercharging our data architecture at Coinbase using Databricks Lakehouse Eric Sun

2022-07-19 Watch
video
Eric Sun (Coinbase)

Coinbase is neither simply a finance company nor a tech company — it’s a crypto company. This distinction has big implications for how we work with the Blockchain, Product and Financial data that we need to drive our hypergrowth. We’ve recently enabled a Lakehouse architecture based upon Databricks to unify these complex and varied data sets, to deliver a high performance, continuous ingestion framework at an unprecedented scale. We can now support both ETL and ML workloads on one platform to deliver innovative batch and streaming use cases, and democratize data much faster by enabling teams to use the tools of their choice, while greatly reducing end-to-end latency and simplifying maintenance and operations. In this keynote, we will share our journey to the Lakehouse, and some of the lessons learned as we built an open data architecture at scale.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/