talk-data.com talk-data.com

Event

Databricks DATA + AI Summit 2023

2026-01-11 YouTube Visit website ↗

Activities tracked

582

Sessions & talks

Showing 426–450 of 582 · Newest first

Search within this event →
Supercharge your SaaS applications with a modern, cloud-native database

Supercharge your SaaS applications with a modern, cloud-native database

2022-07-19 Watch
video

Today’s world demands modern applications that process data at faster speeds and deliver real-time insights. Yet the challenge for most businesses is their data infrastructure isn't designed for data intensity — the idea that high volumes of data should be quickly ingested and processed, no matter how complex or diverse the data sets. How do you meet the demands of a data-intensive application? It starts with the right database. This session gives you a roadmap with key criteria for powering modern, data-intensive applications with a cloud-native database — and how three customers drove up to 100x better performance for their applications.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Survey of Production ML Tech Stacks

Survey of Production ML Tech Stacks

2022-07-19 Watch
video

Production machine learning demand stitching together many tools ranging from open source standards to cloud-specific and third party solutions. This session surveys the current ML deployment technology landscape to contextualize which tools solve for which features off production ML systems such as CI/CD, REST endpoint, and monitoring. It'll help answer the questions: what tools are out there? Where do I start with the MLops tech stack for my application? What are the pros and cons of open source versus managed solutions? This talk takes a features driven approach to tool selection for MLops tacks to provide best practices in the most rapidly evolving field of data science.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Take Databricks Lakehouse to the Max with Informatica​

Take Databricks Lakehouse to the Max with Informatica​

2022-07-19 Watch
video

The hard part of ML and analytics is not building data models. It’s getting the data right and into production. Join us to learn how Informatica’s Intelligent Data Management Cloud (IDMC) helps you maximize the benefits of the Databricks’ Unified Analytics platform. Learn how our cloud-native capabilities can shorten your time to results. See how to enable more data users to easily load data and develop data engineering workflows on Databricks in ELT mode at scale. Find out how Informatica delivers all the necessary governance and compliance guardrails you need to operate analytics, AI and ML. Accelerate adoption and maximize agility while maintaining control of your data and lowering risk.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

The Future is Open - a Look at Google Cloud’s Open Data Ecosystem

The Future is Open - a Look at Google Cloud’s Open Data Ecosystem

2022-07-19 Watch
video
Anagha Khanolkar (Databricks) , Mansi Maharana (Databricks)

Join Anagha Khanolkar and Mansi Maharana, both Cloud Customer Engineers specialized in Advanced Analytics, to learn about Open Data Analytics on Google Cloud. This session will cover Google Data Cloud's Open Data Analytics portfolio, value proposition, customer stories, trends, and more, and including Databricks on GCP.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

The Future of Data - What’s Next with Google Cloud

The Future of Data - What’s Next with Google Cloud

2022-07-19 Watch
video
Bruno Aziza (Google Cloud)

Join Bruno Aziza, Head of Data and Analytics, Google Cloud, for an in-depth look at what he is seeing in the future of data and emerging trends. He will also cover Google Cloud’s data analytics practice, including insights into the Data Cloud Alliance, Big Lake, and our strategic partnership with Databricks.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

The Modern Metadata Platform: What, Why, and How?

The Modern Metadata Platform: What, Why, and How?

2022-07-19 Watch
video

Recently there has been a lot of buzz in the data community on the topic of metadata management. It’s often discussed in the context of data discovery, data provenance, data governance, and data privacy. Even Gartner and Forrester have created the new Active Metadata Management and Enterprise Data Fabric categories to highlight the development in this area.

However, metadata management isn’t actually a new problem. It has just taken on a whole new dimension with the widespread adoption of the Modern Data Stack. What used to be a small, esoteric issue that only concerned the core data team has exploded into complex, organizational challenges that plagued companies large and small.

In this talk, we’ll explain how a Modern Metadata Platform (MMP) can help solve these new challenges and the key ingredients to building a scalable and extensible MMP.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

The Semantics of Biology—Vaccine and Drug Research with Knowledge Graphs and Logical Inferencing

The Semantics of Biology—Vaccine and Drug Research with Knowledge Graphs and Logical Inferencing

2022-07-19 Watch
video

From the organization of the tree of life, to the tissues and structures of living organisms: trees and graphs are a recurring data structure in biology. Given the tree-like relationships between biological entities, Knowledge Graphs are emerging as the ideal way to store and retrieve biological data.

In our first Data + AI talk (https://www.youtube.com/watch?v=Kj5bZ2afWSU), we presented the Bellman open source library (https://github.com/gsk-aiops/bellman). Bellman was developed to translate SPARQL queries into Apache Spark Dataset operations so that scientists can submit graph queries in familiar environments like Jupyter and Databricks notebooks.

In this talk, we present the new logical inferencing capabilities we've built into the Bellman OSS library. We will demonstrate how connections between biological entities that are not explicitly connected in the data are deduced from ontologies. These inferred connections are returned to the scientist to aid in the discovery of new connections with the intent on accelerating gene to disease research. To demonstrate these capabilities, we will take a deep dive into the "subclassOf" logical entailment to retrieve all subclasses of a biological entity. The performance characteristics of inference algorithms like forward and backward chaining will also be compared.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Tools for Assisted Apache Spark Version Migrations, From 2.1 to 3.2+

Tools for Assisted Apache Spark Version Migrations, From 2.1 to 3.2+

2022-07-19 Watch
video

This talk will look at the current state of tools to automate library and language upgrades in Python and Scala and apply them to upgrading to new version of Apache Spark. After doing a very informal survey, it seems that many users are stuck on no longer supported versions of Spark, so this talk will expand on the first attempt at automating upgrades (2.4 - 3.0) to explore the problem all the way back to 2.1.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Towards Dynamic Microstructure: The Role of Machine Learning in the Next Generation of Exchanges

Towards Dynamic Microstructure: The Role of Machine Learning in the Next Generation of Exchanges

2022-07-19 Watch
video
Michael O’Rourke (Nasdaq) , Douglas Hamilton (Nasdaq)

What role will AI and machine learning play in ensuring the efficiency and transparency of the next generation of markets?

In this session, Douglas Hamilton (AVP, Machine Intelligence Lab) and Michael O’Rourke (SVP, Engineering & AI/ML) will show attendees how Nasdaq is building dynamic microstructures that reduce the inherent frictions associated with trading, and give insights into their application across industries.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Turbocharge your AI/ML Databricks workflows with Precisely

Turbocharge your AI/ML Databricks workflows with Precisely

2022-07-19 Watch
video

Trusted analytics and predictive data models require accurate, consistent, and contextual data. The more attributes used to fuel models, the more accurate their results. However, building comprehensive models with trusted data is not easy. Accessing data from multiple disparate sources, making spatial data consumable, and enriching models with reliable third-party data is challenging.

In response to these challenges, Precisely has developed tools to facilitate a location-enabled lakehouse on the Databricks platform, helping users get more out of their data. Come see live demos and learn how to build your own location-enabled lakehouse by:

• Organizing and managing address data and assigning a unique and persistent identifier • Enriching addresses with standard and dynamic attributes from our curated data portfolio • Analyzing enriched data to uncover relationships and create dashboard visualizations • Understanding high-level solution architecture

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Turning Big Biology Data into Insights on Disease – The Power of Circulating Biomarkers

Turning Big Biology Data into Insights on Disease – The Power of Circulating Biomarkers

2022-07-19 Watch
video

Profiling small molecules in human blood across global populations gives rise to a greater understanding of the varied biological pathways and processes that contribute to human health and diseases. Herein, we describe the development of a comprehensive Human Biology Database, derived from nontargeted molecular profiling of over 300,000 human blood samples from individuals across diverse backgrounds, demographics, geographical locations, lifestyles, diseases, and medication regimens, and its applications to inform drug development.

Approximately 11,000 circulating molecules have been captured and measured per sample using Sapient’s high-throughput, high-specificity rapid liquid chromatography-mass spectrometry (rLC-MS) platform. The samples come from cohorts with adjudicated clinical outcomes from prospective studies lasting 10-25 years, as well as data on individuals’ diet, nutrition, physical exercise, and mental health. Genetic information for a subset of subjects is also included and we have added microbiome sequencing data from over 150,000 human samples in diverse diseases.

An efficient data science environment is established to enable effective health insight mining across this vast database. Built on a customized AWS and Databricks “infrastructure-as-code” Terraform configuration, we employ streamlined data ETL and machine learning-based approaches for rapid rLC-MS data extraction. In mining the database, we have been able to identify circulating molecules potentially causal to disease; illuminate the impact of human exposures like diet and environment on disease development, aging, and mortality over decades of time; and support drug development efforts through identification of biomarkers of target engagement, pharmacodynamics, safety, efficacy, and more.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Unifying Data Science and Business: AI Augmentation/Integration in Production Business Applications

Unifying Data Science and Business: AI Augmentation/Integration in Production Business Applications

2022-07-19 Watch
video

Why is it so hard to integrate Machine Learning into real business applications? In 2019 Gartner predicted that AI augmentation would solve this problem and would create will create $2.9 trillion of business value and 6.2 billion hours of worker productivity in 2021. A new realm of business science methods that encompass AI-powered analytics that allows people with domain expertise to make smarter decisions faster and with more confidence have also emerged as a solution to this problem. Dr. Harvey will demystify why integration challenges still account for $30.2 billion in annual global losses and discuss what it takes to integrate AI/ML code or algorithms into real business applications and the effort that goes into making each component, including data collection, preparation, training, and serving production-ready, enabling organizations to use the results of integrated models repeatedly with minimal user intervention. Finally, Dr. Harvey will discuss AISquared’s integration with Databricks and MLFlow to accelerate the integration of AI by unifying data science with business. By adding five lines of code to your model, users can now leverage AISquared’s model integration API framework which provides a quick and easy way to integrate models directly into live business applications.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Unity Catalog: Journey to Unified Governance for Your Data and AI Assets on Lakehouse

Unity Catalog: Journey to Unified Governance for Your Data and AI Assets on Lakehouse

2022-07-19 Watch
video

Modern data assets take many forms: not just files or tables, but dashboards, ML models, and unstructured data like video and images, all of which cannot be governed and managed by legacy data governance solutions. Join this session to learn how data teams can use Unity Catalog to centrally manage all data and AI assets with a common governance model based on familiar ANSI SQL, ensuring much better native performance and security. Built-in automated data lineage provides end-to-end visibility into how data flows from source to consumption, so that organizations can identify and diagnose the impact of data changes. Unity Catalog delivers the flexibility to leverage existing data catalogs and solutions and establish a future-proof, centralized governance without expensive migration costs. It also creates detailed audit reports for data compliance and security, while ensuring data teams can quickly discover and reference data for BI, analytics, and ML workloads, accelerating time to value.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

What to Do When Your Job Goes OOM in the Night (Flowcharts!)

What to Do When Your Job Goes OOM in the Night (Flowcharts!)

2022-07-19 Watch
video

Have you ever had a Spark job just stop working? No idea where to start debugging? Or maybe your job that used to be completed in minutes is now taking hours? Or are you just tired of answering user questions? Come join us for a fun detour into the world of out of memory exceptions, slow jobs, and other things that make our lives sad and leave with techniques to make our lives happy again. This flowchart is based on the initial work of Anya's Spark tuning flowchart updated with our collective experience fixing broken Spark jobs. The talk will wrap up with the methodology we used and how you can contribute to the flowchart (aka guilt you into writing pull requests).

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Why a Data Lakehouse is Critical During the Manufacturing Apocalypse

Why a Data Lakehouse is Critical During the Manufacturing Apocalypse

2022-07-19 Watch
video

COVID has changed the way that we work and the way that we must do business. Supply Chain disruptions have impacted manufacturers’ ability to manufacture and distribute products. Logistics and the lack of labor have forced us to staff differently. The existential threat is real and we must change the way that we analyze data and solve problems real time in order to stay relevant.

In this session, you’ll learn about our journey, why the Data Lake and digital tech is essential to survival in this new world, some practical examples of how machine learning and data pipelines enable faster decision making, and why businesses cannot survive without these capabilities.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Databricks SQL Under the Hood: What's New with Live Demos

Databricks SQL Under the Hood: What's New with Live Demos

2022-07-19 Watch
video

With serverless SQL compute and built-in governance, Databricks SQL lets every analyst and analytics engineer easily ingest, transform, and query the freshest data directly on your data lake, using their tools of choice like Fivetran, dbt, PowerBI or Tableau, and standard SQL. There is no need to move data to another system. All this takes place at virtually any scale, at a fraction of the cost of traditional cloud data warehouses. Join this session for a deep dive into how Databricks SQL works under the hood, and see a live end-to-end demo of the data and analytics on Databricks from data ingestion, transformation, and consumption, using the modern data stack along with Databricks SQL.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Data Mesh in Action – Building Data Mesh Architecture Pattern with LTI Canvas Alcazar

Data Mesh in Action – Building Data Mesh Architecture Pattern with LTI Canvas Alcazar

2022-07-19 Watch
video

Data is no longer considered an asset to be protected within teams, but as an asset to be democratized and made available to everyone in the organization in a secure and governed manner. The Data Mesh is an evolving data architecture pattern that helps organizations in breaking down data silos and providing agility to respond to market changes quickly with decentralized data ownership and centralized governance and security.

This talk will provide details and demonstrate how to use Databricks Delta Lake with Unity Catalog to implement and operationalize the Data Mesh Architecture pattern. The demo includes LTI Canvas Alcazar solution which helps accelerate the data mesh implementation with Databricks.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Deep Dive into the New Features of Apache Spark 3.2 and 3.3

Deep Dive into the New Features of Apache Spark 3.2 and 3.3

2022-07-19 Watch
video

Apache Spark has become the most widely-used engine for executing data engineering, data science and machine learning on single-node machines or clusters. The number of monthly maven downloads of Spark has rapidly increased to 20 million.

We will talk about the higher-level features and improvements in Spark 3.2 and 3.3. The talk also dives deeper into the following features + Introducing pandas API on Apache Spark to unify small data API and big data API. + Completing the ANSI SQL compatibility mode to simplify migration of SQL workloads. + Productionizing adaptive query execution to speed up Spark SQL at runtime. + Introducing RocksDB state store to make state processing more scalable

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Defending Against Adversarial Model Attacks

Defending Against Adversarial Model Attacks

2022-07-19 Watch
video

The application of AI algorithms in domains such as self-driving cars, facial recognition, and hiring holds great promise. At the same time, it raises legitimate concerns about AI algorithms robustness against adversarial attacks. Widespread adoption of AI algorithms where the predictions are hidden or obscured from the trained eye of the subject expert, opportunities for a malicious actor to take advantage of the AI algorithms grow considerably, necessitating the addition of adversarial robustness training and checking. To protect against and mitigate the damages caused by these malicious actors, this talk will examine how to build a pipeline that’s robust against adversarial attacks by leveraging Kubeflow Pipelines and integration with LFAI Adversarial Robustness Toolbox (ART). Additionally we will show how to test a machine learning model's adversarial robustness in production on Kubeflow Serving, by virtue of Payload logging (KNative eventing) and ART. This presentation focuses on adversarial robustness instead of fairness and bias.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Deliver Faster Decision Intelligence From Your Lakehouse

Deliver Faster Decision Intelligence From Your Lakehouse

2022-07-19 Watch
video

Accelerate the path from data to decisions with the the Tellius AI-driven Decision Intelligence platform powered by Databricks Delta Lake. Empower business users and data teams to analyze data residing in the Delta Lake to understand what is happening in their business, uncover the reasons why metrics change, and get recommendations on how to impact outcomes. Learn how organizations derive value from Delta Lakehouse with a modern analytics experience that unifies guided insights, natural language search, and automated machine learning to speed up data-driven decision making at cloud scale.

In this session, we will showcase how customers: - Discover changes in KPIs and investigate the reasons why metrics change with AI-powered automated analysis - Empower business users and data analysts to iteratively explore data to identify trend drivers, uncover new customer segments, and surface hidden patterns in data - Simplify and speed-up analysis from massive datasets on Databrick Delta lake

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Delta Lake, the Foundation of Your Lakehouse

Delta Lake, the Foundation of Your Lakehouse

2022-07-19 Watch
video

Delta Lake is the open source storage layer that makes the Databricks Lakehouse Platform possible by adding reliability, performance, and scalability to your data, wherever it is located. Join this session for an inside look at what is under the hood of Databricks - see how Delta Lake, by adding ACID transactions and versioning to Parquet files together with the Photon engine, provides customers with huge performance gains and the ability to address new challenges. This session will include a demo and overview of customer use cases unlocked by Delta Lake, and the benefits of running Delta Lake workloads on Databricks.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Delta Sharing - A New Paradigm for Secure Data Sharing and Data Collaboration on Lakehouse

Delta Sharing - A New Paradigm for Secure Data Sharing and Data Collaboration on Lakehouse

2022-07-19 Watch
video

Data sharing and data collaboration have become important in today's hyper connected digital economy. But to date, a lack of standards-based data sharing protocol has resulted in data sharing solutions tied to a single vendor or commercial product introducing vendor lock-in risks. What the industry deserves is an open approach to data sharing. Additionally, with stringent privacy regulations, data collaboration on sensitive data has become a challenge for organizations, resulting in fragmented, siloed, and incomplete insights. Join this session to learn how Databricks Lakehouse Platform simplifies secure data sharing and enables data collaboration across organizations in a privacy centric way.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Designing Better MLOps Systems

Designing Better MLOps Systems

2022-07-19 Watch
video

Real-world data problems are becoming increasingly daunting to solve, as data volume grows and computing tools proliferate. Since 2018, Gartner has predicted that 85% of ML projects will fail and this trend will likely continue through 2022 as well. Nevertheless, in most cases, ML practitioners have the opportunity to avoid their projects from failing in the early phases.

In this talk, the speaker will borrow from her consultancy and hands-on implementation experience with cross-functional clients to share her takeaways in designing better ML systems. The talk will walk through common pitfalls to watch out for, relevant best practices in software engineering for ML, and technical anchors that make a robust system. This talk aims to empower the audience – beginner and experienced practitioners alike – with confidence in their ML project designs and help provide the big-picture design thinking framework for successful projects.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Destination Lakehouse: All Your Data, Analytics and AI on One Platform

Destination Lakehouse: All Your Data, Analytics and AI on One Platform

2022-07-19 Watch
video

The data lakehouse is the future for modern data teams seeking to innovate with a data architecture that simplifies data workloads, eases collaboration, and maintains the flexibility and openness to stay agile as a company scales. The Databricks Lakehouse Platform realizes this idea by unifying analytics, data engineering, machine learning, and streaming workloads across clouds on one simple, open data platform. In this session, learn how the Databricks Lakehouse Platform can meet your needs for every data and analytics workload, with examples of real-customer applications, reference architectures, and demos to showcase how you can create modern data solutions of your own.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Distributed Machine Learning at Lyft

Distributed Machine Learning at Lyft

2022-07-19 Watch
video

Data collection, preprocessing, feature engineering are the fundamental steps in any Machine Learning Pipeline. After feature engineering, being able to parallelize training on multiple low cost machines helps to reduce cost and time both. And, then being able to train models in a distributed manner speeds up Hyperparameter Tuning. How can we unify these stages of ML Pipeline in one unified distributed training platform together? And that too on Kubernetes?

Our ML platform is completely based on Kubernetes because of its scalability and rapid bootstrapping time of resources. In this talk we will demonstrate how Lyft uses Spark on Kubernetes, Fugue (our home grown unifying compute abstraction layer) to design a holistic end to end ML Pipeline system for distributed feature engineering, training & prediction experience for our customers on our ML Platform on top of Spark on K8s. We will also do a deep dive to show how we are abstracting and hiding infrastructure complexities so that our Data Scientists and Research Scientist can focus only on the business logic for their models through simple pythonic APIs and SQL. We let the users focus on ''what to do'' and the platform takes care of ''how to do''. We will share our challenges, learning and the fun we had while implementing. Using Spark on K8s have helped us achieve large scale data processing with 90% less cost and at times bringing down processing time from 2 hours to less than 20 mins.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/