talk-data.com talk-data.com

Event

Databricks DATA + AI Summit 2023

2026-01-11 YouTube Visit website ↗

Activities tracked

287

Filtering by: AI/ML ×

Sessions & talks

Showing 201–225 of 287 · Newest first

Search within this event →
Competitive advantage hinges on predictive insights generated from AI! Build powerful data-driven

Competitive advantage hinges on predictive insights generated from AI! Build powerful data-driven

2022-07-19 Watch
video

AI is central to unlocking competitive advantage. However data science teams don’t have access to a consistent level of high-quality data required to build AI & ML data applications.

Instead data scientists spend 80% of their time collecting, cleaning & preparing the data for analysis rather than building AI-data applications.

During this talk Snowplow introduces the concept of data creation. Create & deploy high-quality & predictive behavioral data in real-time to Databricks.

Learn how being equipped with AI-ready data in Databricks allows data science teams to focus on building AI data applications rather than data wrangling—dramatically accelerating the pace of data projects & improving model performance & managing data governance. - How to execute more AI & data intensive applications in production using Databricks & Snowplow - How to execute on each AI & data intensive application faster thanks to pre-validated & predictive data - How data creation can solve for data governance

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Data Boards: A Collaborative and Interactive Space for Data Science

Data Boards: A Collaborative and Interactive Space for Data Science

2022-07-19 Watch
video

Databricks enables many organizations to harness the power of data; but while Databricks enables collaboration across Data Scientists and Data Engineers, there is still opportunity to begin democratizing access to domain experts. Successfully achieving this requires a rethinking of the classic analytics user interfaces, towards interactive systems with highly collaborative visual interfaces. Current visualization and workflow tools are ill-suited to bringing the full team together. I will present Northstar, a novel system we developed for Interactive Data Exploration at MIT / Brown University, now commercialized by Einblick. I will explain why Northstar required us to completely rethink the analytics stack, from the interface to the “guts,” and highlight the techniques we developed to provide a truly novel user-interface which enables creating code optional analysis over Databricks, where all user personas can collaborate together very large datasets and use complex ML operations.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Data Lake for State Health Exchange Analytics using Databricks

Data Lake for State Health Exchange Analytics using Databricks

2022-07-19 Watch
video

One of the largest State based health exchanges in the country was looking to modernize their data warehouse (DWH) environment to support the vision that every decision to design, implement and evaluate their state-based health exchange portal is informed by timely and rigorous evidence about its consumers’ experiences. The scope of the project was to replace existing Oracle-based DWH with an analytics platform that could support a much broader range of requirements with an ability to provide unified analytics capabilities including machine learning. The modernized analytics platform comprises a cloud native data lake and DWH solution using Databricks. The solution provides significantly higher performance and elastic scalability to better handle larger and varying data volumes with a much lower cost of ownership compared to the existing solution. In this session, we will walk through the rationale behind tool selection, solution architecture, project timeline and benefits expected.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Road to a Robust Data Lake: Utilizing Delta Lake & Databricks to Map 150 Million Miles of Roads

Road to a Robust Data Lake: Utilizing Delta Lake & Databricks to Map 150 Million Miles of Roads

2022-07-19 Watch
video

In the past, stream processing over data lakes required a lot of development efforts from data engineering teams, as Itai has shown in his talk at Spark+AI Summit 2019 (https://tinyurl.com/2s3az5td). Today, with Delta Lake and Databricks Auto Loader, this becomes a few minutes' work! Not only that, it unlocks a new set of ways to efficiently leverage your data.

Nexar, a leading provider of dynamic mapping solutions, utilizes Delta Lake and advanced features such as Auto Loader to map 150 million miles of roads a month and provide meaningful insights to cities, mobility companies, driving apps, and insurers. Nexar’s growing dataset contains trillions of images that are used to build and maintain a digital twin of the world. Nexar uses state-of-the-art technologies to detect road furniture (like road signs and traffic lights), surface markings, and road works.

In this talk, we will describe how you can efficiently ingest, process, and maintain a robust Data Lake, whether you’re a mapping solutions provider, a media measurement company, or a social media network. Topics include: * Incremental & efficient streaming over cloud storage such as S3 * Storage optimizations using Delta Lake * Supporting mutable data use-cases with Delta Lake

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Streaming ML Enrichment Framework Using Advanced Delta Table Features

Streaming ML Enrichment Framework Using Advanced Delta Table Features

2022-07-19 Watch
video

Talk about a challenge of building a scalable framework for data scientists and ML engineers, that could accommodate hundreds of generic or customer specific ML models, running both in streaming and batch, capable of processing 100+ million records per day from social media networks.

The goal has been archived using Spark and Delta. Our framework is built on clever usage of delta features such as change data feed, selective merge and spark structure streaming from and into delta tables. Saving the data in multiple delta tables, where the structure of these tables are reflecting the particular step in the whole flow. This brings great efficiency, as the downstream processing does very little transformations and thus even people without extensive experience of writing ML pipelines and jobs can use the framework easily. At the heart of the framework there is a series of Spark structure streaming jobs continuously evaluating rules and looking for what social media content should be processed by which model. These rules could be updated by the users anytime and the framework needs to automatically adjust the processing. In an environment like this, the ability to track the records throughout the whole process and the atomicity of operations is of utmost importance and delta tables are providing all of this out of the box.

In the talk we are going to focus on the ideas behind the framework and efficient combining of structured streaming and delta tables. Key takeaways would be exploring some of the lesser known delta table features and real-life experiences from building a ML framework solution based on scalable big data technologies, showing how capable and fast such a solution can be, even with minimal hardware resources.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Survey of Production ML Tech Stacks

Survey of Production ML Tech Stacks

2022-07-19 Watch
video

Production machine learning demand stitching together many tools ranging from open source standards to cloud-specific and third party solutions. This session surveys the current ML deployment technology landscape to contextualize which tools solve for which features off production ML systems such as CI/CD, REST endpoint, and monitoring. It'll help answer the questions: what tools are out there? Where do I start with the MLops tech stack for my application? What are the pros and cons of open source versus managed solutions? This talk takes a features driven approach to tool selection for MLops tacks to provide best practices in the most rapidly evolving field of data science.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Take Databricks Lakehouse to the Max with Informatica​

Take Databricks Lakehouse to the Max with Informatica​

2022-07-19 Watch
video

The hard part of ML and analytics is not building data models. It’s getting the data right and into production. Join us to learn how Informatica’s Intelligent Data Management Cloud (IDMC) helps you maximize the benefits of the Databricks’ Unified Analytics platform. Learn how our cloud-native capabilities can shorten your time to results. See how to enable more data users to easily load data and develop data engineering workflows on Databricks in ELT mode at scale. Find out how Informatica delivers all the necessary governance and compliance guardrails you need to operate analytics, AI and ML. Accelerate adoption and maximize agility while maintaining control of your data and lowering risk.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

The Semantics of Biology—Vaccine and Drug Research with Knowledge Graphs and Logical Inferencing

The Semantics of Biology—Vaccine and Drug Research with Knowledge Graphs and Logical Inferencing

2022-07-19 Watch
video

From the organization of the tree of life, to the tissues and structures of living organisms: trees and graphs are a recurring data structure in biology. Given the tree-like relationships between biological entities, Knowledge Graphs are emerging as the ideal way to store and retrieve biological data.

In our first Data + AI talk (https://www.youtube.com/watch?v=Kj5bZ2afWSU), we presented the Bellman open source library (https://github.com/gsk-aiops/bellman). Bellman was developed to translate SPARQL queries into Apache Spark Dataset operations so that scientists can submit graph queries in familiar environments like Jupyter and Databricks notebooks.

In this talk, we present the new logical inferencing capabilities we've built into the Bellman OSS library. We will demonstrate how connections between biological entities that are not explicitly connected in the data are deduced from ontologies. These inferred connections are returned to the scientist to aid in the discovery of new connections with the intent on accelerating gene to disease research. To demonstrate these capabilities, we will take a deep dive into the "subclassOf" logical entailment to retrieve all subclasses of a biological entity. The performance characteristics of inference algorithms like forward and backward chaining will also be compared.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Towards Dynamic Microstructure: The Role of Machine Learning in the Next Generation of Exchanges

Towards Dynamic Microstructure: The Role of Machine Learning in the Next Generation of Exchanges

2022-07-19 Watch
video
Michael O’Rourke (Nasdaq) , Douglas Hamilton (Nasdaq)

What role will AI and machine learning play in ensuring the efficiency and transparency of the next generation of markets?

In this session, Douglas Hamilton (AVP, Machine Intelligence Lab) and Michael O’Rourke (SVP, Engineering & AI/ML) will show attendees how Nasdaq is building dynamic microstructures that reduce the inherent frictions associated with trading, and give insights into their application across industries.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Turbocharge your AI/ML Databricks workflows with Precisely

Turbocharge your AI/ML Databricks workflows with Precisely

2022-07-19 Watch
video

Trusted analytics and predictive data models require accurate, consistent, and contextual data. The more attributes used to fuel models, the more accurate their results. However, building comprehensive models with trusted data is not easy. Accessing data from multiple disparate sources, making spatial data consumable, and enriching models with reliable third-party data is challenging.

In response to these challenges, Precisely has developed tools to facilitate a location-enabled lakehouse on the Databricks platform, helping users get more out of their data. Come see live demos and learn how to build your own location-enabled lakehouse by:

• Organizing and managing address data and assigning a unique and persistent identifier • Enriching addresses with standard and dynamic attributes from our curated data portfolio • Analyzing enriched data to uncover relationships and create dashboard visualizations • Understanding high-level solution architecture

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Turning Big Biology Data into Insights on Disease – The Power of Circulating Biomarkers

Turning Big Biology Data into Insights on Disease – The Power of Circulating Biomarkers

2022-07-19 Watch
video

Profiling small molecules in human blood across global populations gives rise to a greater understanding of the varied biological pathways and processes that contribute to human health and diseases. Herein, we describe the development of a comprehensive Human Biology Database, derived from nontargeted molecular profiling of over 300,000 human blood samples from individuals across diverse backgrounds, demographics, geographical locations, lifestyles, diseases, and medication regimens, and its applications to inform drug development.

Approximately 11,000 circulating molecules have been captured and measured per sample using Sapient’s high-throughput, high-specificity rapid liquid chromatography-mass spectrometry (rLC-MS) platform. The samples come from cohorts with adjudicated clinical outcomes from prospective studies lasting 10-25 years, as well as data on individuals’ diet, nutrition, physical exercise, and mental health. Genetic information for a subset of subjects is also included and we have added microbiome sequencing data from over 150,000 human samples in diverse diseases.

An efficient data science environment is established to enable effective health insight mining across this vast database. Built on a customized AWS and Databricks “infrastructure-as-code” Terraform configuration, we employ streamlined data ETL and machine learning-based approaches for rapid rLC-MS data extraction. In mining the database, we have been able to identify circulating molecules potentially causal to disease; illuminate the impact of human exposures like diet and environment on disease development, aging, and mortality over decades of time; and support drug development efforts through identification of biomarkers of target engagement, pharmacodynamics, safety, efficacy, and more.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Unifying Data Science and Business: AI Augmentation/Integration in Production Business Applications

Unifying Data Science and Business: AI Augmentation/Integration in Production Business Applications

2022-07-19 Watch
video

Why is it so hard to integrate Machine Learning into real business applications? In 2019 Gartner predicted that AI augmentation would solve this problem and would create will create $2.9 trillion of business value and 6.2 billion hours of worker productivity in 2021. A new realm of business science methods that encompass AI-powered analytics that allows people with domain expertise to make smarter decisions faster and with more confidence have also emerged as a solution to this problem. Dr. Harvey will demystify why integration challenges still account for $30.2 billion in annual global losses and discuss what it takes to integrate AI/ML code or algorithms into real business applications and the effort that goes into making each component, including data collection, preparation, training, and serving production-ready, enabling organizations to use the results of integrated models repeatedly with minimal user intervention. Finally, Dr. Harvey will discuss AISquared’s integration with Databricks and MLFlow to accelerate the integration of AI by unifying data science with business. By adding five lines of code to your model, users can now leverage AISquared’s model integration API framework which provides a quick and easy way to integrate models directly into live business applications.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Unity Catalog: Journey to Unified Governance for Your Data and AI Assets on Lakehouse

Unity Catalog: Journey to Unified Governance for Your Data and AI Assets on Lakehouse

2022-07-19 Watch
video

Modern data assets take many forms: not just files or tables, but dashboards, ML models, and unstructured data like video and images, all of which cannot be governed and managed by legacy data governance solutions. Join this session to learn how data teams can use Unity Catalog to centrally manage all data and AI assets with a common governance model based on familiar ANSI SQL, ensuring much better native performance and security. Built-in automated data lineage provides end-to-end visibility into how data flows from source to consumption, so that organizations can identify and diagnose the impact of data changes. Unity Catalog delivers the flexibility to leverage existing data catalogs and solutions and establish a future-proof, centralized governance without expensive migration costs. It also creates detailed audit reports for data compliance and security, while ensuring data teams can quickly discover and reference data for BI, analytics, and ML workloads, accelerating time to value.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Why a Data Lakehouse is Critical During the Manufacturing Apocalypse

Why a Data Lakehouse is Critical During the Manufacturing Apocalypse

2022-07-19 Watch
video

COVID has changed the way that we work and the way that we must do business. Supply Chain disruptions have impacted manufacturers’ ability to manufacture and distribute products. Logistics and the lack of labor have forced us to staff differently. The existential threat is real and we must change the way that we analyze data and solve problems real time in order to stay relevant.

In this session, you’ll learn about our journey, why the Data Lake and digital tech is essential to survival in this new world, some practical examples of how machine learning and data pipelines enable faster decision making, and why businesses cannot survive without these capabilities.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Deep Dive into the New Features of Apache Spark 3.2 and 3.3

Deep Dive into the New Features of Apache Spark 3.2 and 3.3

2022-07-19 Watch
video

Apache Spark has become the most widely-used engine for executing data engineering, data science and machine learning on single-node machines or clusters. The number of monthly maven downloads of Spark has rapidly increased to 20 million.

We will talk about the higher-level features and improvements in Spark 3.2 and 3.3. The talk also dives deeper into the following features + Introducing pandas API on Apache Spark to unify small data API and big data API. + Completing the ANSI SQL compatibility mode to simplify migration of SQL workloads. + Productionizing adaptive query execution to speed up Spark SQL at runtime. + Introducing RocksDB state store to make state processing more scalable

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Defending Against Adversarial Model Attacks

Defending Against Adversarial Model Attacks

2022-07-19 Watch
video

The application of AI algorithms in domains such as self-driving cars, facial recognition, and hiring holds great promise. At the same time, it raises legitimate concerns about AI algorithms robustness against adversarial attacks. Widespread adoption of AI algorithms where the predictions are hidden or obscured from the trained eye of the subject expert, opportunities for a malicious actor to take advantage of the AI algorithms grow considerably, necessitating the addition of adversarial robustness training and checking. To protect against and mitigate the damages caused by these malicious actors, this talk will examine how to build a pipeline that’s robust against adversarial attacks by leveraging Kubeflow Pipelines and integration with LFAI Adversarial Robustness Toolbox (ART). Additionally we will show how to test a machine learning model's adversarial robustness in production on Kubeflow Serving, by virtue of Payload logging (KNative eventing) and ART. This presentation focuses on adversarial robustness instead of fairness and bias.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Deliver Faster Decision Intelligence From Your Lakehouse

Deliver Faster Decision Intelligence From Your Lakehouse

2022-07-19 Watch
video

Accelerate the path from data to decisions with the the Tellius AI-driven Decision Intelligence platform powered by Databricks Delta Lake. Empower business users and data teams to analyze data residing in the Delta Lake to understand what is happening in their business, uncover the reasons why metrics change, and get recommendations on how to impact outcomes. Learn how organizations derive value from Delta Lakehouse with a modern analytics experience that unifies guided insights, natural language search, and automated machine learning to speed up data-driven decision making at cloud scale.

In this session, we will showcase how customers: - Discover changes in KPIs and investigate the reasons why metrics change with AI-powered automated analysis - Empower business users and data analysts to iteratively explore data to identify trend drivers, uncover new customer segments, and surface hidden patterns in data - Simplify and speed-up analysis from massive datasets on Databrick Delta lake

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Designing Better MLOps Systems

Designing Better MLOps Systems

2022-07-19 Watch
video

Real-world data problems are becoming increasingly daunting to solve, as data volume grows and computing tools proliferate. Since 2018, Gartner has predicted that 85% of ML projects will fail and this trend will likely continue through 2022 as well. Nevertheless, in most cases, ML practitioners have the opportunity to avoid their projects from failing in the early phases.

In this talk, the speaker will borrow from her consultancy and hands-on implementation experience with cross-functional clients to share her takeaways in designing better ML systems. The talk will walk through common pitfalls to watch out for, relevant best practices in software engineering for ML, and technical anchors that make a robust system. This talk aims to empower the audience – beginner and experienced practitioners alike – with confidence in their ML project designs and help provide the big-picture design thinking framework for successful projects.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Destination Lakehouse: All Your Data, Analytics and AI on One Platform

Destination Lakehouse: All Your Data, Analytics and AI on One Platform

2022-07-19 Watch
video

The data lakehouse is the future for modern data teams seeking to innovate with a data architecture that simplifies data workloads, eases collaboration, and maintains the flexibility and openness to stay agile as a company scales. The Databricks Lakehouse Platform realizes this idea by unifying analytics, data engineering, machine learning, and streaming workloads across clouds on one simple, open data platform. In this session, learn how the Databricks Lakehouse Platform can meet your needs for every data and analytics workload, with examples of real-customer applications, reference architectures, and demos to showcase how you can create modern data solutions of your own.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Distributed Machine Learning at Lyft

Distributed Machine Learning at Lyft

2022-07-19 Watch
video

Data collection, preprocessing, feature engineering are the fundamental steps in any Machine Learning Pipeline. After feature engineering, being able to parallelize training on multiple low cost machines helps to reduce cost and time both. And, then being able to train models in a distributed manner speeds up Hyperparameter Tuning. How can we unify these stages of ML Pipeline in one unified distributed training platform together? And that too on Kubernetes?

Our ML platform is completely based on Kubernetes because of its scalability and rapid bootstrapping time of resources. In this talk we will demonstrate how Lyft uses Spark on Kubernetes, Fugue (our home grown unifying compute abstraction layer) to design a holistic end to end ML Pipeline system for distributed feature engineering, training & prediction experience for our customers on our ML Platform on top of Spark on K8s. We will also do a deep dive to show how we are abstracting and hiding infrastructure complexities so that our Data Scientists and Research Scientist can focus only on the business logic for their models through simple pythonic APIs and SQL. We let the users focus on ''what to do'' and the platform takes care of ''how to do''. We will share our challenges, learning and the fun we had while implementing. Using Spark on K8s have helped us achieve large scale data processing with 90% less cost and at times bringing down processing time from 2 hours to less than 20 mins.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Driving Real-Time Data Capture and Transformation in Delta Lake with Change Data Capture

Driving Real-Time Data Capture and Transformation in Delta Lake with Change Data Capture

2022-07-19 Watch
video

Change data capture (CDC) is an increasingly common technology used in real-time machine learning and AI data pipelines. When paired with Databricks Delta Lake, it provides organizations with a number of benefits including lower data processing costs and highly responsive analytics applications. This session will provide a detailed overview of Matillion’s new CDC capabilities and how the integration of these capabilities with Delta Lake on Databricks can help you manage dataset changes, making it easy to automate the capture, transformation, and enrichment of data in near real time. Attend this session and see the advantages of a Matillion’s CDC capabilities to simplify real time data capture and analytics in your Delta Lake.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Efficient and Multi-Tenant Scheduling of Big Data and AI Workloads

Efficient and Multi-Tenant Scheduling of Big Data and AI Workloads

2022-07-19 Watch
video

Many ML and big data teams in the open source community are looking to run their workloads in the cloud and they invariably face a common set of challenges such as multi-tenant cluster management, resource fairness and sharing, gang scheduling and cost-effective infrastructure operations. Kubernetes is the de-facto standard platform for running containerized applications in the cloud. However, the default resource scheduler in Kubernetes leaves more to be desired for AI scenarios when running ML/DL training workloads or large-scale data processing jobs for feature engineering.

In this talk, we will share how the community leverage and build upon Apache YuniKorn to address the unique resource scheduling needs for ML and big data teams.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Eliminating AI Risk—One Model Failure at a Time

Eliminating AI Risk—One Model Failure at a Time

2022-07-19 Watch
video

As organizations adopt AI they inherent AI risk. AI risk often manifests itself in AI models that produce erroneous predictions that go undetected and result in serious consequences for the organization and individuals affected by the decisions.

In this talk we will discuss root causes for AI models going haywire, and present a rigorous framework for eliminating risk from AI. We will show how this methodology can be used as building blocks for building an AI firewall that can prevent and model AI model failures.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Emerging Data Architectures & Approaches for Real-Time AI using Redis

Emerging Data Architectures & Approaches for Real-Time AI using Redis

2022-07-19 Watch
video

As more applications harness the power of real-time data, it’s important to architect and implement a data stack to meet the broad requirements of operational ML and be able to seamlessly integrate neural embeddings into applications.

Real-time ML requires more than just deploying ML models to production using MLOps tooling; it requires a fast and scalable operational database that easily integrates into the MLOps workflow. Milliseconds matter and can make the difference in delivering fast online predictions whether it’s personalized recommendations, detecting fraud, or figuring out the most optimal food delivery route.

Attend this session to explore how a modern data stack can be used for real-time operational ML and building AI-infused applications. The session will over the following topics:

Emerging architectural components for operational ML such as the online feature store for real-time serving.

Operational excellence in managing globally distributed ML data and feature pipelines

Foundational data types of Redis including the representation of data using vector embeddings.

Using Redis as a vector database to build vector similarity search applications.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Enabling BI in a Lakehouse Environment: How Spark and Delta Can Help With Automating a DWH Develop

Enabling BI in a Lakehouse Environment: How Spark and Delta Can Help With Automating a DWH Develop

2022-07-19 Watch
video

Traditional data warehouses typically struggle when it comes to handling large volumes of data and traffic, particularly when it comes to unstructured data. In contrast, data lakes overcome such issues and have become the central hub for storing data. We outline how we can enable BI Kimball data modelling in a Lakehouse environment.

We present how we built a Spark-based framework to modernize DWH development with data lake as central storage, assuring high data quality and scalability. The framework was implemented at over 15 enterprise data warehouses across Europe.

We present how one can tackle in Spark & with Delta Lake the data warehouse principles like surrogate, foreign and business keys, SCD type 1 and 2 etc. Additionally, we share our experiences on how such a unified data modelling framework can bridge BI with modern day use cases, such as machine learning and real time analytics. The session outlines the original challenges, the steps taken and the technical hurdles we faced.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/