talk-data.com talk-data.com

Event

Databricks DATA + AI Summit 2023

2026-01-11 YouTube Visit website ↗

Activities tracked

287

Filtering by: AI/ML ×

Sessions & talks

Showing 226–250 of 287 · Newest first

Search within this event →
Enabling Learning on Confidential Data

Enabling Learning on Confidential Data

2022-07-19 Watch
video
Rishabh Poddar (Opaque Systems)

Multiple organizations often wish to aggregate their confidential data and learn from it, but they cannot do so because they cannot share their data with each other. For example, banks wish to train models jointly over their aggregate transaction data to detect money launderers more efficiently because criminals hide their traces across different banks.

To address such problems, we developed MC^2 at UC Berkeley, an open-source framework for multi-party confidential computation, on top of Apache Spark. MC^2 enables organizations to share encrypted data and perform analytics and machine learning on the encrypted data without any organization or the cloud seeing the data. Our company Opaque brings the MC^2 technology in an easy-to-use form to organizations in the financial, medical, ad tech, and other sectors.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

X-FIPE: eXtended Feature Impact for Prediction Explanation

X-FIPE: eXtended Feature Impact for Prediction Explanation

2022-07-19 Watch
video

Many enterprises have built their own machine learning platforms in the cloud using Databricks, e.g. Humana FlorenceAI. In order to effectively drive the adoption of predictive models in daily business operations, data scientists and business teams need to work closely to make sure they serve the consumer needs in compliance with regulatory rules. Model interpretability is key. In this talk, we would like to share an explainable AI algorithm developed at Humana, X-FIPE, eXtended Feature Impact for Prediction Explanation.

X-FIPE is a top-driver algorithm to calculate feature importance for any machine learning predictive models, whether it is Python or PySpark, at a local level. Instead of showing the feature importance on a population level, it can find the top drivers for each observation or member. These top drivers could differ widely from one member to another member in the population. it not only helps explain the predictive model, but also offer users actionable insights.

Compared with widely used algorithms, e.g. LIME, SHAP, and FIPE, X-FIPE improves the time complexity from linear O(n) to logarithmic O(log(n)), where n is the number of used model features. Also, we discovered the connection between X-FIPE value and Shapley value -- X-FIPE a first order approximation of Shapley value. Our observation shows that the most contribution of Shapley value of a feature comes from the marginal contribution when it is first added and when it is last removed from the full features. This is why the X-FIPE keeps enough accuracy and also reduces the computation.

Hopefully this talk will provide you a path forward to include explainable AI into your machine learning workflows, you are encouraged to try out and contribute to our open source Python package xfipe soon to come.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Your fastest path to Lakehouse and beyond

Your fastest path to Lakehouse and beyond

2022-07-19 Watch
video

Azure Databricks is an easy, open, and collaborative service for data, analytics & AI use cases, enabled by Lakehouse architecture. Join this session to discover how you can get the most out of your Azure investments by combining the best of Azure Synapse Analytics, Azure Databricks and Power BI for building a complete analytics & AI solution based on Lakehouse architecture.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Azure Databricks Excitement 2022

Azure Databricks Excitement 2022

2022-07-19 Watch
video

Data + AI Summit 2022 was a great opportunity to check-in on the partnership between Microsoft Azure and Databricks!

Can ML Forecast Fashion Trends? What Should We Predict?

Can ML Forecast Fashion Trends? What Should We Predict?

2022-07-19 Watch
video

Can How to use existing data to feature ‘style’ in order to make relevant recommendation to customers 1. Is fashion style predictable? What can be forecast by machine in fashion world, what cannot? 2. What type of data we can get for machine to learn in fashion world? 3. Personal perspective of three dimensions to feature style in AI

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Data-Centric Principles for AI Engineering

Data-Centric Principles for AI Engineering

2022-07-19 Watch
video

While some AI problems can be solved with end-to-end deep learning models that go from raw inputs to outputs, practitioners (including our customers!) find that such "mega models" are, on their own, not enough to build production-ready AI applications. In practice, it’s critical that AI engineers can inspect, test, and refactor the modular components of their applications, as they would with any piece of infrastructure or software.

In this talk, we’ll introduce a data-centric approach to AI engineering that highlights the advantages of modular components, fine-grained evaluation, and rapid iteration through programmatic labeling. We'll discuss the practical trade-offs of incrementally building and testing pipelines composed of models, preprocessing steps, and business logic. Along the way, we’ll share examples of these principles in practice through real-world case studies.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Data Warehousing on the Lakehouse

Data Warehousing on the Lakehouse

2022-07-19 Watch
video

Most organizations routinely operate their business with complex cloud data architectures that silo applications, users and data. As a result, there is no single source of truth of data for analytics, and most analysis is performed with stale data. To solve these challenges, the lakehouse has emerged as the new standard for data architecture, with the promise to unify data, AI and analytic workloads in one place. In this session, we will cover why the data lakehouse is the next best data warehouse. You will hear from the experts success stories, use cases, and best practices learned from the field and discover how the data lakehouse ingests, stores and governs business-critical data at scale to build a curated data lake for data warehousing, SQL and BI workloads. You will also learn how Databricks SQL can help you lower costs and get started in seconds with instant, elastic SQL serverless compute, and how to empower every analytics engineers and analysts to quickly find and share new insights using their favorite BI and SQL tools, like Fivetran, dbt, Tableau or PowerBI.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

dbt + Machine Learning: What Makes a Great Baton Pass?

dbt + Machine Learning: What Makes a Great Baton Pass?

2022-07-19 Watch
video

dbt has done a great job of building an elegant, common interface between data engineers and data analysts: uniting on SQL. As the data industry evolves, there's plenty of pain and room to grow in building that interface between data scientists and data analysts. There isn't a good answer for when things go wrong in the machine learning arena: should the data analyst own fine-tuning the pre-processing data(think: prepping transformed data even more for machine learning models to better work with the data). Should we increase the SQL surface area to build ML models or should we leave that to non-SQL interfaces(python/scala/etc.)? Does this have to be an either/or future? Whatever the interface evolves into, it must center people, create a low bar and high ceiling, and focus on outcomes and not the mystique of features/tools behind a learning curve.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Delta Live Tables: Modern Software Engineering and Management for ETL

Delta Live Tables: Modern Software Engineering and Management for ETL

2022-07-19 Watch
video

Data engineers have the difficult task of cleansing complex, diverse data, and transforming it into a usable source to drive data analytics, data science, and machine learning. They need to know the data infrastructure platform in depth, build complex queries in various languages and stitch them together for production. Join this talk to learn how Delta Live Tables (DLT) simplifies the complexity of data transformation and ETL. DLT is the first ETL framework to use modern software engineering practices to deliver reliable and trusted data pipelines at any scale. Discover how analysts and data engineers can innovate rapidly with simple pipeline development and maintenance, how to remove operational complexity by automating administrative tasks and gaining visibility into pipeline operations, how built-in quality controls and monitoring ensure accurate BI, data science, and ML, and how simplified batch and streaming can be implemented with self-optimizing and auto-scaling data pipelines.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Doubling the Capacity of the Data Platform Without Doubling the Cost

Doubling the Capacity of the Data Platform Without Doubling the Cost

2022-07-19 Watch
video

The data and ML platform at Scribd is growing. I am responsible for understanding and managing its cost, while enabling the business to solve new and interesting problems with our data. In this talk we'll discuss each of the following concepts and how they apply at Scribd and more broadly to other Databricks customers.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Enable Production ML with Databricks Feature Store

Enable Production ML with Databricks Feature Store

2022-07-19 Watch
video

Productionalizing ML models is hard. In fact, very few ML projects make it to production, and one of the hardest problems is data! Most AI platforms are disconnected from the data platform, making it challenging to keep features constantly updated and available in real-time. Offline/online skew prevents models from being used in real-time or, worse, introduces bugs and biases in production. Building systems to enable real-time inference requires valuable production engineering resources. As a result of these challenges, most ML models do not see the light of day.

Learn how you can simplify production ML using Databricks Feature Store, the first feature store built on the data lakehouse. Data sources for features are drawn from a central data lakehouse, and the feature tables themselves are tables in the lakehouse, accessible in Spark and SQL for both machine learning and analytics use cases. Features, data pipelines, source data, and models can all be co-governed in a central platform. Feature Store is seamlessly integrated with Apache Spark™, enabling automatic lineage tracking, and with MLflow, enabling models to look up feature values at inference time automatically. See these capabilities in action and how you can use it for your ML projects.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Evolution of Data Architectures and How to Build a Lakehouse

Evolution of Data Architectures and How to Build a Lakehouse

2022-07-19 Watch
video

Data architectures are the key and part of a larger picture to building robust analytical and AI applications. One must take a holistic view of the entire data analytics realm when it comes to planning for data science initiatives.

Through this talk, learn about the evolution of the data landscape and why Lakehouses are becoming a de facto for organizations building scalable data architectures. A lakehouse architecture combines data management capability including reliability, integrity, and quality from the data warehouse and supports all data workloads including BI and AI with the low cost and open approach of data lakes.

Data Practitioners will also learn some core concepts of building an efficient Lakehouse with Delta Lake.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Welcome &  Destination Lakehouse    Ali Ghodsi   Keynote Data + AI Summit 2022

Welcome & Destination Lakehouse Ali Ghodsi Keynote Data + AI Summit 2022

2022-07-19 Watch
video
Ali Ghodsi (Databricks) , Reynold Xin (Databricks) , Matei Zaharia (Databricks)

Join the Day 1 keynote to hear from Databricks co-founders - and original creators of Apache Spark and Delta Lake - Ali Ghodsi, Matei Zaharia, and Reynold Xin on how Databricks and the open source community is taking on the biggest challenges in data. The talks will address the latest updates on the Apache Spark and Delta Lake projects, the evolution of data lakehouse architecture, and how companies like Adobe and Amgen are using lakehouse architecture to advance their data goals.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Adversarial AI—The Nature of the Threat, Impacts, and Mitigation Strategies

Adversarial AI—The Nature of the Threat, Impacts, and Mitigation Strategies

2022-07-19 Watch
video

Adversarial AI/ML is an emerging research area focused on the vulnerabilities of Artificial Intelligence (AI)/Machine Learning (ML) models to adversarial exploitation such as data poisoning, adversarial perturbations, inference and extraction attacks. This research area is of particular interest to domains where AI/ML models play an essential role in the mission-critical decision making processes. In this presentation, we will give a review of the four principal categories of Adversarial AI. We will discuss each one of these, supported by the relevant and interesting examples, and we will discuss the future implications. We will present in greater depth our research in Adversarial NLP, backed by the specific data poisoning and adversarial perturbation examples attacks on NLP classifiers. We will conclude the presentation by discussing the current mitigation approaches and methods, and offer some general recommendations for how to best address the Adversarial AI exploits.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

AI-Fueled Forecasting: The Next Generation of Financial Planning

AI-Fueled Forecasting: The Next Generation of Financial Planning

2022-07-19 Watch
video

In an age of data abundance and digital disruption CFOs are adopting next generation planning capabilities to drive strategic decision making in real time. The future of forecasting is AI driven. PrecisionViewTM, a Deloitte’s proprietary forecasting solution leverages data aggregation technologies with predictive analytics and machine-learning capabilities to allow businesses to achieve improved forecasting accuracy.

Attend this webinar to hear about: • AI-powered financial planning that helps generate high-impact insights by incorporating the organization’s internal data and a myriad of external macroeconomic factors • Examples of how companies have achieved success using scenario modelling • Databricks’ compute capabilities that allow for parallel processing which helps generate near real time forecasts at the most granular levels

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Apache Spark Community Update | Reynold Xin Streaming Lakehouse | Karthik Ramasamy

Apache Spark Community Update | Reynold Xin Streaming Lakehouse | Karthik Ramasamy

2022-07-19 Watch
video
Karthik Ramasamy (Databricks) , Reynold Xin (Databricks)

Data + AI Summit Keynote talks from Reynold Xin and Karthik Ramasamy

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Best Practices of Maintaining High-Quality Data

Best Practices of Maintaining High-Quality Data

2022-07-19 Watch
video

Data sits at the heart of machine learning algorithms and makes your model only as good as the data governance policies at the organization. The talk will cover multiple data governance frameworks. Besides, we will talk in depth about one of the key areas of the data governance policy i.e. data quality. The session will cover the significance of the data quality, the definition of goodness, what are the key benefits and impact of maintaining high quality data and processes. Not merely a theoretical aspect, the talk focusses on the practical techniques and guidelines on maintaining the data quality.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Building a Data Science as a Service Platform in Azure with Databricks

Building a Data Science as a Service Platform in Azure with Databricks

2022-07-19 Watch
video

Machine learning in the enterprise is rarely delivered by a single team. In order to enable Machine Learning across an organisation you need to target a variety of different skills, processes, technologies, and maturity's. To do this is incredibly hard and requires a composite of different techniques to deliver a single platform which empowers all users to build and deploy machine learning models.

In this session we discuss how Databricks enabled a data science as a service platform for one of the UKs largest household insurers. We look at how this platform is empowering users of all abilities to build models, deploy models and realise and return on investment earlier.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Building Recommendation Systems Using Graph Neural Networks

Building Recommendation Systems Using Graph Neural Networks

2022-07-19 Watch
video

RECKON (RECommendation systems using KnOwledge Networks) is a machine learning project centred around improving the entities intelligence.

We represent the dataset of our site interactions as a heterogeneous graph. The nodes represent various entities in the underlying data (Users, Articles, Authors, etc.). Edges between nodes represent interactions between these entities (User u has read article v, Article u was written by author v, etc.)

RECKON uses a GNN based encoder-decoder architecture to learn representations for important entities in our data by leveraging both their individual features and the interactions between them through repeated graph convolutions.

Personalized Recommendations play an important role in improving our user's experience and retaining them. We would like to take this opportunity to walk through some of the techniques that we have incorporated in RECKON and an end-end building of this product on databricks along with the demo.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Databricks Meets Power BI

Databricks Meets Power BI

2022-07-19 Watch
video

Databricks and Spark are becoming increasingly popular and are now used as a modern data platform to analyze real-time or batch data. In addition, Databricks offers a great integration for machine learning developers.

Power BI, on the other hand, is a great platform for easy graphical analysis of data, and it's a great way to bring hundreds of different data sources together, analyze them together and make them accessible on any device.

So let's just bring both worlds together and see how well Databricks works with Power BI.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Data On-Board: The Aerospace Revolution

Data On-Board: The Aerospace Revolution

2022-07-19 Watch
video

From airplanes to satellites, through mission systems, the aerospace industry generates a huge amount of data waiting to be exploited. All this information is shaping new concepts and capabilities that will forever change the industry thanks to artificial intelligence: autonomous flight, fault prediction, automatic problem detection or energy efficiency among many others. To achieve this, we face countless challenges, such as the rigorous AI certification and trustworthiness, safety, data integrity and security, which will have to be faced in this exciting Airbus journey: welcome to the aerospace revolution!

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Detecting Financial Crime Using an Azure Advanced Analytics Platform and MLOps Approach

Detecting Financial Crime Using an Azure Advanced Analytics Platform and MLOps Approach

2022-07-19 Watch
video

As gatekeepers of the financial system, banks play a crucial role in reporting possible instances of financial crime. At the same time, criminals continuously reinvent their approaches to hide their activities among dense transaction data. In this talk, we describe the challenges of detecting money laundering and outline why employing machine learning via MLOps is critically important to identify complex and ever-changing patterns.

In anti-money-laundering, machine learning answers to a dire need for vigilance and efficiency where previous-generation systems fall short. We will demonstrate how our open platform facilitates a gradual migration towards a model-driven landscape, using the example of transaction-monitoring to showcase applications of supervised and unsupervised learning, human explainability, and model monitoring. This environment enables us to drive change from the ground up in how the bank understands its clients to detect financial crime.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Disrupting the Prescription Drug Market with AI and Data

Disrupting the Prescription Drug Market with AI and Data

2022-07-19 Watch
video

The US prescription drug market has known issues; overpriced, unaffordable drugs; razor-thin margins for pharmacies; widespread inefficiencies, and endemic lack of transparency.

AI and Data are key technologies to fix these issues and benefit patients, pharmacies, and providers. AI-driven price optimization brings price transparency and removes inefficiencies, making prescription drugs more affordable. Drug recommendations and personalization empowers consumers and providers with better choices, knowledge, and control.

To support all these solutions, we have built our intelligent Pharma AI and Data Platform. Based on the DataBricks’ platform, we delivered our AI and Data platform into production in 2.5 months, deploying our innovative AI Optimized Pricing models, supporting tens of thousands of pharmacies, and connecting millions of consumers. Continuing in our journey, we are building AI prescription recommender, medication adherence improver, and healthcare personalization.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Hassle-Free Data Ingestion into the Lakehouse

Hassle-Free Data Ingestion into the Lakehouse

2022-07-19 Watch
video

Ingesting data from hundreds of different data sources is critical before organizations can execute advanced analytics, data science, and machine learning. Unfortunately, ingesting and unifying this data to create a reliable single source of truth is usually extremely time-consuming and costly. In this session, discover how Databricks simplifies data ingestion, at low latency, with SQL-only ingestion capabilities. We will discuss and demonstrate how you can easily and quickly ingest any data into the lakehouse. The session will also cover newly-released features and tools that make data ingestion even simpler on the Databricks Lakehouse Platform.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

How EPRI Uses Computer Vision to Mitigate Wildfire Risks for Electric Utilities

How EPRI Uses Computer Vision to Mitigate Wildfire Risks for Electric Utilities

2022-07-19 Watch
video

For this talk, Labelbox has invited the Electric Power and Research Institute (EPRI) to share information about how it is using computer vision, drone technology, and Labelbox’s training data platform to reduce wildfire risks innate to electricity delivery. This talk is a great starting point for any data teams tackling difficult computer vision projects. The Labelbox team will demonstrate how teams can produce their own annotated datasets like EPRI did, and import them into the Lakehouse for AI with the Labelbox Connector for Databricks.

Mechanical failures from overhead electrical infrastructure, in certain environments, are described in utility wildfire mitigation plans as potential ignition concerns. The utility industry is evaluating drones and new inspection technologies that may support more efficient and timely identification of such at risk assets. EPRI will present several of its AI initiatives and their impact on wildfire prevention and proper maintenance of power lines.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/