talk-data.com talk-data.com

Stefania Leone

Speaker

Stefania Leone

3

talks

Director, Product Management Databricks

Stefania is a product manager at Databricks working on Unity Catalog and the Databricks Runtime. She holds a PhD in Computer Science from ETH Zurich.

Bio from: Databricks DATA + AI Summit 2023

Frequent Collaborators

Filter by Event / Source

Talks & appearances

3 activities · Newest first

Search activities →
Solving Exclusive Data Access With Role-Based Access Control

Do you have users that wear multiple hats over a day? Like working with data from various customers and hoping they don’t inadvertently aggregate data? Or are they working on sensitive datasets such as clinical trials that should not be combined, or are data sets that are subject to regulations? We have a solution! In this session, we will present a new capability that allows users wearing multiple hats to switch roles in the Databricks workspace to work exclusively on a dedicated project, data of a particular client or clinical trial. When switching to a particular role, the workspace adapts in such a way that only workspace objects and UC data of that particular role are accessible. We will also showcase the administrative experience of setting up exclusive access using groups and UC permissions.

Use Apache Spark™ from Anywhere: Remote Connectivity with Spark Connect

Over the past decade, developers, researchers, and the community at large have successfully built tens of thousands of data applications using Apache Spark™. Since then, use cases and requirements of data applications have evolved. Today, every application, from web services that run in application servers, interactive environments such as notebooks and IDEs, to phones and edge devices such as smart home devices, want to leverage the power of data. However, Spark's driver architecture is monolithic, running client applications on top of a scheduler, optimizer and analyzer. This architecture makes it hard to address these new requirements as there is no built-in capability to remotely connect to a Spark cluster from languages other than SQL.

Spark Connect introduces a decoupled client-server architecture for Apache Spark that allows remote connectivity to Spark clusters using the DataFrame API and unresolved logical plans as the protocol. The separation between client and server allows Spark and its open ecosystem to be leveraged from everywhere. It can be embedded in modern data applications, in IDEs, notebooks and programming languages. This session highlights how simple it is to connect to Spark using Spark Connect from any data applications or IDEs. We will do a deep dive into the architecture of Spark Connect and provide an outlook on how the community can participate in the extension of Spark Connect for new programming languages and frameworks bringing the power of Spark everywhere.

Talk by: Martin Grund and Stefania Leone

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Databricks Connect Powered by Spark Connect: Develop and Debug Spark From Any Developer Tool

Spark developers want to develop and debug their code using their tools of choice and development best practices while ensuring high-production fidelity on the target remote cluster. However, Spark's driver architecture is monolithic, with no built-in capability to directly connect to a remote Spark cluster from languages other than SQL. This makes it hard to enable such interactive developer experiences from a user’s local IDE of choice. Spark Connect’s decoupled client-server architecture introduces remote connectivity to Spark clusters and with that, enables interactive development experience - Spark and its open ecosystem can be leveraged from everywhere.

In this session, we show how we leverage Spark Connect to build a completely redesigned version of Databricks Connect, a first-class IDE-based developer experience that offers interactive debugging from any IDE. We show how developers can easily ensure consistency between their local and remote environments. We walk the audience through real-live examples of how to locally debug code running on Databrick. We also show how Databricks Connect integrates into the Databricks Visual Studio Code extension for an even better developer experience.

Talk by: Martin Grund and Stefania Leone

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc