talk-data.com talk-data.com

Event

Databricks DATA + AI Summit 2023

2026-01-11 YouTube Visit website ↗

Activities tracked

205

Filtering by: Data Lakehouse ×

Sessions & talks

Showing 76–100 of 205 · Newest first

Search within this event →
A Fireside Chat: Building Your Startup on Databricks

A Fireside Chat: Building Your Startup on Databricks

2023-07-26 Watch
video

Are you interested in learning how leading startups build applications on Databricks and leverage the power of the lakehouse? Join us for a fireside chat with cutting edge startups as we discuss real world insights and best practices for building on the Databricks Lakehouse, as well as successes and challenges encountered along the way. This conversation will provide an opportunity to learn and ask questions to panelists spanning all sectors.

Talk by: Chris Hecht, Derek Slager, Uri May, and Edward Chiu

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksin

Fair Data or Foul Data…Lakehouse for Public Sector as a FAIR platform

Fair Data or Foul Data…Lakehouse for Public Sector as a FAIR platform

2023-07-26 Watch
video
Milos Colic (Databricks) , Pritesh Patel (Databricks)

FAIR (findable, accessible, interoperable, reusable) data and data platforms are becoming more and more important in public sector. Lakehouse platform is strongly aligned with these principles. Lakehouse provides tools required to both adhere to FAIR but also to FAIRify data that isn't FAIR compliant. In this session, we will cover parts of the lakehouse that enable end users to FAIRify data products, how to build good robust data products and which parts of Lakehouse align to which principles in FAIR.

We'll demonstrate how DLT is crucial for data transformations on nonFAIR data, how Unity Catalog unlocks discoverability (F) and governed data access (A), and how marketplace, cleanrooms and Delta Sharing unlock interoperability and data exchange (I and R). These concepts are massive enablers for highly regulated industries such as Public Sector. It undeniably important to align Lakehouse to standards that are widely adopted by standards and policy makers and regulators. These principles transcend all industries and all use cases.

Talk by: Milos Colic and Pritesh Patel

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Introducing Universal Format: Iceberg and Hudi Support in Delta Lake

Introducing Universal Format: Iceberg and Hudi Support in Delta Lake

2023-07-26 Watch
video
Ryan Johnson (Databricks) , Himanshu Raja (Databricks)

In this session, we will talk about how Delta Lake plans to integrate with Iceberg and Hudi. Customers are being forced to choose storage formats based on the tools that support them rather than choosing the most performant and functional format for their lakehouse architecture. With Universal Format (“UniForm”), Delta removes the need to make this compromise and makes Delta tables compatible with Iceberg and Hudi query engines. We will do a technical deep dive of the technology, demo it, and discuss the roadmap.

Talk by: Himanshu Raja and Ryan Johnson

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Journey Towards Uniting Metastores

Journey Towards Uniting Metastores

2023-07-26 Watch
video
Ananya Ghosh (Nationwide)

This talk will provide a brief overview about Nationwide’s journey towards implementing Unity Catalog at an enterprise level. We will cover the following topics:

Identity management structure Compute framework Naming standards and usage best practices And a little bit about how Delta Sharing will help us ingest third-party data

Unity Catalog has been a core feature towards strengthening Lakehouse architecture for multiple business units.

Speaker: Ananya Ghosh

Lakehouses: The Best Start to Your Graph Data and Analytics Journey

Lakehouses: The Best Start to Your Graph Data and Analytics Journey

2023-07-26 Watch
video

Data architects and IT executives are continually looking for the best ways to integrate graph data and analytics into their organizations to improve business outcomes. This session outlines how the Data Lakehouse provides the perfect starting point for a successful journey. We will explore how the Data Lakehouse offer the unique combination of scalability, flexibility, and speed to quickly and effectively ingest, pre-process, curate, and analyze graph data to create powerful analytics. Additionally, we will discuss the benefits of using the Data Lakehouse over traditional graph databases and how it can help improve time to insight, time to production and overall satisfaction. At the end of this presentation, attendees will: - Understand the benefits of using a Data Lakehouse for graph data and analytics - Learn how to get started with a successful Lakehouse implementation (demo) - Discover the advantages of using a Data Lakehouse over graph databases - Learn specifically where graph databases integrate and perform better together

Key Takeaways: - Data lakehouses provide the perfect starting point for a successful graph data and analytics journey - Data lakehouses offer scalability, flexibility, and speed to quickly and effectively analyze graph data - The Data lakehouse is a cost-effective alternative to traditional graph database shortening your time to insight and de-risk your project

Talk by: Douglas Moore

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Planning and Executing a Snowflake Data Warehouse Migration to Databricks

Planning and Executing a Snowflake Data Warehouse Migration to Databricks

2023-07-26 Watch
video

Organizations are going through a critical phase of data infrastructure modernization, laying the foundation for the future, and adapting to support growing data and AI needs. Organizations that embraced cloud data warehouses (CDW) such as Snowflake have ended up trying to use a data warehousing tool for ETL pipelines and data science. This created unnecessary complexity and resulted in poor performance since data warehouses are optimized for SQL-based analytics only.

Realizing the limitation and pain with cloud data warehouses, organizations are turning to a lakehouse-first architecture. Though a cloud platform to cloud platform migration should be relatively easy, the breadth of the Databricks platform provides flexibility and hence requires careful planning and execution. In this session, we present the migration methodology, technical approaches, automation tools, product/feature mapping, a technical demo and best practices using real-world case studies for migrating data, ELT pipelines and warehouses from Snowflake to Databricks.

Talk by: Satish Garla and Ramachandran Venkat

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksin

The C-Level Guide to Data Strategy Success with the Lakehouse

The C-Level Guide to Data Strategy Success with the Lakehouse

2023-07-26 Watch
video
Dael Williamson (Databricks) , Robin Sutara (Databricks)

Join us for a practical session on implementing a data strategy leveraging people, process, and technology to meet the growing demands of your business stakeholders for faster innovation at lower cost. In this session we will share real-world examples on best practices and things to avoid as you drive your strategy from the board to the business units in your organization

Talk by: Robin Sutara and Dael Williamson

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksin

What’s New With Platform Security and Compliance in the Databricks Lakehouse Platform

What’s New With Platform Security and Compliance in the Databricks Lakehouse Platform

2023-07-26 Watch
video
David Veuve (Databricks) , Samrat Ray (Databricks)

At Databricks, we know that data is one of your most valuable assets and alwasys must be protected, that’s why security is built into every layer of the Databricks Lakehouse Platform. Databricks provides comprehensive security to protect your data and workloads, such as encryption, network controls, data governance and auditing.

In this session, you will hear from Databricks product leaders on the platform security and compliance progress made over the past year, with demos on how administrators can start protecting workloads fast. You will also learn more about the roadmap that delivers on the Databricks commitment to you as the most trusted, compliant, and secure data and AI platform with the Databricks Lakehouse.

Talk by: Samrat Ray and David Veuve

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksin

Delta Kernel: Simplifying Building Connectors for Delta

Delta Kernel: Simplifying Building Connectors for Delta

2023-07-26 Watch
video
Denny Lee (Databricks) , Tathagata Das (Databricks)

Since the release of Delta 2.0, the project has been growing at a breakneck speed. In this session, we will cover all the latest capabilities that makes Delta Lake the best format for the lakehouse. Based on lessons learned from this past year, we will introduce Project Aqueduct and how we will simplify building Delta Lake APIs from Rust and Go to Trino, Flink, and PySpark.

Talk by: Tathagata Das and Denny Lee

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Lakehouse / Spark AMA

Lakehouse / Spark AMA

2023-07-26 Watch
video
Hyukjin Kwon (Databricks) , Martin Grund (Databricks) , Wenchen Fan (Databricks)

Have some great questions about Apache Spark™ and Lakehouses?  Well, come by and ask the experts your questions!

Talk by: Martin Grund, Hyukjin Kwon, and Wenchen Fan

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Simplifying Lakehouse Observability: Databricks Key Design Goals and Strategies

Simplifying Lakehouse Observability: Databricks Key Design Goals and Strategies

2023-07-26 Watch
video
Michael Milirud (Databricks)

In this session, we'll explore Databricks vision for simplifying lakehouse observability, a critical component of any successful data, analytics, and machine learning initiatives. By directly integrating observability solutions within the lakehouse, Databricks aims to provide users with the tools and insights needed to run a successful business on top of lakehouse.

Our approach is designed to leverage existing expertise and simplify the process of monitoring and optimizing data and ML workflows, enabling teams to deliver sustainable and scalable data and AI applications. Join us to learn more about our key design goals and how Databricks is streamlining lakehouse observability to support the next generation of data-driven applications

Talk by: Michael Milirud

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored by: Privacera | Applying Advanced Data Security Governance with Databricks Unity Catalog

Sponsored by: Privacera | Applying Advanced Data Security Governance with Databricks Unity Catalog

2023-07-26 Watch
video

This talk explores the application of advanced data security and access control integrated with Databricks Unity Catalog through Privacera. Learn about Databricks with Unity Catalog and Privacera capabilities and real-world use cases demonstrating data security and access control best practices and how to successfully plan for and implement enterprise data security governance at scale across your entire Databricks Lakehouse.

Talk by: Don Bosco Durai

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Future Data Access Control: Booz Allen Hamilton’s Way of Securing Databricks Lakehouse with Immuta

Future Data Access Control: Booz Allen Hamilton’s Way of Securing Databricks Lakehouse with Immuta

2023-07-26 Watch
video

In this talk, I’ll review how we utilize Attribute-Based Access Control (ABAC) to enforce policy via Immuta. I’ll discuss the differences between the ABAC and legacy Role-Based Access Control (RBAC) approaches to control access and how the RBAC approach is not sufficient to keep up with today’s growing big data market. With so much data available, there also comes substantial risk. Data can contain many sensitive data elements, including PII and PHI. Industry leaders like Databricks are pushing the boundaries of data technology, which leads to constantly evolving data use cases. And that’s a good thing. However, the RBAC approach is struggling to keep up with those advancements.

So what is RBAC? It’s an approach to data access that permits system access based on the end-user’s role. For legacy systems, it’s meant as a simple but effective approach to securing data. Are you a manager? Then you’ll get access to data meant for managers. This is great for small deployments with clearly defined roles. Here at Booz Allen, we invested in Databricks because we have an environment of over 30 thousand users and billions of rows of data.

To mitigate this problem and align with our forward-thinking company standard, we introduced Immuta into our stack. Immuta uses ABAC to allow for dynamic data access control. Users are automatically assigned certain attributes, and access is based on those attributes instead of just their role. This allows for more flexibility and allows data access control to easily scale without the need to constantly map a user to their role. Using attributes, we can write policies in one place and have them applied across all our data platforms. This makes for a truly holistic data governance approach and provides immediate ROI and time savings for the company.

Talk by: Jeffrey Hess

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Lakehouse Architecture to Advance Security Analytics at the Department of State

Lakehouse Architecture to Advance Security Analytics at the Department of State

2023-07-26 Watch
video

In 2023, the Department of State surged forward on implementing a lakehouse architecture to get faster, smarter, and more effective on cybersecurity log monitoring and incident response. In addition to getting us ahead of federal mandates, this approach promises to enable advanced analytics and machine learning across our highly federated global IT environment while minimizing costs associated with data retention and aggregation.

This talk will include a high-level overview of the technical and policy challenge and a technical deeper dive on the tactical implementation choices made. We’ll share lessons learned related to governance and securing organizational support, connecting between multiple cloud environments, and standardizing data to make it useful for analytics. And finally, we’ll discuss how the lakehouse leverages Databricks in multicloud environments to promote decentralized ownership of data while enabling strong, centralized data governance practices.

Talk by: Timothy Ahrens and Edward Moe

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Security Best Practices and Tools to Build a Secure Lakehouse

Security Best Practices and Tools to Build a Secure Lakehouse

2023-07-26 Watch
video
Arun Pamulapati (Databricks) , Anindita Mahapatra (Databricks)

To learn more, visit the Databricks Security and Trust Center: https://www.databricks.com/trust

As you embark on a lakehouse project or evolve your existing data lake, you may want to improve your security posture and take advantage of new security features—there may even be a security team at your company that demands it. Databricks has worked with thousands of customers to securely deploy the Databricks Platform to meet their architecture and security requirements. While many organizations deploy security differently, we have found a common set of guidelines and features among organizations that require a high level of security. In this session, we will detail the security features and architectural choices frequently used by these organizations and walk through a series of threat models for the risks that most concern security teams. While this session is great for people who already know Databricks—don’t worry—that knowledge isn’t required. You will walk away with a full handbook detailing all the concepts, configurations, check lists, security analysis tool (SAT), and security reference architecture (SRA) automation scripts from the session so that you can make immediate progress when you get back to the office. Security can be hard, but we’ve collected the hard work already done by some of the best in the industry, and built tools, to make it easier. Come learn how. See how good looks like via a demo.

Talk by: Arun Pamulapati and Anindita Mahapatra

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored: KPMG | Multicloud Enterprise Delta Sharing & Governance using Unity Catalog @ S&P Global

Sponsored: KPMG | Multicloud Enterprise Delta Sharing & Governance using Unity Catalog @ S&P Global

2023-07-26 Watch
video

Cloud technologies have revolutionized global data access across a number of industries. However, many enterprise organizations face challenges in adopting these technologies effectively, as comprehensive cloud data governance strategies and solutions are complex and evolving – particularly in hybrid or multicloud scenarios involving multiple third parties. KPMG and S&P Global have harnessed the power of Databricks Lakehouse to create a novel approach.

By integrating Unity Catalogue, Delta Sharing, and the KPMG Modern Data Platform, S&P Global has enabled scalable, transformative cross-enterprise data sharing and governance. This demonstration highlights a collaboration between S&P Global Sustainable1 (S1) ESG program and the KPMG ESG Analytics Accelerators to enable large-scale SFDR ESG portfolio analytics. Join us to discover our solution that drives transformative change, fosters data-driven decision-making, and bolsters sustainability efforts in a wide range of industries.

Talk by: Niels Hanson,Dennis Tally

Here’s more to explore: A New Approach to Data Sharing: https://dbricks.co/44eUnT1

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored: Matillion | Using Matillion to Boost Productivity w/ Lakehouse and your Full Data Stack

Sponsored: Matillion | Using Matillion to Boost Productivity w/ Lakehouse and your Full Data Stack

2023-07-26 Watch
video
Rick Wear , Sarah Pollitt (Matillion)

In this presentation, Matillion’s Sarah Pollitt, Group Product Manager for ETL, will discuss how you can use Matillion to load data from popular data sources such as Salesforce, SAP, and over a hundred out-of-the-box connectors into your data lakehouse. You can quickly transform this data using powerful tools like Matillion or dbt, or your own custom notebooks, to derive valuable insights. She will also explore how you can run streaming pipelines to ensure real-time data processing, and how you can extract and manage this data using popular governance tools such as Alation or Collibra, ensuring compliance and data quality. Finally, Sarah will showcase how you can seamlessly integrate this data into your analytics tools of choice, such as Thoughtspot, PowerBI, or any other analytics tool that fits your organization's needs.

Talk by: Rick Wear

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Vector Data Lakes

Vector databases such as ElasticSearch and Pinecone offer fast ingestion and querying on vector embeddings with ANNs. However, they typically do not decouple compute and storage, making them hard to integrate in production data stacks. Because data storage in these databases is expensive and not easily accessible, data teams typically maintain ETL pipelines to offload historical embedding data to blob stores. When that data needs to be queried, they get loaded back into the vector database in another ETL process. This is reminiscent of loading data from OLTP database to cloud storage, then loading said data into an OLAP warehouse for offline analytics.

Recently, “lakehouse” offerings allow direct OLAP querying on cloud storage, removing the need for the second ETL step. The same could be done for embedding data. While embedding storage in blob stores cannot satisfy the high TPS requirements in online settings, we argue it’s sufficient for offline analytics use cases like slicing and dicing data based on embedding clusters. Instead of loading the embedding data back into the vector database for offline analytics, we propose direct processing on embeddings stored in Parquet files in Delta Lake. You will see that offline embedding workloads typically touch a large portion of the stored embeddings without the need for random access.

As a result, the workload is entirely bound by network throughput instead of latency, making it quite suitable for blob storage backends. On a test one billion vector dataset, ETL into cloud storage takes around one hour on a dedicated GPU instance, while batched nearest neighbor search can be done in under one minute with four CPU instances. We believe future “lakehouses” will ship with native support for these embedding workloads.

Talk by: Tony Wang and Chang She

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp Databricks named a Leader in 2022 Gartner® Magic QuadrantTM CDBMS: https://dbricks.co/3phw20d

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Going Beyond SQL: Python UDFs in Unity Catalog for All Your Lakehouse

Going Beyond SQL: Python UDFs in Unity Catalog for All Your Lakehouse

2023-07-26 Watch
video
Jakob Mund (Databricks)

While SQL is powerful, it does have some limits. Fear not, this lightning talk introduces user-defined functions (UDFs) written in Python, managed and governed in Databricks Unity Catalog, and usable across the Lakehouse. This covers the basics from how to create and govern UDFs to more advanced topics including networking, observability and provide a glimpse of how it works under the hood. After this session, you will be equipped to take SQL and the Lakehouse to the next level using Python UDFs.

Talk by: Jakob Mund

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

How Comcast Effectv Drives Data Observability with Databricks and Monte Carlo

How Comcast Effectv Drives Data Observability with Databricks and Monte Carlo

2023-07-26 Watch
video
Scott Lerner (Comcast Effectv) , Robinson Creighton (Comcast Effectv)

Comcast Effectv, the 2,000-employee advertising wing of Comcast, America’s largest telecommunications company, provides custom video ad solutions powered by aggregated viewership data. As a global technology and media company connecting millions of customers to personalized experiences and processing billions of transactions, Comcast Effectv was challenged with handling massive loads of data, monitoring hundreds of data pipelines, and managing timely coordination across data teams.

In this session, we will discuss Comcast Effectv’s journey to building a more scalable, reliable lakehouse and driving data observability at scale with Monte Carlo. This has enabled Effectv to have a single pane of glass view of their entire data environment to ensure consumer data trust across their entire AWS, Databricks, and Looker environment.

Talk by: Scott Lerner and Robinson Creighton

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

How to Create and Manage a High-Performance Analytics Team

How to Create and Manage a High-Performance Analytics Team

2023-07-26 Watch
video

Data science and analytics teams are unique. Large and small corporations want to build and manage analytics teams to convert their data and analytic assets into revenue and competitive advantage, but many are failing before they make their first hire. In this session, the audience will learn how to structure, hire, manage and grow an analytics team. Organizational structure, project and program portfolios, neurodiversity, developing talent, and more will be discussed.

Questions and discussion will be encouraged and engaged in. The audience will leave with a deeper understanding of how to succeed in turning data and analytics into tangible results.

Talk by: John Thompson

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored: Accenture | Databricks Enables Employee Data Domain to Align People w/ Business Outcomes

Sponsored: Accenture | Databricks Enables Employee Data Domain to Align People w/ Business Outcomes

2023-07-26 Watch
video

A global franchise retailer was struggling to understand the value of its employees and had not fostered a data-driven enterprise. During the journey to use facts as the basis for decision making, Databricks became the facilitator of DataMesh and created the pipelines, analytics and source engine for a three-layer — bronze, silver, gold — lakehouse that supports the HR domain and drives the integration of multiple additional domains: sales, customer satisfaction, product quality and more. In this talk, we will walk through:

  • The business rationale and drivers
  • The core data sources
  • The data products, analytics and pipelines
  • The adoption of Unity Catalog for data privacy compliance /adherence and data management
  • Data quality metrics

Join us to see the analytic product and the design behind this innovative view of employees and their business outcomes.

Talk by: Rebecca Bucnis

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp Databricks named a Leader in 2022 Gartner® Magic QuadrantTM CDBMS: https://dbricks.co/3phw20d

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Unlocking the Power of Databricks SDKs: The Power to Integrate, Streamline, and Automate

Unlocking the Power of Databricks SDKs: The Power to Integrate, Streamline, and Automate

2023-07-26 Watch
video
Serge Smertin (Databricks)

In today's data-driven landscape, the demands placed upon data engineers are diverse and multifaceted. With the integration of Java, Python, or Go microservices, Databricks SDKs provide a powerful bridge between the established ecosystems and Databricks. They allow data engineers to unlock new levels of integration and collaboration, as well as integrate Unity Catalog into processes to create advanced workflows straight from notebooks.

In this session, learn best practices for when and how to use SDK, command-line interface, or Terraform integration to seamlessly integrate with Databricks and revolutionize how you integrate with the Databricks Lakehouse. The session covers using shell scripts to automate complex tasks and streamline operations that improve scalability.

Talk by: Serge Smertin

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Deep Dive Into Grammarly's Data Platform

Deep Dive Into Grammarly's Data Platform

2023-07-25 Watch
video

Grammarly helps 30 million people and 50,000 teams to communicate more effectively. Using the Databricks Lakehouse Platform, we can rapidly ingest, transform, aggregate, and query complex data sets from an ecosystem of sources, all governed by Unity Catalog. This session will overview Grammarly’s data platform and the decisions that shaped the implementation. We will dive deep into some architectural challenges the Grammarly Data Platform team overcame as we developed a self-service framework for incremental event processing.

Our investment in the lakehouse and Unity Catalog has dramatically improved the speed of our data value chain: making 5 billion events (ingested, aggregated, de-identified, and governed) available to stakeholders (data scientists, business analysts, sales, marketing) and downstream services (feature store, reporting/dashboards, customer support, operations) available within 15. As a result, we have improved our query cost performance (110% faster at 10% the cost) compared to our legacy system on AWS EMR.

I will share architecture diagrams, their implications at scale, code samples, and problems solved and to be solved in a technology-focused discussion about Grammarly’s iterative lakehouse data platform.

Talk by: Faraz Yasrobi and Christopher Locklin

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Unity Catalog, Delta Sharing and Data Mesh on Databricks Lakehouse

Unity Catalog, Delta Sharing and Data Mesh on Databricks Lakehouse

2023-07-25 Watch
video

In this technical deep dive, we will detail how customers implemented data mesh on Databricks and how standardizing on delta format enabled delta-to-delta share to non-Databricks consumers.

  • Current state of the IT landscape
  • Data silos (problems with organizations not having connected data in the ecosystem)
  • A look back on why we moved away from data warehouses and choose cloud in the first place
  • What caused the data chaos in the cloud (instrumentation and too much stitching together) ~ periodic table list of services of the cloud
  • How to strike the balance between autonomy and centralization
  • Why Databricks Unity Catalog puts you in the right path to implementing data mesh strategy
  • What are the process and features that enable and end-to-end Implementation of a data strategy
  • How customers were able to successfully implement the data mesh on out of the box Unity Catalog and delta sharing without overwhelming their IT tool stack
  • Use cases
  • Delta-to-delta data sharing
  • Delta-to-others data sharing
  • How do you navigate when data today is available across regions, across clouds, on-prem and external systems
  • Change data feed to share only “data that has changed”
  • Data stewardship
  • Why ABAC is important
  • How file based access policies and governance play an important role
  • Future state and its pitfalls
  • Egress costs
  • Data compliances

Talk by: Surya Turaga and Thomas Roach

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc