talk-data.com talk-data.com

Topic

Data Lakehouse

data_architecture data_warehouse data_lake

489

tagged

Activity Trend

118 peak/qtr
2020-Q1 2026-Q1

Activities

489 activities · Newest first

Orchestration Made Easy with Databricks Workflows

Orchestrating and managing end-to-end production pipelines have remained a bottleneck for many organizations. Data teams spend too much time stitching pipeline tasks and manually managing and monitoring the orchestration process – with heavy reliance on external or cloud-specific orchestration solutions, all of which slow down the delivery of new data. In this session, we introduce you to Databricks Workflows: a fully managed orchestration service for all your data, analytics, and AI, built in the Databricks Lakehouse Platform. Join us as we dive deep into the new workflow capabilities, and understand the integration with the underlying platform. You will learn how to create and run reliable production workflows, centrally manage and monitor workflows, and learn how to implement recovery actions such as repair and run, as well as other new features.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Auto Encoder Decoder-Based Anomaly Detection with the Lakehouse Paradigm

Auto-Encoder-Decoder is a type of deep learning neural network architecture with an hourglass shape, high dimensional inputs are compressed to latent space through the encoder. The decoder mirrors the encoder architecture and reconstructs the input data from the latent space. Auto-Encoder-Decoder models are commonly used for anomaly detection, after training, the reconstructed error of normal data is minimized thus anomaly can be detected if its reconstructed error gets higher than the “normal threshold”. This presentation will demonstrate an Auto-Encoder-Decoder anomaly detection solution built with the Lakehouse Paradigm, from data management to after-deployment monitoring, to explain the entire model life cycle. It will also highlight the flexibility and scalability that MLflow custom model and Pandas UDF can bring when a large number of individual models need to be trained, deployed, and monitored in parallel.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Batches, Streams, and Everything in between: Unifying Batch and Stream Storage with Apache Pulsar

Delta Lake and Lakehouse architectures have been instrumental technologies in providing a better foundation for dealing with streaming and data deltas via an open-industry standard. The rapid growth of the ecosystem is a testament to the success of this approach. However, challenges still remain in building a data platform that allows teams to process all data via streams, regardless of the age of data, while also being able to view all streams as tables without exporting data out of the streaming system. In this talk, we will take a hands-on look at how Apache Pulsar is building it’s core storage engine on the concepts of Lakehouse architectures, allowing teams to build data platforms that can manage data over its entire lifecycle and enabling data to be consumed as either a stream or a table. With these capabilities, we will show how Pulsar + Delta Lake empowers teams, regardless of toolset, to better focus on driving value from data, not just managing it.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Build an Enterprise Lakehouse for Free with Trino and Delta Lake

Delta Lake has quickly grown in usage across data lakes everywhere due to the growing use cases that require DML capabilities that Delta Lake brings. Outside of support for ACID transactions, users want the ability to interactively query the data in their data lake. This is where a query engine like Trino (formerly PrestoSQL) comes in. Starburst provides an enterprise version of the popular Trino MPP SQL query engine and has recently open sourced their Delta Lake connector.

In this talk, Tom and Claudius will talk about the connector, its features, and how their users are taking advantage of expanding the functionality of their data lakes with improved performance and the ability to handle colliding modifications. Get started with this feature-rich and open stack without the need of a multi-million dollar budget.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Building a Lakehouse on AWS for Less with AWS Graviton and Photon

AWS Graviton processors are custom-designed by AWS to enable the best price performance for workloads in Amazon EC2. In this session we will review benchmarks that demonstrate how AWS Graviton based instances run Databricks workloads at a lower price and better performance than x86-based instances on AWS, and when combined with Photon, the new Databricks engine, the price performance gains are even greater. Learn how you can optimize your Databricks workloads on AWS and save more.

Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Building an Analytics Lakehouse at Grab

Grab shares the story of their Lakehouse journey, from the drivers behind their shift to this new paradigm, to lessons learned along the way. From a starting point of a siloed, data warehouse centric architecture that had inherent challenges with scalability, performance and data duplication, Grab has standardized upon Databricks to serve as an open and unified Lakehouse platform to deliver insights at scale, democratizing data through the rapid deployment of AI and BI use cases across their operations.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Predicting and Preventing Machine Downtime with AI and Expert Alerts

John Deere’s Expert alerts is a proactive monitoring system that notifies dealers of potential machine issues. This allows technicians to diagnose issues remotely and fix them before they become a problem thereby avoiding multiple trips by a repair technician and minimizing downtime. John Deere ingests petabytes of data every year from its Connected Machines across the globe. To improve the availability, uptime and performance of the John Deere machines globally, our data scientists perform machine data analysis on our data lake in an efficient and scalable manner. The result is dramatically improved mean time to repair, decreased downtime with predictive alerts, improved cost efficiency, improved customer satisfaction and great yields and results for John Deere’s customers.

You will learn • What are Experts Alerts at John Deere and what challenges they seek to solve • How John Deere migrated from a legacy application for alerting to flexible and scalable Lakehouse framework • Getting stakeholder buy in and converting business logic to AI • Overcoming the scale problem: processing petabytes of data within SLAs • What is next for Alert

Other Resources • Two Minute Overview of Expert Alerts: https://www.youtube.com/watch?v=yFnMhMhipXA • Expert Alerts: Dealer Execution - John Deere: https://www.youtube.com/watch?v=2FGz0lx4UiM • Ben Burgess FarmSight services - Expert Alerts from John Deere: https://www.youtube.com/watch?v=BrQhX4oCsSw • U.S. Farm Report Driving Technology: John Deere Expert Alerts: https://www.youtube.com/watch?v=h8IGtk61EDo

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Radical Speed on the Lakehouse: Photon Under the Hood

Many organizations are standardizing on the lakehouse, however, this new architecture poses challenges with an underlying query execution engine for accessing structured and unstructured data. The execution engine needs to provide the performance of a data warehouse and the scalability of data lakes. To ensure optimum performance, the Databricks Lakehouse Platform offers Photon. This next-gen vectorized query execution engine outperforms existing data warehouses in SQL workloads and implements a more general execution framework for efficient processing of data with support of the Apache Spark™ API. With Photon, analytical queries are seeing a 3 to 5x speed increase, with a 40% reduction in compute hours for ETL workloads. In this session, we will dive into Photon, describe its integration with the Databricks Platform and Apache Spark™ runtimes, talk through customer use cases, and show how your SQL and DataFrame workloads can benefit from the performance of Photon.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Realize the Promise of Streaming with the Databricks Lakehouse Platform

Streaming is the future of all data pipelines and applications. It enables businesses to make data-driven decisions sooner and react faster, develop data-driven applications considered previously impossible, and deliver new and differentiated experiences to customers. However, many organizations have not realized the promise of streaming to its full potential because it requires them to completely redevelop their data pipelines and applications on new, complex, proprietary, and disjointed technology stacks.

The Databricks Lakehouse Platform is a simple, unified, and open platform that supports all streaming workloads ranging from ingestion, ETL to event processing, event-driven application, and ML inference. In this session, we will discuss the streaming capabilities of the Lakehouse Platform and demonstrate how easy it is to build end-to-end, scalable streaming pipelines and applications, to fulfill the promise of streaming for your business. You will also hear from Erica Lee, VP of ML at Upwork, the world's largest Work Marketplace, share how the Upwork team uses Databricks to enable real-time predictions by computing ML features in a continuous streaming manner.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Scaling Your Workloads with Databricks Serverless

Databricks SQL provides a first-class user experience for BI and SQL directly on the lakehouse platform. But you still need to administer and maintain clusters of virtual machines. What if you could focus on your Databricks SQL queries and never need to worry about the underlying compute infrastructure? Learn how Databricks Serverless, built into the Databricks Lakehouse Platform, eliminates cluster management, provides instant compute, and lowers total cost of ownership for Databricks SQL. In this session, you will see demos, hear from customers, learn how Databricks Serverless works under the hood, be equipped with everything you need to get started – and ultimately get the best out of Databricks Serverless.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Security Best Practices for Lakehouse

To learn more, visit the Databricks Security and Trust Center: https://www.databricks.com/trust

As you embark on a lakehouse project or evolve your existing data lake, you may want to improve your security posture and take advantage of new security features—there may even be a security team at your company that demands it! Databricks has worked with thousands of customers to securely deploy the Databricks Platform to meet their architecture and security requirements. While many organizations deploy security differently, we have found a common set of guidelines and features among organizations who require a high level of security. In this talk, we will detail the security features and architectural choices frequently used by these organizations and walk through a series of threat models for the risks that most concern security teams. While this session is great for people who already know Databricks, don’t worry, that knowledge isn’t required.

You will walk away with a full handbook detailing all of the concepts, configurations, and code from the session so that you can make immediate progress when you get back to the office. Security can be hard, but we’ve collected the hard work already done by some of the best in the industry, to make it easier. Come learn how.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Serverless Kafka and Apache Spark in a Multi-Cloud Data Lakehouse Architecture

Apache Kafka in conjunction with Apache Spark became the de facto standard for processing and analyzing data. Both frameworks are open, flexible, and scalable. Unfortunately, the latter makes operations a challenge for many teams. Ideally, teams can use serverless SaaS offerings to focus on business logic. However, hybrid and multi-cloud scenarios require a cloud-native platform that provides automated and elastic tooling to reduce the operations burden.

This post explores different architecture to build serverless Kafka and Spark multi-cloud architectures across regions and continents. We start from the analytics perspective of a data lake and explore its relation to a fully integrated data streaming layer with Kafka to build a modern data lakehouse. Real-world use cases show the joint value and explore the benefit of the "delta lake" integration.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Simon Whiteley + Denny Lee Live Ask Me Anything

Simon and Denny Build A Thing is a live webshow, where Simon Whiteley (Advancing Analytics) and Denny Lee (Databricks) are building out a TV Ratings Analytics tool, working through the various challenges of building out a Data Lakehouse using Databricks. In this session, they'll be talking through their Lakehouse Platform, revisiting various pieces of functionality, and answering your questions, Live!

This is your chance to ask questions around structuring a lake for enterprise data analytics, the various ways we can use Delta Live Tables to simplify ETL or how to get started serving out data using Databricks SQL. We have a whole load of things to talk through, but we want to hear YOUR questions, which we can field from industry experience, community engagement and internal Databricks direction. There's also a chance we'll get distracted and talk about the Expanse for far too long.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Cloud Native Geospatial Analytics at JLL

Luis Sanz, CEO of CARTO and Yanqing Zeng, Lead Data Scientist at JLL, take us through how cloud native geospatial analytics can be unlocked on the Databricks Lakehouse platform with CARTO. Yanqing will showcase her work on large scale spatial analytics projects to address some of the most critical analysis use cases in Real Estate. Taking a geospatial perspective, Yanqing will share practical examples of how large-scale spatial data and analytics can be used for property portfolio mapping, AI-driven risk assessment, real estate valuation and more.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Data Lakehouse and Data Mesh—Two Sides of the Same Coin

Over the last few years two new approaches to data management have been developed in the data community: Data Mesh and Data Lakehouse. The latter is an open architecture that pushes the technological advancements of a Data Lake by adding data management capabilities proven by a long history of Data Warehousing practices. Data Mesh on the other hand is addressing data management challenges from an organizational angle, by advocating decentralized ownership of domain data while applying product thinking and domain-driven design to analytics data. At first one might think that those two architectural approaches are competing with each other, however in this talk you will learn that the two are rather orthogonal and can go very well together.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Simplifying Migrations to Lakehouse—the Databricks Way

Customers around the world are experiencing tremendous success migrating from legacy on-premises Hadoop architectures to a modern Databricks Lakehouse in the cloud. At Databricks, we have formulated a migration methodology that helps customers sail through this migration journey with ease. In this talk, we will touch upon some of the key elements that minimize risks and simplify the process of migrating to Databricks, and will walk through some of the customer journeys and use cases.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Smart Manufacturing: Real-time Process Optimization with Databricks

Learn more about how a Fortune 500 aluminium rolled stock manufacturer is leveraging Tredence-Databricks Lakehouse-based AIoT Industrial internet of things (IIoT) Solutions to improve productivity by +20%.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Take Databricks Lakehouse to the Max with Informatica​

The hard part of ML and analytics is not building data models. It’s getting the data right and into production. Join us to learn how Informatica’s Intelligent Data Management Cloud (IDMC) helps you maximize the benefits of the Databricks’ Unified Analytics platform. Learn how our cloud-native capabilities can shorten your time to results. See how to enable more data users to easily load data and develop data engineering workflows on Databricks in ELT mode at scale. Find out how Informatica delivers all the necessary governance and compliance guardrails you need to operate analytics, AI and ML. Accelerate adoption and maximize agility while maintaining control of your data and lowering risk.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Turbocharge your AI/ML Databricks workflows with Precisely

Trusted analytics and predictive data models require accurate, consistent, and contextual data. The more attributes used to fuel models, the more accurate their results. However, building comprehensive models with trusted data is not easy. Accessing data from multiple disparate sources, making spatial data consumable, and enriching models with reliable third-party data is challenging.

In response to these challenges, Precisely has developed tools to facilitate a location-enabled lakehouse on the Databricks platform, helping users get more out of their data. Come see live demos and learn how to build your own location-enabled lakehouse by:

• Organizing and managing address data and assigning a unique and persistent identifier • Enriching addresses with standard and dynamic attributes from our curated data portfolio • Analyzing enriched data to uncover relationships and create dashboard visualizations • Understanding high-level solution architecture

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Unity Catalog: Journey to Unified Governance for Your Data and AI Assets on Lakehouse

Modern data assets take many forms: not just files or tables, but dashboards, ML models, and unstructured data like video and images, all of which cannot be governed and managed by legacy data governance solutions. Join this session to learn how data teams can use Unity Catalog to centrally manage all data and AI assets with a common governance model based on familiar ANSI SQL, ensuring much better native performance and security. Built-in automated data lineage provides end-to-end visibility into how data flows from source to consumption, so that organizations can identify and diagnose the impact of data changes. Unity Catalog delivers the flexibility to leverage existing data catalogs and solutions and establish a future-proof, centralized governance without expensive migration costs. It also creates detailed audit reports for data compliance and security, while ensuring data teams can quickly discover and reference data for BI, analytics, and ML workloads, accelerating time to value.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/