talk-data.com talk-data.com

Topic

Analytics

data_analysis insights metrics

4552

tagged

Activity Trend

398 peak/qtr
2020-Q1 2026-Q1

Activities

4552 activities · Newest first

Democratize AI & ML in a Large Company: The Importance of User Enablement & Technical Training

The biggest critical factor to success in a cloud transformation is people. As such, having a change management process in place to manage the impact of the transformation and user enablement is foundational to any large program. In this session, we will dive into how TD bank democratizes data, mobilizes a community of over 2000 analytics users and the tactics we used to successfully enable new use cases on Cloud. The session will focus on the following:

To democratize data: - Centralize a data platform that is accessible to all employees and allow for easy data sharing - Implement privacy and security to protect data and use data ethically - Compliance and governance for using data in responsible and compliant way - Simplification of processes and procedures to reduce redundancy and faster adoption

To mobilize end users: - Increase data literacy: provide training and resources for employees to increase their abilities and skills - Foster a culture of collaboration and openness: cross-functional teams to collaborate and share ideas - Encourage exploration of innovative ideas that impact the organization's values and customers technical enablement and adoption tactics we've used at TD Bank:

  1. Hands-on training for over 1300+ analytics users with emphasis on learn by doing, to relate to real-life situations
  2. Online tutorials and documentations to be used as self-paced study
  3. Workshops and office hours on specific topics to empower business users
  4. Coaching to work with teams on a specific use case/complex issue and provide recommendations for a faster, cost effective solutions
  5. Offer certification and encourage continuous education for employees to keep up to date with latest
  6. Feedback loop: get user feedback on training and user experience to improve future trainings

Talk by: Ellie Hajarian

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Improving Hospital Operations with Streaming Data and Real Time AI/ML

Over the past two years, Providence has developed a robust streaming data platform (SDP) leveraging Databricks in Azure. The SDP enables us to ingest and process real-time data reflecting clinical operations across our 52 hospitals and roughly 1000 ambulatory clinics. The HL7 messages generated by Epic are parsed using Databricks in our secure cloud environment and used to generate an up-to-the minute picture of exactly what is happening at the point of care.

We are already leveraging this information to minimize hospital overcrowding and have been actively integrating AI/ML to accurately forecast future conditions (e.g., arrivals, length of stay, acuity, and discharge requirements.) This allows us to both improve resource utilization (e.g., nurse staffing levels) and to optimize patient throughput. The result is both improved patient care and operational efficiency.

In this session, we will share how these outcomes are only possible with the power and elegance afforded by our investments in Azure, Databricks, and increasingly Lakehouse. We will demonstrate Providence's blueprint for enabling real-time analytics which can be generalized to other healthcare providers.

Talk by: Lindsay Mico and Deylo Woo

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Jet Streaming Data & Predictive Analytics: How Collins Aerospace to Keep Aircraft Flying

Most have experienced the frustration and disappointment of a flight delay or cancelation due to aircraft issues. The Collins Aerospace business unit at Raytheon Technologies is committed to redefining aerospace by using data to deliver a more reliable, sustainable, efficient, and enjoyable aviation industry.

Ascentia is a product example of this with focus on helping airlines make smarter and more sustainable decisions by anticipating aircraft maintenance issues in advance, leading to more reliable flight schedules and fewer delays. Over the past five years a variety of products from the Databricks technology suite were employed to achieve this. Leveraging cloud infrastructure and harnessing the Databricks Lakehouse, Apache Spark™ development, and Databricks’ dynamic platform, Collins has been able to accelerate development and deployment of predictive health monitoring (PHM) analytics to generate Ascentia’s aircraft maintenance recommendations.

Making Travel More Accessible for Customers Bringing Mobility Devices

American Airlines takes great pride in caring for customers travel, and recognize the importance of supporting the dignity and independence of everyone who travels with us. As we work to improve the customer experience, we're committed to making our airline more accessible to everyone. Our work to ensure that travel that is accessible to all is well underway. We have been particularly focused on making the journey smoother for customers who rely on wheelchairs or other mobility devices. We have implemented the use of a bag tag specifically for wheelchairs and scooters that gives team members more information, like the mobility device’s weight and battery type, or whether it needs to be returned to a customer before a connecting flight.

As a data engineering and analytics team, we at American Airlines are building a passenger service request data product that will provide timely insights on expected mobility device traffic at each airport so that the front-line team members can provide seamless travel experience to the passengers.

Talk by: Teja Tangeda and Madhan Venkatesan

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Managing Data Encryption in Apache Spark™

Sensitive data sets can be encrypted directly by new Apache Spark™ versions (3.2 and higher). Setting several configuration parameters and DataFrame options will trigger the Apache Parquet modular encryption mechanism that protects select columns with column-specific keys. The upcoming Spark 3.4 version will also support uniform encryption, where all DataFrame columns are encrypted with the same key.

Spark data encryption is already leveraged by a number of companies to protect personal or business confidential data in their production environments. The main integration effort is focused on key access control and on building a Spark/Parquet plug-in code that can interact with company’s key management service (KMS).

In this session, we will briefly cover the basics of Spark/Parquet encryption usage, and dive into the details of encryption key management that will help in integrating this Spark data protection mechanism in your deployment. You will learn how to run a HelloWorld encryption sample, and how to extend it into a real world production code integrated with your organization’s KMS and access control policies. We will talk about the standard envelope encryption approach to big data protection, the performance-vs-security trade-offs between single and double envelope wrapping, internal and external key metadata storage. We will see a demo, and discuss the new features such as uniform encryption and two-tier management of encryption keys.

Talk by: Gidon Gershinsky

Here’s more to explore: Data, Analytics, and AI Governance: https://dbricks.co/44gu3YU

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored: Lightup Data | How McDonald's Leveraged Lightup Data Quality

As one of the world's largest fast-food chains, McDonald's manages massive amounts of data for customers, sales, inventory, marketing, and more. And at that scale, ensuring the accuracy, reliability, and quality of all that data comes with a new set of complex challenges. Developing manual data quality checks with legacy tools was too time consuming and resource-intensive, requiring developer support and data domain expertise. Ultimately, they struggled to scale their checks across their enterprise data pipelines.

Join our featured customer session, where you’ll hear from Matt Sandler, Senior Director of Data and Analytics at McDonald’s, about how they use the Lightup Deep Data Quality platform to deploy pushdown data quality checks in minutes, not months — without developer support. From reactive to proactive, the McDonald’s data team leverages Lightup to scale their data quality checks across petabytes of data, ensuring high-quality data and reliable analytics for their products and services. During the session, you’ll learn:

  • The key challenges of scaling Data Quality checks with legacy tools
  • Why fixing data quality (fast) was critical to launching their new loyalty program and personalized marketing initiatives
  • How quickly McDonald’s ramped up with Lightup, transforming their data quality struggles into success

After the session, you’ll understand:

  • Why McDonald’s phased out their legacy Data Quality tools
  • The benefits of using pushdown data quality checks, AI-powered anomaly detection, and incident alerts
  • Best practices for scaling data quality checks in your own organization

Talk by: Matt Sandler and Manu Bansal

Here’s more to explore: Data, Analytics, and AI Governance: https://dbricks.co/44gu3YU

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored: Sisense-Developing Data Products: Infusion & Composability Are Changing Expectations

Composable analytics is the next progression of business intelligence. We will discuss how current analytics rely on two key principles: composability and agility. Through modularizing our analytics capabilities, we can rapidly “compose” new data applications. An organization uses these building blocks to deliver customized analytics experiences at a customer level.

This session will orientate business intelligence leaders to composable data and analytics.

  • How data teams can use composable analytics to decrease application development time.
  • How an organization can leverage existing and new tools to maximize value-based, data-driven insights.
    • Requirements for effectively deploying composable analytics.
    • Utilizing no, low-code and high-code analytics capabilities.
    • Extracting full value from your customer data and metadata.
    • Leveraging analytics building blocks to create new products and revenue streams.

Talk by: Scott Castle

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

The Future is Open: Data Streaming in an Omni-Cloud Reality

This session begins with data warehouse trivia and lessons learned from production implementations of multicloud data architecture. You will learn to design future-proof low latency data systems that focus on openness and interoperability. You will also gain a gentle introduction to Cloud FinOps principles that can help your organization reduce compute spend and increase efficiency. 

Most enterprises today are multicloud. While an assortment of low-code connectors boasts the ability to make data available for analytics in real time, they post long-lasting challenges:

  • Inefficient EDW targets
  • Inability to evolve schema
  • Forbiddingly expensive data exports due to cloud and vendor lock-in

The alternative is an open data lake that unifies batch and streaming workloads. Bronze landing zones in open format eliminate the data extraction costs required by proprietary EDW. Apache Spark™ Structured Streaming provides a unified ingestion interface. Streaming triggers allow us to switch back and forth between batch and stream with one-line code changes. Streaming aggregation enables us to incrementally compute on data that arrives near each other.

Specific examples are given on how to use Autoloader to discover newly arrived data and ensure exactly once, incremental processing. How DLT can be configured effectively to further simplify streaming jobs and accelerate the development cycle. How to apply SWE best practices to Workflows and integrate with popular Git providers, either using the Databricks Project or Databricks Terraform provider. 

Talk by: Christina Taylor

Here’s more to explore: Big Book of Data Engineering: 2nd Edition: https://dbricks.co/3XpPgNV The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Learnings From the Field: Migration From Oracle DW and IBM DataStage to Databricks on AWS

Legacy data warehouses are costly to maintain, unscalable and cannot deliver on data science, ML and real-time analytics use cases. Migrating from your enterprise data warehouse to Databricks lets you scale as your business needs grow and accelerate innovation by running all your data, analytics and AI workloads on a single unified data platform.

In the first part of this session we will guide you through the well-designed process and tools that will help you from the assessment phase to the actual implementation of an EDW migration project. Also, we will address ways to convert PL/SQL proprietary code to an open standard python code and take advantage of PySpark for ETL workloads and Databricks SQL’s data analytics workload power.

The second part of this session will be based on an EDW migration project of SNCF (French national railways); one of the major enterprise customers of Databricks in France. Databricks partnered with SNCF to migrate its real estate entity from Oracle DW and IBM DataStage to Databricks on AWS. We will walk you through the customer context, urgency to migration, challenges, target architecture, nitty-gritty details of implementation, best practices, recommendations, and learnings in order to execute a successful migration project in a very accelerated time frame.

Talk by: Himanshu Arora and Amine Benhamza

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Self-Service Geospatial Analysis Leveraging Databricks, Apache Sedona, and R

Geospatial data analysis is critical to understanding the impact of agricultural operations on environmental sustainability with respect to water quality, soil health, greenhouse gasses, and more. Outside of a few specialized software products, however, support for spatial data types is often limited or missing from analytics and visualization platforms. In this session, we show how Truterra is using Databricks, Apache Sedona, and R to analyze spatial data at scale. Additionally, learn how Truterra uses spatial insights to educate and promote practices that optimize profitability, sustainability, and stewardship outcomes at the farm.

In this session, you will see how Databricks and Apache Sedona are used to process large spatial datasets including field, watershed, and hydrologic boundaries. You will see dynamic widgets, SQL and R used in tandem to generate map visuals, display them, and enable download all from a Databricks notebook.

Talk by: Nara Khou and Cort Lunke

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Data Sharing and Beyond with Delta Sharing

Stepping into this brave new digital world we are certain that data will be a central product for many organizations. The way to convey their knowledge and their assets will be through data and analytics. Delta Sharing was the world's first open protocol for secure and scalable real-time data sharing. Through our customer conversations, there is a lot of anticipation of how Delta Sharing can be extended to non-tabular assets, such as machine learning experiments and models.

In this session, we will cover how we extended the Delta Sharing protocol to other sharing workflows, enabling sharing of ML models, arbitrary files and more. The development resulted in Arcuate, a Databricks Labs project with a data sharing flavor. The session will start with the high-level approach and how it can be extended to cover other similar use cases. It will then move to our implementation and how it integrates seamlessly with Databricks-managed Delta Sharing server and notebooks. We finally conclude with lessons learned, and our visions for a future of data sharing and beyond

Talk by: Vuong Nguyen and Milos Colic

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Enabling Data Governance at Enterprise Scale Using Unity Catalog

Amgen has invested in building modern, cloud-native enterprise data and analytics platforms over the past few years with a focus on tech rationalization, data democratization, overall user experience, increase reusability, and cost-effectiveness. One of these platforms is our Enterprise Data Fabric which focuses on pulling in data across functions and providing capabilities to integrate and connect the data and govern access. For a while, we have been trying to set up robust data governance capabilities which are simple, yet easy to manage through Databricks. There were a few tools in the market that solved a few immediate needs, but none solved the problem holistically. For use cases like maintaining governance on highly restricted data domains like Finance and HR, a long-term solution native to Databricks and addressing the below limitations was deemed important:

The way these tools were set up, allowed the overriding of a few security policies

  • Tools were not UpToDate with the latest DBR runtime
  • Complexity of implementing fine-grained security
  • Policy management – AWS IAM + In tool policies

To address these challenges, and for large-scale enterprise adoption of our governance capability, we started working on UC integration with our governance processes. With an aim to realize the following tech benefits:

  • Independent of Databricks runtime
  • Easy fine-grained access control
  • Eliminated management of IAM roles
  • Dynamic access control using UC and dynamic views

Today, using UC, we have to implement fine-grained access control & governance for the restricted data of Amgen. We are in the process of devising a realistic migration & change management strategy across the enterprise.

Talk by: Lakhan Prajapati and Jaison Dominic

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Extending Lakehouse Architecture with Collaborative Identity

Lakehouse architecture has become a valuable solution for unifying data processing for AI, but faces limitations in maximizing data’s full potential. Additional data infrastructure is helpful for strengthening data consolidation and data connectivity with third-party sources, which are necessary for building full data sets for accurate audience modeling. 

In this session, LiveRamp will demonstrate to data and analytics decision-makers how to build on the Lakehouse architecture with extensions for collaborative identity graph construction, including how to simplify and improve data enrichment, data activation, and data collaboration. LiveRamp will also introduce a complete data marketplace, which enables easy, pseudonymized data enhancements that widen the attribute set for better behavioral model construction.

With these techniques and technologies, enterprises across financial services, retail, media, travel, and more can safely unlock partner insights and ultimately produce more accurate inputs for personalization engines, and more engaging offers and recommendations for customers.

Talk by: Erin Boelkens and Shawn Gilleran

Here’s more to explore: A New Approach to Data Sharing: https://dbricks.co/44eUnT1

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

JetBlue’s Real-Time AI & ML Digital Twin Journey Using Databricks

JetBlue has embarked over the past year on an AI and ML transformation. Databricks has been instrumental in this transformation due to the ability to integrate streaming pipelines, ML training using MLflow, ML API serving using ML registry and more in one cohesive platform. Using real-time streams of weather, aircraft sensors, FAA data feeds, JetBlue operations and more are used for the world's first AI and ML operating system orchestrating a digital-twin, known as BlueSky for efficient and safe operations. JetBlue has over 10 ML products (multiple models each product) in production across multiple verticals including dynamic pricing, customer recommendation engines, supply chain optimization, customer sentiment NLP and several more.

The core JetBlue data science and analytics team consists of Operations Data Science, Commercial Data Science, AI and ML engineering and Business Intelligence. To facilitate the rapid growth and faster go-to-market strategy, the team has built an internal Data Catalog + AutoML + AutoDeploy wrapper called BlueML using Databricks features to empower data scientists including advanced analysts with the ability to train and deploy ML models in less than five lines of code.

Talk by: Derrick Olson and Rob Bajra

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Rapidly Implementing Major Retailer API at the Hershey Company

Accurate, reliable, and timely data is critical for CPG companies to stay ahead in highly competitive retailer relationships, and for a company like the Hershey Company, the commercial relationship with Walmart is one of the most important. The team at Hershey found themselves with a looming deadline for their legacy analytics services and targeted a migration to the brand new Walmart Luminate API. Working in partnership with Advancing Analytics, the Hershey Company leveraged a metadata-driven Lakehouse Architecture to rapidly onboard the new Luminate API, helping the category management teams to overhaul how they measure, predict, and plan their business operations.

In this session, we will discuss the impact Luminate has had on Hershey's business covering key areas such as sales, supply chain, and retail field execution, and the technical building blocks that can be used to rapidly provision business users with the data they need, when they need it. We will discuss how key technologies enable this rapid approach, with Databricks Autoloader ingesting and shaping our data, Delta Streaming processing the data through the lakehouse and Databricks SQL providing a responsive serving layer. The session will include commentary as well as cover the technical journey.

Talk by: Simon Whiteley and Jordan Donmoyer

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Self-Service Data Analytics and Governance at Enterprise Scale with Unity Catalog

This session focuses on one of the first Unity Catalog implementations for a large-scale enterprise. In this scenario, a cloud scale analytics platform with 7500 active users based on the lakehouse approach is used. In addition, there is potential for 1500 further users who are subject to special governance rules. They are consuming more than 600 TB of data stored in Delta Lake - continuously growing at more than 1TB per day. This might grow due to local country data. Therefore, the existing data platform must be extended to enable users to combine global and local data from their countries. A new data management was required, which reflects the strict information security rules at a need to know base. Core requirements are: read only from global data, write into local and share the results.

Due to a very pronounced information security awareness and a lack of the technological possibilities it was not possible to interdisciplinary analyze and exchange data so easy or at all so far. Therefore, a lot of business potential and gains could not be identified and realized.

With the new developments in the technology used and the basis of the lakehouse approach, thanks to Unity Catalog, we were able to develop a solution that could meet high requirements for security and process. And enables globally secured interdisciplinary data exchange and analysis at scale. This solution enables the democratization of the data. This results not only in the ability to gain better insights for business management, but also to generate entirely new business cases or products that require a higher degree of data integration and encourage the culture to change. We highlight technical challenges and solutions, present best practices and point out benefits of implementing Unity catalog for enterprises.

Talk by: Artem Meshcheryakov and Pascal van Bellen

Here’s more to explore: Data, Analytics, and AI Governance: https://dbricks.co/44gu3YU

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored by: Avanade | Accelerating Adoption of Modern Analytics and Governance at Scale

To unlock all the competitive advantage Databricks offers your organization, you might need to update your strategy and methodology for the platform. With over 1,000+ Databricks projects completed globally in the last 18 months, we are going to share our insights on the best building blocks to target as you search for efficiency and competitive advantage.

These building blocks supporting this include enterprise metadata and data management services, data management foundation, and data services and products that enable business units to fully use their data and analytics at scale.

In this session, Avanade data leaders will highlight how Databricks’ modern data stack fits Azure PaaS and SaaS (such as Microsoft Fabric) ecosystem, how Unity catalog metadata supports automated data operations scenarios, and how we are helping clients measure modern analytics and governance business impact and value.

Talk by: Alan Grogan and Timur Mehmedbasic

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp Databricks named a Leader in 2022 Gartner® Magic QuadrantTM CDBMS: https://dbricks.co/3phw20d

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored by: ThoughtSpot | Drive Self-Service Adoption Through the Roof with Embedded Analytics

When it comes to building stickier apps and products to grow your business, there's no greater opportunity than embedded analytics. Data apps that deliver superior user engagement and business value do analytics differently. They take a user-first approach and know how to deliver real-time, AI-powered insights - not just to internal employees - but to an organization’s customers and partners, as well.

Learn how ThoughtSpot Everywhere is helping companies like Emerald natively integrate analytics with other tools in their modern data stack to deliver a blazing-fast and instantly available analytics experience across all the data their users love. Join this session to learn how you can leverage embedded analytics to: Drive higher app engagement Get your app to market faster And create new revenue streams

Talk by: Krishti Bikal and Vika Smilansky

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Why a Major Japanese Financial Institution Chose Databricks To Accelerate its Data AI-Driven Journey

In this session, NTT DATA presents a case study involving of one of the largest and most prominent financial institutions in Japan. The project involved migration from the largest data analysis platform to Databricks, a project that required careful navigation of very strict security requirements while accommodating the needs of evolving technical solutions so they could support a wide variety of company structures. This session is for those who want to accelerate their business by effectively utilizing AI as well as BI.

NTT DATA is one of the largest system integrators in Japan, providing data analytics infrastructure to leading companies to help them effectively drive the democratization of data and AI as many in the Japanese market are now adding AI into their BI offering.

Talk by: Yuki Saito

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Databricks and Delta Lake: Lessons Learned from Building Akamai's Web Security Analytics Product

Akamai is a leading content delivery network (CDN) and cybersecurity company operating hundreds of thousands of servers in more than 135 countries worldwide. In this session, we will share our experiences and lessons learned from building and maintaining the Web Security Analytics (WSA) product, an interactive analytics platform powered by Databricks and Delta Lake that enables customers to efficiently analyze and take informed action on a high volume of streaming security events.

The WSA platform must be able to serve hundreds of queries per minute, scanning hundreds of terabytes of data from a six petabyte data lake, with most queries returning results within ten seconds; for both aggregation queries and needle in a haystack queries. This session will cover how to use Databricks SQL warehouses and job clusters cost-effectively, and how to improve query performance using tools and techniques such as Delta Lake, Databricks Photon, and partitioning. This talk will be valuable for anyone looking to build and operate a high-performance analytics platform.

Talk by: Tomer Patel and Itai Yaffe

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc