talk-data.com talk-data.com

Topic

Analytics

data_analysis insights metrics

170

tagged

Activity Trend

398 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: Databricks DATA + AI Summit 2023 ×
Partner Connect & Ecosystem Strategy

Data + AI Summit Keynotes from: Partner Connect & Ecosystem Strategy (Zaheera Valani) What are ELT and CDC, and why are all the cool kids doing it? (George Fraser) Analytics without Compromise (Francois Ajenstat)

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Presto 101: An Introduction to Open Source Presto

Presto is a widely adopted distributed SQL engine for data lake analytics. With Presto, you can perform ad hoc querying of data in place, which helps solve challenges around time to discover and the amount of time it takes to do ad hoc analysis. Additionally, new features like the disaggregated coordinator, Presto-on-Spark, scan optimizations, a reusable native engine, and a Pinot connector enable added benefits around performance, scale, and ecosystem.

In this session, Philip and Rohan will introduce the Presto technology and share why it’s becoming so popular – in fact, companies like Facebook, Uber, Twitter, Alibaba, and much more use Presto for interactive ad hoc queries, reporting & dashboarding data lake analytics, and much more. We’ll also show a quick demo on getting Presto running in AWS.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Privacy Preserving Machine Learning and Big Data Analytics Using Apache Spark

In recent years, latest privacy laws & regulations bring a fundamental shift in the protection of data and privacy, placing new challenges to data applications. To resolve these privacy & security challenges in big data ecosystem without impacting existing applications, several hardware TEE (Trusted Execution Environment) solutions have been proposed for Apache Spark, e.g., PySpark with Scone and Opaque etc. However, to the best of our knowledge, none of them provide full protection to data pipelines in Spark applications. An adversary may still get sensitive information from unprotected components and stages. Furthermore, some of them greatly narrowed supported applications, e.g., only support SparkSQL. In this presentation, we will present a new PPMLA (privacy preserving machine learning and analytics) solution built on top of Apache Spark, BigDL, Occlum and Intel SGX. It ensures all spark components and pipelines are fully protected by Intel SGX, and existing Spark applications written in Scala, Java or Python can be migrated into our platform without any code change. We will demonstrate how to build distributed end-to-end SparkML/SparkSQL workloads with our solution on untrusted cloud environment and share real-world use cases for PPMLA.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Revolutionizing agriculture with AI: Delivering smart industrial solutions built upon a Lakehouse

John Deere is leveraging big data and AI to deliver ‘smart’ industrial solutions that are revolutionizing agriculture and construction, driving sustainability and ultimately helping to feed the world. The John Deere Data Factory that is built upon the Databricks Lakehouse Platform is at the core of this innovation. It ingests petabytes of data and trillions of records to give data teams fast, reliable access to standardized data sets supporting 100s of ML and analytics use cases across the organization. From IoT sensor-enabled equipment driving proactive alerts that prevent failures, to precision agriculture that maximizes field output, to optimizing operations in the supply chain, finance and marketing, John Deere is providing advanced products, technology and services for customers who cultivate, harvest, transform, enrich, and build upon the land.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

ROAPI: Serve Not So Big Data Pipeline Outputs Online with Modern APIs

Data is the key component of Analytics, AI or ML platform. Organizations may not be successful without having a Platform that can Source, Transform, Quality check and present data in a reportable format that can drive actionable insights.

This session will focus on how Capital One HR Team built a Low Cost Data movement Ecosystem that can source data, transform at scale and build the data storage (Redshift) at a level that can be easily consumed by AI/ML programs - by using AWS Services with combination of Open source software(Spark) and Enterprise Edition Hydrograph (UI Based ETL tool with Spark as backend) This presentation is mainly to demonstrate the flexibility that Apache Spark provides for various types ETL Data Pipelines when we code in Spark.

We have been running 3 types of pipelines over 6+ years , over 400+ nightly batch jobs for $1000/mo. (1) Spark on EC2 (2) UI Based ETL tool with Spark backend (on the same EC2) (3) Spark on EMR. We have a CI/CD pipeline that supports easy integration and code deployment in all non-prod and prod regions ( even supports automated unit testing). We will also demonstrate how this ecosystem can failover to a different region in less than 15 minutes , making our application highly resilient.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Swedbank: Enterprise Analytics in Cloud

Swedbank is the largest bank in Sweden & third largest in Nordics. They have about 7-8M customers across retail, mortgage , and investment (pensions). One of the key drivers for the bank was to look at data across all silos and build analytics to drive their ML models - they couldn’t. That’s when Swedbank made a strategic decision to go to the cloud and make bets on Databricks, Immuta, and Azure.

-Enterprise analytics in cloud is an initiative to move Swedbanks on-premise Hadoop based data lake into the cloud to provide improved analytical capabilities at scale. The strategic goals of the “Analytics Data Lake” are: -Advanced analytics: Improve analytical capabilities in terms of functionality, reduce analytics time to market and better predictive modelling -A Catalyst for Sharing Data: Make data Visible, Accessible, Understandable, Linked, and Trusted Technical advancements: Future proof with ability to add new tools/libraries, support for 3rd party solutions for Deep Learning/AI

To achieve these goals, Swedbank had to migrate existing capabilities and application services to Azure Databricks & implement Immuta as its unified access control plane. A “data discovery” space was created for data scientists to be able to come & scan (new) data, develop, train & operationalise ML models. To meet these goals Swedbank requires dynamic and granular data access controls to both mitigate data exposure (due to compromised accounts, attackers monitoring a network, and other threats) while empowering users via self-service data discovery & analytics. Protection of sensitive data is key to enable Swedbank to support key financial services use cases.

The presentation will focus on this journey, calling out key technical challenges, learning & benefits observed.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Unlocking the power of data, AI & analytics: Amgen’s journey to the Lakehouse | Kerby Johnson

In this keynote, you will learn more about Amgen's data platform journey from data warehouse to data lakehouse. They’’ll discuss our decision process and the challenges they faced with legacy architectures, and how they designed and implemented a sustaining platform strategy with Databricks Lakehouse, accelerating their ability to democratize data to thousands of users.
Today, Amgen has implemented 400+ data science and analytics projects covering use cases like clinical trial optimization, supply chain management and commercial sales reporting, with more to come as they complete their digital transformation and unlock the power of data across the company.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

US Air Force: Safeguarding Personnel Data at Enterprise Scale

The US Air Force VAULT platform is a cloud-native enterprise data platform designed to provide the Department of the Air Force (DAF) with a robust, interoperable, and secure data environment. The strategic goals of VAULT include:

  • Leading Data Culture - Increase data use and literacy to improve efficiency and effectiveness of decisions, readiness, mission operations, and cybersecurity.
  • A Catalyst for Sharing Data - Make data Visible, Accessible, Understandable, Linked, and Trusted (VAULT).
  • Driving Data Capabilities - Increase access to the right combination of state-of-the-art technologies needed to best utilize data.

To achieve these goals, the VAULT team created a self-service platform to onboard and extract, transform and load data, perform data analytics, machine learning and visualization, and data governance. Supporting over 50 tenants across NIPR and SIPR, adds complexity to maintaining data security while ensuring data can be shared and utilized for analytics. To meet these goals VAULT requires dynamic and granular data access controls to both mitigate data exposure (due to compromised accounts, attackers monitoring a network, and other threats) while empowering users via self-service analytics. Protection of sensitive data is key to enable VAULT to support key use cases such as personal readiness to optimally place Airmen trainees to meet production goals, increase readiness, and match trainees to their preferences.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Using Feast Feature Store with Apache Spark for Self-Served Data Sharing and Analysis for Streaming

In this presentation we will talk about how we will use available NER based sensitive data detection methods, automated record of activity processing on top of spark and feast for collaborative intelligent analytics & governed data sharing. Information sharing is the key to successful business outcomes but it's complicated by sensitive information both user centric and business centric.

Our presentation is motivated by the need to share key KPIs, outcomes for health screening data collected from various surveys to improve care and assistance. In particular, collaborative information sharing was needed to help with health data management, early detection and prevention of disease KPIs. We will present a framework or an approach we have used for these purposes.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Data + AI Summit 2022 Keynote from John Deere: Revolutionizing agriculture with AI

Hear Ganesh Jayaram, CIO of John Deere, talk about how the company is leveraging big data and AI to deliver ‘smart’ industrial solutions that are revolutionizing agriculture, driving sustainability and ultimately helping to feed the world. The John Deere Data Factory that is built upon the Databricks Lakehouse Platform is at the core of this innovation. It ingests 8 petabytes of data and trillions of records to give data teams fast, reliable access to standardized data sets to deliver over 3000 ML and analytics use cases that democratize data across John Deere, to deliver a culture of empowerment where data is everybody's responsibility.

Visit the Data + AI Summit at https://databricks.com/dataaisummit/