talk-data.com talk-data.com

Topic

Data Analytics

data_analysis statistics insights

23

tagged

Activity Trend

38 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: Databricks DATA + AI Summit 2023 ×
Data Caching Strategies for Data Analytics and AI

he increasing popularity of data analytics and artificial intelligence (AI) has led to a dramatic increase in the volume of data being used in these fields, creating a growing need for an enhanced computational capability. Cache plays a crucial role as an accelerator for data and AI computations, but it is important to note that these domains have different data access patterns, requiring different cache strategies. In this session, you will see our observations on data access patterns in the analytical SQL and AI training domains based on practical experience with large-scale systems. We will discuss the evaluation results of various caching strategies for analytical SQL and AI and provide caching recommendations for different use cases. Over the years, we have learned some best practices from big internet companies about the following aspects of our journey:

  1. Traffic pattern for analytical SQL and cache strategy recommendation
  2. Traffic pattern for AI training and how we can measure the cache efficiency for different AI training process
  3. Cache capacity planning based on real-time metrics of the working set
  4. Adaptive caching admission and eviction for uncertain traffic patterns

Talk by: Chunxu Tang and Beinan Wang

Here’s more to explore: State of Data + AI Report: https://dbricks.co/44i2HBp Databricks named a Leader in 2022 Gartner® Magic QuadrantTM CDBMS: https://dbricks.co/3phw20d

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Data Democratization at Michelin

Too often business decisions in large organizations are based on time consuming and labor-intensive data extracts, fragile Excel or access sheets that require significant manual intervention. The teams that prepare these manual reports have invaluable heuristic knowledge that, when combined with meaningful data and tools, can make smart business decisions. Imagine a world where these business teams are empowered with tools that help them build meaningful reports despite their limited technical expertise.

In this session, we will discuss: - The value derived from investing in developing citizen data personas within a business organization - How we successfully built a citizen data analytics culture within Michelin - Real examples of the impact of this initiative on the business and on the people themselves

The audience will walk away with some convincing arguments for building a citizen data culture in their organization and a how-to cookbook that they can use to cultivate citizen data personas. Finally, they can interactively uncover key success factors in the case of Michelin that can help drive a similar initiative in their respective companies.

Talk by: Philippe Leonhart and Fabien Cochet

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Learnings From the Field: Migration From Oracle DW and IBM DataStage to Databricks on AWS

Legacy data warehouses are costly to maintain, unscalable and cannot deliver on data science, ML and real-time analytics use cases. Migrating from your enterprise data warehouse to Databricks lets you scale as your business needs grow and accelerate innovation by running all your data, analytics and AI workloads on a single unified data platform.

In the first part of this session we will guide you through the well-designed process and tools that will help you from the assessment phase to the actual implementation of an EDW migration project. Also, we will address ways to convert PL/SQL proprietary code to an open standard python code and take advantage of PySpark for ETL workloads and Databricks SQL’s data analytics workload power.

The second part of this session will be based on an EDW migration project of SNCF (French national railways); one of the major enterprise customers of Databricks in France. Databricks partnered with SNCF to migrate its real estate entity from Oracle DW and IBM DataStage to Databricks on AWS. We will walk you through the customer context, urgency to migration, challenges, target architecture, nitty-gritty details of implementation, best practices, recommendations, and learnings in order to execute a successful migration project in a very accelerated time frame.

Talk by: Himanshu Arora and Amine Benhamza

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Self-Service Data Analytics and Governance at Enterprise Scale with Unity Catalog

This session focuses on one of the first Unity Catalog implementations for a large-scale enterprise. In this scenario, a cloud scale analytics platform with 7500 active users based on the lakehouse approach is used. In addition, there is potential for 1500 further users who are subject to special governance rules. They are consuming more than 600 TB of data stored in Delta Lake - continuously growing at more than 1TB per day. This might grow due to local country data. Therefore, the existing data platform must be extended to enable users to combine global and local data from their countries. A new data management was required, which reflects the strict information security rules at a need to know base. Core requirements are: read only from global data, write into local and share the results.

Due to a very pronounced information security awareness and a lack of the technological possibilities it was not possible to interdisciplinary analyze and exchange data so easy or at all so far. Therefore, a lot of business potential and gains could not be identified and realized.

With the new developments in the technology used and the basis of the lakehouse approach, thanks to Unity Catalog, we were able to develop a solution that could meet high requirements for security and process. And enables globally secured interdisciplinary data exchange and analysis at scale. This solution enables the democratization of the data. This results not only in the ability to gain better insights for business management, but also to generate entirely new business cases or products that require a higher degree of data integration and encourage the culture to change. We highlight technical challenges and solutions, present best practices and point out benefits of implementing Unity catalog for enterprises.

Talk by: Artem Meshcheryakov and Pascal van Bellen

Here’s more to explore: Data, Analytics, and AI Governance: https://dbricks.co/44gu3YU

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Why a Major Japanese Financial Institution Chose Databricks To Accelerate its Data AI-Driven Journey

In this session, NTT DATA presents a case study involving of one of the largest and most prominent financial institutions in Japan. The project involved migration from the largest data analysis platform to Databricks, a project that required careful navigation of very strict security requirements while accommodating the needs of evolving technical solutions so they could support a wide variety of company structures. This session is for those who want to accelerate their business by effectively utilizing AI as well as BI.

NTT DATA is one of the largest system integrators in Japan, providing data analytics infrastructure to leading companies to help them effectively drive the democratization of data and AI as many in the Japanese market are now adding AI into their BI offering.

Talk by: Yuki Saito

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored: AWS-Real Time Stream Data & Vis Using Databricks DLT, Amazon Kinesis, & Amazon QuickSight

Amazon Kinesis Data Analytics is a managed service that can capture streaming data from IoT devices. Databricks Lakehouse platform provides ease of processing streaming and batch data using Delta Live Tables. Amazon Quicksight with powerful visualization capabilities can provides various advanced visualization capabilities with direct integration with Databricks. Combining these services, customers can capture, process, and visualize data from hundreds and thousands of IoT sensors with ease.

Talk by: Venkat Viswanathan

Here’s more to explore: Big Book of Data Engineering: 2nd Edition: https://dbricks.co/3XpPgNV The Data Team's Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Streaming Data Analytics with Power BI and Databricks

This session is comprised of a series of end-to-end technical demos illustrating the synergy between Databricks and Power BI for streaming use cases, and considerations around when to choose which scenario:

Scenario 1: DLT + Power BI Direct Query and Auto Refresh

Scenario 2: Structured Streaming + Power BI streaming datasets

Scenario 3: DLT + Power BI composite datasets

Talk by: Liping Huang and Marius Panga

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Writing Data-Sharing Apps Using Node.js and Delta Sharing

JavaScript remains the top programming language today with most code repositories written using JavaScript on GitHub. However, JavaScript is evolving beyond just a language for web application development into a language built for tomorrow. Everyday tasks like data wrangling, data analysis, and predictive analytics are possible today directly from a web browser. For example, many popular data analytics libraries, like Tensorflow.js, now support JavaScript SDKs.

Another popular library, Danfo.js, makes it possible to wrangle data using familiar pandas-like operations, shortening the learning curve and arming the typical data engineer or data scientist with another data tool in their toolbox. In this presentation, we’ll explore using the Node.js connector for Delta Sharing to build a data analytics app that summarizes a Twitter dataset.

Talk by: Will Girten

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored by: Microsoft | Next-Level Analytics with Power BI and Databricks

The widely-adopted combination of Power BI and Databricks has been a game-changer in providing a comprehensive solution for modern data analytics. In this session, you’ll learn how self-service analytics combined with the Databricks Lakehouse Platform can allow users to make better-informed decisions by unlocking insights hidden in complex data. We’ll provide practical examples of how organizations have leveraged these technologies together to drive digital transformation, lower total cost of ownership (TCO), and increase revenue. By the end of the presentation and demo, you’ll understand how Power BI and Databricks can help drive real-time insights at scale for organizations in any industry.

Talk by: Bob Zhang and Mahesh Prakriya

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Cloud and Data Science Modernization of Veterans Affairs Financial Service Center with Azure Databri

The Department of Veterans Affairs (VA) is home to over 420,000 employees, provides health care for 9.16 million enrollees and manages the benefits of 5.75 million recipients. The VA also hosts an array of financial management, professional, and administrative services at their Financial Service Center (FSC), located in Austin, Texas. The FSC is divided into various service groups organized around revenue centers and product lines, including the Data Analytics Service (DAS). To support the VA mission, in 2021 FSC DAS continued to press forward with their cloud modernization efforts, successfully achieving four key accomplishments:

Office of Community Care (OCC) Financial Time Series Forecast - Financial forecasting enhancements to predict claims CFO Dashboard - Productivity and capability enhancements for financial and audit analytics Datasets Migrated to the Cloud - Migration of on-prem datasets to the cloud for down-stream analytics (includes a supply chain proof-of-concept) Data Science Hackathon - A hackathon to predict bad claims codes and demonstrate DAS abilities to accelerate a ML use case using Databricks AutoML

This talk discusses FSC DAS’ cloud and data science modernization accomplishments in 2021, lessons learned, and what’s ahead.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Productionizing Ethical Credit Scoring Systems with Delta Lake, Feature Store and MLFlow

Fairness, Ethics, Accountability and Transparency (FEAT) are must-haves for high-stakes machine learning models. In particular, models within the Financial Services industry such as those that assign credit scores can impact people’s access to housing and utilities and even influence their social standing. Hence, model developers have a moral responsibility to ensure that models do not systematically disadvantage any one group. Nevertheless, implementing such models in industrial settings remains challenging. A lack of concrete guidelines, common standards and technical templates make evaluating models from a FEAT perspective unfeasible. To address these implementation challenges, the Monetary Authority of Singapore (MAS) set up the Veritas Initiative to create a framework for operationalising the FEAT principles, so as to guide the responsible development of AIDA (Artificial Intelligence and Data Analytics) systems.

In January 2021, MAS announced the successful conclusion of Phase 1 of the Veritas Initiative. Deliverables included an assessment methodology for the Fairness principle and open source code for applying Fairness metrics to two use cases - customer marketing and credit scoring. In this talk, we demonstrate how these open-source examples, and their fairness metrics, might be put into production using open source tools such as Delta Lake and MLFlow. Although the Veritas Framework was developed in Singapore, the ethical framework is applicable across geographies.

By doing this, we illustrate how ethical principles can be operationalised, monitored and maintained in production, thus moving beyond only accuracy-based metrics of model performance and towards a more holistic and principled way of developing and productionizing machine learning systems.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Securing Databricks on AWS Using Private Link

Minimizing data transfers over the public internet is among the top priorities for organizations of any size, both for security and cost reasons. Modern cloud-native data analytics platforms need to support deployment architectures that meet this objective. For Databricks on AWS such an architecture is realized thanks to AWS PrivateLink, which allows computing resources deployed on different virtual private networks and different AWS accounts to communicate securely without ever crossing the public internet.

In this session, we want to provide a brief introduction to AWS Private Link and its main use cases in the context of a Databricks deployment: securing communications between control and data plane and securely connecting to the Databricks Web UI. We will then provide step-by-step walkthrough of the steps required in setting up PrivateLink connections with a Databricks deployment and demonstrate how to automate that process using AWS Cloud Formation or Terraform templates.

In this presentation we will cover the following topics: - Brief Introduction to AWS Private Link - How you can use PrivateLink to secure your AWS Databricks deployment - Step-by-step walkthrough of how to set up Private Link - How to automate and scale the setup using AWS CloudFormation or Terraform

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Simon Whiteley + Denny Lee Live Ask Me Anything

Simon and Denny Build A Thing is a live webshow, where Simon Whiteley (Advancing Analytics) and Denny Lee (Databricks) are building out a TV Ratings Analytics tool, working through the various challenges of building out a Data Lakehouse using Databricks. In this session, they'll be talking through their Lakehouse Platform, revisiting various pieces of functionality, and answering your questions, Live!

This is your chance to ask questions around structuring a lake for enterprise data analytics, the various ways we can use Delta Live Tables to simplify ETL or how to get started serving out data using Databricks SQL. We have a whole load of things to talk through, but we want to hear YOUR questions, which we can field from industry experience, community engagement and internal Databricks direction. There's also a chance we'll get distracted and talk about the Expanse for far too long.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Customer-centric Innovation to Scale Data & AI Everywhere

Imagine a world where you have the flexibility to infuse intelligence into every application, from edge to cloud. In this session, you will learn how Intel is enabling customer-centric innovation and delivering the simplicity, productivity, and performance the developers need to scale their data and AI solutions everywhere. An overview of Intel end-to-end data analytics and AI technologies, developer tools as well as examples of customers use cases will be presented.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Correlation Over Causation: Cracking the Relationship Between User Engagement and User Happiness

As a head of product on the Confluence team at Atlassian, I own the metrics associated with user happiness. This a common area of ownership for heads of product, GMs, CEOs. But how do you actually use data to move the needle on user happiness, and how do you convert user activity and engagement insights into clear actions that end up positively impacting user happiness? In this talk, I would like to share the approach we developed jointly with our data analytics team to understand, operationalize and report on our journey on make Confluence users happier. This talk will be useful for data analytics and data science practitioners, product executives, and anyone faced with a task of operationalizing improvement of a "fuzzy" metric like NPS or CSAT.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

The Future is Open - a Look at Google Cloud’s Open Data Ecosystem

Join Anagha Khanolkar and Mansi Maharana, both Cloud Customer Engineers specialized in Advanced Analytics, to learn about Open Data Analytics on Google Cloud. This session will cover Google Data Cloud's Open Data Analytics portfolio, value proposition, customer stories, trends, and more, and including Databricks on GCP.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

The Future of Data - What’s Next with Google Cloud

Join Bruno Aziza, Head of Data and Analytics, Google Cloud, for an in-depth look at what he is seeing in the future of data and emerging trends. He will also cover Google Cloud’s data analytics practice, including insights into the Data Cloud Alliance, Big Lake, and our strategic partnership with Databricks.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Delta Live Tables: Modern Software Engineering and Management for ETL

Data engineers have the difficult task of cleansing complex, diverse data, and transforming it into a usable source to drive data analytics, data science, and machine learning. They need to know the data infrastructure platform in depth, build complex queries in various languages and stitch them together for production. Join this talk to learn how Delta Live Tables (DLT) simplifies the complexity of data transformation and ETL. DLT is the first ETL framework to use modern software engineering practices to deliver reliable and trusted data pipelines at any scale. Discover how analysts and data engineers can innovate rapidly with simple pipeline development and maintenance, how to remove operational complexity by automating administrative tasks and gaining visibility into pipeline operations, how built-in quality controls and monitoring ensure accurate BI, data science, and ML, and how simplified batch and streaming can be implemented with self-optimizing and auto-scaling data pipelines.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Enabling Business Users to Perform Interactive Ad-Hoc Analysis over Delta Lake with No Code

In this talk, we'll first introduce Sigma Workbooks along with its technical design motivations and architectural details. Sigma Workbooks is an interactive visual data analytics system that enables business users to easily perform complex ad-hoc analysis over data in cloud data warehouses (CDWs). We'll then demonstrate the expressivity, scalability, and ease-of-use of Sigma Workbooks through real-life use cases over datasets stored in Delta Lake. We’ll conclude the talk by sharing the lessons that we have learned throughout the design and implementation iterations of Sigma Workbooks.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Evolution of Data Architectures and How to Build a Lakehouse

Data architectures are the key and part of a larger picture to building robust analytical and AI applications. One must take a holistic view of the entire data analytics realm when it comes to planning for data science initiatives.

Through this talk, learn about the evolution of the data landscape and why Lakehouses are becoming a de facto for organizations building scalable data architectures. A lakehouse architecture combines data management capability including reliability, integrity, and quality from the data warehouse and supports all data workloads including BI and AI with the low cost and open approach of data lakes.

Data Practitioners will also learn some core concepts of building an efficient Lakehouse with Delta Lake.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/