talk-data.com talk-data.com

Topic

Analytics

data_analysis insights metrics

170

tagged

Activity Trend

398 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: Databricks DATA + AI Summit 2023 ×
Discover Data Lakehouse With End-to-End Lineage

Data Lineage is key for managing change, ensuring data quality and implementing Data Governance in an organization. There are a few use cases for Data Lineage: Data Governance: For compliance and regulatory purposes our customers are required to prove the data/reports they are submitting came from a trusted and verified source.

This typically means identifying the tables and data sets used in a report or dashboard and tracing the source of these tables and fields. Another use case for the Governance scenario is to understand the spread of sensitive data within the lakehouse. Data Discovery: Data analysts looking to self-serve and build their own analytics and models typically spend time exploring and understanding the data in their lakehouse.

Lineage is a key piece of information which enhances the understanding and trustworthiness of the data the analyst plans to use. Problem Identification: Data teams are often called to solve errors in analysts dashboards and reports (“Why is the total number of widgets different in this report than the one I have built?”). This usually leads to an expensive forensic exercise by the DE team to understand the sources of data and the transformations applied to it before it hits the report. Change Management : It is not uncommon for data sources to change, a new source may stop delivering data or a field in the source system changes its semantics.

In this scenario the DE team would like to understand the downstream impact of this change - to get a sense of how many datasets and users will be affected by this change. This will help them determine the impact of the change, manage user expectations and address issues ahead of time In this talk, we will talk about how we capture table and column lineage for spark / delta and unity catalog for our customers in details and how users could leverage data lineage to serve various use cases mentioned above.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Enable Production ML with Databricks Feature Store

Productionalizing ML models is hard. In fact, very few ML projects make it to production, and one of the hardest problems is data! Most AI platforms are disconnected from the data platform, making it challenging to keep features constantly updated and available in real-time. Offline/online skew prevents models from being used in real-time or, worse, introduces bugs and biases in production. Building systems to enable real-time inference requires valuable production engineering resources. As a result of these challenges, most ML models do not see the light of day.

Learn how you can simplify production ML using Databricks Feature Store, the first feature store built on the data lakehouse. Data sources for features are drawn from a central data lakehouse, and the feature tables themselves are tables in the lakehouse, accessible in Spark and SQL for both machine learning and analytics use cases. Features, data pipelines, source data, and models can all be co-governed in a central platform. Feature Store is seamlessly integrated with Apache Spark™, enabling automatic lineage tracking, and with MLflow, enabling models to look up feature values at inference time automatically. See these capabilities in action and how you can use it for your ML projects.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Enabling Business Users to Perform Interactive Ad-Hoc Analysis over Delta Lake with No Code

In this talk, we'll first introduce Sigma Workbooks along with its technical design motivations and architectural details. Sigma Workbooks is an interactive visual data analytics system that enables business users to easily perform complex ad-hoc analysis over data in cloud data warehouses (CDWs). We'll then demonstrate the expressivity, scalability, and ease-of-use of Sigma Workbooks through real-life use cases over datasets stored in Delta Lake. We’ll conclude the talk by sharing the lessons that we have learned throughout the design and implementation iterations of Sigma Workbooks.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Evolution of Data Architectures and How to Build a Lakehouse

Data architectures are the key and part of a larger picture to building robust analytical and AI applications. One must take a holistic view of the entire data analytics realm when it comes to planning for data science initiatives.

Through this talk, learn about the evolution of the data landscape and why Lakehouses are becoming a de facto for organizations building scalable data architectures. A lakehouse architecture combines data management capability including reliability, integrity, and quality from the data warehouse and supports all data workloads including BI and AI with the low cost and open approach of data lakes.

Data Practitioners will also learn some core concepts of building an efficient Lakehouse with Delta Lake.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

How AARP Services, Inc. automated SAS transformation to Databricks using LeapLogic

While SAS has been a standard in analytics and data science use cases, it is not cloud-native and does not scale well. Join us to learn how AARP automated the conversion of hundreds of complex data processing, model scoring, and campaign workloads to Databricks using LeapLogic, an intelligent code transformation accelerator that can transform any and all legacy ETL, analytics, data warehouse and Hadoop to modern data platforms.

In this session experts from AARP and Impetus will share about collaborating with Databricks and how they were able to: • Automate modernization of SAS marketing analytics based on coding best practices • Establish a rich library of Spark and Python equivalent functions on Databricks with the same capabilities as SAS procedures, DATA step operations, macros, and functions • Leverage Databricks-native services like Delta Live Tables to implement waterfall techniques for campaign execution and simplify pipeline monitoring

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

How to Automate the Modernization and Migration of Your Data Warehousing Workloads to Databricks

The logic in your data is the heartbeat of your organization’s reports, analytics, dashboards and applications. But that logic is often trapped in antiquated technologies that can’t take advantage of the massive scalability in the Databricks Lakehouse.

In this session BladeBridge will show how to automate the conversion of this metadata and code into Databricks PySpark and DBSQL. BladeBridge will demonstrate the flexibility of configuring for N legacy technologies to facilitate an automated path for not just a single modernization project but a factory approach for corporate wide modernization.

BladeBridge will also present how you can empirically size your migration project to determine the level of effort required.

In this session you will learn: What BladeBridge Converter is What BladeBridge Analyzer is How BladeBridge configures Readers and Writers How to size a conversion effort How to accelerate adoption of Databricks

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

AI-Fueled Forecasting: The Next Generation of Financial Planning

In an age of data abundance and digital disruption CFOs are adopting next generation planning capabilities to drive strategic decision making in real time. The future of forecasting is AI driven. PrecisionViewTM, a Deloitte’s proprietary forecasting solution leverages data aggregation technologies with predictive analytics and machine-learning capabilities to allow businesses to achieve improved forecasting accuracy.

Attend this webinar to hear about: • AI-powered financial planning that helps generate high-impact insights by incorporating the organization’s internal data and a myriad of external macroeconomic factors • Examples of how companies have achieved success using scenario modelling • Databricks’ compute capabilities that allow for parallel processing which helps generate near real time forecasts at the most granular levels

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Big Data in the Age of Moneyball

Data and predictions have permeated sports and our conversations around it since the beginning. Who will win the big game this weekend? How many points will your favorite player score? How much money will be guaranteed in the next free agent contract? Once could argue that data-driven decisions in sports started with Moneyball in baseball, in 2003. In the two decades since, data and technology have exploded on the scene. The Texas Rangers are using modern cloud software, such as Databricks, to help make sense of this data, and provide actionable information to create a World Series team on the field. From computer vision, pose analytics, and player tracking, to pitch design, base stealing likelihood, and more, come see how the Texas Rangers are using innovative cloud technologies to create action-driven reports from the current sea of Big Data. Finally, this talk will demonstrate how the Texas Rangers use MLFlow and the MLRegistry inside Databricks to organize their predictive models.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Detecting Financial Crime Using an Azure Advanced Analytics Platform and MLOps Approach

As gatekeepers of the financial system, banks play a crucial role in reporting possible instances of financial crime. At the same time, criminals continuously reinvent their approaches to hide their activities among dense transaction data. In this talk, we describe the challenges of detecting money laundering and outline why employing machine learning via MLOps is critically important to identify complex and ever-changing patterns.

In anti-money-laundering, machine learning answers to a dire need for vigilance and efficiency where previous-generation systems fall short. We will demonstrate how our open platform facilitates a gradual migration towards a model-driven landscape, using the example of transaction-monitoring to showcase applications of supervised and unsupervised learning, human explainability, and model monitoring. This environment enables us to drive change from the ground up in how the bank understands its clients to detect financial crime.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

From PostGIS to Spark SQL: The History and Future of Spatial SQL

In this talk, we'll review the major milestones that have defined Spatial SQL as the powerful tool for geospatial analytics that it is today.

From the early foundations of the JTS Topology Suite and GEOS and its application on the PostGIS extension for PostgreSQL, to the latest implementation in Spark SQL using libraries such as the CARTO Analytics Toolbox for Databricks, Spatial SQL has been a key component of many geospatial analytics products and solutions, leveraging the computing power of different databases with SQL as lingua franca, allowing easy adoption by data scientists, analysts and engineers.

The latest innovation in this area is the CARTO Spatial Extension for Databricks, which makes the most of the near-unlimited scalability provided by Spark and the cutting-edge geospatial capabilities that CARTO offers.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Graph-based stream processing

The understanding of complex relationships and interdependencies between different data points is crucial to many decision-making processes.

Graph analytics have found their way into every major industry, from marketing and financial services to transportation. Fraud detection, recommendation engines and process optimization are some of the use cases where real-time decisions are mission-critical, and the underlying domain can be easily modeled as a graph.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Hassle-Free Data Ingestion into the Lakehouse

Ingesting data from hundreds of different data sources is critical before organizations can execute advanced analytics, data science, and machine learning. Unfortunately, ingesting and unifying this data to create a reliable single source of truth is usually extremely time-consuming and costly. In this session, discover how Databricks simplifies data ingestion, at low latency, with SQL-only ingestion capabilities. We will discuss and demonstrate how you can easily and quickly ingest any data into the lakehouse. The session will also cover newly-released features and tools that make data ingestion even simpler on the Databricks Lakehouse Platform.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

How Robinhood Built a Streaming Lakehouse to Bring Data Freshness from 24h to Less Than 15 Mins

Robinhood’s data lake is the bedrock foundation that powers business analytics, product experimentation, and other machine learning applications throughout our organization. Come join this session where we will share our journey of building a scalable streaming data lakehouse with Spark, Postgres and other leading open source technologies.

We will lay out our architecture in depth and describe how we perform CDC streaming ingestion and incremental processing of 1000’s of Postgres tables into our data lake.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

How the Largest County in the US is Transforming Hiring with a Modern Data Lakehouse

Los Angeles County’s Department of Human Resources (DHR) is responsible for attracting a diverse workforce for the 37 departments it supports. Each year, DHR processes upwards of 400,000 applications for job opportunities making it one of the largest employers in the nation. Managing a hiring process of this scale is complex with many complicated factors such as background checks and skills examination. These processes, if not managed properly, can create bottlenecks and a poor experience for both candidates and hiring managers.

In order to identify areas for improvement, DHR set out to build detailed operational metrics across each stage of the hiring process. DHR used to conduct high level analysis manually using excel and other disparate tools. The data itself was limited, difficult to obtain, and analyze. In addition, it was taking analysts weeks to manually pull data from half a dozen siloed systems into excel for cleansing and analysis. This process was labor-intensive, inefficient, and prone to human error.

To overcome these challenges, DHR in partnership with Internal Services Department (ISD) adopted a modern data architecture in the cloud. Powered by the Azure Databricks Lakehouse, DHR was able to bring together their diverse volumes of data into a single platform for data analytics. Manual ETL processes that took weeks could now be automated in 10 minutes or less. With this new architecture, DHR has built Business Intelligence dashboards to unpack the hiring process to get a clear picture of where the bottlenecks are and track the speed with which candidates move through the process The dashboards allow the County departments innovate and make changes to enhance and improve the experience of potential job seekers and improve the timeliness of securing highly qualified and diverse County personnel at all employment levels.

In this talk, we’ll discuss DHR’s journey towards building a data-driven hiring process, the architecture decisions that enabled this transformation and the types of analytics that we’ve deployed to improve hiring efforts.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Interactive Analytics on a Massive Scale Using Delta Lake

Interactive, Near Real Time analytics is usually a common requirement for many data teams across different fields.

In the field of web security, interactive analytics allows end users to get real time or historical insights about the state of their protected resource at any point of time and take actions accordingly.

One of the hardest aspects of enabling interactive, near-real-time analytics on a massive scale is a low response time. Scanning hundreds of Terabytes of data over a non-aggregated stream of events (a Delta Lake), and still returning an answer within just a few seconds can be a major challenge.

In this talk we will learn: • How did we build a 5PB Delta Lake of non-aggregated security events • What challenges did we see along the way - reducing delta log scan, improving cache affinity, reducing storage throttling errors etc. • How did we overcome them one by one

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Analytics Engineering and the Great Convergence   Tristan Handy   Keynote Data + AI Summit 2022

We've come a long way from the way data analysis used to be done. The emergence of the analytics engineering workflow, with dbt at its center, has helped usher in a new era of productivity. Not quite data engineering or data analysis, analytics engineering has enabled new levels of collaboration between two key sets of practitioners.

But that's not the only coming together happening right now. Enabled by the open lakehouse, the worlds of data analysis and AI/ML are also converging under a single roof, hinting at a new future of intertwined workloads and silo-free collaboration. It's a future that's tantalizing, and entirely within reach. Let's talk about making it happen.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Day 1 Afternoon Keynote |  Data + AI Summit 2022

Day 1 Afternoon Keynote | Data + AI Summit 2022 Supercharging our data architecture at Coinbase using Databricks Lakehouse | Eric Sun | Keynote Partner Connect & Ecosystem Strategy | Zaheera Valani What are ELT and CDC, and why are all the cool kids doing it? |George Fraser Analytics without Compromise | Francois Ajenstat Fireside Chat with Zhamak Dehghani and Arsalan Tavakoli

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Day 1 Morning Keynote | Data + AI Summit 2022

Day 1 Morning Keynote | Data + AI Summit 2022 Welcome & "Destination Lakehouse" | Ali Ghodsi Apache Spark Community Update | Reynold Xin Streaming Lakehouse | Karthik Ramasamy Delta Lake | Michael Armbrust How Adobe migrated to a unified and open data Lakehouse to deliver personalization at unprecedented scale | Dave Weinstein Data Governance and Sharing on Lakehouse |Matei Zaharia Analytics Engineering and the Great Convergence | Tristan Handy Data Warehousing | Shant Hovespian Unlocking the power of data, AI & analytics: Amgen’s journey to the Lakehouse | Kerby Johnson

Get insights on how to launch a successful lakehouse architecture in Rise of the Data Lakehouse by Bill Inmon, the father of the data warehouse. Download the ebook: https://dbricks.co/3ER9Y0K

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Financial Services Experience at Data + AI Summit 2022

The future of Financial Services is open with data and AI at its core. Welcome data teams and executives in Financial Services! This year’s Data + AI Summit is jam-packed with talks, demos and discussions on how Financial Services leaders are harnessing the power of data and analytics to digitally transform, minimize risk, accelerate time to market and drive sustainable value creation To help you take full advantage of the Financial Services industry experience at Summit, we’ve curated all the programs in one place.

Highlights at this year’s Summit:

Financial Services Industry Forum: Our flagship event for Financial Services attendees at Summit featuring keynotes and panel discussions with ADP, Northwestern Mutual, Point72 Asset Management, S&P Global and EY, followed by networking. More details in the agenda below. Financial Services Lounge: Stop by our lounge located outside the Expo floor to meet with Databricks’ industry experts and see solutions from our partners including Accenture, Avanade, Deloitte and others. Session Talks: Over 15 technical talks and demos on topics including hyper-personalization, AI-fueled forecasting, enterprise analytics in cloud, scaling privacy and cybersecurity, MLOps in cryptocurrency, ethical credit scoring and more.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Health Care and Life Sciences Experience at Data + AI Summit 2022

Welcome data teams and executives in the Healthcare and Life Sciences industry! This year’s Data + AI Summit is jam-packed with talks, demos and discussions on the biggest innovations in patient care and drug R&D. To help you take full advantage of the Healthcare and Life Sciences experience at Summit, we’ve curated all the programs in one place.

Highlights at this year’s Summit:

Healthcare and Life Sciences Industry Forum: Our capstone event for Healthcare and Life Sciences attendees at Summit featuring keynotes and panel discussions with Walgreens, Takeda, Optum, and Humana followed by networking. More details in the agenda below. Healthcare and Life Sciences Lounge: Stop by our industry lounge located outside the Expo floor to meet with Databricks’ industry experts and see solutions from our partners including ZS Associates, John Snow Labs and others. Session Talks: Over 10 technical talks on topics including healthcare NLP, knowledge graphs for R&D, commercial analytics, and predicting hospital readmissions.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/