talk-data.com talk-data.com

Topic

Cloud Computing

infrastructure saas iaas

99

tagged

Activity Trend

471 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: Databricks DATA + AI Summit 2023 ×
Building & Managing a Data Platform for a Delta Lake Exceeding 13PB & 1000s of Users: AT&T's Story

Data runs AT&T’s business, just like it runs most businesses these days. Data can lead to a greater understanding of a business and when translated correctly into information can provide human and business systems valuable insights to make better decisions. Unique to AT&T is the volume of data we support, how much of our work that is driven by AI and the scale at which data and AI drive value for our customers and stakeholders.

Our cloud migration journey includes making data and AI more accessible to employees throughout AT&T so they can use their deep business expertise to leverage data more easily and rapidly. We always had to balance this data democratization and desire for speed with keeping our data private and secure. We loved the open ecosystem model of Lakehouse that enables data, BI and ML tools to be seamlessly integrated on a single pane arena; it simplifies the architecture and reduces dependencies between technologies in the cloud. Being clear in our architecture guidelines and patterns was very important to us for our success.

We are seeing more interest from our business unit partners and continuing to build the AI capability AI as a service to support more citizen data scientists. To scale up our Lakehouse journey, we built a Databricks center of excellence (CoE) function in AT&T which today has approximately 1400+ active members, further concentrating existing expertise and resources in ML/AI discipline to collaborate on all things Databricks like technical support, trainings, FAQ’s and best practices to attain and sustain world-class performance and drive business value for AT&T. Join us to learn more about how we process and manage over 10 petabytes of our network Lakehouse with Delta Lake and Databricks.

What’s New in Unity Catalog -- With Live Demos

Join the Unity Catalog product team and dive into the cutting-edge world of data, analytics and AI governance. With Unity Catalog’s unified governance solution for data, analytics, and AI on any cloud, you’ll discover the latest and greatest enhancements we’re shipping, including fine-grained governance with row/column filtering, new enhancements with automated data lineage and governance for ML assets.

In this demo-packed session, You’ll learn how new capabilities in Unity Catalog can further simplify your data governance and accelerated analytics and AI initiatives. Plus, get an exclusive sneak peek at our upcoming roadmap. And don’t forget, you’ll have the chance to ask the product teams themselves any burning questions you have about the best governance solution for the lakehouse. Don’t miss out on this exciting opportunity to level up your data game with Unity Catalog.

Talk by: Paul Roome

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Live from the Lakehouse: Data sharing, Databricks marketplace, and Fivetran & cloud data platforms

Hear from two guests. First, Zaheera Valani (Sr Director, Engineering at Databricks) on data sharing and Databricks marketplace. Second guest, Taylor Brown (COO and co-founder, Fivetran), discusses cloud data platforms and automating data pulling from thousands of disparate data sources - how Fivetran and Databricks partner. Hosted by Holly Smith (Sr Resident Solutions Architect, Databricks) and Jimmy Obeyeni (Strategic Account Executive, Databricks)

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Cloud and Data Science Modernization of Veterans Affairs Financial Service Center with Azure Databri

The Department of Veterans Affairs (VA) is home to over 420,000 employees, provides health care for 9.16 million enrollees and manages the benefits of 5.75 million recipients. The VA also hosts an array of financial management, professional, and administrative services at their Financial Service Center (FSC), located in Austin, Texas. The FSC is divided into various service groups organized around revenue centers and product lines, including the Data Analytics Service (DAS). To support the VA mission, in 2021 FSC DAS continued to press forward with their cloud modernization efforts, successfully achieving four key accomplishments:

Office of Community Care (OCC) Financial Time Series Forecast - Financial forecasting enhancements to predict claims CFO Dashboard - Productivity and capability enhancements for financial and audit analytics Datasets Migrated to the Cloud - Migration of on-prem datasets to the cloud for down-stream analytics (includes a supply chain proof-of-concept) Data Science Hackathon - A hackathon to predict bad claims codes and demonstrate DAS abilities to accelerate a ML use case using Databricks AutoML

This talk discusses FSC DAS’ cloud and data science modernization accomplishments in 2021, lessons learned, and what’s ahead.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Secure Data Distribution and Insights with Databricks on AWS

Every industry must comply with some form of compliance or data security in order to operate. As data becomes more mission critical to the organization, so does the need to protect and secure it.

Public Sector organizations are responsible for securing sensitive data sets and complying with regulatory programs such as HIPAA, FedRAMP, and StateRAMP.

This does not come as a surprise given the many different attacks targeted at the industry and the extremely sensitive nature of the large volumes of data stored and analyzed. For a product owner or DBA, this can be extremely overwhelming with a security team issuing more restrictions and data access becoming more of a common request among business users. It can be difficult finding an effective governance model to democratize data while also managing compliance across your hybrid estate.

In this session, we will discuss challenges faced in the public sector when expanding to AWS cloud. We will review best practices for managing access and data integrity for a cloud-based data lakehouse with Databricks, and discuss recommended approaches for securing your AWS Cloud environment. We will highlight ways to enable compliance by developing a continuous monitoring strategy and providing tips for implementation of defense in depth. This guide will provide critical questions to ask, an overall strategy, and specific recommendations to serve all security leaders and data engineers in the Public Sector.

This talk is intended to educate on security design considerations when extending your data warehouse to the cloud. This guidance is expected to grow and evolve as new standards and offerings emerge for local, state, and federal government.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

So Fresh and So Clean: Learn How to Build Real-Time Warehouses on Lakehouse

Warehouses? Where we are going, we won't need warehouses! Join Dillon, Franco, and Shannon as they take an industry-standard Data Warehouse integration benchmark, called TPC-DI, which is a typical 80s style data warehouse, and bring it into the future. We will review how to implement standard data warehousing practices on Lakehouse, and show you how to deliver optimal price/performance in the cloud and keep your data so fresh and so clean. We will take an assortment of structured, semi-structured, and unstructured data in the form of CSV, TXT, XML, and Fixed-Width files, and transform them warehouse-style into Lakehouse with a historical load and incremental CDC loads.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

A Case Study in Rearchitecting an On-Premises Pipeline in the Cloud

This talk will give a detailed discussion of some of the many considerations that must be taken into account when rebuilding on-premises data pipelines in the cloud. I will give an initial overview of the original pipeline and the reasons that we chose to migrate this pipeline to Azure. Next, I will discuss the decisions that lead to the architecture we used to replace the original pipeline, and give a thorough overview of the new cloud pipeline, including design components and networking. I will also discuss the many lessons we learned along the way to successfully migrating this pipeline.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

How To Make Apache Spark on Kubernetes Run Reliably on Spot Instances

Since the general availability of Apache Spark’s native support for running on Kubernetes with Spark 3.1 in March 2021, the Spark community is increasingly choosing to run on k8s to benefit of containerization, efficient resource-sharing, and the tools from the cloud-native ecosystem.

Data teams are faced with complexities in this transition, including how to leverage spot VMs. These instances enable up to 90% cost savings but are not guaranteed to be available and face the risk of termination. This session will cover concrete guidelines on how to make Spark run reliably on spot instances, with code examples from real-world use cases.

Main topics: • Using spot nodes for Spark executors • Mixing instance types & sizes to reduce risk of spot interruptions - cluster autoscaling • Spark 3.0: Graceful Decommissioning - preserve shuffle files on executor shutdown • Spark 3.1: PVC reuse on executor restart - disaggregate compute & shuffle storage • What to look for in future Spark releases

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

How To Use Databricks SQL for Analytics on Your Lakehouse

Most organizations run complex cloud data architectures that silo applications, users, and data. As a result, most analysis is performed with stale data and there isn’t a single source of truth of data for analytics.

Join this interactive follow-along deep dive demo to learn how Databricks SQL allows you to operate a multicloud lakehouse architecture that delivers data warehouse performance at data lake economics — with up to 12x better price/performance than traditional cloud data warehouses. Now data analysts and scientists can work with the freshest and most complete data and quickly derive new insights for accurate decision-making.

Here’s what we’ll cover: • Managing data access and permissions and monitoring how the data is being used and accessed in real time across your entire lakehouse infrastructure • Configuring and managing compute resources for fast performance, low latency, and high user concurrency to your data lake • Creating and working with queries, dashboards, query refresh, troubleshooting features and alerts • Creating connections to third-party BI and database tools (Power BI, Tableau, DbVisualizer, etc.) so that you can query your lakehouse without making changes to your analytical and dashboarding workflows

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

AI powered Assortment Planning Solution

For shop owners to maximize revenue, they need to ensure that the right products are available on the right shelf at the right time. So, how does one assort the right mix of products to make max profit & reduce inventory pressure? Today, these decisions are led by human knowledge of trends & inputs from salespeople. This is error prone and cannot scale with a growing product assortment & varying demand patterns. Mindtree has analyzed this problem and built a cloud-based AI/ML solution that provides contextual, real-time insights and optimizes inventory management. In this presentation, you will hear our solution approach to help global CPG organization, promote new products, increase demand across their product offerings and drive impactful insights. You will also learn about the technical solution architecture, orchestration of product and KPI generation using Databricks, AI/ML models, heterogenous cloud platform options for deployment and rollout, scale-up and scale-out options.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

A Practitioner's Guide to Unity Catalog—A Technical Deep Dive

As a practitioner, managing and governing data assets and ML models in the data lakehouse is critical for your business initiatives to be successful. With Databricks Unity Catalog, you have a unified governance solution for all data and AI asserts in your lakehouse, giving you much better performance, management and security on any cloud. When deploying Unity Catalog for your lakehouse, you must be prepared with best practices to ensure a smooth governance implementation. This session will cover key considerations for a successful implementation such as: • How to manage Unity Catalog’s metastore and understand various usage patterns • How to use identity federation to assign account principals to a Databricks Workspace • Best practices for leveraging cloud storages, managed tables and external tables with Unity catalog

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Meshing About with Databricks

Large enterprises are increasingly de-centralizing their data teams to increase overall business agility. The cloud has been a big enabler for teams to become more autonomous in the data products they prioritize, the technology they choose, and the ability to attribute costs granularly.

In order for organizations to successfully realize such aspirations, it is in their best interest to shift from centralized teams and centralized technology to a more distributed ecosystem built around business domains.

The data mesh is an architecture paradigm that many enterprises are looking to adopt to realize this vision. It proposes that distributed autonomous domains leverage self-serve data infrastructure as a platform to enable their work of creating and maintaining sharable data products.

This session will explain how Databricks can be used to implement a Data Mesh across an enterprise.

We will demonstrate how: - A new data team can be onboarded quickly - Consumers can discover data products and their lineage - Domains can publish data products and set governance policies - Data can be accessed within and external to the enterprise - Analysis can be shared

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Migrate and Modernize your Data Platform with Confluent and Databricks

Moving and building in the cloud to accelerate analytics development requires enterprises to rethink their data infrastructure. Whether you are moving from an on-prem legacy system or you were born in the cloud, businesses are turning to Confluent and Databricks to help them unlock new real-time customer experiences and intelligence for their backend operations.

Join us to see how Confluent and Databricks enable companies to set data in motion across any system, at any scale, in near real-time. Connecting Confluent with Databricks allows companies to migrate and connect data from on-prem databases and data warehouses like Netezza, Oracle, and Cloudera to Databricks in the cloud to power real-time analytics.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Power to the (SQL) People: Python UDFs in DBSQL

Databricks SQL (DB SQL) allows customers to leverage the simple and powerful Lakehouse architecture with up to 12x better price/performance compared to traditional cloud data warehouses. Analysts can use standard SQL to easily query data and share insights using a query editor, dashboards or a BI tool of their choice, and analytics engineers can build and maintain efficient data pipelines, including with tools like dbt.

While SQL is great at querying and transforming data, sometimes you need to extend its capabilities with the power of Python, a full programming language. Users of Databricks notebooks already enjoy seamlessly mixing SQL, Python and several other programming languages. Use cases include masking or encrypting and decrypting sensitive data, complex transformation logic, using popular open source libraries or simply reusing code that has already been written elsewhere in Databricks. In many cases, it is simply prohibitive or even impossible to rewrite the logic in SQL.

Up to now, there was no way to use Python from within DBSQL. We are removing this restriction with the introduction of Python User Defined Functions (UDFs). DBSQL users can now create, manage and use Python UDFs using standard SQL. UDFs are registered in Unity Catalog, which means they can be governed and used throughout Databricks, including in notebooks.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Amgen’s Journey To Building a Global 360 View of its Customers with the Lakehouse

Serving patients in over 100 countries, Amgen is a leading global biotech company focused on developing therapies that have the power to save lives. Delivering on this mission requires our commercial teams to regularly meet with healthcare providers to discuss new treatments that can help patients in need. With the onset of the pandemic, where face-to-face interactions with doctors and other Healthcare Providers (HCPs) were severely impacted, Amgen had to rethink these interactions. With that in mind, the Amgen Commercial Data and Analytics team leveraged a modern data and AI architecture built on the Databricks Lakehouse to help accelerate its digital and data insights capabilities. This foundation enabled Amgen’s teams to develop a comprehensive, customer-centric view to support flexible go-to-market models and provide personalized experiences to our customers. In this presentation, we will share our recent journey of how we took an agile approach to bringing together over 2.2 petabytes of internally generated and externally sourced vendor data , and onboard into our AWS Cloud and Databricks environments to enable a standardized, scalable and robust capabilities to meet the business requirements in our fast-changing life sciences environment. We will share use cases of how we harmonized and managed our diverse sets of data to deliver efficiency, simplification, and performance outcomes for the business. We will cover the following aspects of our journey along with best practices we learned over time: • Our architecture to support Amgen’s Commercial Data & Analytics constant processing around the globe • Engineering best practices for building large scale Data Lakes and Analytics platforms such as Team organization, Data Ingestion and Data Quality Frameworks, DevOps Toolkit and Maturity Frameworks, and more • Databricks capabilities adopted such as Delta Lake, Workspace policies, SQL workspace endpoints, and MLflow for model registry and deployment. Also, various tools were built for Databricks workspace administration • Databricks capabilities being explored for future, such as Multi-task Orchestration, Container-based Apache Spark Processing, Feature Store, Repos for Git integration, etc. • The types of commercial analytics use cases we are building on the Databricks Lakehouse platform Attendees building global and Enterprise scale data engineering solutions to meet diverse sets of business requirements will benefit from learning about our journey. Technologists will learn how we addressed specific Business problems via reusable capabilities built to maximize value.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Introduction to Flux and OSS Replication

In this breakout session we’ll learn about Flux, the data scripting and query language for InfluxDB. InfluxDB is the leading time series database platform. With Flux you can perform time series lifecycle management tasks, data preparation and analytics, alert tasks, and more. InfluxDB has two offerings: InfluxDB Cloud and InfluxDB OSS. Finally, we’ll learn about how you can use Flux and the replication tool to consolidate data from your OSS instances running at the edge to InfluxDB Cloud.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Migrate Your Existing DAGs to Databricks Workflows

In this session, you will learn the benefits of orchestrating your business-critical ETL and ML workloads within the lakehouse, as well as how to migrate and consolidate your existing workflows to Databricks Workflows - a fully managed lakehouse orchestration service that allows you to run workflows on any cloud. We’ll walk you through different migration scenarios and share lessons learned and recommendations to help you reap the benefits of orchestration with Databricks Workflows.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Migrating Complex SAS Processes to Databricks - Case Study

Many federal agencies use SAS software for critical operational data processes. While SAS has historically been a leader in analytics, it has often been used by data analysts for ETL purposes as well. However, modern data science demands on ever-increasing volumes and types of data require a shift to modern, cloud architectures and data management tools and paradigms for ETL/ELT. In this presentation, we will provide a case study at Centers for Medicare and Medicaid Services (CMS) detailing the approach and results of migrating a large, complex legacy SAS process to modern, open-source/open-standard technology - Spark SQL & Databricks – to produce results ~75% faster without reliance on proprietary constructs of the SAS language, with more scalability, and in a manner that can more easily ingest old rules and better govern the inclusion of new rules and data definitions. Significant technical and business benefits derived from this modernization effort are described in this session.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Modern Architecture of a Cloud-Enabled Data and Analytics Platform

In today’s modern IT organization whether it is the delivery of a sophisticated analytical model or a product advancement decision or understanding the behavior of a customer, the fact remains that in every instance we rely on data to make good, informed decisions. Given this backdrop, having an architecture which supports the ability to efficiently collect data from a wide range of sources within the company is still an important goal of all data organizations.

In this session we will explain how Bayer has deployed a hybrid data platform which strives to integrate key existing legacy data systems of the past while taking full advantage of what a modern cloud data platform has to offer in terms of scalability and flexibility. It will elaborate the use of its most significant component, Databricks, which serves to provide not only a very sophisticated data pipelining solution but also a complete ecosystem for teams to create data and analytical solutions in a flexible and agile way.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Near Real-Time Analytics with Event Streaming, Live Tables, and Delta Sharing

Microservices is an increasingly popular architecture much loved by application teams, for it allows services to be developed and scaled independently. Data teams, though, often need a centralized repository where all data from different services come together to join and aggregate. The data platform can serve as a single source of company facts, enable near real time analytics, and secure sharing of massive data sets across clouds.

A viable microservices ingestion pattern is Change Data Capture, using AWS Database Migration Services or Debezium. CDC proves to be a scalable solution ideal for stable platforms, but it has several challenges for evolving services: Frequent schema changes, complex, unsupported DDL during migration, and automated deployments are but a few. An event streaming architecture can address these challenges.

Confluent, for example, provides a schema registry service where all services can register their event schemas. Schema registration helps with verifying that the events are being published based on the agreed contracts between data producers and consumers. It also provides a separation between internal service logic and the data consumed downstream. The services write their events to Kafka using the registered schemas with a specific topic based on the type of the event.

Data teams can leverage Spark jobs to ingest Kafka topics into Bronze tables in the Delta Lake. On ingestion, the registered schema from schema registry is used to validate the schema based on the provided version. A merge operation is sometimes called to translate events into final states of the records per business requirements.

Data teams can take advantage of Delta Live Tables on streaming datasets to produce Silver and Gold tables in near real time. Each input data source also has a set of expectations to ensure data quality and business rules. The pipeline allows Engineering and Analytics to collaborate by mixing Python and SQL. The refined data sets are then fed into Auto ML for discovery and baseline modeling.

To expose Gold tables to more consumers, especially non spark users across clouds, data teams can implement Delta Sharing. Recipients can accesses Silver tables from a different cloud and build their own analytics data sets. Analytics teams can also access Gold tables via pandas Delta Sharing client and BI tools.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/