talk-data.com talk-data.com

Topic

Cyber Security

cybersecurity information_security data_security privacy

2078

tagged

Activity Trend

297 peak/qtr
2020-Q1 2026-Q1

Activities

2078 activities · Newest first

Building an Operational Machine Learning Organization from Zero and Leveraging ML for Crypto Securit

BlockFi is a cryptocurrency platform that allows its clients to grow wealth through various financial products including loans, trading and interest accounts. In this presentation, we will showcase our journey adopting Databricks to build an operational nerve center for analytics across the company. We will demonstrate how to build a cross-functional organization and solve key business problems to earn executive buy-in. We will showcase two of the early successes we've had using machine learning & data science to solve key business challenges in the domains of cyber security and IT Operations. In the domain of security, we will showcase how we are using Graph Analytics to analyze millions of blockchain transactions to identify dust attacks, account takeover and flag risky transactions. The operational IT use case will showcase how we are using Sarimax to forecast platform usage patterns to scale our infrastructure using hourly crypto prices, and financial indicators.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Building Enterprise Scale Data and Analytics Platforms at Amgen

Amgen has developed a suite of enterprise data & analytics platforms powered by modern, cloud native and open source technologies, that have played a vital role in building game changing analytics capabilities within the organization. Our platforms include a mature Data Lake with extensive self service capabilities, a Data Fabric with semantically connected data, a Data Marketplace for advanced cataloging, an intelligent Enterprise search among others to solve for a range of high value business problems. In this talk, we - Amgen and our partner ZS Associates - will share learning from our journey so far, best practices for building enterprise scale data & analytics platforms, and describe several business use cases and how we leverage modern technologies such as Databricks to enable our business teams. We will cover use cases related to Delta Lake, microservices, platform monitoring, fine grained security, and more.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Practical Data Governance in a Large Scale Databricks Environment

Learn from two governance and data practitioners what it takes to do data governance at enterprise scale. This is critical, since the power of Data Science is the ability to tap into any type of data source and turn it into pure value. It is at odds with its key enablers of Scale and Governance and we often must tackle new ways to bring our focus back to unlocking the insights inside the data. In this session, We will share new agile practices to roll out governance policies that balance Governance and Scale. We will untap how to deliver centralized fine-grained governance for ML and data transformation workloads that actually empowers data scientists in an enterprise Databricks environment that ensures privacy and compliance across hundreds of datasets. With automation being key to scale, we will also explore how we successfully automated security and governance

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Protecting PII/PHI Data in Data Lake via Column Level Encryption

Data breach is a concern for any data collection company including Northwestern mutual. Every measure is taken to avoid the identity theft and fraud for our customers; however they are still not sufficient if the security around it is not updated periodically. A multiple layer of encryption is the most common approach utilized to avoid breaches however unauthorized internal access to this sensitive data still poses a threat

This presentation will walk you following steps: - Design to build encryption at column level - How to protect PII data that is used as key for joins - Ability for authorized users to decrypt data at run time - Ability to rotate the encryption keys if needed

At Northwestern Mutual, a combination of Fernet, AES encryption libraries, user-defined functions (UDFs), and Databricks secrets, were utilized to develop a process to encrypt PII information. Access was only provided to those with a business need to decrypt it, this helps avoids the internal threat. This is also done without data duplication or metadata (view/tables) duplication. Our goal is to help you understand on how you can build a secure data lake for your organization which can eliminate threats of data breach internally and externally. Associated blog: https://databricks.com/blog/2020/11/20/enforcing-column-level-encryption-and-avoiding-data-duplication-with-pii.html

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Securing Databricks on AWS Using Private Link

Minimizing data transfers over the public internet is among the top priorities for organizations of any size, both for security and cost reasons. Modern cloud-native data analytics platforms need to support deployment architectures that meet this objective. For Databricks on AWS such an architecture is realized thanks to AWS PrivateLink, which allows computing resources deployed on different virtual private networks and different AWS accounts to communicate securely without ever crossing the public internet.

In this session, we want to provide a brief introduction to AWS Private Link and its main use cases in the context of a Databricks deployment: securing communications between control and data plane and securely connecting to the Databricks Web UI. We will then provide step-by-step walkthrough of the steps required in setting up PrivateLink connections with a Databricks deployment and demonstrate how to automate that process using AWS Cloud Formation or Terraform templates.

In this presentation we will cover the following topics: - Brief Introduction to AWS Private Link - How you can use PrivateLink to secure your AWS Databricks deployment - Step-by-step walkthrough of how to set up Private Link - How to automate and scale the setup using AWS CloudFormation or Terraform

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Security Best Practices for Lakehouse

To learn more, visit the Databricks Security and Trust Center: https://www.databricks.com/trust

As you embark on a lakehouse project or evolve your existing data lake, you may want to improve your security posture and take advantage of new security features—there may even be a security team at your company that demands it! Databricks has worked with thousands of customers to securely deploy the Databricks Platform to meet their architecture and security requirements. While many organizations deploy security differently, we have found a common set of guidelines and features among organizations who require a high level of security. In this talk, we will detail the security features and architectural choices frequently used by these organizations and walk through a series of threat models for the risks that most concern security teams. While this session is great for people who already know Databricks, don’t worry, that knowledge isn’t required.

You will walk away with a full handbook detailing all of the concepts, configurations, and code from the session so that you can make immediate progress when you get back to the office. Security can be hard, but we’ve collected the hard work already done by some of the best in the industry, to make it easier. Come learn how.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Complete Data Security and Governance Powered by Unity Catalog and Immuta

Join Immuta and Databricks to learn how combining the Databricks Unity Catalog and Immuta’s industry-leading data access platform enables complete data governance with granular security. This new integration makes Immuta-orchestrated attribute-based access control (ABAC) policies even more powerful and non-invasive, taking the solution to new levels and empowering your data platform teams.

During this session, you’ll also learn: - Why ABAC is essential for modern data stacks - How customers use an ABAC model to orchestrate complex policies at scale - Details on the Unity primitives for row- and column-level security - How Immuta will scale Unity enforcement primitives through ABAC and abstractions

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Spark Inception: Exploiting the Apache Spark REPL to Build Streaming Notebooks

Join Scott Haines (Databricks Beacon) as he teaches you to write your own Notebook style service (like Jupyter / Zeppelin / Databricks) for both fun (and profit?). Cause haven't we all just been a little curious how Notebook environments work? From the outside things probably seem magical, however just below the surface there is a literal world of possibilities waiting to be exploited (both figuratively and literally) to assist in the building of unimaginable new creations. Curiosity is of course the foundation for creativity and novel ideation, and when armed with the knowledge you'll pick up in this session, you'll have gained an additional perspective and way of thinking (mental model) for solving complex problems using dynamic procedural (on-the-fly) code compilation.

Did I mention you'll use Spark Structured Streaming in order to generate a "live" communication channel between your Notebook service and the "outside world"?

Overview During this session you'll learn to build your own Notebook-style service on top of Apache Spark & the Scala ILoop. Along the way, you'll uncover how to harness the SparkContext to manage, drive, and scale your own procedurally defined Apache Spark applications by mixing core configuration and other "magic". As we move through the steps necessary to achieve this end result, you'll learn to run individual paragraphs, or the entire synchronous waterfall of paragraphs, leading to the dynamic generation of applications.

Deep dive into the world of possibilities that fork from a solid understanding of procedurally generated, on-the-fly, code compilation (live injection), the security ramifications (cause of course this is unsafe!), but come away with a new mental model focused on architecting composite applications, or auto-generated

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Unity Catalog: Journey to Unified Governance for Your Data and AI Assets on Lakehouse

Modern data assets take many forms: not just files or tables, but dashboards, ML models, and unstructured data like video and images, all of which cannot be governed and managed by legacy data governance solutions. Join this session to learn how data teams can use Unity Catalog to centrally manage all data and AI assets with a common governance model based on familiar ANSI SQL, ensuring much better native performance and security. Built-in automated data lineage provides end-to-end visibility into how data flows from source to consumption, so that organizations can identify and diagnose the impact of data changes. Unity Catalog delivers the flexibility to leverage existing data catalogs and solutions and establish a future-proof, centralized governance without expensive migration costs. It also creates detailed audit reports for data compliance and security, while ensuring data teams can quickly discover and reference data for BI, analytics, and ML workloads, accelerating time to value.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Data Mesh in Action – Building Data Mesh Architecture Pattern with LTI Canvas Alcazar

Data is no longer considered an asset to be protected within teams, but as an asset to be democratized and made available to everyone in the organization in a secure and governed manner. The Data Mesh is an evolving data architecture pattern that helps organizations in breaking down data silos and providing agility to respond to market changes quickly with decentralized data ownership and centralized governance and security.

This talk will provide details and demonstrate how to use Databricks Delta Lake with Unity Catalog to implement and operationalize the Data Mesh Architecture pattern. The demo includes LTI Canvas Alcazar solution which helps accelerate the data mesh implementation with Databricks.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Data On-Board: The Aerospace Revolution

From airplanes to satellites, through mission systems, the aerospace industry generates a huge amount of data waiting to be exploited. All this information is shaping new concepts and capabilities that will forever change the industry thanks to artificial intelligence: autonomous flight, fault prediction, automatic problem detection or energy efficiency among many others. To achieve this, we face countless challenges, such as the rigorous AI certification and trustworthiness, safety, data integrity and security, which will have to be faced in this exciting Airbus journey: welcome to the aerospace revolution!

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

How McAfee Leverages Databricks on AWS at Scale

McAfee, a global leader in online protection security enables home users and businesses to stay ahead of fileless attacks, viruses, malware, and other online threats. Learn how McAfee leverages Databricks on AWS to create a centralized data platform as a single source of truth to power customer insights. We will also describe how McAfee uses additional AWS services specifically Kinesis and CloudWatch to provide real time data streaming and monitor and optimize their Databricks on AWS deployment. Finally, we’ll discuss business benefits and lessons learned during McAfee’s petabyte scale migration to Databricks on AWS using Databricks Delta clone technology coupled with network, compute, storage optimizations on AWS.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

How to Build a Complete Security and Governance Solution Using Unity Catalog

Unity Catalog unifies governance and security for Databricks in one place. It can store data classifications and privileges and enforce them.

This talk will go into the details of Unity Catalog and explains the core building blocks in Unity Catalog for Security and Governance. I will also explain how Privacera translates Apache Ranger policies into native policies of Unity Catalogs, audits are collected from Unity Catalog and imported into the centralized Audit Store of Apache Ranger, and Privacera can extend Unity Catalog.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Interactive Analytics on a Massive Scale Using Delta Lake

Interactive, Near Real Time analytics is usually a common requirement for many data teams across different fields.

In the field of web security, interactive analytics allows end users to get real time or historical insights about the state of their protected resource at any point of time and take actions accordingly.

One of the hardest aspects of enabling interactive, near-real-time analytics on a massive scale is a low response time. Scanning hundreds of Terabytes of data over a non-aggregated stream of events (a Delta Lake), and still returning an answer within just a few seconds can be a major challenge.

In this talk we will learn: • How did we build a 5PB Delta Lake of non-aggregated security events • What challenges did we see along the way - reducing delta log scan, improving cache affinity, reducing storage throttling errors etc. • How did we overcome them one by one

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Privacy Preserving Machine Learning and Big Data Analytics Using Apache Spark

In recent years, latest privacy laws & regulations bring a fundamental shift in the protection of data and privacy, placing new challenges to data applications. To resolve these privacy & security challenges in big data ecosystem without impacting existing applications, several hardware TEE (Trusted Execution Environment) solutions have been proposed for Apache Spark, e.g., PySpark with Scone and Opaque etc. However, to the best of our knowledge, none of them provide full protection to data pipelines in Spark applications. An adversary may still get sensitive information from unprotected components and stages. Furthermore, some of them greatly narrowed supported applications, e.g., only support SparkSQL. In this presentation, we will present a new PPMLA (privacy preserving machine learning and analytics) solution built on top of Apache Spark, BigDL, Occlum and Intel SGX. It ensures all spark components and pipelines are fully protected by Intel SGX, and existing Spark applications written in Scala, Java or Python can be migrated into our platform without any code change. We will demonstrate how to build distributed end-to-end SparkML/SparkSQL workloads with our solution on untrusted cloud environment and share real-world use cases for PPMLA.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Simplify Global DataOps and MLOps Using Okta’s FIG Automation Library

Think for a moment about an ML pipeline that you have created. Was it tedious to write? Did you have to familiarize yourself with technology outside your normal domain? Did you find many bugs? Did you give up with a “good enough” solution? Even simple ML pipelines are tedious. Complex ML pipelines make teams that include Data Engineers and ML Engineers still end up with delays and bugs. Okta’s FIG (Feature Infrastructure Generator) simplifies this with a configuration language for Data Scientists that produces scalable and correct ML pipelines, even highly complex ones. FIG is “just a library” in the sense that you can PIP install it. Once installed, FIG will configure your AWS account, creating ETL jobs, workflows, and ML training and scoring jobs. Data Scientists then use FIG’s configuration language to specify features and model integrations. With a single function call, FIG will run an ML pipeline to generate feature data, train models, and create scoring data. Feature generation is performed in a scalable, efficient, and temporally correct manner. Model training artifacts and scoring are automatically labeled and traced. This greatly simplifies the ML prototyping experience. Once it is time to productionize a model, FIG is able to use the same configuration to coordinate with Okta’s deployment infrastructure to configure production AWS accounts, register build and model artifacts, and setup monitoring. This talk will show a demo of using FIG in the development of Okta’s next generation security infrastructure. The demo includes a walkthrough of the configuration language and how that is translated into AWS during a prototyping session. The demo will also briefly cover how FIG interacts with Okta’s deployment system to make productionization seamless.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

US Air Force: Safeguarding Personnel Data at Enterprise Scale

The US Air Force VAULT platform is a cloud-native enterprise data platform designed to provide the Department of the Air Force (DAF) with a robust, interoperable, and secure data environment. The strategic goals of VAULT include:

  • Leading Data Culture - Increase data use and literacy to improve efficiency and effectiveness of decisions, readiness, mission operations, and cybersecurity.
  • A Catalyst for Sharing Data - Make data Visible, Accessible, Understandable, Linked, and Trusted (VAULT).
  • Driving Data Capabilities - Increase access to the right combination of state-of-the-art technologies needed to best utilize data.

To achieve these goals, the VAULT team created a self-service platform to onboard and extract, transform and load data, perform data analytics, machine learning and visualization, and data governance. Supporting over 50 tenants across NIPR and SIPR, adds complexity to maintaining data security while ensuring data can be shared and utilized for analytics. To meet these goals VAULT requires dynamic and granular data access controls to both mitigate data exposure (due to compromised accounts, attackers monitoring a network, and other threats) while empowering users via self-service analytics. Protection of sensitive data is key to enable VAULT to support key use cases such as personal readiness to optimally place Airmen trainees to meet production goals, increase readiness, and match trainees to their preferences.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Welcome to the latest episode of SaaS Scaled, where we’re talking to Patrick Parker, CEO of SaaS Partners — a company aimed at helping people build and scale their own SaaS businesses.   Patrick talks about his experience in consulting and software security, and what ultimately led him to start building SaaS Partners. We address why new startups so often fail and discuss some of the key challenges they need to overcome. What are some simple things new startup founders can do to boost their chances of success?   We discuss the importance of focusing on one key problem at a time, and Patrick talks about the value of reusing and copying certain successful models of building a startup. We also dive into the impact of emerging technologies on the SaaS space such as AI, machine learning, and the metaverse.   Finally, we talk about Web3, cryptocurrency, and blockchain, what this growing trend means for SaaS and the world as a whole, and how long it will take to become truly mainstream.    This episode is brought to you by Qrvey The tools you need to take action with your data, on a platform built for maximum scalability, security, and cost efficiencies. If you’re ready to reduce complexity and dramatically lower costs, contact us today at qrvey.com. Qrvey, the modern no-code analytics solution for SaaS companies on AWS. 

IBM TS7700 Release 5.2.2 Guide

This IBM® Redbooks® publication covers IBM TS7700 R5.2. The IBM TS7700 is part of a family of IBM Enterprise tape products. This book is intended for system architects and storage administrators who want to integrate their storage systems for optimal operation. Building on 25 years of experience, the R5.2 release includes many features that enable improved performance, usability, and security. Highlights include IBM TS7700 Advanced Object Store, an all flash TS7770, grid resiliency enhancements, and Logical WORM retention. By using the same hierarchical storage techniques, the TS7700 (TS7770 and TS7760) can also off load to object storage. Because object storage is cloud-based and accessible from different regions, the TS7700 Cloud Storage Tier support essentially allows the cloud to be an extension of the grid. As of this writing, the TS7700C supports the ability to off load to IBM Cloud® Object Storage, Amazon S3, and RSTOR. This publication explains features and concepts that are specific to the IBM TS7700 as of release R5.2. The R5.2 microcode level provides IBM TS7700 Cloud Storage Tier enhancements, IBM DS8000® Object Storage enhancements, Management Interface dual control security, and other smaller enhancements. The R5.2 microcode level can be installed on the IBM TS7770 and IBM TS7760 models only. Note: The latest Release 5.2 was split into two phases: R5.2 Phase 1 (also referred to as and ) R5.2 Phase 2 ( and R) TS7700 provides tape virtualization for the IBM z environment. Off loading to physical tape behind a TS7700 is used by hundreds of organizations around the world. Tape virtualization can help satisfy the following requirements in a data processing environment. New and existing capabilities of the TS7700 5.2.2 release includes the following highlights: Eight-way Grid Cloud, which consists of up to three generations of TS7700 Synchronous and asynchronous replication of virtual tape and TCT objects Grid access to all logical volume and object data that is independent of where it exists An all-flash TS7770 option for improved performance Full Advanced Object Store Grid Cloud support of DS8000 Transparent Cloud Tier Full AES256 encryption for data that is in-flight and at-rest Tight integration with IBM Z® and DFSMS policy management DS8000 Object Store AES256 in-flight encryption and compression Regulatory compliance through Logical WORM and LWORM Retention support Cloud Storage Tier support for archive, logical volume version, and disaster recovery Optional integration with physical tape 16 Gb IBM FICON® throughput that exceeds 5 GBps per TS7700 cluster Grid Resiliency Support with Control Unit Initiated Reconfiguration (CUIR) support IBM Z hosts view up to 3,968 common devices per TS7700 grid TS7770 Cache On-demand feature that is based capacity licensing TS7770 support of SSD within the VED server The TS7700T writes data by policy to physical tape through attachment to high-capacity, high-performance IBM TS1160, IBM TS1150, and IBM TS1140 tape drives that are installed in an IBM TS4500 or TS3500 tape library. The TS7770 models are based on high-performance and redundant IBM POWER9™ technology. They provide improved performance for most IBM Z tape workloads when compared to the previous generations of IBM TS7700.

Fundamentals of Data Engineering

Data engineering has grown rapidly in the past decade, leaving many software engineers, data scientists, and analysts looking for a comprehensive view of this practice. With this practical book, you'll learn how to plan and build systems to serve the needs of your organization and customers by evaluating the best technologies available through the framework of the data engineering lifecycle. Authors Joe Reis and Matt Housley walk you through the data engineering lifecycle and show you how to stitch together a variety of cloud technologies to serve the needs of downstream data consumers. You'll understand how to apply the concepts of data generation, ingestion, orchestration, transformation, storage, and governance that are critical in any data environment regardless of the underlying technology. This book will help you: Get a concise overview of the entire data engineering landscape Assess data engineering problems using an end-to-end framework of best practices Cut through marketing hype when choosing data technologies, architecture, and processes Use the data engineering lifecycle to design and build a robust architecture Incorporate data governance and security across the data engineering lifecycle