talk-data.com
Obfuscating Sensitive Information from Spark UI and Logs
Topics
Description
The Spark UI and logs have useful information but also include sensitive data that need to be obfuscated.
To obfuscate the data, at Workday we have implemented methods for Apache Spark where the string representations for the TreeNode class can be configured to be obfuscated or non-obfuscated.To do this, we added a custom treenode printer for ui and a custom log4j appender which uses a list of rules based on class name/package name/log message regexes to decide whether to obfuscate third party libraries. In the Spark UI and in the logging, this results in the obfuscation of Spark Plans and column names.
In this talk we will go over the steps we have taken to implement the methods for obfuscation and show what it looks like in the Spark UI and logs. The methods shared have worked out well when deployed to production at workday, and other companies can also benefit from implementing these methods.
Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/