Almost 5 years into GDPR enforcement, the courts and the supervisory authorities have peddled through quite some decisions and more are expected while this description is being written.
talk-data.com
Topic
GDPR/CCPA
95
tagged
Activity Trend
Top Events
In the age of GDPR, CCPA, and HIPAA along with the widespread proliferation of ad blockers and Apple’s ITP, it’s never been harder to measure the effectiveness of your digital marketing efforts while protecting your customers' data.
Cybersecurity and privacy compliance are critical to protecting organizations from data breaches and fines. In the "Cybersecurity and Privacy Law Handbook," you'll find practical, beginner-friendly guidance to understand standards, identify gaps, and implement policies to secure your workplace effectively. What this Book will help me do Understand international cybersecurity standards such as ISO27001 and NIST. Identify and analyze security gaps using gap analysis and business impact methodologies. Ensure compliance with privacy laws like GDPR, HIPAA, and FTC regulations. Develop and implement effective cybersecurity policies and procedures. Navigate complex US-specific privacy regulations and their implications. Author(s) None Rocchi is an experienced author and practitioner in cybersecurity and privacy. With extensive knowledge in international compliance standards, they excel in breaking down complex topics into digestible and actionable content. Their practical and approachable writing style makes tackling the technical and legal facets of cybersecurity straightforward and engaging. Who is it for? This book is tailored for professionals new to cybersecurity and privacy who wish to understand and implement fundamental practices in this domain. It is ideal for managers, students, or experts from other fields looking to manage security functions effectively. No prior deep technical knowledge is required, making it friendly for beginners.
We talked about:
Christiaan’s background Usual ways of collecting and curating data Getting the buy-in from experts and executives Starting an annotation booklet Pre-labeling Dataset collection Human level baseline and feedback Using the annotation booklet to boost annotation productivity Putting yourself in the shoes of annotators (and measuring performance) Active learning Distance supervision Weak labeling Dataset collection in career positioning and project portfolios IPython widgets GDPR compliance and non-English NLP Finding Christiaan online
Links:
My personal blog: https://useml.net/ Comtura, my company: https://comtura.ai/ LI: https://www.linkedin.com/in/christiaan-swart-51a68967/ Twitter: https://twitter.com/swartchris8/
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Get a practical introduction to CockroachDB. This book starts with installation and foundational concepts and takes you through to creating clusters that are ready for production environments. You will learn how to create, optimize, and operate CockroarchDB clusters in single and multi-region environments. You will encounter anti-patterns to avoid, as well as testing techniques for integration and load testing. The book explains why CockroachDB exists, goes over its major benefits, and quickly transitions into installing and configuring CockroachDB. Just as quickly, you’ll be creating databases, getting data into those databases, and querying that data from your applications. You’ll progress to data privacy laws such as GDPR and CCPA, and learn how CockroachDB’s global distribution features can help you comply with ever-shifting data sovereignty regulations. From there, you’ll move into deployment topologies, guidance on integration testing and load testing, best practices, and a readiness checklist for production deployments. What You Will Learn Deploy and interact with CockroachDB Design and optimize databases and tables Choose the correct data types for modeling your data Protect data with database and table encryption Achieve compliance with international data privacy regulations Scale your databases in a way that enhances their performance Monitor changes to the data and health of your databases Who This Book Is For Developers and database administrators who want to provide a secure, reliable, and effortlessly distributed home for their data; those who wish to use a modern tool to tackle the kinds of scaling challenges that have previously required dedicated teams of people to solve; anyone who wants to leverage their database to solve non-trivial, real-world challenges while protecting their data and users
Within Wehkamp we required a uniform way to provide reliable and on time data to the business, while making this access compliant with GDPR. Unlocking all the data sources that we have scattered across the company and democratize the data access was of the utmost importance, allowing us to empower the business with more, better and faster data.
Focusing on open source technologies, we've built a data platform almost from the ground up that focuses on 3 levels of data curation - bronze, silver and gold - which follows the LakeHouse Architecture. The ingestion into bronze is where the PII fields are pseudonymized, making the use of the data within the delta lake compliant and, since there is no visible user data, it means everyone can use the entire delta lake for exploration and new use cases. Naturally, specific teams are allowed to see some user data that is necessary for their use cases. Besides the standard architecture, we've developed a library that allows us to ingest new data sources by adding a JSON config file with the characteristics. This combined with the ACID transactions that delta provides and the efficient Structured Stream provided through Auto Loader has allowed a small team to maintain 100+ streams with insignificant downtime.
Some other components of this platform are the following: - Alerting to Slack - Data quality checks - CI/CD - Stream processing with the delta engine
The feedback so far has been encouraging, as more and more teams across the company are starting to use the new platform and taking advantage of all its perks. It is still a long time until we get to turn off some of the components of the old data platform, but it has come a long way.
Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/
We talked about:
Rahul’s background What do data engineering managers do and why do we need them? Balancing engineering and management Rahul’s transition into data engineering management The importance of updating your skill set Planning the transition to manager and other challenges Setting expectations for the team and measuring success Data reconciliation GDPR compliance Data modeling for Big Data Advice for people transitioning into data engineering management Staying on top of trends and enabling team members The qualities of a good data engineering team The qualities of a good data engineer candidate (interview advice) The difference between having knowledge and stuffing a CV with buzzwords Advice for students and fresh graduates An overview of an end-to-end data engineering process
Links:
Rahul's LinkedIn: https://www.linkedin.com/in/16rahuljain/
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
Understand the different access control paradigms available in the Snowflake Data Cloud and learn how to implement access control in support of data privacy and compliance with regulations such as GDPR, APPI, CCPA, and SOX. The information in this book will help you and your organization adhere to privacy requirements that are important to consumers and becoming codified in the law. You will learn to protect your valuable data from those who should not see it while making it accessible to the analysts whom you trust to mine the data and create business value for your organization. Snowflake is increasingly the choice for companies looking to move to a data warehousing solution, and security is an increasing concern due to recent high-profile attacks. This book shows how to use Snowflake's wide range of features that support access control, making it easier to protect data access from the data origination point all the way to the presentation and visualization layer.Reading this book helps you embrace the benefits of securing data and provide valuable support for data analysis while also protecting the rights and privacy of the consumers and customers with whom you do business. What You Will Learn Identify data that is sensitive and should be restricted Implement access control in the Snowflake Data Cloud Choose the right access control paradigm for your organization Comply with CCPA, GDPR, SOX, APPI, and similar privacy regulations Take advantage of recognized best practices for role-based access control Prevent upstream and downstream services from subverting your access control Benefit from access control features unique to the Snowflake Data Cloud Who This Book Is For Data engineers, database administrators, and engineering managers who wantto improve their access control model; those whose access control model is not meeting privacy and regulatory requirements; those new to Snowflake who want to benefit from access control features that are unique to the platform; technology leaders in organizations that have just gone public and are now required to conform to SOX reporting requirements
Design for large-scale, high-performance queries using Snowflake’s query processing engine to empower data consumers with timely, comprehensive, and secure access to data. This book also helps you protect your most valuable data assets using built-in security features such as end-to-end encryption for data at rest and in transit. It demonstrates key features in Snowflake and shows how to exploit those features to deliver a personalized experience to your customers. It also shows how to ingest the high volumes of both structured and unstructured data that are needed for game-changing business intelligence analysis. Mastering Snowflake Solutions starts with a refresher on Snowflake’s unique architecture before getting into the advanced concepts that make Snowflake the market-leading product it is today. Progressing through each chapter, you will learn how to leverage storage, query processing, cloning, data sharing, and continuous data protection features. This approach allows for greater operational agility in responding to the needs of modern enterprises, for example in supporting agile development techniques via database cloning. The practical examples and in-depth background on theory in this book help you unleash the power of Snowflake in building a high-performance system with little to no administrative overhead. Your result from reading will be a deep understanding of Snowflake that enables taking full advantage of Snowflake’s architecture to deliver value analytics insight to your business. What You Will Learn Optimize performance and costs associated with your use of the Snowflake data platform Enable data security to help in complying with consumer privacy regulations such as CCPA and GDPR Share data securely both inside your organization and with external partners Gain visibility to each interaction with your customersusing continuous data feeds from Snowpipe Break down data silos to gain complete visibility your business-critical processes Transform customer experience and product quality through real-time analytics Who This Book Is for Data engineers, scientists, and architects who have had some exposure to the Snowflake data platform or bring some experience from working with another relational database. This book is for those beginning to struggle with new challenges as their Snowflake environment begins to mature, becoming more complex with ever increasing amounts of data, users, and requirements. New problems require a new approach and this book aims to arm you with the practical knowledge required to take advantage of Snowflake’s unique architecture to get the results you need.
Every marketer wants to accurately measure the impact of their advertising spend. And “digital” was supposed to make that really easy (especially for digital advertising). But, that promise has rarely been realized---it’s becoming increasingly difficult to track users across touchpoints, thanks to privacy regulations (GDPR, CCPA, etc.) and browser updates that block or aggressively expire cookies. In this session, we will review four different approaches to marketing attribution: heuristic modeling (first touch, last touch, linear, time decay, etc.), algorithmic modeling, media mix modeling (MMM), and randomized controlled trials (RCTs). As part of the review, we will venture lightly, but profoundly, into some foundational statistical concepts: we WILL use the terms 'counterfactual' and 'potential outcome,' and probably even 'unobserved heterogeneity!'
Big Data Europe Onsite and online on 22-25 November in 2022 Learn more about the conference: https://bit.ly/3BlUk9q
Join our next Big Data Europe conference on 22-25 November in 2022 where you will be able to learn from global experts giving technical talks and hand-on workshops in the fields of Big Data, High Load, Data Science, Machine Learning and AI. This time, the conference will be held in a hybrid setting allowing you to attend workshops and listen to expert talks on-site or online.
Protect business value, stay compliant with global regulations, and meet stakeholder demands with this privacy how-to Privacy, Regulations, and Cybersecurity: The Essential Business Guide is your guide to understanding what “privacy” really means in a corporate environment: how privacy is different from cybersecurity, why privacy is essential for your business, and how to build privacy protections into your overall cybersecurity plan. First, author Chris Moschovitis walks you through our evolving definitions of privacy, from the ancient world all the way to the General Law on Data Protection (GDPR). He then explains—in friendly, accessible language—how to orient your preexisting cybersecurity program toward privacy, and how to make sure your systems are compliant with current regulations. This book—a sequel to Moschovitis’ well-received Cybersecurity Program Development for Business—explains which regulations apply in which regions, how they relate to the end goal of privacy, and how to build privacy into both new and existing cybersecurity programs. Keeping up with swiftly changing technology and business landscapes is no easy task. Moschovitis provides down-to-earth, actionable advice on how to avoid dangerous privacy leaks and protect your valuable data assets. Learn how to design your cybersecurity program with privacy in mind Apply lessons from the GDPR and other landmark laws Remain compliant and even get ahead of the curve, as privacy grows from a buzzword to a business must Learn how to protect what’s of value to your company and your stakeholders, regardless of business size or industry Understand privacy regulations from a business standpoint, including which regulations apply and what they require Think through what privacy protections will mean in the post-COVID environment Whether you’re new to cybersecurity or already have the fundamentals, this book will help you design and build a privacy-centric, regulation-compliant cybersecurity program.
This pocket guide will help you understand the Regulation, the broader principles of data protection, and what the GDPR means for businesses in Europe and beyond. Please visit https://www.itgovernancepublishing.co.uk/topic/uk-gdpr-supplemental-material to download your free Brexit supplement.
This bestselling guide is the ideal companion for anyone carrying out a GDPR (General Data Protection Regulation) compliance project. It provides comprehensive guidance and practical advice on complying with the Regulation. Visit https://www.itgovernancepublishing.co.uk/topic/uk-gdpr-supplemental-material to download your free Brexit supplement.
Create a world-class MongoDB cluster that is scalable, reliable, and secure. Comply with mission-critical regulatory regimes such as the European Union’s General Data Protection Regulation (GDPR). Whether you are thinking of migrating to MongoDB or need to meet legal requirements for an existing self-managed cluster, this book has you covered. It begins with the basics of replication and sharding, and quickly scales up to cover everything you need to know to control your data and keep it safe from unexpected data loss or downtime. This book covers best practices for stable MongoDB deployments. For example, a well-designed MongoDB cluster should have no single point of failure. The book covers common use cases when only one or two data centers are available. It goes into detail about creating geopolitical sharding configurations to cover the most stringent data protection regulation compliance. The book also covers different tools and approaches for automating and monitoring a cluster with Kubernetes, Docker, and popular cloud provider containers. What You Will Learn Get started with the basics of MongoDB clusters Protect and monitor a MongoDB deployment Deepen your expertise around replication and sharding Keep effective backups and plan ahead for disaster recovery Recognize and avoid problems that can occur in distributed databases Build optimal MongoDB deployments within hardware and data center limitations Who This Book Is For Solutions architects, DevOps architects and engineers, automation and cloud engineers, and database administrators who are new to MongoDB and distributed databases or who need to scale up simple deployments. This book is a complete guide to planning a deployment for optimal resilience, performance, and scaling, and covers all the details required to meet the new set of data protection regulations such as the GDPR. This book is particularly relevant for large global organizations such as financial and medical institutions, as well as government departments that need to control data in the whole stack and are prohibited from using managed cloud services.
Summary Data lakes offer a great deal of flexibility and the potential for reduced cost for your analytics, but they also introduce a great deal of complexity. What used to be entirely managed by the database engine is now a composition of multiple systems that need to be properly configured to work in concert. In order to bring the DBA into the new era of data management the team at Upsolver added a SQL interface to their data lake platform. In this episode Upsolver CEO Ori Rafael and CTO Yoni Iny describe how they have grown their platform deliberately to allow for layering SQL on top of a robust foundation for creating and operating a data lake, how to bring more people on board to work with the data being collected, and the unique benefits that a data lake provides. This was an interesting look at the impact that the interface to your data can have on who is empowered to work with it.
Announcements
Hello and welcome to the Data Engineering Podcast, the show about modern data management What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise. When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! You listen to this show because you love working with data and want to keep your skills up to date. Machine learning is finding its way into every aspect of the data landscape. Springboard has partnered with us to help you take the next step in your career by offering a scholarship to their Machine Learning Engineering career track program. In this online, project-based course every student is paired with a Machine Learning expert who provides unlimited 1:1 mentorship support throughout the program via video conferences. You’ll build up your portfolio of machine learning projects and gain hands-on experience in writing machine learning algorithms, deploying models into production, and managing the lifecycle of a deep learning prototype. Springboard offers a job guarantee, meaning that you don’t have to pay for the program until you get a job in the space. The Data Engineering Podcast is exclusively offering listeners 20 scholarships of $500 to eligible applicants. It only takes 10 minutes and there’s no obligation. Go to dataengineeringpodcast.com/springboard and apply today! Make sure to use the code AISPRINGBOARD when you enroll. Your host is Tobias Macey and today I’m interviewing Ori Rafael and Yoni Iny about building a data lake for the DBA at Upsolver
Interview
Introduction How did you get involved in the area of data management? Can you start by sharing your definition of what a data lake is and what it is comprised of? We talked last in November of 2018. How has the landscape of data lake technologies and adoption changed in that time?
How has Upsolver changed or evolved since we last spoke?
How has the evolution of the underlying technologies impacted your implementation and overall product strategy?
What are some of the common challenges that accompany a data lake implementation? How do those challenges influence the adoption or viability of a data lake? How does the introduction of a universal SQL layer change the staffing requirements for building and maintaining a data lake?
What are the advantages of a data lake over a data warehouse if everything is being managed via SQL anyway?
What are some of the underlying realities of the data systems that power the lake which will eventually need to be understood by the operators of the platform? How is the SQL layer in Upsolver implemented?
What are the most challenging or complex aspects of managing the underlying technologies to provide automated partitioning, indexing, etc.?
What are the main concepts that you need to educate your customers on? What are some of the pitfalls that users should be aware of? What features of your platform are often overlooked or underutilized which you think should be more widely adopted? What have you found to be the most interesting, unexpected, or challenging lessons learned while building the technical and business elements of Upsolver? What do you have planned for the future?
Contact Info
Ori
Yoni
yoniiny on GitHub LinkedIn
Parting Question
From your perspective, what is the biggest gap in the tooling or technology for data management today?
Links
Upsolver
Podcast Episode
DBA == Database Administrator IDF == Israel Defense Forces Data Lake Eventual Consistency Apache Spark Redshift Spectrum Azure Synapse Analytics SnowflakeDB
Podcast Episode
BigQuery Presto
Podcast Episode
Apache Kafka Cartesian Product kSQLDB
Podcast Episode
Eventador
Podcast Episode
Materialize
Podcast Episode
Common Table Expressions Lambda Architecture Kappa Architecture Apache Flink
Podcast Episode
Reinforcement Learning Cloudformation GDPR
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Support Data Engineering Podcast
Before the COVID-19 crisis, we were already acutely aware of the need for a broader conversation around data privacy: look no further than the Snowden revelations, Cambridge Analytica, the New York Times Privacy Project, the General Data Protection Regulation (GDPR) in Europe, and the California Consumer Privacy Act (CCPA). In the age of COVID-19, these issues are far more acute. We also know that governments and businesses exploit crises to consolidate and rearrange power, claiming that citizens need to give up privacy for the sake of security. But is this tradeoff a false dichotomy? And what type of tools are being developed to help us through this crisis? In this episode, Katharine Jarmul, Head of Product at Cape Privacy, a company building systems to leverage secure, privacy-preserving machine learning and collaborative data science, will discuss all this and more, in conversation with Dr. Hugo Bowne-Anderson, data scientist and educator at DataCamp.Links from the show
FROM THE INTERVIEW
Katharine on TwitterKatharine on LinkedInContact Tracing in the Real World (By Ross Anderson)The Price of the Coronavirus Pandemic (By Nick Paumgarten)Do We Need to Give Up Privacy to Fight the Coronavirus? (By Julia Angwin)Introducing the Principles of Equitable Disaster Response (By Greg Bloom)Cybersecurity During COVID-19 ( By Bruce Schneier)
It's a never ending game of whack-a-mole, where Peter Swire already talked about technological escalation wars back in 2012 when he quit the W3C DNT efforts.
Even before GDPR, savvy analysts were aware of privacy considerations and concerns as they went about their work of analyzing consumer behavior. But, increasingly, the convergence of regulatory changes (GDPR, CCPA,...), technological changes (ITP, ETP,...), publicized data breaches (Desjardins, Adobe, Capital One,...), and increased consumer awareness and action (ad blocker installations, browser configurations) have driven limitations to the data that is automatically available to marketers and analysts. Simultaneously, the rise of CDPs, machine learning, and AI are driving an increased desire to gather and activate broader, richer, and more integrated customer data. In this session, the panelists will grapple with the tensions between these competing forces.
Don’t be afraid of the GDPR wolf! How can your business easily comply with the new data protection and privacy laws and avoid fines of up to $27M? GDPR For Dummies sets out in simple steps how small business owners can comply with the complex General Data Protection Regulations (GDPR). These regulations apply to all businesses established in the EU and to businesses established outside of the EU insofar as they process personal data about people within the EU. Inside, you’ll discover how GDPR applies to your business in the context of marketing, employment, providing your services, and using service providers. Learn how to avoid fines, regulatory investigations, customer complaints, and brand damage, while gaining a competitive advantage and increasing customer loyalty by putting privacy at the heart of your business. Find out what constitutes personal data and special category data Gain consent for online and offline marketing Put your Privacy Policy in place Report a data breach before being fined 79% of U.S. businesses haven’t figured out how they’ll report breaches in a timely fashion, provide customers the right to be forgotten, conduct privacy impact assessments, and more. If you are one of those businesses that hasn't put a plan in place, then GDPR For Dummies is for you.