talk-data.com talk-data.com

Topic

GDPR/CCPA

data_privacy compliance regulations

37

tagged

Activity Trend

9 peak/qtr
2020-Q1 2026-Q1

Activities

37 activities · Newest first

Data storytelling isn't just about presenting numbers—it's about creating shared wisdom that drives better decision-making. In our increasingly polarized world, we often miss that most people actually have reasonable views hidden behind the loudest voices. But how can technology help us cut through the noise and build genuine understanding? What if AI could help us share stories across different communities and contexts, making our collective knowledge more accessible? From reducing unnecessary meetings to enabling more effective collaboration, the way we exchange information is evolving rapidly. Are you prepared for a future where AI helps us communicate more effectively rather than replacing human judgment? Professor Alex “Sandy” Pentland is a leading computational scientist, co-founder of the MIT Media Lab and Media Lab Asia, and a HAI Fellow at Stanford. Recognized by Forbes as one of the world’s most powerful data scientists, he played a key role in shaping the GDPR through the World Economic Forum and contributed to the UN’s Sustainable Development Goals as one of the Secretary General’s “Data Revolutionaries.” His accolades include MIT’s Toshiba Chair, election to the U.S. National Academy of Engineering, the Harvard Business Review McKinsey Award, and the DARPA 40th Anniversary of the Internet Award. Pentland has served on advisory boards for organizations such as the UN Secretary General, UN Foundation, Consumers Union, and formerly for the OECD, Google, AT&T, and Nissan. Companies originating from his lab have driven major innovations, including India’s Aadhaar digital identity system, Alibaba’s news and advertising arm, and the world’s largest rural health service network. His more recent ventures span mental health (Ginger.io), AI interaction management (Cogito), delivery optimization (Wise Systems), financial privacy (Akoya), and fairness in social services (Prosperia). A mentor to over 80 PhD students—many now leading in academia, research, or entrepreneurship—Pentland helped pioneer fields such as computational social science, wearable computing, and modern biometrics. His books include Social Physics, Honest Signals, Building the New Economy, and Trusted Data. In the episode, Richie and Sandy explore the role of storytelling in data and AI, how technology reshapes our narratives, the impact of AI on decision-making, the importance of shared wisdom in communities, and much more. Links Mentioned in the Show: MIT Media LabSandy’s Booksdeliberation.ioConnect with SandySkill Track: Artificial Intelligence (AI) LeadershipRelated Episode: The Human Element of AI-Driven Transformation with Steve Lucas, CEO at BoomiRewatch RADAR AI  New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business

AI is transforming industries, but it’s also raising complex questions about data protection and privacy. EDPB Opinion 28/204 provides guidance specifically for GDPR practitioners dealing with AI.

00:00 Introduction to AI and GDPR 00:33 Understanding Anonymity in AI Models 01:53 Framework for Determining Anonymity 03:30 Practical Steps for GDPR Compliance 06:16 Exploring Legitimate Interests 07:19 The Three-Step Test for Legitimate Interests 10:18 Navigating Legitimate Interests 10:34 Understanding the Balancing Test 11:17 Risks and Rights in AI Data Processing 14:59 Mitigating Measures for Data Protection 17:16 Web Scraping and Data Protection 18:24 Consequences of Unlawful Data Processing 20:13 Key Takeaways for GDPR Practitioners

This episode explores the complexities of international data transfers under GDPR, detailing the criteria established by the European Data Protection Board. It outlines the three criteria to determine when data crossing EU borders qualifies as a transfer under Chapter V of GDPR, along with discussions on adequacy decisions, the EU-US Data Privacy Framework, and practical applications of standard contractual clauses (SCCs). Binding corporate rules (BCRs) and limited exceptions, or derogations, are also explained as methods for legitimate data transfers without adequacy.

00:00 Introduction to International Data Transfers

00:34 Understanding GDPR's Transfer Criteria

01:36 Real-World Examples of Data Transfers

02:08 When Transfers Don't Count

03:23 Green Lights for Data Transfers: Adequacy Decisions

04:00 The EU-US Data Privacy Framework

05:34 Safeguards for Data Transfers

05:49 Standard Contractual Clauses (SCCs) and Binding Corporate Rules (BCRs)

07:20 Exceptions and Derogations

11:47 The Importance of Documentation

13:00 Risk Awareness and Conclusion

Our next episode explores the complexities of international data transfers under GDPR, detailing the criteria established by the European Data Protection Board. It outlines the three criteria to determine when data crossing EU borders qualifies as a transfer under Chapter V of GDPR, along with discussions on adequacy decisions, the EU-US Data Privacy Framework, and practical applications of standard contractual clauses (SCCs). Binding corporate rules (BCRs) and limited exceptions, or derogations, are also explained as methods for legitimate data transfers without adequacy.

Think GDPR is just corporate jargon? Think again! This episode breaks down exactly what GDPR means for YOU, from its core principles to how it actually protects your data. We'll unpack your rights, how to hold companies accountable, and real-world examples of GDPR in action. Get ready to take charge of your digital life!

Links:

GDPR Aware Handbook by Siarhei Varankevich CIPP/E, CIPM, CIPT, FIP: https://data-privacy-office.eu/usefull-materials/gdpr-aware-handbook/

Tired of Surface-Level Data Privacy Training?

This episode explores DPO Europe's Global Data Privacy Manager course and why it's different. We break down how this course goes beyond the basics to help you take action, build a program, and become a confident leader in data privacy. If you're ready to move past the "now what?" and tackle real-world challenges, tune in!

Starting your privacy journey? This episode unpacks the essentials of GDPR: why it matters, how it works, and what it means for your career. We'll explore real-world scenarios, practical skills, and why this training is key for any aspiring privacy pro.

Links:

GDPR DPP Training - https://data-privacy-office.eu/courses/gdpr-data-privacy-professional/

Ever feel like you're clicking "agree" online without really understanding what you're signing up for? The EU feels the same way. In this episode, we explore how the EU is tackling data protection in the digital age. From those pesky cookie banners to the stealthy world of device fingerprinting, we break down what's at stake and how the EU is fighting to give you back control of your data. Join us as we unpack the EU's e-Privacy Directive, its upcoming revamp, and what it all means for you. Get ready to become a more informed and empowered digital citizen.

Episode Summary: In this episode, we dive into the exciting world of AI and Large Language Models (LLMs) and how they're revolutionizing marketing. Gone are the days of generic campaigns and guesswork. With AI, marketing is becoming highly personalized, insight-driven, and responsive to individual customer needs—all in real-time. Key Points Covered: * The Shift from Data-Driven to Insight-Driven MarketingDiscover how marketing is evolving from simply collecting data to understanding the "why" behind customer behavior. AI allows marketers to predict customer preferences, making campaigns more targeted and effective. * AI-Powered Personalization at ScaleLearn how AI can dig into customer data to deliver hyper-personalized experiences, like suggesting a product based on your previous purchases, time of day, or even the weather in your location. * Customer Journey Mapping with AIAI is now capable of mapping every step of a customer’s interaction with a brand, from the first website visit to the final purchase, helping marketers identify friction points and optimize the entire journey. * The Power of Real-Time AI DashboardsForget the overwhelming spreadsheets! AI-powered dashboards are the new standard, delivering clear, actionable insights in real-time across all marketing channels. * Ethical Considerations in AI-Driven MarketingWith great power comes great responsibility. We explore how marketers can walk the fine line between personalization and privacy, and why transparency and trust are critical in this AI-powered era. * The Future of AI in Customer ExperienceFrom chatbots that truly understand your needs to online shopping experiences that adapt to you, AI is poised to make our everyday interactions with brands smoother and more enjoyable. Memorable Quote:"It’s like having a dedicated marketing team for every single customer." Ethical Discussion:We discuss the responsibility marketers have in ensuring AI respects data privacy and builds trust with consumers. Regulations like GDPR are setting important standards, but it’s up to each brand to find the balance between personalization and privacy. Final Thought:As AI continues to reshape the marketing landscape, it's crucial for brands and customers alike to stay informed, ask questions, and participate in the conversation about how these technologies are used. Have thoughts on how AI is transforming marketing? Share your insights with us, and stay curious for the next episode as we dive deeper into the world of AI, marketing, and beyond. Send me an email at [email protected] This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit mukundansankar.substack.com

The Data Product Management In Action podcast, brought to you by Soda and executive producer Scott Hirleman, is a platform for data product management practitioners to share insights and experiences. In Season 01, Episode 002, host Frannie Helforoush (Senior Digital Product Manager at RBC Global Asset Management) chats with Deepti Surabattula (Principal Data Product Manager and AI Delivery & Support Workstream Lead at Pfizer). They discuss the importance of user and stakeholder involvement in data product management and effective relationship management. Deepti shares experiences and challenges with different implementation processes and how to enjoy and find reward in creating valuable data products. About our host Frannie Helforoush: Frannie's journey began as a software engineer and evolved into a strategic product manager. Now, as a data product manager, she leverages her expertise in both fields to create impactful solutions. Frannie thrives on making data accessible and actionable, driving product innovation, and ensuring product thinking is integral to data management. Connect with Frannie on LinkedIn.

About our guest Deepti Surabattula: Deepti is a product leader with a strong engineering background. She has proven success across Life Sciences, Aerospace, and Medical Devices, leading AI, data, and regulatory-compliant products from inception to delivery. Deepti is an expert in regulatory guidelines for data integrity and product compliance (21 CFR part 11, GDPR, MHRA, ICH, EMA) and is passionate about strategy, technology innovation, and quality solutions to improve human lives. Connect with Deepti on LinkedIn. All views and opinions expressed are those of the individuals and do not necessarily reflect their employers or anyone else. Join the conversation on LinkedIn.  

Send us a text Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. Datatopics Unplugged is your go-to spot for relaxed discussions around tech, news, data, and society.

Dive into conversations that should flow as smoothly as your morning coffee (but don't), where industry insights meet laid-back banter. Whether you're a data aficionado or just someone curious about the digital age, pull up a chair, relax, and let's get into the heart of data, unplugged style!

In this episode: Slack's Data Practices: Discussing Slack's use of customer data to build models, the risks of global data leakage, and the impact of GDPR and AI regulations.ChatGPT's Data Analysis Improvements:  Discussing new features in ChatGPT that let you interrogate your data like a pro. The Loneliness of Data Scientists: Why being a lone data wolf is tough, and how collaboration is the key to success. Rustworkx for Graph Computation:  Evaluating Rustworkx as a robust tool for graphs compared to Networkx.Dolt - Git for Data: Comparing Dolt and DVC as tools for data version control. Check it out.Veo by Google DeepMind: An overview of Google's Veo technology and its potential applications.Ilya Sutskever’s Departure from OpenAI: What does Ilya Sutskever’s exit mean for OpenAI with Jakub Pachocki stepping in?Hot Takes - No Data Engineering Roadmap? Debating the necessity of a data engineering roadmap and the prominence of SQL skills.

Cookies were invented to help online shoppers, simply as an identifier so that online carts weren’t lost to the ether. Marketers quickly saw the power of using cookies for more than just maintaining session states, and moved to use them as part of their targeted advertising. Before we knew it, our online habits were being tracked, without our clear consent. The unregulated cookie-boom lasted until 2018 with the advent of GDPR and the CCPA. Since then marketers have been evolving their practices, looking for alternatives to cookie-tracking that will perform comparatively, and with the cookie being phased out in 2024, technologies like fingerprinting and new privacy-centric marketing strategies will play a huge role in how products meet users in the future.  Cory Munchbach has spent her career on the cutting edge of marketing technology and brings years working with Fortune 500 clients from various industries to BlueConic. Prior to BluConic, she was an analyst at Forrester Research where she covered business and consumer technology trends and the fast-moving marketing tech landscape. A sought-after speaker and industry voice, Cory’s work has been featured in Financial Times, Forbes, Raconteur, AdExchanger, The Drum, Venture Beat, Wired, AdAge, and Adweek. A life-long Bostonian, Cory has a bachelor’s degree in political science from Boston College and spends a considerable amount of her non-work hours on various volunteer and philanthropic initiatives in the greater Boston community.  In the episode, Richie and Cory cover successful marketing strategies and their use of data, the types of data used in marketing, how data is leveraged during different stages of the customer life cycle, the impact of privacy laws on data collection and marketing strategies, tips on how to use customer data while protecting privacy and adhering to regulations, the importance of data skills in marketing, the future of marketing analytics and much more. Links Mentioned in the Show: BlueConicMattel CreationsGoogle: Prepare for third-party cookie restrictionsData Clean Rooms[Course] Marketing Analytics for Business

Kai Zenner has been working on the EI AI Act for a while, and we chat about his perspective on its evolution, challenges, and potential. Along the way, we discuss why the EU AI Act differs from GDPR, why regulating a quasi-global piece of legislation is very difficult, and much more.

I admit, politics and regulation are way outside my wheelhouse, and I learned a ton in this discussion. Given the impact the EU AI Act will affect the work of everyone involved with data, I think you'll learn a thing or two about not just the act itself, but also how the "sausage is made", so to speak. Enjoy!

LinkedIn: https://www.linkedin.com/in/kzenner/

Twitter: https://twitter.com/ZennerBXL

Site: https://www.kaizenner.eu


If you like this show, give it a 5-star rating on your favorite podcast platform.

Purchase Fundamentals of Data Engineering at your favorite bookseller.

Subscribe to my Substack: https://joereis.substack.com/

We talked about:

Katharine's background Katharine's ML privacy startup GDPR, CCPA, and the “opt-in as the default” approach What is data privacy? Finding Katharine's book – Practical Data Privacy The various definitions of data privacy and “user profiles” Privacy engineering and privacy-enhancing technologies Why data privacy is important What is differential privacy? The importance of keeping privacy in mind when designing systems Data privacy on the example of ChatGPT Katharine's resource suggestions for learning about data privacy

Links:

LinkedIn: https://www.linkedin.com/in/katharinejarmul/

Twitter: https://twitter.com/kjam

Free data engineering course: https://github.com/DataTalksClub/data-engineering-zoomcamp Join DataTalks.Club: https://datatalks.club/slack.html Our events: https://datatalks.club/events.html

Summary

With the rise of the web and digital business came the need to understand how customers are interacting with the products and services that are being sold. Product analytics has grown into its own category and brought with it several services with generational differences in how they approach the problem. NetSpring is a warehouse-native product analytics service that allows you to gain powerful insights into your customers and their needs by combining your event streams with the rest of your business data. In this episode Priyendra Deshwal explains how NetSpring is designed to empower your product and data teams to build and explore insights around your products in a streamlined and maintainable workflow.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Join in with the event for the global data community, Data Council Austin. From March 28-30th 2023, they'll play host to hundreds of attendees, 100 top speakers, and dozens of startups that are advancing data science, engineering and AI. Data Council attendees are amazing founders, data scientists, lead engineers, CTOs, heads of data, investors and community organizers who are all working together to build the future of data. As a listener to the Data Engineering Podcast you can get a special discount of 20% off your ticket by using the promo code dataengpod20. Don't miss out on their only event this year! Visit: dataengineeringpodcast.com/data-council today! RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their extensive library of integrations enable you to automatically send data to hundreds of downstream tools. Sign up free at dataengineeringpodcast.com/rudder Your host is Tobias Macey and today I'm interviewing Priyendra Deshwal about how NetSpring is using the data warehouse to deliver a more flexible and detailed view of your product analytics

Interview

Introduction How did you get involved in the area of data management? Can you describe what NetSpring is and the story behind it?

What are the activities that constitute "product analytics" and what are the roles/teams involved in those activities?

When teams first come to you, what are the common challenges that they are facing and what are the solutions that they have attempted to employ? Can you describe some of the challenges involved in bringing product analytics into enterprise or highly regulated environments/industries?

How does a warehouse-native approach simplify that effort?

There are many different players (both commercial and open source) in the product analytics space. Can you share your view on the role that NetSpring plays in that ecosystem? How is the NetSpring platform implemented to be able to best take advantage of modern warehouse technologies and the associated data stacks?

What are the pre-requisites for an organization's infrastructure/data maturity for being able to benefit from NetSpring? How have the goals and implementation of the NetSpring platform evolved from when you first started working on it?

Can you describe the steps involved in integrating NetSpring with an organization's existing warehouse?

What are the signals that NetSpring uses to understand the customer journeys of different organizations? How do you manage the variance of the data models in the warehouse while providing a consistent experience for your users?

Given that you are a product organization, how are you using NetSpring to power NetSpring? What are the most interesting, innovative, or unexpected ways that you have seen NetSpring used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on NetSpring? When is NetSpring the wrong choice? What do you have planned for the future of NetSpring?

Contact Info

LinkedIn

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

NetSpring ThoughtSpot Product Analytics Amplitude Mixpanel Customer Data Platform GDPR CCPA Segment

Podcast Episode

Rudderstack

Podcast Episode

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Sponsored By: TimeXtender: TimeXtender Logo TimeXtender is a holistic, metadata-driven solution for data integration, optimized for agility. TimeXtender provides all the features you need to build a future-proof infrastructure for ingesting, transforming, modelling, and delivering clean, reliable data in the fastest, most efficient way possible.

You can't optimize for everything all at once. That's why we take a holistic approach to data integration that optimises for agility instead of fragmentation. By unifying each layer of the data stack, TimeXtender empowers you to build data solutions 10x faster while reducing costs by 70%-80%. We do this for one simple reason: because time matters.

Go to dataengineeringpodcast.com/timextender today to get started for free!Rudderstack: Rudderstack

RudderStack provides all your customer data pipelines in one platform. You can collect, transform, and route data across your entire stack with its event streaming, ETL, and reverse ETL pipelines.

RudderStack’s warehouse-first approach means it does not store sensitive information, and it allows you to leverage your existing data warehouse/data lake infrastructure to build a single source of truth for every team.

RudderStack also supports real-time use cases. You can Implement RudderStack SDKs once, then automatically send events to your warehouse and 150+ business tools, and you’ll never have to worry about API changes again.

Visit dataengineeringpodcast.com/rudderstack to sign up for free today, and snag a free T-Shirt just for being a Data Engineering Podcast listener.Data Council: Data Council Logo Join us at the event for the global data community, Data Council Austin. From March 28-30th 2023, we'll play host to hundreds of attendees, 100 top speakers, and dozens of startups that are advancing data science, engineering and AI. Data Council attendees are amazing founders, data scientists, lead engineers, CTOs, heads of data, investors and community organizers who are all working together to build the future of data. As a listener to the Data Engineering Podcast you can get a special discount off tickets by using the promo code dataengpod20. Don't miss out on our only event this year! Visit: dataengineeringpodcast.com/data-council Promo Code: dataengpod20Support Data Engineering Podcast

Send us a text Datatopics is a podcast presented by Kevin Missoorten to talk about the fuzzy and misunderstood concepts in the world of data, analytics, and AI and get to the bottom of things.

In this episode Kevin is joined by Ruben Lasuy - a fellow consultant in the space of GDPR, data governance and data strategy - to explore the so called "Collaborative Data Ecosystems", a datatopic surfing the Solid-protocol wave. But are Solid and its Solid Pods really the trigger for this new concept or is there more at play? 

Datatopics is brought to you by Dataroots Music: The Gentlemen - DivKidThe thumbnail is generated by Midjourney

We talked about:

Christiaan’s background Usual ways of collecting and curating data Getting the buy-in from experts and executives Starting an annotation booklet Pre-labeling Dataset collection Human level baseline and feedback Using the annotation booklet to boost annotation productivity Putting yourself in the shoes of annotators (and measuring performance) Active learning Distance supervision Weak labeling Dataset collection in career positioning and project portfolios IPython widgets GDPR compliance and non-English NLP Finding Christiaan online

Links:

My personal blog: https://useml.net/ Comtura, my company: https://comtura.ai/ LI: https://www.linkedin.com/in/christiaan-swart-51a68967/ Twitter: https://twitter.com/swartchris8/

ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

We talked about:

Rahul’s background What do data engineering managers do and why do we need them? Balancing engineering and management Rahul’s transition into data engineering management The importance of updating your skill set Planning the transition to manager and other challenges Setting expectations for the team and measuring success Data reconciliation GDPR compliance Data modeling for Big Data Advice for people transitioning into data engineering management Staying on top of trends and enabling team members The qualities of a good data engineering team The qualities of a good data engineer candidate (interview advice) The difference between having knowledge and stuffing a CV with buzzwords Advice for students and fresh graduates An overview of an end-to-end data engineering process

Links:

Rahul's LinkedIn: https://www.linkedin.com/in/16rahuljain/

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Summary Data lakes offer a great deal of flexibility and the potential for reduced cost for your analytics, but they also introduce a great deal of complexity. What used to be entirely managed by the database engine is now a composition of multiple systems that need to be properly configured to work in concert. In order to bring the DBA into the new era of data management the team at Upsolver added a SQL interface to their data lake platform. In this episode Upsolver CEO Ori Rafael and CTO Yoni Iny describe how they have grown their platform deliberately to allow for layering SQL on top of a robust foundation for creating and operating a data lake, how to bring more people on board to work with the data being collected, and the unique benefits that a data lake provides. This was an interesting look at the impact that the interface to your data can have on who is empowered to work with it.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise. When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! You listen to this show because you love working with data and want to keep your skills up to date. Machine learning is finding its way into every aspect of the data landscape. Springboard has partnered with us to help you take the next step in your career by offering a scholarship to their Machine Learning Engineering career track program. In this online, project-based course every student is paired with a Machine Learning expert who provides unlimited 1:1 mentorship support throughout the program via video conferences. You’ll build up your portfolio of machine learning projects and gain hands-on experience in writing machine learning algorithms, deploying models into production, and managing the lifecycle of a deep learning prototype. Springboard offers a job guarantee, meaning that you don’t have to pay for the program until you get a job in the space. The Data Engineering Podcast is exclusively offering listeners 20 scholarships of $500 to eligible applicants. It only takes 10 minutes and there’s no obligation. Go to dataengineeringpodcast.com/springboard and apply today! Make sure to use the code AISPRINGBOARD when you enroll. Your host is Tobias Macey and today I’m interviewing Ori Rafael and Yoni Iny about building a data lake for the DBA at Upsolver

Interview

Introduction How did you get involved in the area of data management? Can you start by sharing your definition of what a data lake is and what it is comprised of? We talked last in November of 2018. How has the landscape of data lake technologies and adoption changed in that time?

How has Upsolver changed or evolved since we last spoke?

How has the evolution of the underlying technologies impacted your implementation and overall product strategy?

What are some of the common challenges that accompany a data lake implementation? How do those challenges influence the adoption or viability of a data lake? How does the introduction of a universal SQL layer change the staffing requirements for building and maintaining a data lake?

What are the advantages of a data lake over a data warehouse if everything is being managed via SQL anyway?

What are some of the underlying realities of the data systems that power the lake which will eventually need to be understood by the operators of the platform? How is the SQL layer in Upsolver implemented?

What are the most challenging or complex aspects of managing the underlying technologies to provide automated partitioning, indexing, etc.?

What are the main concepts that you need to educate your customers on? What are some of the pitfalls that users should be aware of? What features of your platform are often overlooked or underutilized which you think should be more widely adopted? What have you found to be the most interesting, unexpected, or challenging lessons learned while building the technical and business elements of Upsolver? What do you have planned for the future?

Contact Info

Ori

LinkedIn

Yoni

yoniiny on GitHub LinkedIn

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Upsolver

Podcast Episode

DBA == Database Administrator IDF == Israel Defense Forces Data Lake Eventual Consistency Apache Spark Redshift Spectrum Azure Synapse Analytics SnowflakeDB

Podcast Episode

BigQuery Presto

Podcast Episode

Apache Kafka Cartesian Product kSQLDB

Podcast Episode

Eventador

Podcast Episode

Materialize

Podcast Episode

Common Table Expressions Lambda Architecture Kappa Architecture Apache Flink

Podcast Episode

Reinforcement Learning Cloudformation GDPR

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast

Before the COVID-19 crisis, we were already acutely aware of the need for a broader conversation around data privacy: look no further than the Snowden revelations, Cambridge Analytica, the New York Times Privacy Project, the General Data Protection Regulation (GDPR) in Europe, and the California Consumer Privacy Act (CCPA). In the age of COVID-19, these issues are far more acute. We also know that governments and businesses exploit crises to consolidate and rearrange power, claiming that citizens need to give up privacy for the sake of security. But is this tradeoff a false dichotomy? And what type of tools are being developed to help us through this crisis? In this episode, Katharine Jarmul, Head of Product at Cape Privacy, a company building systems to leverage secure, privacy-preserving machine learning and collaborative data science, will discuss all this and more, in conversation with Dr. Hugo Bowne-Anderson, data scientist and educator at DataCamp.Links from the show

FROM THE INTERVIEW

Katharine on TwitterKatharine on LinkedInContact Tracing in the Real World (By Ross Anderson)The Price of the Coronavirus Pandemic (By Nick Paumgarten)Do We Need to Give Up Privacy to Fight the Coronavirus? (By Julia Angwin)Introducing the Principles of Equitable Disaster Response (By Greg Bloom)Cybersecurity During COVID-19 ( By Bruce Schneier)