talk-data.com talk-data.com

Topic

Cyber Security

cybersecurity information_security data_security privacy

2078

tagged

Activity Trend

297 peak/qtr
2020-Q1 2026-Q1

Activities

2078 activities · Newest first

Enhanced Cyber Resilience Solution by Threat Detection using IBM Cloud Object Storage System and IBM QRadar SIEM

This Solution Redpaper™ publication explains how the features of IBM Cloud® Object Storage System reduces the effect of incidents on business data when combined with log analysis, deep inspection, and detection of threats that IBM QRadar SIEM provides. This paper also demonstrates how to integrate IBM Cloud Object Storage's access logs with IBM QRadar SIEM. An administrator can monitor, inspect, detect, and derive insights for identifying potential threats to the data that is stored on IBM Cloud Object Storage. Also, IBM QRadar SIEM can proactively trigger cyber resiliency workflow in IBM Cloud Object Storage remotely to protect the data based on threat detection. This publication is intended for chief technology officers, solution and security architects, and systems administrators.

Securing Your Critical Workloads with IBM Hyper Protect Services

Many organizations must protect their mission-critical applications in production, but security threats can also surface during the development and pre-production phases. Also, during deployment and production, insiders who manage the infrastructure that hosts critical applications can pose a threat given their super-user credentials and level of access to secrets or encryption keys. Organizations must incorporate secure design practices in their development operations and embrace DevSecOps to protect their applications from the vulnerabilities and threat vectors that can compromise their data and potentially threaten their business. IBM® Cloud Hyper Protect Services provide built-in data-at-rest and data-in-flight protection to help developers easily build secure cloud applications by using a portfolio of cloud services that are powered by IBM LinuxONE. The LinuxONE platform ensures that client data is always encrypted, whether at rest or in transit. This feature gives customers complete authority over sensitive data and associated workloads (which restricts access, even for cloud admins) and helps them meet regulatory compliance requirements. LinuxONE also allows customers to build mission-critical applications that require quick time to market and dependable rapid expansion. The purpose of this IBM Redbooks® publication is to: Introduce the IBM Hyper Protect Services that are running on IBM LinuxONE on the IBM Cloud™ and on-premises Provide high-level design architectures Describe deployment best practices Provide guides to getting started and examples of the use of the Hyper Protect Services The target audience for this book is IBM Hyper Protect Virtual Services technical specialists, IT architects, and system administrators.

Intelligent Data Analytics for Terror Threat Prediction

Intelligent data analytics for terror threat prediction is an emerging field of research at the intersection of information science and computer science, bringing with it a new era of tremendous opportunities and challenges due to plenty of easily available criminal data for further analysis. This book provides innovative insights that will help obtain interventions to undertake emerging dynamic scenarios of criminal activities. Furthermore, it presents emerging issues, challenges and management strategies in public safety and crime control development across various domains. The book will play a vital role in improvising human life to a great extent. Researchers and practitioners working in the fields of data mining, machine learning and artificial intelligence will greatly benefit from this book, which will be a good addition to the state-of-the-art approaches collected for intelligent data analytics. It will also be very beneficial for those who are new to the field and need to quickly become acquainted with the best performing methods. With this book they will be able to compare different approaches and carry forward their research in the most important areas of this field, which has a direct impact on the betterment of human life by maintaining the security of our society. No other book is currently on the market which provides such a good collection of state-of-the-art methods for intelligent data analytics-based models for terror threat prediction, as intelligent data analytics is a newly emerging field and research in data mining and machine learning is still in the early stage of development.

Send us a text Want to be featured as a guest on Making Data Simple? Reach out to us at [[email protected]] and tell us why you should be next.

Abstract Hosted by Al Martin, VP, IBM Expert Services Delivery, Making Data Simple provides the latest thinking on big data, A.I., and the implications for the enterprise from a range of experts.

This week on Making Data Simple, we have Suj Perepa and Richard Darden. Suj is a distinguished engineer and specializes AI, machine learning technology in the financial sector.  Suj also has been a lead for security programs and a member of the IBM Academy of Technology leadership team. Richard is a distinguished engineer in digital human evangelism for North America government and a distinguished engineer at IBM in Cloud and Cognitive for the public sector and a former chief architect on federal government agencies.  

Show Notes 6:30 – What is a data framework? 9:16 – What is framework? 11:00 – Governance and people 14:15 – Process and architecture 19:15 – Bringing it all into a playbook   22:56 – Who is the client? 24:36 – How does bias play into it? 27:02 – What products are you using? 27:54 – Explainability 30:44 – Ethics of AI 37:13 Suj and Richard’s most important lessons in AI The Discipline of Technology    Connect with the Team Producer Kate Brown - LinkedIn. Producer Steve Templeton - LinkedIn. Host Al Martin - LinkedIn and Twitter.  Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

podcast_episode
by Kyle Polich , Eil Goldweber (University of Michigan)

Eil Goldweber, a graduate student at the University of Michigan, comes on today to share his work in applying formal verification to systems and a modification to the Paxos protocol discussed in the paper Significance on Consecutive Ballots in Paxos. Works Mentioned : Previous Episode on Paxos  https://dataskeptic.com/blog/episodes/2020/distributed-consensus Paper: On the Significance on Consecutive Ballots in Paxos by: Eli Goldweber, Nuda Zhang, and Manos Kapritsos Thanks to our sponsor: Nord VPN : 68% off a 2-year plan and one month free! With NordVPN, all the data you send and receive online travels through an encrypted tunnel. This way, no one can get their hands on your private information. Nord VPN is quick and easy to use to protect the privacy and security of your data. Check them out at nordvpn.com/dataskeptic

Privileged Access Management for Secure Storage Administration: IBM Spectrum Scale with IBM Security Verify Privilege Vault

There is a growing insider security risk to organizations. Human error, privilege misuse, and cyberespionage are considered the top insider threats. One of the most dangerous internal security threats is the privileged user with access to critical data, which is the "crown jewels" of the organization. This data is on storage, so storage administration has critical privilege access that can cause major security breaches and jeopardize the safety of sensitive assets. Organizations must maintain tight control over whom they grant privileged identity status to for storage administration. Extra storage administration access must be shared with support and services teams when required. There also is a need to audit critical resource access that is required by compliance to standards and regulations. IBM® Security™ Verify Privilege Vault On-Premises (Verify Privilege Vault), formerly known as IBM Security™ Secret Server, is the next-generation privileged account management that integrates with IBM Storage to ensure that access to IBM Storage administration sessions is secure and monitored in real time with required recording for audit and compliance. Privilege access to storage administration sessions is centrally managed, and each session can be timebound with remote monitoring. You also can use remote termination and an approval workflow for the session. In this IBM Redpaper, we demonstrate the integration of IBM Spectrum® Scale and IBM Elastic Storage® Server (IBM ESS) with Verify Privilege Vault, and show how to use privileged access management (PAM) for secure storage administration. This paper is targeted at storage and security administrators, storage and security architects, and chief information security officers.

Summary One of the core responsibilities of data engineers is to manage the security of the information that they process. The team at Satori has a background in cybersecurity and they are using the lessons that they learned in that field to address the challenge of access control and auditing for data governance. In this episode co-founder and CTO Yoav Cohen explains how the Satori platform provides a proxy layer for your data, the challenges of managing security across disparate storage systems, and their approach to building a dynamic data catalog based on the records that your organization is actually using. This is an interesting conversation about the intersection of data and security and the lessons that can be learned in each direction.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Your host is Tobias Macey and today I’m interviewing Yoav Cohen about Satori, a data access service to monitor, classify and control access to sensitive data

Interview

Introduction How did you get involved in the area of data management? Can you start by describing what you have built at Satori?

What is the story behind the product and company?

How does Satori compare to other tools and products for managing access control and governance for data assets? What are the biggest challenges that organizations face in establishing and enforcing policies for their data? What are the main goals for the Satori product and what use cases does it enable? Can you describe how the Satori platform is architected?

How has the design of the platform evolved since you first began working on it?

How have your experiences working in cyber security informed your approach to data governance? How does the design of the Satori platform simplify technical aspects of data governance?

What aspects of governance do you delegate to other systems or platforms?

What elements of data infrastructure does Satori integrate with?

For someone who is adopting Satori, what is involved in getting it deployed and set up with their existing data platforms?

What do you see as being the most complex or underserved aspects of data governance?

How much of that complexity is inherent to the problem vs. being a result of how the industry has evolved?

What are some of the most interesting, innovative, or unexpected ways that you have seen the Satori platform used? What are the most interesting, unexpected, or challenging lessons that you have learned while building Satori? When is Satori the wrong choice? What do you have planned for the future of the platform?

Contact Info

LinkedIn @yoavcohen on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don’t forget to check out our other show, Podcast.init to learn about the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat

Links

Satori Data Governance Data Masking TLS == Transport Layer Security

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast

MongoDB Fundamentals

This book, "MongoDB Fundamentals", is the ideal hands-on guide to learning MongoDB. By starting from the basics of NoSQL databases and progressing to cloud integration using MongoDB Atlas, you will gain practical experience managing, querying, and visualizing data effectively for real-world applications. What this Book will help me do Set up and manage a MongoDB database with both local and cloud environments. Master querying and modifying data using the aggregation framework for complex operations. Implement effective database architecture with replication and sharding techniques. Ensure data security and resilience through user management and efficient backup/restore methods. Visualize data insights through dynamic reports and charts using MongoDB Charts. Author(s) Amit Phaltankar, Juned Ahsan, Michael Harrison, and Liviu Nedov are seasoned professionals in the field of database management systems, each bringing extensive experience working with MongoDB and cloud technologies. They excel at translating technical concepts into accessible, actionable insights, and have a passion for enabling IT professionals to create high-performance database solutions. Who is it for? "MongoDB Fundamentals" is tailored for developers, database administrators, system administrators, and cloud architects who are new to MongoDB but are looking to integrate it into their data processing workflows. It's perfect for those who aim to enhance their skills in handling data within cloud computing environments and have some basic programming or database experience.

Send us a text Abstract Hosted by Al Martin, VP, Data and AI Expert Services and Learning at IBM, Making Data Simple provides the latest thinking on big data, A.I., and the implications for the enterprise from a range of experts.

This week on Making Data Simple, we have Claude Yusti. Claude is a Partner in IBM Global Business Services and is responsible for leading the Watson AI and Data Platform Practice for Public Sector. His primary clients are in Health and Human Services. He has over 30 years of experience in the Government and Healthcare industries and has worked with a number of industry.

Show Notes 6:19 – Working with government 9: 58 - On what context are you defining AI? 11:04 – How do you see the protection of data? 18:05 – Use cases 25:01 – Skills, security, data set control and how the government differs from private sector?  Claude Yusti - LinkenIn  Connect with the Team Producer Kate Brown - LinkedIn. Producer Steve Templeton - LinkedIn. Host Al Martin - LinkedIn and Twitter.  Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

Summary As a data engineer you’re familiar with the process of collecting data from databases, customer data platforms, APIs, etc. At YipitData they rely on a variety of alternative data sources to inform investment decisions by hedge funds and businesses. In this episode Andrew Gross, Bobby Muldoon, and Anup Segu describe the self service data platform that they have built to allow data analysts to own the end-to-end delivery of data projects and how that has allowed them to scale their output. They share the journey that they went through to build a scalable and maintainable system for web scraping, how to make it reliable and resilient to errors, and the lessons that they learned in the process. This was a great conversation about real world experiences in building a successful data-oriented business.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise. When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Are you bogged down by having to manually manage data access controls, repeatedly move and copy data, and create audit reports to prove compliance? How much time could you save if those tasks were automated across your cloud platforms? Immuta is an automated data governance solution that enables safe and easy data analytics in the cloud. Our comprehensive data-level security, auditing and de-identification features eliminate the need for time-consuming manual processes and our focus on data and compliance team collaboration empowers you to deliver quick and valuable data analytics on the most sensitive data to unlock the full potential of your cloud data platforms. Learn how we streamline and accelerate manual processes to help you derive real results from your data at dataengineeringpodcast.com/immuta. Today’s episode of the Data Engineering Podcast is sponsored by Datadog, a SaaS-based monitoring and analytics platform for cloud-scale infrastructure, applications, logs, and more. Datadog uses machine-learning based algorithms to detect errors and anomalies across your entire stack—which reduces the time it takes to detect and address outages and helps promote collaboration between Data Engineering, Operations, and the rest of the company. Go to dataengineeringpodcast.com/datadog today to start your free 14 day trial. If you start a trial and install Datadog’s agent, Datadog will send you a free T-shirt. Your host is Tobias Macey and today I’m interviewing Andrew Gross, Bobby Muldoon, and Anup Segu about they are building pipelines at Yipit Data

Interview

Introduction How did you get involved in the area of data management? Can you start by giving an overview of what YipitData does? What kinds of data sources and data assets are you working with? What is the composition of your data teams and how are they structured? Given the use of your data products in the financial sector how do you handle monitoring and alerting around data qualit

Pro SQL Server Relational Database Design and Implementation: Best Practices for Scalability and Performance

Learn effective and scalable database design techniques in SQL Server 2019 and other recent SQL Server versions. This book is revised to cover additions to SQL Server that include SQL graph enhancements, in-memory online transaction processing, temporal data storage, row-level security, and other design-related features. This book will help you design OLTP databases that are high-quality, protect the integrity of your data, and perform fast on-premises, in the cloud, or in hybrid configurations. Designing an effective and scalable database using SQL Server is a task requiring skills that have been around for well over 30 years, using technology that is constantly changing. This book covers everything from design logic that business users will understand to the physical implementation of design in a SQL Server database. Grounded in best practices and a solid understanding of the underlying theory, author Louis Davidson shows you how to "getit right" in SQL Server database design and lay a solid groundwork for the future use of valuable business data. What You Will Learn Develop conceptual models of client data using interviews and client documentation Implement designs that work on premises, in the cloud, or in a hybrid approach Recognize and apply common database design patterns Normalize data models to enhance integrity and scalability of your databases for the long-term use of valuable data Translate conceptual models into high-performing SQL Server databases Secure and protect data integrity as part of meeting regulatory requirements Create effective indexing to speed query performance Understand the concepts of concurrency Who This Book Is For Programmers and database administrators of all types who want to use SQL Server to store transactional data. The book is especially useful to those wanting to learn the latest database design features in SQL Server 2019 (features that include graph objects, in-memory OLTP, temporal data support, and more). Chapters on fundamental concepts, the language of database modeling, SQL implementation, and the normalization process lay a solid groundwork for readers who are just entering the field of database design. More advanced chapters serve the seasoned veteran by tackling the latest in physical implementation features that SQL Server has to offer. The book has been carefully revised to cover all the design-related features that are new in SQL Server 2019.

Exam Ref PL-900 Microsoft Power Platform Fundamentals

Prepare for Microsoft Exam PL-900: Demonstrate your real-world knowledge of the fundamentals of Microsoft Power Platform, including its business value, core components, and the capabilities and advantages of Power BI, Power Apps, Power Automate, and Power Virtual Agents. Designed for business users, functional consultants, and other professionals, this Exam Ref focuses on the critical thinking and decision-making acumen needed for success at the Microsoft Certified: Power Platform Fundamentals level. Focus on the expertise measured by these objectives: Describe the business value of Power Platform Identify the Core Components of Power Platform Demonstrate the capabilities of Power BI Demonstrate the capabilities of Power Apps Demonstrate the capabilities of Power Automate Demonstrate the capabilities of Power Virtual Agents This Microsoft Exam Ref: Organizes its coverage by exam objectives Features strategic, what-if scenarios to challenge you Assumes you are a business user, functional consultant, or other professional who wants to improve productivity by automating business processes, analyzing data, creating simple app experiences, or developing business enhancements to Microsoft cloud solutions. About the Exam Exam PL-900 focuses on knowledge needed to describe the value of Power Platform services and of extending solutions; describe Power Platform administration and security; describe Common Data Service, Connectors, and AI Builder; identify common Power BI components; connect to and consume data; build basic dashboards with Power BI; identify common Power Apps components; build basic canvas and model-driven apps; describe Power Apps portals; identify common Power Automate components; build basic flows; describe Power Virtual Agents capabilities; and build and publish basic chatbots. About Microsoft Certification Passing this exam fulfills your requirements for the Microsoft Certified: Power Platform Fundamentals certification, demonstrating your understanding of Power Platforms core capabilitiesfrom business value and core product capabilities to building simple apps, connecting data sources, automating basic business processes, creating dashboards, and creating chatbots. With this certification, you can move on to earn specialist certifications covering more advanced aspects of Power Apps and Power BI, including Microsoft Certified: Power Platform App Maker Associate and Power Platform Data Analyst Associate. See full details at: microsoft.com/learn

Exam Ref PL-900 Microsoft Power Platform Fundamentals

Prepare for Microsoft Exam PL-900: Demonstrate your real-world knowledge of the fundamentals of Microsoft Power Platform, including its business value, core components, and the capabilities and advantages of Power BI, Power Apps, Power Automate, and Power Virtual Agents. Designed for business users, functional consultants, and other professionals, this Exam Ref focuses on the critical thinking and decision-making acumen needed for success at the Microsoft Certified: Power Platform Fundamentals level. Focus on the expertise measured by these objectives: Describe the business value of Power Platform Identify the Core Components of Power Platform Demonstrate the capabilities of Power BI Demonstrate the capabilities of Power Apps Demonstrate the capabilities of Power Automate Demonstrate the capabilities of Power Virtual Agents This Microsoft Exam Ref: Organizes its coverage by exam objectives Features strategic, what-if scenarios to challenge you Assumes you are a business user, functional consultant, or other professional who wants to improve productivity by automating business processes, analyzing data, creating simple app experiences, or developing business enhancements to Microsoft cloud solutions. About the Exam Exam PL-900 focuses on knowledge needed to describe the value of Power Platform services and of extending solutions; describe Power Platform administration and security; describe Common Data Service, Connectors, and AI Builder; identify common Power BI components; connect to and consume data; build basic dashboards with Power BI; identify common Power Apps components; build basic canvas and model-driven apps; describe Power Apps portals; identify common Power Automate components; build basic flows; describe Power Virtual Agents capabilities; and build and publish basic chatbots. About Microsoft Certification Passing this exam fulfills your requirements for the Microsoft Certified: Power Platform Fundamentals certification, demonstrating your understanding of Power Platforms core capabilitiesfrom business value and core product capabilities to building simple apps, connecting data sources, automating basic business processes, creating dashboards, and creating chatbots. With this certification, you can move on to earn specialist certifications covering more advanced aspects of Power Apps and Power BI, including Microsoft Certified: Power Platform App Maker Associate and Power Platform Data Analyst Associate. See full details at: microsoft.com/learn

Summary Building data products are complicated by the fact that there are so many different stakeholders with competing goals and priorities. It is also challenging because of the number of roles and capabilities that are necessary to go from idea to delivery. Different organizations have tried a multitude of organizational strategies to improve the success rate of these data teams with varying levels of success. In this episode Jesse Anderson shares the lessons that he has learned while working with dozens of businesses across industries to determine the team structures and communication styles that have generated the best results. If you are struggling to deliver value from big data, or just starting down the path of building the organizational capacity to turn raw information into valuable products then this is a conversation that you don’t want to miss.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise. When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Are you bogged down by having to manually manage data access controls, repeatedly move and copy data, and create audit reports to prove compliance? How much time could you save if those tasks were automated across your cloud platforms? Immuta is an automated data governance solution that enables safe and easy data analytics in the cloud. Our comprehensive data-level security, auditing and de-identification features eliminate the need for time-consuming manual processes and our focus on data and compliance team collaboration empowers you to deliver quick and valuable data analytics on the most sensitive data to unlock the full potential of your cloud data platforms. Learn how we streamline and accelerate manual processes to help you derive real results from your data at dataengineeringpodcast.com/immuta. Today’s episode of the Data Engineering Podcast is sponsored by Datadog, a SaaS-based monitoring and analytics platform for cloud-scale infrastructure, applications, logs, and more. Datadog uses machine-learning based algorithms to detect errors and anomalies across your entire stack—which reduces the time it takes to detect and address outages and helps promote collaboration between Data Engineering, Operations, and the rest of the company. Go to dataengineeringpodcast.com/datadog today to start your free 14 day trial. If you start a trial and install Datadog’s agent, Datadog will send you a free T-shirt. Your host is Tobias Macey and today I’m interviewing Jesse Anderson about best practices for organizing and managing data teams

Interview

Introduction How did you get involved in the area of data management? Can you start by giving an overview of how you view the mission and responsibilities of a data team?

What are the critical elements of a successful data team? Beyond the core pillars of data science, data engineering, and operations, what other specialized roles do you find hel

Legal and Privacy Issues in Information Security, 3rd Edition

Thoroughly revised and updated to address the many changes in this evolving field, the third edition of Legal and Privacy Issues in Information Security addresses the complex relationship between the law and the practice of information security. Information systems security and legal compliance are required to protect critical governmental and corporate infrastructure, intellectual property created by individuals and organizations alike, and information that individuals believe should be protected from unreasonable intrusion. Organizations must build numerous information security and privacy responses into their daily operations to protect the business itself, fully meet legal requirements, and to meet the expectations of employees and customers.

Instructor Materials for Legal Issues in Information Security include:

PowerPoint Lecture Slides
Instructor's Guide
Sample Course Syllabus
Quiz & Exam Questions
Case Scenarios/Handouts

New to the third Edition:

• Includes discussions of amendments in several relevant federal and state laws and regulations since 2011 • Reviews relevant court decisions that have come to light since the publication of the first edition • Includes numerous information security data breaches highlighting new vulnerabilities

Summary The first stage of every good pipeline is to perform data integration. With the increasing pace of change and the need for up to date analytics the need to integrate that data in near real time is growing. With the improvements and increased variety of options for streaming data engines and improved tools for change data capture it is possible for data teams to make that goal a reality. However, despite all of the tools and managed distributions of those streaming engines it is still a challenge to build a robust and reliable pipeline for streaming data integration, especially if you need to expose those capabilities to non-engineers. In this episode Ido Friedman, CTO of Equalum, explains how they have built a no-code platform to make integration of streaming data and change data capture feeds easier to manage. He discusses the challenges that are inherent in the current state of CDC technologies, how they have architected their system to integrate well with existing data platforms, and how to build an appropriate level of abstraction for such a complex problem domain. If you are struggling with streaming data integration and change data capture then this interview is definitely worth a listen.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise. When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Modern Data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days. Datafold helps Data teams gain visibility and confidence in the quality of their analytical data through data profiling, column-level lineage and intelligent anomaly detection. Datafold also helps automate regression testing of ETL code with its Data Diff feature that instantly shows how a change in ETL or BI code affects the produced data, both on a statistical level and down to individual rows and values. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Go to dataengineeringpodcast.com/datafold today to start a 30-day trial of Datafold. Once you sign up and create an alert in Datafold for your company data, they will send you a cool water flask. Are you bogged down by having to manually manage data access controls, repeatedly move and copy data, and create audit reports to prove compliance? How much time could you save if those tasks were automated across your cloud platforms? Immuta is an automated data governance solution that enables safe and easy data analytics in the cloud. Our comprehensive data-level security, auditing and de-identification features eliminate the need for time-consuming manual processes and our focus on data and compliance team collaboration empowers you to deliver quick and valuable data analytics on the most sensitive data to unloc

Deployment and Usage Guide for Running AI Workloads on Red Hat OpenShift and NVIDIA DGX Systems with IBM Spectrum Scale

This IBM® Redpaper publication describes the architecture, installation procedure, and results for running a typical training application that works on an automotive data set in an orchestrated and secured environment that provides horizontal scalability of GPU resources across physical node boundaries for deep neural network (DNN) workloads. This paper is mostly relevant for systems engineers, system administrators, or system architects that are responsible for data center infrastructure management and typical day-to-day operations such as system monitoring, operational control, asset management, and security audits. This paper also describes IBM Spectrum® LSF® as a workload manager and IBM Spectrum Discover as a metadata search engine to find the right data for an inference job and automate the data science workflow. With the help of this solution, the data location, which may be on different storage systems, and time of availability for the AI job can be fully abstracted, which provides valuable information for data scientists.

Exposed

Discover why privacy is a counterproductive, if not obsolete, concept in this startling new book It's only a matter of time-- the modern notion of privacy is quickly evaporating because of technological advancement and social engagement. Whether we like it or not, all our actions and communications are going to be revealed for everyone to see. Exposed: How Revealing Your Data and Eliminating Privacy Increases Trust and Liberates Humanity takes a controversial and insightful look at the concept of privacy and persuasively argues that preparing for a post-private future is better than exacerbating the painful transition by attempting to delay the inevitable. Security expert and author Ben Malisow systematically dismantles common notions of privacy and explains how: Most arguments in favor of increased privacy are wrong Privacy in our personal lives leaves us more susceptible to being bullied or blackmailed Governmental and military privacy leads to an imbalance of power between citizen and state Military supremacy based on privacy is an obsolete concept Perfect for anyone interested in the currently raging debates about governmental, institutional, corporate, and personal privacy, and the proper balance between the public and the private, Exposed also belongs on the shelves of security practitioners and policymakers everywhere.

Pro Microsoft Power BI Administration: Creating a Consistent, Compliant, and Secure Corporate Platform for Business Intelligence

Manage Power BI within organizations. This book helps you systematize administration as Microsoft shifts Power BI from a self-service tool to an enterprise tool. You will learn best practices for many Power BI administrator tasks. And you will know how to manage artifacts such as reports, users, work spaces, apps, and gateways. The book also provides experience-based guidance on governance, licensing, and managing capacities. Good management includes policies and procedures that can be applied consistently and even automatically across a broad user base. This book provides a strategic road map for the creation and implementation of policies and procedures that support Power BI best practices in enterprises. Effective governance depends not only on good policies, but also on the active and timely monitoring of adherence to those policies. This book helps you evaluate the tools to automate and simplify the most common administrativeand monitoring tasks, freeing up administrators to provide greater value to the organization through better user training and awareness initiatives. What You Will Learn Recognize the roles and responsibilities of the Power BI administrator Manage users and their work spaces Know when to consider using Power BI Premium Govern your Power BI implementation and manage Power BI tenants Create an effective security strategy for Power BI in the enterprise Collaborate and share consistent views of the data across all users Follow a life cycle management strategy for rollout of dashboards and reports Create internal training resources backed up by accurate documentation Monitor Power BI to better understand risks and compliance manage costs, and track implementation Who This Book Is For IT professionals tasked with maintaining their corporate Power BI environments, Power BI administrators and power users interested in rolling out Power BI more widely in their organizations, and IT governance professionals tasked with ensuring adherence to policies and regulations

Summary One of the oldest aphorisms about data is "garbage in, garbage out", which is why the current boom in data quality solutions is no surprise. With the growth in projects, platforms, and services that aim to help you establish and maintain control of the health and reliability of your data pipelines it can be overwhelming to stay up to date with how they all compare. In this episode Egor Gryaznov, CTO of Bigeye, joins the show to explore the landscape of data quality companies, the general strategies that they are using, and what problems they solve. He also shares how his own product is designed and the challenges that are involved in building a system to help data engineers manage the complexity of a data platform. If you are wondering how to get better control of your own pipelines and the traps to avoid then this episode is definitely worth a listen.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management What are the pieces of advice that you wish you had received early in your career of data engineering? If you hand a book to a new data engineer, what wisdom would you add to it? I’m working with O’Reilly on a project to collect the 97 things that every data engineer should know, and I need your help. Go to dataengineeringpodcast.com/97things to add your voice and share your hard-earned expertise. When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Modern Data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days. Datafold helps Data teams gain visibility and confidence in the quality of their analytical data through data profiling, column-level lineage and intelligent anomaly detection. Datafold also helps automate regression testing of ETL code with its Data Diff feature that instantly shows how a change in ETL or BI code affects the produced data, both on a statistical level and down to individual rows and values. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Go to dataengineeringpodcast.com/datafold today to start a 30-day trial of Datafold. Once you sign up and create an alert in Datafold for your company data, they will send you a cool water flask. Are you bogged down by having to manually manage data access controls, repeatedly move and copy data, and create audit reports to prove compliance? How much time could you save if those tasks were automated across your cloud platforms? Immuta is an automated data governance solution that enables safe and easy data analytics in the cloud. Our comprehensive data-level security, auditing and de-identification features eliminate the need for time-consuming manual processes and our focus on data and compliance team collaboration empowers you to deliver quick and valuable data analytics on the most sensitive data to unlock the full potential of your cloud data platforms. Learn how we streamline and accelerate manual processes to help you derive real results from your data at dataengineeringpodcast.com/immuta. Your host is Tobias Macey and today I’m interviewing Egor Gryaznov about the state of the industry for data quality management and what he is building at B