talk-data.com talk-data.com

Topic

Cyber Security

cybersecurity information_security data_security privacy

2078

tagged

Activity Trend

297 peak/qtr
2020-Q1 2026-Q1

Activities

2078 activities · Newest first

Data Democratization with Domo

Discover how to leverage the full potential of Domo, a robust cloud-based business intelligence platform, in your organization. This comprehensive guide walks you through data integration, transformation, visualization, and governance techniques, enabling you to deliver impactful, data-driven results quickly and effectively. What this Book will help me do Understand and utilize Domo's cloud data architecture for comprehensive data analysis. Seamlessly acquire and manage data using Domo connectors and tools. Create and customize dashboards that communicate data insights effectively. Build and deploy Python applications and machine learning models on Domo. Securely govern your organization's data with robust Domo features. Author(s) The author, None Burtenshaw, is an expert in business intelligence and data platforms. With years of experience working with data integration tools, their writing combines technical thoroughness with practical insights. They aim to empower professionals with the skills to excel in data-driven decision making, reflecting their passion for making technology accessible and actionable. Who is it for? This book is ideal for business intelligence professionals, including developers and analysts, looking to elevate their understanding of Domo. It is suited for those with a fundamental knowledge of data platforms seeking advanced skills in data management and visualization. BI managers will gain insights into governance and security, while analysts will find inspiration for data storytelling. If you're aiming to master the possibilities of Domo, this book is for you.

Advanced Analytics with PySpark

The amount of data being generated today is staggering and growing. Apache Spark has emerged as the de facto tool to analyze big data and is now a critical part of the data science toolbox. Updated for Spark 3.0, this practical guide brings together Spark, statistical methods, and real-world datasets to teach you how to approach analytics problems using PySpark, Spark's Python API, and other best practices in Spark programming. Data scientists Akash Tandon, Sandy Ryza, Uri Laserson, Sean Owen, and Josh Wills offer an introduction to the Spark ecosystem, then dive into patterns that apply common techniques-including classification, clustering, collaborative filtering, and anomaly detection, to fields such as genomics, security, and finance. This updated edition also covers NLP and image processing. If you have a basic understanding of machine learning and statistics and you program in Python, this book will get you started with large-scale data analysis. Familiarize yourself with Spark's programming model and ecosystem Learn general approaches in data science Examine complete implementations that analyze large public datasets Discover which machine learning tools make sense for particular problems Explore code that can be adapted to many uses

Eyal Waldman, Co-founder and former President, CEO, and Member of the Board of Mellanox Technologies, joins Molly Presley to discuss how data has transformed the world, which companies to keep an eye on in the future of data, and the importance of security for global data communication.

Data #DecentralizedData #Business #21stCentury #HammerSpace #Podcast #futurist

Cyberpunk by jiglr | https://soundcloud.com/jiglrmusic Music promoted by https://www.free-stock-music.com Creative Commons Attribution 3.0 Unported License https://creativecommons.org/licenses/by/3.0/deed.en_US Hosted on Acast. See acast.com/privacy for more information.

Summary The best way to make sure that you don’t leak sensitive data is to never have it in the first place. The team at Skyflow decided that the second best way is to build a storage system dedicated to securely managing your sensitive information and making it easy to integrate with your applications and data systems. In this episode Sean Falconer explains the idea of a data privacy vault and how this new architectural element can drastically reduce the potential for making a mistake with how you manage regulated or personally identifiable information.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! Atlan is the metadata hub for your data ecosystem. Instead of locking all of that information into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan today to learn more about how you can take advantage of active metadata and escape the chaos. Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days or even weeks. By the time errors have made their way into production, it’s often too late and damage is done. Datafold built automated regression testing to help data and analytics engineers deal with data quality in their pull requests. Datafold shows how a change in SQL code affects your data, both on a statistical level and down to individual rows and values before it gets merged to production. No more shipping and praying, you can now know exactly what will change in your database! Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Visit dataengineeringpodcast.com/datafold today to book a demo with Datafold. Data teams are increasingly under pressure to deliver. According to a recent survey by Ascend.io, 95% in fact reported being at or over capacity. With 72% of data experts reporting demands on their team going up faster than they can hire, it’s no surprise they are increasingly turning to automation. In fact, while only 3.5% report having current investments in automation, 85% of data teams plan on investing in automation in the next 12 months. 85%!!! That’s where our friends at Ascend.io come in. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation, orchestration, and observability. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend automates workloads on Snowflake, Databricks, BigQuery, and open source Spark, and can be deployed in AWS, Azure, or GCP. Go to dataengineeringpodcast.com/ascend and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $5,000 when you become a customer. Your host is Tobias Macey and today I’m interviewing Sean Falconer about the idea of a data privacy vault and how the Skyflow team are working to make it turn-key

Interview

Introduction How did you get involved in the area of data management? Can you describe what Skyflow is and the story behind it? What is a "data privacy vault" and how does it differ from strategies such as privacy engineering or existing data governance patterns? What are the primary use cases and capabilities that you are focused on solving for with Skyflow?

Who is the target customer for Skyflow (e.g. how does it enter an organization)?

How is the Skyflow platform architected?

How have the design and goals of the system changed or evolved over time?

Can you describe the process of integrating with Skyflow at the application level? For organizations that are building analytical capabilities on top of the data managed in their applications, what are the interactions with Skyflow at each of the stages in the data lifecycle? One of the perennial problems with distributed systems is the challenge of joining data across machine boundaries. How do you mitigate that problem? On your website there are different "vaults" advertised in the form of healthcare, fintech, and PII. What are the different requirements across each of those problem domains?

What are the commonalities?

As a relatively new company in an emerging product category, what are some of the customer education challenges that you are facing? What are the most interesting, innovative, or unexpected ways that you have seen Skyflow used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Skyflow? When is Skyflow the wrong choice? What do you have planned for the future of Skyflow?

Contact Info

LinkedIn @seanfalconer on Twitter Website

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don’t forget to check out our other show, Podcast.init to learn about the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers

Links

Skyflow Privacy Engineering Data Governance Homomorphic Encryption Polymorphic Encryption

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast

In this episode of SaaS Scaled, we’re talking to Curtis Barker. Curtis is the VP of Product Strategy at Rezilion, a platform built to help organizations take control of their actual attack surface.   We talk about how Rezilion came to be, how it works, and what problems it solves. Curtis discusses how cloud computing in general has changed over the years and the specific regulations and frameworks that have evolved. How have these standards shaped the cloud space, for better or worse?   Curtis talks about the differences between product strategy and product management, as well as the kind of personality and skills needed to be successful in each, based on his own experience.    We discuss how SaaS looks set to change in the coming years and where it needs to change the most, specifically regarding security. And Curtis shares his thoughts on the future of technologies like machine learning and artificial intelligence.

Elasticsearch 8.x Cookbook - Fifth Edition

"Elasticsearch 8.x Cookbook" is your go-to resource for harnessing the full potential of Elasticsearch 8. This book provides over 180 hands-on recipes to help you efficiently implement, customize, and scale Elasticsearch solutions in your enterprise. Whether you're handling complex queries, analytics, or cluster management, you'll find practical insights to enhance your capabilities. What this Book will help me do Understand the advanced features of Elasticsearch 8.x, including X-Pack, for improving functionality and security. Master advanced indexing and query techniques to perform efficient and scalable data operations. Implement and manage Elasticsearch clusters effectively including monitoring performance via Kibana. Integrate Elasticsearch seamlessly into Java, Scala, Python, and big data environments. Develop custom plugins and extend Elasticsearch to meet unique project requirements. Author(s) Alberto Paro is a seasoned Elasticsearch expert with years of experience in search technologies and enterprise solution development. As a professional developer and consultant, he has worked with numerous organizations to implement Elasticsearch at scale. Alberto brings his deep technical knowledge and hands-on approach to this book, ensuring readers gain practical insights and skills. Who is it for? This book is perfect for software engineers, data professionals, and developers working with Elasticsearch in enterprise environments. If you're seeking to advance your Elasticsearch knowledge, enhance your query-writing abilities, or seek to integrate it into big data workflows, this book will be invaluable. Regardless of whether you're deploying Elasticsearch in e-commerce, applications, or for analytics, you'll find the content purposeful and engaging.

IBM z16 Technical Introduction

This IBM® Redbooks® publication introduces the latest member of the IBM Z® platform that is built with the IBM Telum processor: the IBM z16 server. The IBM Z platform is recognized for its security, resiliency, performance, and scale. It is relied on for mission-critical workloads and as an essential element of hybrid cloud infrastructures. The IBM z16 server adds capabilities and value with innovative technologies that are needed to accelerate the digital transformation journey. This book explains how the IBM z16 server uses innovations and traditional IBM Z strengths to satisfy the growing demand for cloud, analytics, and a more flexible infrastructure. With the IBM z16 servers as the base, applications can run in a trusted, reliable, and secure environment that improves operations and lessens business risk.

COVID, inflation, broken supply chains, and not-so-distant war make this a turbulent time for the modern consumer. During times like these, families tend to their nests, which leads to lots of home-improvement projects…which means lots of painting.

Today we explore the case study of a Fortune 500 producer of the paints and stains that coat many households, consumer products, and even mechanical vehicles. While business expands, this company needs to carefully align the records that track hundreds of suppliers, thousands of storefronts, and millions of customers.

Business expansion and complex supply chains make it particularly important—and challenging—for enterprises such as this paint producer, which we’ll call Bright Colors, to accurately describe the entities that make up their business. They need to be governed, validated data to describe entities such as their products, locations, and customers. Master data management, also known as MDM, streamlines operations and assists data governance by reconciling disparate data records into golden records and ideally a single source of truth.

We’re excited to share our conversation with an industry expert that helps Bright Colors and other Fortune 2000 enterprises navigate turbulent times with effective strategies for MDM and data governance.

Dave Wilkinson is chief technology officer with D3Clarity, a global strategy and implementation services firm that seeks to ensure digital certainty, security, and trust. D3Clarity is a partner of Semarchy, whose Intelligent Data Hub software helps enterprises govern and manage master data, reference data, data quality, enrichment, and workflows. Semarchy sponsored this podcast.

Microsoft Power BI Performance Best Practices

"Microsoft Power BI Performance Best Practices" is a thorough guide to mastering efficiently operating Power BI solutions. This book walks you through optimizing every layer of a Power BI project, from data transformations to architecture, equipping you with the ability to create robust and scalable analytics solutions. What this Book will help me do Understand how to set realistic performance goals for Power BI projects and implement ongoing performance monitoring. Apply effective architectural and configuration strategies to improve Power BI solution efficiency. Learn practices for constructing and optimizing data models and implementing Row-Level Security effectively. Utilize tools like DAX Studio and VertiPaq Analyzer to detect and resolve common performance bottlenecks. Gain deep knowledge of Power BI Premium and techniques for handling large-scale data solutions using Azure. Author(s) Bhavik Merchant is a recognized expert in business intelligence and analytics solutions. With extensive experience in designing and implementing Power BI solutions across industries, he brings a pragmatic approach to solving performance issues in Power BI. Bhavik's writing style reflects his passion for teaching, ensuring readers gain practical knowledge they can directly apply to their work. Who is it for? This book is designed for data analysts, BI developers, and data professionals who have foundational knowledge of Power BI and aim to elevate their skills to construct high-performance analytics solutions. It is particularly suited to individuals seeking guidance on best practices and tools for optimizing Power BI applications.

In this episode of SaaS Scaled, we’re talking to Daniel Saks. Daniel is the president and co-founder of AppDirect, a platform that allows businesses to access all the tools and capabilities needed to thrive in a rapidly evolving digital world.   Daniel talks about how AppDirect got started, the problems it solves, and the story so far. We talk about the growth of the digital economy in recent decades and the changes that Daniel has noticed over time.   We talk about the rise of SaaS companies, and what the future holds as some companies move from direct to indirect selling, and single-channel to multi-channel. Daniel shares some of the various factors that could bring down the cost of sales for SaaS companies.   Finally, Daniel talks about his own podcast and shares one of his favorite books.   This episode is brought to you by Qrvey The tools you need to take action with your data, on a platform built for maximum scalability, security, and cost efficiencies. If you’re ready to reduce complexity and dramatically lower costs, contact us today at qrvey.com. Qrvey, the modern no-code analytics solution for SaaS companies on AWS.

Bioinformatics and Medical Applications

BIOINFORMATICS AND MEDICAL APPLICATIONS The main topics addressed in this book are big data analytics problems in bioinformatics research such as microarray data analysis, sequence analysis, genomics-based analytics, disease network analysis, techniques for big data analytics, and health information technology. Bioinformatics and Medical Applications: Big Data Using Deep Learning Algorithms analyses massive biological datasets using computational approaches and the latest cutting-edge technologies to capture and interpret biological data. The book delivers various bioinformatics computational methods used to identify diseases at an early stage by assembling cutting-edge resources into a single collection designed to enlighten the reader on topics focusing on computer science, mathematics, and biology. In modern biology and medicine, bioinformatics is critical for data management. This book explains the bioinformatician’s important tools and examines how they are used to evaluate biological data and advance disease knowledge. The editors have curated a distinguished group of perceptive and concise chapters that presents the current state of medical treatments and systems and offers emerging solutions for a more personalized approach to healthcare. Applying deep learning techniques for data-driven solutions in health information allows automated analysis whose method can be more advantageous in supporting the problems arising from medical and health-related information. Audience The primary audience for the book includes specialists, researchers, postgraduates, designers, experts, and engineers, who are occupied with biometric research and security-related issues.

Visualizing Google Cloud

Easy-to-follow visual walkthrough of every important part of the Google Cloud Platform The Google Cloud Platform incorporates dozens of specialized services that enable organizations to offload technological needs onto the cloud. From routine IT operations like storage to sophisticated new capabilities including artificial intelligence and machine learning, the Google Cloud Platform offers enterprises the opportunity to scale and grow efficiently. In Visualizing Google Cloud: Illustrated References for Cloud Engineers & Architects, Google Cloud expert Priyanka Vergadia delivers a fully illustrated, visual guide to matching the best Google Cloud Platform services to your own unique use cases. After a brief introduction to the major categories of cloud services offered by Google, the author offers approximately 100 solutions divided into eight categories of services included in Google Cloud Platform: Compute Storage Databases Data Analytics Data Science, Machine Learning and Artificial Intelligence Application Development and Modernization with Containers Networking Security You’ll find richly illustrated flowcharts and decision diagrams with straightforward explanations in each category, making it easy to adopt and adapt Google’s cloud services to your use cases. With coverage of the major categories of cloud models—including infrastructure-, containers-, platforms-, functions-, and serverless—and discussions of storage types, databases and Machine Learning choices, Visualizing Google Cloud: Illustrated References for Cloud Engineers & Architects is perfect for Every Google Cloud enthusiast, of course. It is for anyone who is planning a cloud migration or new cloud deployment. It is for anyone preparing for cloud certification, and for anyone looking to make the most of Google Cloud. It is for cloud solutions architects, IT decision-makers, and cloud data and ML engineers. In short, this book is for YOU.

Summary Any time that you are storing data about people there are a number of privacy and security considerations that come with it. Privacy engineering is a growing field in data management that focuses on how to protect attributes of personal data so that the containing datasets can be shared safely. In this episode Gretel co-founder and CTO John Myers explains how they are building tools for data engineers and analysts to incorporate privacy engineering techniques into their workflows and validate the safety of their data against re-identification attacks.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! This episode is brought to you by Acryl Data, the company behind DataHub, the leading developer-friendly data catalog for the modern data stack. Open Source DataHub is running in production at several companies like Peloton, Optum, Udemy, Zynga and others. Acryl Data provides DataHub as an easy to consume SaaS product which has been adopted by several companies. Signup for the SaaS product at dataengineeringpodcast.com/acryl Are you looking for a structured and battle-tested approach for learning data engineering? Would you like to know how you can build proper data infrastructures that are built to last? Would you like to have a seasoned industry expert guide you and answer all your questions? Join Pipeline Academy, the worlds first data engineering bootcamp. Learn in small groups with likeminded professionals for 9 weeks part-time to level up in your career. The course covers the most relevant and essential data and software engineering topics that enable you to start your journey as a professional data engineer or analytics engineer. Plus we have AMAs with world-class guest speakers every week! The next cohort starts in April 2022. Visit dataengineeringpodcast.com/academy and apply now! RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their state-of-the-art reverse ETL pipelines enable you to send enriched data to any cloud tool. Sign up free… or just get the free t-shirt for being a listener of the Data Engineering Podcast at dataengineeringpodcast.com/rudder. Your host is Tobias Macey and today I’m interviewing John Myers about privacy engineering and use cases for synthetic data

Interview

Introduction How did you get involved in the area of data management? Can you describe what Gretel is and the story behind it? How do you define "privacy engineering"?

In an organization or data team, who is typically responsible for privacy engineering?

How would you characterize the current state of the art and adoption for privacy engineering? Who are the target users of Gretel and how does that inform the features and design of the product? What are the stages of the data lifecycle where Gretel is used? Can you describe a typical workflow for integrating Gretel into data pipelines for business analytics or ML model training? How is the Gretel platform implemented?

How have the design and goals of the system changed or evolved since you started working on it?

What are some of the nuances of synthetic data generation or masking that data engineers/data analysts need to be aware of as they start using Gretel? What are the most interesting, innovative, or unexpected ways that you have seen Gretel used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Gretel? When is Gretel the wrong choice? What do you have planned for the future of Gretel?

Contact Info

LinkedIn @jtm_tech on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don’t forget to check out our other show, Podcast.init to learn about the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers

Links

Gretel Privacy Engineering Weights and Biases Red Team/Blue Team Generative Adversarial Network Capture The Flag in application security CVE == Common Vulnerabilities and Exposures Machine Learning Cold Start Problem Faker Mockaroo Kaggle Sentry

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast

In this episode of SaaS Scaled, we’re talking to Brian Dreyer, VP of Product Management at SightCall. Brian is here to talk about his experience in SaaS product management, share what he’s learned over the years, and tell us how things have changed. Brian talks about how he would do product management today if he had to start a company from scratch, and why. We talk about how to successfully pivot and restart products and the challenges involved. Brian also mentions how SaaS has changed over the last couple of decades and the new challenges that have arisen. We also dive into how the relationship between product and marketing has changed over the years, and Brian talks about how cloud computing has evolved and where it’s headed. Finally, he shares some recommendations for further reading for anyone interested in SaaS product management.   This episode is brought to you by Qrvey The tools you need to take action with your data, on a platform built for maximum scalability, security, and cost efficiencies. If you’re ready to reduce complexity and dramatically lower costs, contact us today at qrvey.com. Qrvey, the modern no-code analytics solution for SaaS companies on AWS.

Blockchain technology, cryptocurrencies and decentralised finance are described by some as massively disruptive technologies that will turn our existing financial system on its head. For the traditional financial services industry, these technologies have the potential to create huge efficiency gains and democratise more complex financial services for individual users. On the other hand, DeFi also reduces – and potentially removes – the need for trusted intermediaries, which makes the model unsettling to some operators in the current financial system. DeFi also opens the opportunity for global financial inclusion of enterprises and private individuals in developing markets – a very large group whose needs are typically unmet by traditional finance. With all this huge potential about to be released, we better learn why these technologies are so revolutionary and what will they do for us now and in the future. To answer these questions and many more relating to DeFi, I recently spoke to Daniel Liebau. Dan is the Chief Investment Officer, Blockchain Strategy at Modular Asset Management and the Founding Chairman of Lightbulb Capital, a DeFi investment and consulting firm. In this episode of Leaders of Analytics, Dan and I discuss: Why is DeFi so revolutionary and the opportunities and risks that lie within this space for individual users, corporations and nation statesThe difference between Payment, Utility and Security tokens and how these are likely to be used in our future financial systemThe utility of NFTs and their future as an asset categoryHow blockchains, cryptocurrencies and DeFi will be part of our lives in 5, 10 and 20 years respectivelyWhat Dan is teaching his FinTech, crypto and DeFi students, and much more.  Daniel Liebau on LinkedIn: https://www.linkedin.com/in/liebauda/ Lightbulb Capital: https://www.lightbulbcap.com/

PostgreSQL 14 Administration Cookbook

PostgreSQL 14 Administration Cookbook provides a hands-on guide to mastering the administration of PostgreSQL 14. With over 175 recipes, this book equips you with practical techniques to manage, secure, and optimize your PostgreSQL databases, ensuring they are robust and high-performing. What this Book will help me do Master managing PostgreSQL databases both on-premises and in the cloud efficiently. Implement effective backup and recovery strategies to secure your data. Leverage the latest features of PostgreSQL 14 to enhance your database workflows. Understand and apply best practices for maintaining high availability and performance. Troubleshoot real-world challenges with guided solutions and expert insights. Author(s) Simon Riggs and Gianni Ciolli are seasoned database experts with years of experience working with PostgreSQL. Simon is a PostgreSQL core team member, contributing his technical knowledge towards building robust database solutions, while Gianni brings a wealth of expertise in database administration and support. Together, they share a passion for making complex database concepts accessible and actionable. Who is it for? This book is for database administrators, data architects, and developers who manage PostgreSQL databases and are looking to deepen their knowledge. It is suitable for professionals with some experience in PostgreSQL who aim to maximize their database's performance and security, as well as for those new to the system seeking a comprehensive start. Readers with an interest in practical, problem-solving approaches to database management will greatly benefit from this cookbook.

The Internet of Medical Things (IoMT)

INTERNET OF MEDICAL THINGS (IOMT) Providing an essential addition to the reference material available in the field of IoMT, this timely publication covers a range of applied research on healthcare, biomedical data mining, and the security and privacy of health records. With their ability to collect, analyze and transmit health data, IoMT tools are rapidly changing healthcare delivery. For patients and clinicians, these applications are playing a central part in tracking and preventing chronic illnesses — and they are poised to evolve the future of care. In this book, the authors explore the potential applications of a wave of sensor-based tools—including wearables and stand-alone devices for remote patient monitoring—and the marriage of internet-connected medical devices with patient information that ultimately sets the IoMT ecosystem apart. This book demonstrates the connectivity between medical devices and sensors is streamlining clinical workflow management and leading to an overall improvement in patient care, both inside care facilities and in remote locations.

Summary Data governance is a practice that requires a high degree of flexibility and collaboration at the organizational and technical levels. The growing prominence of cloud and hybrid environments in data management adds additional stress to an already complex endeavor. Privacera is an enterprise grade solution for cloud and hybrid data governance built on top of the robust and battle tested Apache Ranger project. In this episode Balaji Ganesan shares how his experiences building and maintaining Ranger in previous roles helped him understand the needs of organizations and engineers as they define and evolve their data governance policies and practices.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! This episode is brought to you by Acryl Data, the company behind DataHub, the leading developer-friendly data catalog for the modern data stack. Open Source DataHub is running in production at several companies like Peloton, Optum, Udemy, Zynga and others. Acryl Data provides DataHub as an easy to consume SaaS product which has been adopted by several companies. Signup for the SaaS product at dataengineeringpodcast.com/acryl RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their state-of-the-art reverse ETL pipelines enable you to send enriched data to any cloud tool. Sign up free… or just get the free t-shirt for being a listener of the Data Engineering Podcast at dataengineeringpodcast.com/rudder. The most important piece of any data project is the data itself, which is why it is critical that your data source is high quality. PostHog is your all-in-one product analytics suite including product analysis, user funnels, feature flags, experimentation, and it’s open source so you can host it yourself or let them do it for you! You have full control over your data and their plugin system lets you integrate with all of your other data tools, including data warehouses and SaaS platforms. Give it a try today with their generous free tier at dataengineeringpodcast.com/posthog Your host is Tobias Macey and today I’m interviewing Balaji Ganesan about his work at Privacera and his view on the state of data governance, access control, and security in the cloud

Interview

Introduction How did you get involved in the area of data management? Can you describe what Privacera is and the story behind it? What is your working definition of "data governance" and how does that influence your product focus and priorities? What are some of the lessons that you learned from your work on Apache Ranger that helped with your efforts at Privacera? How would you characterize your position in the market for data governance/data security tools? What are the unique constraints and challenges that come into play when managing data in cloud platforms? Can you explain how the Privacera platform is architected?

How have the design and goals of the system changed or evolved since you started working on it?

What is the workflow for an operator integrating Privacera into a data platform?

How do you provide feedback to users about the level of coverage for discovered data assets?

How does Privacera fit into the workflow of the different personas working with data?

What are some of the security and privacy controls that Privacera introduces?

How do you mitigate the potential for anyone to bypass Privacera’s controls by interacting directly with the underlying systems? What are the most interesting, innovative, or unexpected ways that you have seen Privacera used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Privacera? When is Privacera the wrong choice? What do you have planned for the future of Privacera?

Contact Info

LinkedIn @Balaji_Blog on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don’t forget to check out our other show, Podcast.init to learn about the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers

Links

Privacera Hadoop Hortonworks Apache Ranger Oracle Teradata Presto/Trino Starburst

Podcast Episode

Ahana

Podcast Episode

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Sponsored By: Acryl: Acryl

The modern data stack needs a reimagined metadata management platform. Acryl Data’s vision is to bring clarity to your data through its next generation multi-cloud metadata management platform. Founded by the leaders that created projects like LinkedIn DataHub and Airbnb Dataportal, Acryl Data enables delightful search and discovery, data observability, and federated governance across data ecosystems. Signup for the SaaS product today at dataengineeringpodcast.com/acrylSupport Data Engineering Podcast

Grokking Streaming Systems

A friendly, framework-agnostic tutorial that will help you grok how streaming systems work—and how to build your own! In Grokking Streaming Systems you will learn how to: Implement and troubleshoot streaming systems Design streaming systems for complex functionalities Assess parallelization requirements Spot networking bottlenecks and resolve back pressure Group data for high-performance systems Handle delayed events in real-time systems Grokking Streaming Systems is a simple guide to the complex concepts behind streaming systems. This friendly and framework-agnostic tutorial teaches you how to handle real-time events, and even design and build your own streaming job that’s a perfect fit for your needs. Each new idea is carefully explained with diagrams, clear examples, and fun dialogue between perplexed personalities! About the Technology Streaming systems minimize the time between receiving and processing event data, so they can deliver responses in real time. For applications in finance, security, and IoT where milliseconds matter, streaming systems are a requirement. And streaming is hot! Skills on platforms like Spark, Heron, and Kafka are in high demand. About the Book Grokking Streaming Systems introduces real-time event streaming applications in clear, reader-friendly language. This engaging book illuminates core concepts like data parallelization, event windows, and backpressure without getting bogged down in framework-specific details. As you go, you’ll build your own simple streaming tool from the ground up to make sure all the ideas and techniques stick. The helpful and entertaining illustrations make streaming systems come alive as you tackle relevant examples like real-time credit card fraud detection and monitoring IoT services. What's Inside Implement and troubleshoot streaming systems Design streaming systems for complex functionalities Spot networking bottlenecks and resolve backpressure Group data for high-performance systems About the Reader No prior experience with streaming systems is assumed. Examples in Java. About the Authors Josh Fischer and Ning Wang are Apache Committers, and part of the committee for the Apache Heron distributed stream processing engine. Quotes Very well-written and enjoyable. I recommend this book to all software engineers working on data processing. - Apoorv Gupta, Facebook Finally, a much-needed introduction to streaming systems—a must-read for anyone interested in this technology. - Anupam Sengupta, Red Hat Tackles complex topics in a very approachable manner. - Marc Roulleau, GIRO A superb resource for helping you grasp the fundamentals of open-source streaming systems. - Simon Verhoeven, Cronos Explains all the main streaming concepts in a friendly way. Start with this one! - Cicero Zandona, Calypso Technologies

Simplify Big Data Analytics with Amazon EMR

Simplify Big Data Analytics with Amazon EMR is a thorough guide to harnessing Amazon's EMR service for big data processing and analytics. From distributed computation pipelines to real-time streaming analytics, this book provides hands-on knowledge and actionable steps for implementing data solutions efficiently. What this Book will help me do Understand the architecture and key components of Amazon EMR and how to deploy it effectively. Learn to configure and manage distributed data processing pipelines using Amazon EMR. Implement security and data governance best practices within the Amazon EMR ecosystem. Master batch ETL and real-time analytics techniques using technologies like Apache Spark. Apply optimization and cost-saving strategies to scalable data solutions. Author(s) Sakti Mishra is a seasoned data professional with extensive expertise in deploying scalable analytics solutions on cloud platforms like AWS. With a background in big data technologies and a passion for teaching, Sakti ensures practical insights accompany every concept. Readers will find his approach thorough, hands-on, and highly informative. Who is it for? This book is perfect for data engineers, data scientists, and other professionals looking to leverage Amazon EMR for scalable analytics. If you are familiar with Python, Scala, or Java and have some exposure to Hadoop or AWS ecosystems, this book will empower you to design and implement robust data pipelines efficiently.