#Compliance and #Privacy in #Health #Informatics by @BesaBauta

2018-08-02 · The Future of Data Podcast | conversation with leaders, influencers, and change makers in the World of Data & Analytics Listen

podcast_episode

by Besa Bauta (MercyFirst)

Analytics Big Data Data Governance Data Management Data Science

In this podcast @BesaBauta from MeryFirst talks about the compliance and privacy challenges faced in the hyper regulated industry. With her experience in health informatics, Besa shared some best practices and challenges faced by data science groups in health informatics and other similar groups in regulated space. This podcast is great for anyone looking to learn about data science compliance and privacy challenges.

TIMELINE: 0:28 Besa's journey. 6:05 Besa's current role. 9:30 Privacy and compliance in health informatics. 14:44 Are the current privacy regulations sufficient? 16:15 Data management in different organizations. 22:37 The negatives for compliance policies on data. 26:28 Hiring a good chief data officer. 30:20 Vetting a company as a CDO. 32:38 Challenges for a startup in the healthcare sector. 36:25 Common challenges for data officers in the healthcare sector. 38:29 Millenials and technology. 40:05 Leadership dealing with compliance policies. 46:26 Requirements for working in health informatics. 49:18 Ingredients of a perfect hire. 50:40 Besa's success mantra. 52:35 How does Besa stay updated? 54:37 Besa's favorite read. 57:04 Key takeaway. Besa's Recommended Read: The Art Of War by Sun Tzu and Lionel Giles https://amzn.to/2Jx2PYm

Podcast Link: https://futureofdata.org/compliance-and-privacy-in-health-informatics-by-besabauta/

Besa's BIO: Dr. Besa Bauta is the Chief Data Officer and Chief Compliance Officer for MercyFirst, a social service organization providing health and mental health services to children and adolescents in New York City. She oversees the Research, Evaluation, Analytics, and Compliance for Health (REACH) division, including data governance and security measures, analytics, risk mitigation, and policy initiatives. She is also an Adjunct Assistant Professor at NYU and previously worked as a Research Director for a USAID project in Afghanistan and as the Senior Director of Research and Evaluation at the Center for Evidence-Based Implementation and Research (CEBIR). She holds a Ph.D. in implementation science with a focus on health services, an MPH in Global Health, and an MSW. Her research has focused on health systems, mental health, and technology integration to improve population-level outcomes.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Expert GeoServer

2018-07-31 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ben Mearns

Cloud Computing data data-engineering geographic-information-system-gis geographic information system (gis) location-data

"Expert GeoServer" guides readers through the process of building, optimizing, and securing GeoServer-powered web mapping applications. By exploring concepts like spatial analysis platforms, tile caching, and secure authentication, this book equips you to create highly performant and secure geospatial applications. What this Book will help me do Learn to develop spatial analysis platforms using web processing services. Master tile caching to significantly enhance the speed of your mapping applications. Implement secure authentication to protect sensitive geospatial data. Optimize GeoServer for improved performance and resource utilization. Deploy your GeoServer-backed applications on modern cloud-hosting infrastructures. Author(s) None Mearns is an experienced software developer and geospatial technology expert. With a strong background in GeoServer implementation, None has helped organizations optimize and secure their geospatial platforms. Their writing aims to provide clear and actionable instructions for professionals and learners alike. Who is it for? This book is perfect for geospatial developers and professionals aiming to take their GeoServer skills to the next level. A basic understanding of GeoServer is assumed, as this guide tackles advanced topics like performance optimization and security. If you are looking to enhance the speed, usability, and security of your mapping applications, this is for you. Those aiming to confidently deploy production-ready applications will find it invaluable.

Mastering Kibana 6.x

2018-07-31 · O'Reilly Data Science Books O'Reilly Amazon

book

by Anurag Srivastava

AI/ML Analytics Big Data Data Analytics DevOps ELK Kibana data data-science data-science-tasks data-visualization

Mastering Kibana 6.x is your guide to leveraging Kibana for creating impactful data visualizations and insightful dashboards. From setting up basic visualizations to exploring advanced analytics and machine learning integrations, this book equips you with the necessary skills to dive deep into your data and gain actionable insights at scale. You'll also learn to effectively manage and monitor data with powerful tools such as X-Pack and Beats. What this Book will help me do Build sophisticated dashboards to visualize elastic stack data effectively. Understand and utilize Timelion expressions for analyzing time series data. Incorporate X-Pack capabilities to enhance security and monitoring in Kibana. Extract, analyze, and visualize data from Elasticsearch for advanced analytics. Set up monitoring and alerting using Beats components for reliable data operations. Author(s) With extensive experience in big data technologies, the author brings a practical approach to teaching advanced Kibana topics. Having worked on real-world data analytics projects, their aim is to make complex concepts accessible while showing how to tackle analytics challenges using Kibana. Who is it for? This book is ideal for data engineers, DevOps professionals, and data scientists who want to optimize large-scale data visualizations. If you're looking to manage Elasticsearch data through insightful dashboards and visual analytics, or enhance your data operations with features like machine learning, then this book is perfect for you. A basic understanding of the Elastic Stack is helpful, though not required.

Professional Azure SQL Database Administration

2018-07-31 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ahmad Osama

Azure Cloud Computing PowerShell SQL azure-sql-database data data-engineering relational-databases

Learn everything you need to manage Azure SQL Database with 'Professional Azure SQL Database Administration'. This book covers critical tasks such as migration, performance optimization, security, and disaster recovery. Perfect for those transitioning to the cloud, it equips you with skills to ensure your database runs smoothly and efficiently. What this Book will help me do Effectively migrate on-premise SQL Server databases to Azure. Master backup, restore, and security operations with Azure SQL Database. Optimize performance and scalability using monitoring and tuning techniques. Implement high availability and disaster recovery strategies. Simplify database management through automation and advanced techniques. Author(s) Ahmad Osama is a seasoned database admin and Azure expert with extensive experience in SQL Server and cloud database management. As a consultant and trainer, he has guided numerous organizations through cloud transitions. Ahmad's teaching philosophy blends practical insights with clear instruction. Who is it for? This book is intended for database administrators and developers looking to transition their skills to Azure SQL Database. If you have some experience with on-premise SQL Server and are familiar with PowerShell, you'll find this guide invaluable. Ideal for those wanting to develop, migrate, or manage Azure SQL solutions.

Getting Started with Kudu

2018-07-09 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Brock Noland (PhData) , Mladen Kovacevic , Jean-Marc Spaggiari , Ryan Bosshart

Analytics API Hadoop IoT Spark data data-engineering kudu

Fast data ingestion, serving, and analytics in the Hadoop ecosystem have forced developers and architects to choose solutions using the least common denominator—either fast analytics at the cost of slow data ingestion or fast data ingestion at the cost of slow analytics. There is an answer to this problem. With the Apache Kudu column-oriented data store, you can easily perform fast analytics on fast data. This practical guide shows you how. Begun as an internal project at Cloudera, Kudu is an open source solution compatible with many data processing frameworks in the Hadoop environment. In this book, current and former solutions professionals from Cloudera provide use cases, examples, best practices, and sample code to help you get up to speed with Kudu. Explore Kudu’s high-level design, including how it spreads data across servers Fully administer a Kudu cluster, enable security, and add or remove nodes Learn Kudu’s client-side APIs, including how to integrate Apache Impala, Spark, and other frameworks for data manipulation Examine Kudu’s schema design, including basic concepts and primitives necessary to make your project successful Explore case studies for using Kudu for real-time IoT analytics, predictive modeling, and in combination with another storage engine

Building Data Flows In Apache NiFi With Kevin Doran and Andy LoPresto - Episode 39

2018-07-08 · Data Engineering Podcast Listen

podcast_episode

by Andy LoPresto , Kevin Doran , Tobias Macey

Agile/Scrum Airflow Flink API Chef CSV Data Engineering Data Governance Data Management Dataflow DataOps DevOps +13 more

Summary

Data integration and routing is a constantly evolving problem and one that is fraught with edge cases and complicated requirements. The Apache NiFi project models this problem as a collection of data flows that are created through a self-service graphical interface. This framework provides a flexible platform for building a wide variety of integrations that can be managed and scaled easily to fit your particular needs. In this episode project members Kevin Doran and Andy LoPresto discuss the ways that NiFi can be used, how to start using it in your environment, and plans for future development. They also explained how it fits in the broad landscape of data tools, the interesting and challenging aspects of the project, and how to build new extensions.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute. Are you struggling to keep up with customer request and letting errors slip into production? Want to try some of the innovative ideas in this podcast but don’t have time? DataKitchen’s DataOps software allows your team to quickly iterate and deploy pipelines of code, models, and data sets while improving quality. Unlike a patchwork of manual operations, DataKitchen makes your team shine by providing an end to end DataOps solution with minimal programming that uses the tools you love. Join the DataOps movement and sign up for the newsletter at datakitchen.io/de today. After that learn more about why you should be doing DataOps by listening to the Head Chef in the Data Kitchen at dataengineeringpodcast.com/datakitchen Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. Your host is Tobias Macey and today I’m interviewing Kevin Doran and Andy LoPresto about Apache NiFi

Interview

Introduction How did you get involved in the area of data management? Can you start by explaining what NiFi is? What is the motivation for building a GUI as the primary interface for the tool when the current trend is to represent everything as code? How did you get involved with the project?

Where does it sit in the broader landscape of data tools?

Does the data that is processed by NiFi flow through the servers that it is running on (á la Spark/Flink/Kafka), or does it orchestrate actions on other systems (á la Airflow/Oozie)?

How do you manage versioning and backup of data flows, as well as promoting them between environments?

One of the advertised features is tracking provenance for data flows that are managed by NiFi. How is that data collected and managed?

What types of reporting are available across this information?

What are some of the use cases or requirements that lend themselves well to being solved by NiFi?

When is NiFi the wrong choice?

What is involved in deploying and scaling a NiFi installation?

What are some of the system/network parameters that should be considered? What are the scaling limitations?

What have you found to be some of the most interesting, unexpected, and/or challenging aspects of building and maintaining the NiFi project and community? What do you have planned for the future of NiFi?

Contact Info

Kevin Doran

@kevdoran on Twitter Email

Andy LoPresto

@yolopey on Twitter Email

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

NiFi HortonWorks DataFlow HortonWorks Apache Software Foundation Apple CSV XML JSON Perl Python Internet Scale Asset Management Documentum DataFlow NSA (National Security Agency) 24 (TV Show) Technology Transfer Program Agile Software Development Waterfall Spark Flink Kafka Oozie Luigi Airflow FluentD ETL (Extract, Transform, and Load) ESB (Enterprise Service Bus) MiNiFi Java C++ Provenance Kubernetes Apache Atlas Data Governance Kibana K-Nearest Neighbors DevOps DSL (Domain Specific Language) NiFi Registry Artifact Repository Nexus NiFi CLI Maven Archetype IoT Docker Backpressure NiFi Wiki TLS (Transport Layer Security) Mozilla TLS Observatory NiFi Flow Design System Data Lineage GDPR (General Data Protection Regulation)

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast

Apache Hive Essentials - Second Edition

2018-06-30 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Dayong Du

Big Data Hadoop Hive SQL apache-hive data data-engineering

"Apache Hive Essentials" provides a focused guide to mastering the essential techniques of processing and analyzing big data with Apache Hive. What this Book will help me do Set up and configure a Hive environment for big data analysis. Compose effective queries using Hive's SQL-like language to extract insights. Optimize Hive performance to handle complex datasets efficiently. Implement data security and user-defined functions to extend capabilities. Integrate Hive with Hadoop tools for comprehensive data solutions. Author(s) Dayong Du, the author of "Apache Hive Essentials," has years of experience working with big data technologies and tools. With hands-on expertise in Hadoop and the entire ecosystem, he brings a practical and informed perspective to this complex field. His approach is to make these technologies accessible to developers and analysts of all levels. Who is it for? This book is perfect for data analysts, developers, or professionals familiar with SQL who are looking to start with Apache Hive for big data processing. It is suitable for those acquainted with Hadoop and its environment and want to expand their skills into efficient data querying and management. Readers should have an interest in how to leverage big data tools for real-world solutions.

IBM Db2 11.1 Certification Guide

2018-06-28 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Robert (Kent) Collins , Mohankumar Saraswatipura

IBM data data-engineering ibm-db2 relational-databases

Delve into the IBM Db2 11.1 Certification Guide to comprehensively prepare for the IBM C2090-600 exam and master database programming and administration tasks in Db2 environments. Across its insightful chapters, this guide provides practical steps, expert guidance, and over 150 practice questions aimed at ensuring your success. What this Book will help me do Master Db2 server management, including configuration and maintenance tasks, to ensure optimized performance. Implement advanced features such as BLU Acceleration and Db2 pureScale to enhance database functionality. Gain expertise in security protocols, including data encryption and integrity enforcement, for secure database environments. Troubleshoot common Db2 issues using advanced diagnostic tools like db2pd and dsmtop, improving efficiency and uptime. Develop skills in creating and altering database objects, enabling robust database design and management. Author(s) The authors, None Collins and None Saraswatipura, are seasoned database professionals with vast experience in administering and optimizing Db2 environments. Their expertise in guiding students and professionals shines through in the accessible language and practical approach of the book. They bring a blend of theoretical and hands-on insights to ensure learners not only understand but also apply the knowledge effectively. Who is it for? This book is ideal for database administrators, architects, and application developers who are pursuing certification in Db2. It caters to readers with basic Db2 understanding seeking to advance their skills. Whether you're aiming for professional growth or practical expertise, this guide serves your goals by covering certification essentials while enriching your practical knowledge.

Practical Enterprise Data Lake Insights: Handle Data-Driven Challenges in an Enterprise Big Data Lake

2018-06-27 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Saurabh Gupta (The Modern Data Company) , Venkata Giri

Big Data Data Lake Hadoop Data Streaming data data-engineering data-lake storage-repositories

Use this practical guide to successfully handle the challenges encountered when designing an enterprise data lake and learn industry best practices to resolve issues. When designing an enterprise data lake you often hit a roadblock when you must leave the comfort of the relational world and learn the nuances of handling non-relational data. Starting from sourcing data into the Hadoop ecosystem, you will go through stages that can bring up tough questions such as data processing, data querying, and security. Concepts such as change data capture and data streaming are covered. The book takes an end-to-end solution approach in a data lake environment that includes data security, high availability, data processing, data streaming, and more. Each chapter includes application of a concept, code snippets, and use case demonstrations to provide you with a practical approach. You will learn the concept, scope, application, and starting point. What You'll Learn Get to know data lake architecture and design principles Implement data capture and streaming strategies Implement data processing strategies in Hadoop Understand the data lake security framework and availability model Who This Book Is For Big data architects and solution architects

Security on IBM z/VSE

2018-06-14 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Ingo Franzki Helmut Hellner Antoinette Kaschner, Joerg Schmidbauer, Heiko Schnell, Klaus-Dieter Wacker

IBM data data-engineering

Abstract One of a firm’s most valuable resources is its data: client lists, accounting data, employee information, and so on. This critical data must be securely managed and controlled, and simultaneously made available to those users authorized to see it. The IBM® z/VSE® system features extensive capabilities to simultaneously share the firm’s data among multiple users and protect them. Threats to this data come from various sources. Insider threats and malicious hackers are not only difficult to detect and prevent, they might be using resources with the business being unaware. This IBM Redbooks® publication was written to assist z/VSE support and security personnel in providing the enterprise with a safe, secure and manageable environment. This book provides an overview of the security that is provided by z/VSE and the processes for the implementation and configuration of z/VSE security components, Basic Security Manager (BSM), IBM CICS® security, TCP/IP security, single sign-on using LDAP, and connector security.

Microsoft SQL Server 2017 on Linux

2018-06-08 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Benjamin Nevarez

Docker Linux Microsoft SQL SQL Server data data-engineering microsoft-sql-server relational-databases

Essential Microsoft® SQL Server® 2017 installation, configuration, and management techniques for Linux Foreword by Kalen Delaney, Microsoft SQL Server MVP This comprehensive guide shows, step-by-step, how to set up, configure, and administer SQL Server 2017 on Linux for high performance and high availability. Written by a SQL Server expert and respected author, Microsoft SQL Server 2017 on Linux teaches valuable Linux skills to Windows-based SQL Server professionals. You will get clear coverage of both Linux and SQL Server and complete explanations of the latest features, tools, and techniques. The book offers clear instruction on adaptive query processing, automatic tuning, disaster recovery, security, and much more. •Understand how SQL Server 2017 on Linux works •Install and configure SQL Server on Linux •Run SQL Server on Docker containers •Learn Linux Administration •Troubleshoot and tune query performance in SQL Server •Learn what is new in SQL Server 2017 •Work with adaptive query processing and automatic tuning techniques •Implement high availability and disaster recovery for SQL Server on Linux •Learn the security features available in SQL Server

IBM z14 Model ZR1 Technical Guide

2018-06-06 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Hervey Kamga Octavian Lascu Frank Packheiser, Martijn Raave, John Troy, Bill White

Analytics Cloud Computing IBM data data-engineering

Abstract This IBM® Redbooks® publication describes the new member of the IBM Z® family, IBM z14™ Model ZR1 (Machine Type 3907). It includes information about the Z environment and how it helps integrate data and transactions more securely, and can infuse insight for faster and more accurate business decisions. The z14 ZR1 is a state-of-the-art data and transaction system that delivers advanced capabilities, which are vital to any digital transformation. The z14 ZR1 is designed for enhanced modularity, in an industry standard footprint. A data-centric infrastructure must always be available with a 99.999% or better availability, have flawless data integrity, and be secured from misuse. It also must be an integrated infrastructure that can support new applications. Finally, it must have integrated capabilities that can provide new mobile capabilities with real-time analytics that are delivered by a secure cloud infrastructure. IBM z14 ZR1 servers are designed with improved scalability, performance, security, resiliency, availability, and virtualization. The superscalar design allows z14 ZR1 servers to deliver a record level of capacity over the previous IBM Z platforms. In its maximum configuration, z14 ZR1 is powered by up to 30 client characterizable microprocessors (cores) running at 4.5 GHz. This configuration can run more than 29,000 million instructions per second and up to 8 TB of client memory. The IBM z14 Model ZR1 is estimated to provide up to 54% more total system capacity than the IBM z13s® Model N20. This Redbooks publication provides information about IBM z14 ZR1 and its functions, features, and associated software support. More information is offered in areas that are relevant to technical planning. It is intended for systems engineers, consultants, planners, and anyone who wants to understand the IBM Z servers functions and plan for their usage. It is intended as an introduction to mainframes. Readers are expected to be generally familiar with IBM Z technology and terminology.

ArangoDB: Fast, Scalable, and Multi-Model Data Storage with Jan Steeman and Jan Stücke - Episode 34

2018-06-04 · Data Engineering Podcast Listen

podcast_episode

by Jan Steeman (ArangoDB) , Jan Stücke (ArangoDB) , Tobias Macey

API Data Engineering Data Management Data Modelling GitHub JSON JSON Schema postgresql

Summary

Using a multi-model database in your applications can greatly reduce the amount of infrastructure and complexity required. ArangoDB is a storage engine that supports documents, dey/value, and graph data formats, as well as being fast and scalable. In this episode Jan Steeman and Jan Stücke explain where Arango fits in the crowded database market, how it works under the hood, and how you can start working with it today.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. Your host is Tobias Macey and today I’m interviewing Jan Stücke and Jan Steeman about ArangoDB, a multi-model distributed database for graph, document, and key/value storage.

Interview

Introduction How did you get involved in the area of data management? Can you give a high level description of what ArangoDB is and the motivation for creating it?

What is the story behind the name?

How is ArangoDB constructed?

How does the underlying engine store the data to allow for the different ways of viewing it?

What are some of the benefits of multi-model data storage?

When does it become problematic?

For users who are accustomed to a relational engine, how do they need to adjust their approach to data modeling when working with Arango? How does it compare to OrientDB? What are the options for scaling a running system?

What are the limitations in terms of network architecture or data volumes?

One of the unique aspects of ArangoDB is the Foxx framework for embedding microservices in the data layer. What benefits does that provide over a three tier architecture?

What mechanisms do you have in place to prevent data breaches from security vulnerabilities in the Foxx code? What are some of the most interesting or surprising uses of this functionality that you have seen?

What are some of the most challenging technical and business aspects of building and promoting ArangoDB? What do you have planned for the future of ArangoDB?

Contact Info

Jan Steemann

jsteemann on GitHub @steemann on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

ArangoDB Köln Multi-model Database Graph Algorithms Apache 2 C++ ArangoDB Foxx Raft Protocol Target Partners RocksDB AQL (ArangoDB Query Language) OrientDB PostGreSQL OrientDB Studio Google Spanner 3-Tier Architecture Thomson-Reuters Arango Search Dell EMC Google S2 Index ArangoDB Geographic Functionality JSON Schema

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast

Visualizing Streaming Data

2018-06-01 · O'Reilly Data Visualization Books O'Reilly Amazon

book

by Anthony Aragues

Dashboard IoT Data Streaming data data-engineering streaming-architecture streaming-messaging

While tools for analyzing streaming and real-time data are gaining adoption, the ability to visualize these data types has yet to catch up. Dashboards are good at conveying daily or weekly data trends at a glance, though capturing snapshots when data is transforming from moment to moment is more difficult—but not impossible. With this practical guide, application designers, data scientists, and system administrators will explore ways to create visualizations that bring context and a sense of time to streaming text data. Author Anthony Aragues guides you through the concepts and tools you need to build visualizations for analyzing data as it arrives. Determine your company’s goals for visualizing streaming data Identify key data sources and learn how to stream them Learn practical methods for processing streaming data Build a client application for interacting with events, logs, and records Explore common components for visualizing streaming data Consider analysis concepts for developing your visualization Define the dashboard’s layout, flow direction, and component movement Improve visualization quality and productivity through collaboration Explore use cases including security, IoT devices, and application data

Learning PHP, MySQL & JavaScript, 5th Edition

2018-05-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Robin Nixon

HTML JavaScript MySQL data data-engineering relational-databases

Build interactive, data-driven websites with the potent combination of open source technologies and web standards, even if you have only basic HTML knowledge. In this update to this popular hands-on guide, you’ll tackle dynamic web programming with the latest versions of today’s core technologies: PHP, MySQL, JavaScript, CSS, HTML5, and key jQuery libraries. Web designers will learn how to use these technologies together and pick up valuable web programming practices along the way—including how to optimize websites for mobile devices. At the end of the book, you’ll put everything together to build a fully functional social networking site suitable for both desktop and mobile browsers. Explore MySQL, from database structure to complex queries Use the MySQLi extension, PHP’s improved MySQL interface Create dynamic PHP web pages that tailor themselves to the user Manage cookies and sessions and maintain a high level of security Enhance the JavaScript language with jQuery and jQuery mobile libraries Use Ajax calls for background browser-server communication Style your web pages by acquiring CSS2 and CSS3 skills Implement HTML5 features, including geolocation, audio, video, and the canvas element Reformat your websites into mobile web apps

The Alooma Data Pipeline With CTO Yair Weinberger - Episode 33

2018-05-28 · Data Engineering Podcast Listen

podcast_episode

by Yair Weinberger (Alooma) , Tobias Macey

API Amazon EMR Kinesis Azure BigQuery Cassandra Cloud Computing Cloud Storage Data Collection Data Engineering Data Management Datadog +24 more

Summary

Building an ETL pipeline is a common need across businesses and industries. It’s easy to get one started but difficult to manage as new requirements are added and greater scalability becomes necessary. Rather than duplicating the efforts of other engineers it might be best to use a hosted service to handle the plumbing so that you can focus on the parts that actually matter for your business. In this episode CTO and co-founder of Alooma, Yair Weinberger, explains how the platform addresses the common needs of data collection, manipulation, and storage while allowing for flexible processing. He describes the motivation for starting the company, how their infrastructure is architected, and the challenges of supporting multi-tenancy and a wide variety of integrations.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute. For complete visibility into the health of your pipeline, including deployment tracking, and powerful alerting driven by machine-learning, DataDog has got you covered. With their monitoring, metrics, and log collection agent, including extensive integrations and distributed tracing, you’ll have everything you need to find and fix performance bottlenecks in no time. Go to dataengineeringpodcast.com/datadog today to start your free 14 day trial and get a sweet new T-Shirt. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. Your host is Tobias Macey and today I’m interviewing Yair Weinberger about Alooma, a company providing data pipelines as a service

Interview

Introduction How did you get involved in the area of data management? What is Alooma and what is the origin story? How is the Alooma platform architected?

I want to go into stream VS batch here What are the most challenging components to scale?

How do you manage the underlying infrastructure to support your SLA of 5 nines? What are some of the complexities introduced by processing data from multiple customers with various compliance requirements?

How do you sandbox user’s processing code to avoid security exploits?

What are some of the potential pitfalls for automatic schema management in the target database? Given the large number of integrations, how do you maintain the

What are some challenges when creating integrations, isn’t it simply conforming with an external API?

For someone getting started with Alooma what does the workflow look like? What are some of the most challenging aspects of building and maintaining Alooma? What are your plans for the future of Alooma?

Contact Info

LinkedIn @yairwein on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Alooma Convert Media Data Integration ESB (Enterprise Service Bus) Tibco Mulesoft ETL (Extract, Transform, Load) Informatica Microsoft SSIS OLAP Cube S3 Azure Cloud Storage Snowflake DB Redshift BigQuery Salesforce Hubspot Zendesk Spark The Log: What every software engineer should know about real-time data’s unifying abstraction by Jay Kreps RDBMS (Relational Database Management System) SaaS (Software as a Service) Change Data Capture Kafka Storm Google Cloud PubSub Amazon Kinesis Alooma Code Engine Zookeeper Idempotence Kafka Streams Kubernetes SOC2 Jython Docker Python Javascript Ruby Scala PII (Personally Identifiable Information) GDPR (General Data Protection Regulation) Amazon EMR (Elastic Map Reduce) Sequoia Capital Lightspeed Investors Redis Aerospike Cassandra MongoDB

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast

Data Ethics - The New Data Governance Challenge by Dave Wells - Audio Blog

2018-05-25 · Secrets of Data Analytics Leaders Listen

podcast_episode

by Dave Wells (Eckerson Group)

Data Governance

Ethics is challenging because right and good are not always clear. More data, more kinds of data, and advanced analysis of data often conflict with concerns of data privacy, security, anonymity, and ownership. Resolving these conflicts requires acknowledgment, discussion, and the hard work of defining ethics-based policies and creating a culture of ethical conduct.

Originally published at https://www.eckerson.com/articles/data-ethics-the-new-data-governance-challenge

A Deep Dive into NoSQL Databases: The Use Cases and Applications

2018-04-20 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Pethuru Raj , Ganesh Chandra Deka

Analytics Big Data Data Analytics Hadoop NoSQL SQL data data-engineering nosql-databases

A Deep Dive into NoSQL Databases: The Use Cases and Applications, Volume 109, the latest release in the Advances in Computers series first published in 1960, presents detailed coverage of innovations in computer hardware, software, theory, design and applications. In addition, it provides contributors with a medium in which they can explore their subjects in greater depth and breadth. This update includes sections on NoSQL and NewSQL databases for big data analytics and distributed computing, NewSQL databases and scalable in-memory analytics, NoSQL web crawler application, NoSQL Security, a Comparative Study of different In-Memory (No/New)SQL Databases, NoSQL Hands On-4 NoSQLs, the Hadoop Ecosystem, and more. Provides a very comprehensive, yet compact, book on the popular domain of NoSQL databases for IT professionals, practitioners and professors Articulates and accentuates big data analytics and how it gets simplified and streamlined by NoSQL database systems Sets a stimulating foundation with all the relevant details for NoSQL database researchers, developers and administrators

Consolidation Planning Workbook Practical Migration from x86 to IBM LinuxOne

2018-04-16 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Lydia Parziale

IBM Linux data data-engineering

IBM LinuxONE™ is a portfolio of hardware, software, and solutions for an enterprise-grade Linux environment. It is designed to run more transactions faster and with more security and reliability specifically for the open community. It fully embraces open source-based technology. This IBM® Redbooks® publication provides a technical sample workbook for IT organizations that are considering a migration from their x86 distributed servers to IBM LinuxONE. This book provides you with checklists for each facet of your migration to IBM LinuxONE. This IBM Redbooks workbook assists you by providing the following information: Choosing workloads to migrate Analysis of how to size workloads for migration Financial benefits of a migration Project definition Planning checklists

Implementing IBM FlashSystem V9000 AE3

2018-04-12 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Christian Karpp , Jon Herd , James Cioffi , Detlef Helmbrecht , Carsten Larsen , Jeffrey Irving , Adrian Orban , Volker Kiemes

Analytics Cloud Computing Data Management IBM SAS data data-engineering

Abstract The success or failure of businesses often depends on how well organizations use their data assets for competitive advantage. Deeper insights from data require better information technology. As organizations modernize their IT infrastructure to boost innovation rather than limit it, they need a data storage system that can keep pace with several areas that affect your business: Highly virtualized environments Cloud computing Mobile and social systems of engagement In-depth, real-time analytics Making the correct decision on storage investment is critical. Organizations must have enough storage performance and agility to innovate when they need to implement cloud-based IT services, deploy virtual desktop infrastructure, enhance fraud detection, and use new analytics capabilities. At the same time, future storage investments must lower IT infrastructure costs while helping organizations to derive the greatest possible value from their data assets. The IBM® FlashSystem V9000 is the premier, fully integrated, Tier 1, all-flash offering from IBM. It has changed the economics of today's data center by eliminating storage bottlenecks. Its software-defined storage features simplify data management, improve data security, and preserve your investments in storage. The IBM FlashSystem® V9000 SAS expansion enclosures provide new tiering options with read-intensive SSDs or nearline SAS HDDs. IBM FlashSystem V9000 includes IBM FlashCore® technology and advanced software-defined storage available in one solution in a compact 6U form factor. IBM FlashSystem V9000 improves business application availability. It delivers greater resource utilization so you can get the most from your storage resources, and achieve a simpler, more scalable, and cost-efficient IT Infrastructure. This IBM Redbooks® publication provides information about IBM FlashSystem V9000 Software V8.1. It describes the core product architecture, software, hardware, and implementation, and provides hints and tips. The underlying basic hardware and software architecture and features of the IBM FlashSystem V9000 AC3 control enclosure and on IBM Spectrum Virtualize 8.1 software are described in these publications: Implementing IBM FlashSystem 900 Model AE3, SG24-8414 Implementing the IBM System Storage SAN Volume Controller V7.4, SG24-7933 Using IBM FlashSystem V9000 software functions, management tools, and interoperability combines the performance of IBM FlashSystem architecture with the advanced functions of software-defined storage to deliver performance, efficiency, and functions that meet the needs of enterprise workloads that demand IBM MicroLatency® response time. This book offers IBM FlashSystem V9000 scalability concepts and guidelines for planning, installing, and configuring, which can help environments scale up and out to add more flash capacity and expand virtualized systems. Port utilization methodologies are provided to help you maximize the full potential of IBM FlashSystem V9000 performance and low latency in your scalable environment. This book is intended for pre-sales and post-sales technical support professionals, storage administrators, and anyone who wants to understand how to implement this exciting technology.

talk-data.com

Cyber Security

Activity Trend

Top Events

Top Speakers

#Compliance and #Privacy in #Health #Informatics by @BesaBauta

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Expert GeoServer

Mastering Kibana 6.x

Professional Azure SQL Database Administration

Getting Started with Kudu

Building Data Flows In Apache NiFi With Kevin Doran and Andy LoPresto - Episode 39

Apache Hive Essentials - Second Edition

IBM Db2 11.1 Certification Guide

Practical Enterprise Data Lake Insights: Handle Data-Driven Challenges in an Enterprise Big Data Lake

Security on IBM z/VSE

Microsoft SQL Server 2017 on Linux

IBM z14 Model ZR1 Technical Guide

ArangoDB: Fast, Scalable, and Multi-Model Data Storage with Jan Steeman and Jan Stücke - Episode 34

Visualizing Streaming Data

Learning PHP, MySQL & JavaScript, 5th Edition

The Alooma Data Pipeline With CTO Yair Weinberger - Episode 33

Data Ethics - The New Data Governance Challenge by Dave Wells - Audio Blog

A Deep Dive into NoSQL Databases: The Use Cases and Applications

Consolidation Planning Workbook Practical Migration from x86 to IBM LinuxOne

Implementing IBM FlashSystem V9000 AE3