Cloud Computing

Evolutionary Computation in Scheduling

2020-05-19 · O'Reilly Data Science Books O'Reilly Amazon

book

by Ali Emrouznejad , Amir H. Gandomi , Mo M. Jamshidi , Iman Rahimi , Kalyanmoy Deb

data data-science

Presents current developments in the field of evolutionary scheduling and demonstrates the applicability of evolutionary computational techniques to solving scheduling problems This book provides insight into the use of evolutionary computations (EC) in real-world scheduling, showing readers how to choose a specific evolutionary computation and how to validate the results using metrics and statistics. It offers a spectrum of real-world optimization problems, including applications of EC in industry and service organizations such as healthcare scheduling, aircraft industry, school timetabling, manufacturing systems, and transportation scheduling in the supply chain. It also features problems with different degrees of complexity, practical requirements, user constraints, and MOEC solution approaches. Evolutionary Computation in Scheduling starts with a chapter on scientometric analysis to analyze scientific literature in evolutionary computation in scheduling. It then examines the role and impacts of ant colony optimization (ACO) in job shop scheduling problems, before presenting the application of the ACO algorithm in healthcare scheduling. Other chapters explore task scheduling in heterogeneous computing systems and truck scheduling using swarm intelligence, application of sub-population scheduling algorithm in multi-population evolutionary dynamic optimization, task scheduling in cloud environments, scheduling of robotic disassembly in remanufacturing using the bees algorithm, and more. This book: Provides a representative sampling of real-world problems currently being tackled by practitioners Examines a variety of single-, multi-, and many-objective problems that have been solved using evolutionary computations, including evolutionary algorithms and swarm intelligence Consists of four main parts: Introduction to Scheduling Problems, Computational Issues in Scheduling Problems, Evolutionary Computation, and Evolutionary Computations for Scheduling Problems Evolutionary Computation in Scheduling is ideal for engineers in industries, research scholars, advanced undergraduates and graduate students, and faculty teaching and conducting research in Operations Research and Industrial Engineering.

Power Up Your PostgreSQL Analytics With Swarm64

2020-05-18 · Data Engineering Podcast Listen

podcast_episode

by Thomas Richter (Swarm64) , Tobias Macey

AI/ML Analytics Data Engineering Data Management DataOps DWH ETL/ELT Kubernetes PagerDuty Webhooks postgresql

Summary The PostgreSQL database is massively popular due to its flexibility and extensive ecosystem of extensions, but it is still not the first choice for high performance analytics. Swarm64 aims to change that by adding support for advanced hardware capabilities like FPGAs and optimized usage of modern SSDs. In this episode CEO and co-founder Thomas Richter discusses his motivation for creating an extension to optimize Postgres hardware usage, the benefits of running your analytics on the same platform as your application, and how it works under the hood. If you are trying to get more performance out of your database then this episode is for you!

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, a 40Gbit public network, fast object storage, and a brand new managed Kubernetes platform, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. And for your machine learning workloads, they’ve got dedicated CPU and GPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! You monitor your website to make sure that you’re the first to know when something goes wrong, but what about your data? Tidy Data is the DataOps monitoring platform that you’ve been missing. With real time alerts for problems in your databases, ETL pipelines, or data warehouse, and integrations with Slack, Pagerduty, and custom webhooks you can fix the errors before they become a problem. Go to dataengineeringpodcast.com/tidydata today and get started for free with no credit card required. Your host is Tobias Macey and today I’m interviewing Thomas Richter about Swarm64, a PostgreSQL extension to improve parallelism and add support for FPGAs

Interview

Introduction How did you get involved in the area of data management? Can you start by explaining what Swarm64 is?

How did the business get started and what keeps you motivated?

What are some of the common bottlenecks that users of postgres run into? What are the use cases and workloads that gain the most benefit from increased parallelism in the database engine? By increasing the processing throughput of the database, how does that impact disk I/O and what are some options for avoiding bottlenecks in the persistence layer? Can you describe how Swarm64 is implemented?

How has the product evolved since you first began working on it?

How has the evolution of postgres impacted your product direction?

What are some of the notable challenges that you have dealt with as a result of upstream changes in postgres?

How has the hardware landscape evolved and how does that affect your prioritization of features and improvements? What are some of the other extensions in the postgres ecosystem that are most commonly used alongside Swarm64?

Which extensions conflict with yours and how does that impact potential adoption?

In addition to your work to optimize performance of the postres engine, you also provide support for using an FPGA as a co-processor. What are the benefits that an FPGA provides over and above a CPU or GPU architecture?

What are the available options for provisioning hardware in a datacenter or the cloud that has access to an FPGA? Most people are familiar with the relevant attributes for selecting a CPU or GPU, what are the specifications that they should be looking at when selecting an FPGA?

For users who are adopting Swarm64, how does it impact the way they should be thinking of their data models? What is involved in migrating an existing database to use Swarm64? What are some of the most interesting, unexpected, or

Optimize the Value of Your Data with Oracle and IBM Flash Storage Solutions

2020-05-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by IBM

IBM Oracle Cyber Security data data-engineering

In this multicloud and cognitive era, information continues to grow rapidly. By 2025, IDC says worldwide data will grow by 61% to 175 zettabytes, with as much of the data in data centers as in the cloud. IT environments with Oracle deployments will need to accommodate that data growth, including storing, copying, mirroring, and protecting the data. When IT budgets are constrained but data keeps growing, storage costs can consume more than their fair share of the IT budget. The leading-edge portfolio of storage solutions and essential technologies of IBM® can help organizations stay ahead of the information explosion. Designed with built-in efficiency, these solutions represent preferred practices that address the following main storage objectives for hybrid multicloud environments: Stop storing so much Store more with what you have. Move Oracle and related data to balance performance and efficiency IBM offers true enterprise class storage support for Oracle deployments at a low total cost of ownership (TCO). With flash disk, tape, storage network hardware, consolidated management console, software-defined storage solutions, and security software, IBM can provide Oracle customers the full spectrum of products to meet their availability, retention, security, and compliance requirements.

IBM AIX Enhancements and Modernization

2020-05-13 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Liviu Rosca , Scott Vetter , Navdeep Dhaliwal , Ahmed Mashhour , Armin Röll

IBM Cyber Security Terraform data data-engineering

This IBM® Redbooks publication is a comprehensive guide that covers the IBM AIX® operating system (OS) layout capabilities, distinct features, system installation, and maintenance, which includes AIX security, trusted environment, and compliance integration, with the benefits of IBM Power Virtualization Management (PowerVM®) and IBM Power Virtualization Center (IBM PowerVC), which includes cloud capabilities and automation types. The objective of this book is to introduce IBM AIX modernization features and integration with different environments: General AIX enhancements AIX Live Kernel Update individually or using Network Installation Manager (NIM) AIX security features and integration AIX networking enhancements PowerVC integration and features for cloud environments AIX deployment using IBM Terraform and IBM Cloud Automation Manager AIX automation that uses configuration management tools PowerVM enhancements and features Latest disaster recovery (DR) solutions AIX Logical Volume Manager (LVM) and Enhanced Journaled File System (JFS2) AIX installation and maintenance techniques

StreamNative Brings Streaming Data To The Cloud Native Landscape With Pulsar

2020-05-11 · Data Engineering Podcast Listen

podcast_episode

by Sijie Guo (StreamNative) , Tobias Macey

AI/ML Data Engineering Data Management DataOps DWH ETL/ELT Kafka Kubernetes PagerDuty Data Streaming Webhooks

Summary There have been several generations of platforms for managing streaming data, each with their own strengths and weaknesses, and different areas of focus. Pulsar is one of the recent entrants which has quickly gained adoption and an impressive set of capabilities. In this episode Sijie Guo discusses his motivations for spending so much of his time and energy on contributing to the project and growing the community. His most recent endeavor at StreamNative is focused on combining the capabilities of Pulsar with the cloud native movement to make it easier to build and scale real time messaging systems with built in event processing capabilities. This was a great conversation about the strengths of the Pulsar project, how it has evolved in recent years, and some of the innovative ways that it is being used. Pulsar is a well engineered and robust platform for building the core of any system that relies on durable access to easily scalable streams of data.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, a 40Gbit public network, fast object storage, and a brand new managed Kubernetes platform, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. And for your machine learning workloads, they’ve got dedicated CPU and GPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! You monitor your website to make sure that you’re the first to know when something goes wrong, but what about your data? Tidy Data is the DataOps monitoring platform that you’ve been missing. With real time alerts for problems in your databases, ETL pipelines, or data warehouse, and integrations with Slack, Pagerduty, and custom webhooks you can fix the errors before they become a problem. Go to dataengineeringpodcast.com/tidydata today and get started for free with no credit card required. Your host is Tobias Macey and today I’m interviewing Sijie Guo about the current state of the Pulsar framework for stream processing and his experiences building a managed offering for it at StreamNative

Interview

Introduction How did you get involved in the area of data management? Can you start by giving an overview of what Pulsar is?

How did you get involved with the project?

What is Pulsar’s role in the lifecycle of data and where does it fit in the overall ecosystem of data tools? How has the Pulsar project evolved or changed over the past 2 years?

How has the overall state of the ecosystem influenced the direction that Pulsar has taken?

One of the critical elements in the success of a piece of technology is the ecosystem that grows around it. How has the community responded to Pulsar, and what are some of the barriers to adoption?

How are you and other project leaders addressing those barriers?

You were a co-founder at Streamlio, which was built on top of Pulsar, and now you have founded StreamNative to offer Pulsar as a service. What did you learned from your time at Streamlio that has been most helpful in your current endeavor?

How would you characterize your relationship with the project and community in each role?

What motivates you to dedicate so much of your time and enery to Pulsar in particular, and the streaming data ecosystem in general?

Why is streaming data such an important capability? How have projects such as Kafka and Pulsar impacted the broader software and data landscape?

What are some of the most interesting, innovative, or unexpected ways that you have seen Pulsar used? When is Pulsar the wrong choice? What do you have planned for the future of S

Implementing IBM Spectrum Virtualize for Public Cloud Version 8.3

2020-05-11 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jordan Fincher , Nicolo Lorenzoni , Angelo Bernasconi , Jimmy John , Gucer Vasfi , Eric Goodall , Jackson Shea , Pierluigi Buratti

AWS IBM data data-engineering

IBM® Spectrum Virtualize is a key member of the IBM Spectrum™ Storage portfolio. It is a highly flexible storage solution that enables rapid deployment of block storage services for new and traditional workloads, on-premises, off-premises and in a combination of both. IBM Spectrum Virtualize™ for Public Cloud provides the IBM Spectrum Virtualize functionality in IBM Cloud™. This new capability provides a monthly license to deploy and use Spectrum Virtualize in IBM Cloud to enable hybrid cloud solutions, offering the ability to transfer data between on-premises private clouds or data centers and the public cloud. This IBM Redpaper™ publication gives a broad understanding of IBM Spectrum Virtualize for Public Cloud architecture and provides planning and implementation details of the common use cases for this product. This publication helps storage and networking administrators plan and implement install, tailor, and configure IBM Spectrum Virtualize for Public Cloud offering. It also provides a detailed description of troubleshooting tips. IBM Spectrum Virtualize is also available on AWS. For more information, see Implementation guide for IBM Spectrum Virtualize for Public Cloud on AWS, REDP-5534.

Introducing Microsoft SQL Server 2019

2020-04-27 · O'Reilly SQL Books O'Reilly Amazon

book

by James Rowland-Jones , Mitchell Pearson , Arun Sirpal , Dave Noderer , Dustin Ryan , Kellyn Gorman , Buck Woody , Allan Hirt

Analytics Azure BI Big Data Data Management Docker Hadoop HDFS Kubernetes Microsoft NoSQL Power BI +4 more

Introducing Microsoft SQL Server 2019 is the must-have guide for database professionals eager to leverage the latest advancements in SQL Server 2019. This book covers the features and capabilities that make SQL Server 2019 a powerful tool for managing and analyzing data both on-premises and in the cloud. What this Book will help me do Understand the new features introduced in SQL Server 2019 and their practical applications. Confidently manage and analyze relational, NoSQL, and big data within SQL Server 2019. Implement containerization for SQL Server using Docker and Kubernetes. Migrate and integrate your databases effectively to use Power BI Report Server. Query data from Hadoop Distributed File System with Azure Data Studio. Author(s) The authors of 'Introducing Microsoft SQL Server 2019' are subject matter experts including Kellyn Gorman, Allan Hirt, and others. With years of professional experience in database management and SQL Server, they bring a wealth of practical insight and knowledge to the book. Their experience spans roles as administrators, architects, and educators in the field. Who is it for? This book is aimed at database professionals such as DBAs, architects, and big data engineers who are currently using earlier versions of SQL Server or other database platforms. It is particularly well-suited for professionals aiming to understand and implement SQL Server 2019's new features. Readers should have basic familiarity with SQL Server and RDBMS concepts. If you're looking to explore SQL Server 2019 to improve data management and analytics in your organization, this book is for you.

Streaming Integration

2020-04-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Alok Pareek (Striim) , Steve Wilkes

Hadoop Data Streaming data data-engineering streaming-architecture streaming-messaging

Data is being generated at an unrelenting pace, and data storage capacity can’t keep up. Enterprises must modernize the way they use and manage data by collecting, processing, and analyzing it in real time—in other words, streaming. This practical report explains everything organizations need to know to begin their streaming integration journey and make the most of their data. Authors Steve Wilkes and Alok Pareek detail the key attributes and components of an enterprise-grade streaming integration platform, along with stream processing and analysis techniques that will help companies reap immediate value from their data and solve their most pressing business challenges. Learn how to collect and handle large volumes of data at scale See how streams move data between threads, processes, servers, and data centers Get your data in the form you need and analyze it in real time Dive into the pros and cons of data targets such as databases, Hadoop, and cloud services for specific use cases Ensure your streaming integration infrastructure scales, is secure, works 24/7, and can handle failure

The Evolving Role of the Data Engineer

2020-04-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Andy Oram

Data Engineering data data-engineering

Companies working to become data driven often view data scientists as heroes, but that overlooks the vital role that data engineers play in the process. While data scientists focus on finding new insights from datasets, data engineers deal with preparation—obtaining, cleaning, and creating enhanced versions of the data an organization needs. In this report, Andy Oram examines how the role of data engineer has quickly evolved. DBAs, software engineers, developers, and students will explore the responsibilities of modern data engineers and the skills and tools necessary to do the job. You’ll learn how to deal with software engineering concepts such as rapid and continuous development, automation and orchestration, modularity, and traceability. Decision makers considering a move to the cloud will also benefit from the in-depth discussion this report provides. This report covers: Major tasks of data engineers today The different levels of structure in data and ways to maximize its value Capabilities of third-party cloud options Tools for ingestion, transfer, and enrichment Using containers and VMs to run the tools Software engineering development Automation and orchestration of data engineering

IBM DS8000 Encryption for data at rest, Transparent Cloud Tiering, and Endpoint Security (DS8000 Release 9.0)

2020-04-20 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Anreas Reinhardt , Tony Eriksson. Lisa Martinez , Bert Dufrasne

Cloud Storage IBM Cyber Security data data-engineering

IBM® experts recognize the need for data protection, both from hardware or software failures, and from physical relocation of hardware, theft, and retasking of existing hardware. The IBM DS8000® supports encryption-capable hard disk drives (HDDs) and flash drives. These Full Disk Encryption (FDE) drive sets are used with key management services that are provided by IBM Security Key Lifecycle Manager software or Gemalto SafeNet KeySecure to allow encryption for data at rest. Use of encryption technology involves several considerations that are critical for you to understand to maintain the security and accessibility of encrypted data. Failure to follow the requirements that are described in the IBM Redpaper can result in an encryption deadlock. Starting with Release 8.5 code, the DS8000 also supports Transparent Cloud Tiering (TCT) data object encryption. With TCT encryption, data is encrypted before it is transmitted to the cloud. The data remains encrypted in cloud storage and is decrypted after it is transmitted back to the IBM DS8000. Starting with DS8000 Release 9.0, the DS8900F provides Fibre Channel Endpoint Security when communicating with an IBM z15™, which supports link authentication and the encryption of data that is in-flight. For more information, see IBM Fibre Channel Endpoint Security for IBM DS8900F and IBM Z, SG24-8455. This edition focuses on IBM Security Key Lifecycle Manager Version 3.0.1.3 or later, which enables support Key Management Interoperability Protocol (KMIP) with the DS8000 Release 9.0 code or later and updated DS GUI for encryption functions.

IBM z15 Technical Introduction

2020-04-14 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Bill White

Agile/Scrum Analytics IBM data data-engineering

This IBM® Redbooks® publication introduces the latest member of the IBM Z® platform, the IBM z15™. It includes information about the Z environment and how it helps integrate data and transactions more securely. It also provides insight for faster and more accurate business decisions. The z15 is a state-of-the-art data and transaction system that delivers advanced capabilities, which are vital to any digital transformation. The z15 is designed for enhanced modularity, and occupies an industry-standard footprint. It is offered as a single air-cooled 19-inch frame called the z15 T02, or as a multi-frame (1 to 4 19-inch frames) called the z15 T01. Both z15 models excel at the following tasks: Using hybrid multicloud integration services Securing and protecting data with encryption everywhere Providing resilience with key to zero downtime Transforming a transactional platform into a data powerhouse Getting more out of the platform with IT Operational Analytics Accelerating digital transformation with agile service delivery Revolutionizing business processes Blending open source and IBM Z technologies This book explains how this system uses innovations and traditional Z strengths to satisfy growing demand for cloud, analytics, and open source technologies. With the z15 as the base, applications can run in a trusted, reliable, and secure environment that improves operations and lessens business risk.

IBM Spectrum Scale CSI Driver for Container Persistent Storage

2020-04-10 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Muthu Muthiah , Pravin P. Kudav , Abhishek Jain , Deepak Ghuge , Kedar Karmarkar , Andrew Beattie , Daniel de Souza Casali , Sandeep R Patil , Harald Seipp , Yadavendra Yadav , Smita Raut

Data Management HDFS IBM Kubernetes MongoDB data data-engineering postgresql

IBM® Spectrum Scale is a proven, scalable, high-performance data and file management solution. It provides world-class storage management with extreme scalability, flash accelerated performance, automatic policy-based storage that has tiers of flash through disk to tape. It also provides support for various protocols, such as NFS, SMB, Object, HDFS, and iSCSI. Containers can leverage the performance, information lifecycle management (ILM), scalability, and multisite data management to give the full flexibility on storage as they experience on the runtime. Container adoption is increasing in all industries, and they sprawl across multiple nodes on a cluster. The effective management of containers is necessary because their number will probably reach a far greater number than virtual machines today. Kubernetes is the standard container management platform currently being used. Data management is of ultimate importance, and often is forgotten because the first workloads containerized are ephemeral. For data management, many drivers with different specifications were available. A specification named Container Storage Interface (CSI) was created and is now adopted by all major Container Orchestrator Systems available. Although other container orchestration systems exist, Kubernetes became the standard framework for container management. It is a very flexible open source platform used as the base for most cloud providers and software companies' container orchestration systems. Red Hat OpenShift is one of the most reliable enterprise-grade container orchestration systems based on Kubernetes, designed and optimized to easily deploy web applications and services. OpenShift enables developers to focus on the code, while the platform takes care of all of the complex IT operations and processes. This IBM Redbooks® publication describes how the CSI Driver for IBM file storage enables IBM Spectrum® Scale to be used as persistent storage for stateful applications running in Kubernetes clusters. Through the Container Storage Interface Driver for IBM file storage, Kubernetes persistent volumes (PVs) can be provisioned from IBM Spectrum Scale. Therefore, the containers can be used with stateful microservices, such as database applications (MongoDB, PostgreSQL, and so on).

Advice for a Successful Career Path feat. Alyse Daghelian

2020-04-08 · Making Data Simple Listen

podcast_episode

by Alyse Daghelian (IBM) , Al Martin (IBM)

IBM

Send us a text NOTE: This episode was recorded before the COVID-19 outbreak. Any comments made in this episode on travel are no longer relevant or took place during ordered quarantines. Please stay home and be safe. Want to be featured as a guest on Making Data Simple? Reach out to us at [[email protected]] and tell us why you should be next. Abstract Our guest this week is Alyse Daghelian, Global Vice President Cloud Expert Services at IBM. Alyse has gone through many role transitions, from starting in engineering to being the lead for global sales at IBM. She provides some insightful advice that can help you take the next step towards your career goals. Tune-in to learn more. Connect with Alyse LinkedIn Twitter Show Notes 01:26 - You should always be looking for a new job. Click here for a Forbes article that explains why. 10:57 - Check out this article on medium that explains the impact of dehydration on the brain. 27:49 - Learn more about the fundamental life skills that poker can tech you here. Connect with the Team Producer Liam Seston - LinkedIn. Producer Lana Cosic - LinkedIn. Producer Meighann Helene - LinkedIn. Producer Kate Brown - LinkedIn. Producer Allison Proctor - LinkedIn. Producer Mark Simmonds - LinkedIn. Producer Michael Sestak - LinkedIn. Host Al Martin - LinkedIn and Twitter. Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

IBM Storage for Red Hat OpenShift Blueprint Version 1 Release 4

2020-04-02 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by IBM

DevOps IBM Cyber Security data data-engineering

IBM Storage for Red Hat OpenShift is a comprehensive container-ready solution that includes all the hardware & software components necessary to setup and/or expand your Red Hat OpenShift environment. This blueprint includes Red Hat OpenShift Container Platform and uses Container Storage Interface (CSI) standards. IBM Storage brings enterprise data services to containers. In this blueprint, learn how to: · Combine the benefits of IBM Systems with the performance of IBM Storage solutions so that you can deliver the right services to your clients today! · Build a 24 by 7 by 365 enterprise class private cloud with Red Hat OpenShift Container Platform utilizing new open source Container Storage interface (CSI) drivers · Leverage enterprise class services such as NVMe based flash performance, high data availability, and advanced container security IBM Storage for Red Hat OpenShift Container Platform is designed for your DevOps environment for on-premises deployment with easy-to-consume components built to perform and scale for your enterprise. Simplify your journey to cloud with pre-tested and validated blueprints engineered to enable rapid deployment and peace of mind as you move to a hybrid multicloud environment. You now have the capabilities.

Behind The Scenes Of The Linode Object Storage Service

2020-03-23 · Data Engineering Podcast Listen

podcast_episode

by Will Smith (Linode) , Tobias Macey

AI/ML API Big Data Data Engineering Data Management GitHub Kubernetes Linux S3 Data Streaming

Summary There are a number of platforms available for object storage, including self-managed open source projects. But what goes on behind the scenes of the companies that run these systems at scale so you don’t have to? In this episode Will Smith shares the journey that he and his team at Linode recently completed to bring a fast and reliable S3 compatible object storage to production for your benefit. He discusses the challenges of running object storage for public usage, some of the interesting ways that it was stress tested internally, and the lessons that he learned along the way.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, a 40Gbit public network, fast object storage, and a brand new managed Kubernetes platform, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. And for your machine learning workloads, they’ve got dedicated CPU and GPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Corinium Global Intelligence, ODSC, and Data Council. Go to dataengineeringpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today. Your host is Tobias Macey and today I’m interviewing Will Smith about his work on building object storage for the Linode cloud platform

Interview

Introduction How did you get involved in the area of data management? Can you start by giving an overview of the current state of your object storage product?

What was the motivating factor for building and managing your own object storage system rather than building an integration with another offering such as Wasabi or Backblaze?

What is the scale and scope of usage that you had to design for? Can you describe how your platform is implemented?

What was your criteria for deciding whether to use an available platform such as Ceph or MinIO vs building your own from scratch? How have your initial assumptions about the operability and maintainability of your installation been challenged or updated since it has been released to the public?

What have been the biggest challenges that you have faced in designing and deploying a system that can meet the scale and reliability requirements of Linode? What are the most important capabilities for the underlying hardware that you are running on? What supporting systems and tools are you using to manage the availability and durability of your object storage? How did you approach the rollout of Linode’s object storage to gain the confidence that you needed to feel comfortable with full scale usage? What are some of the benefits that you have gained internally at Linode from having an object storage system available to your product teams? What are your thoughts on the state of the S3 API as a de facto standard for object storage? What is your main focus now that object storage is being rolled out to more data centers?

Contact Info

Dorthu on GitHub dorthu22 on Twitter LinkedIn Website

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Linode Object Storage Xen Hypervisor KVM (Linux K

Building A New Foundation For CouchDB

2020-03-17 · Data Engineering Podcast Listen

podcast_episode

by Adam Kocoloski , Tobias Macey

AI/ML Analytics Big Data ClickHouse Data Engineering Data Lake Data Management DevOps DWH GitHub IBM Kubernetes +3 more

Summary CouchDB is a distributed document database built for scale and ease of operation. With a built-in synchronization protocol and a HTTP interface it has become popular as a backend for web and mobile applications. Created 15 years ago, it has accrued some technical debt which is being addressed with a refactored architecture based on FoundationDB. In this episode Adam Kocoloski shares the history of the project, how it works under the hood, and how the new design will improve the project for our new era of computation. This was an interesting conversation about the challenges of maintaining a large and mission critical project and the work being done to evolve it.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, a 40Gbit public network, fast object storage, and a brand new managed Kubernetes platform, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. And for your machine learning workloads, they’ve got dedicated CPU and GPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! Are you spending too much time maintaining your data pipeline? Snowplow empowers your business with a real-time event data pipeline running in your own cloud account without the hassle of maintenance. Snowplow takes care of everything from installing your pipeline in a couple of hours to upgrading and autoscaling so you can focus on your exciting data projects. Your team will get the most complete, accurate and ready-to-use behavioral web and mobile data, delivered into your data warehouse, data lake and real-time streams. Go to dataengineeringpodcast.com/snowplow today to find out why more than 600,000 websites run Snowplow. Set up a demo and mention you’re a listener for a special offer! Setting up and managing a data warehouse for your business analytics is a huge task. Integrating real-time data makes it even more challenging, but the insights you obtain can make or break your business growth. You deserve a data warehouse engine that outperforms the demands of your customers and simplifies your operations at a fraction of the time and cost that you might expect. You deserve ClickHouse, the open-source analytical database that deploys and scales wherever and whenever you want it to and turns data into actionable insights. And Altinity, the leading software and service provider for ClickHouse, is on a mission to help data engineers and DevOps managers tame their operational analytics. Go to dataengineeringpodcast.com/altinity for a free consultation to find out how they can help you today. You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Corinium Global Intelligence, ODSC, and Data Council. Upcoming events include the Software Architecture Conference in NYC, Strata Data in San Jose, and PyCon US in Pittsburgh. Go to dataengineeringpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today. Your host is Tobias Macey and today I’m interviewing Adam Kocoloski about CouchDB and the work being done to migrate the storage layer to FoundationDB

Interview

Introduction How did you get involved in the area of data management? Can you starty by describing what CouchDB is?

How did you get involved in the CouchDB project and what is your current role in the community?

What are the use cases that it is well suited for? Can you share some of the history of CouchDB and its role in the NoSQL movement? How is CouchDB currently architected and how has it evolved since it was first introduced? What have been the benefits and challenges of Erlang as the runtime for CouchDB? How is the current storage engine implemented and what are its shortcomings? What problems are you trying to solve by replatforming on a new storage layer?

What were the selection criteria for the new storage engine and how did you structure the decision making process? What was the motivation for choosing FoundationDB as opposed to other options such as rocksDB, levelDB, etc.?

How is the adoption of FoundationDB going to impact the overall architecture and implementation of CouchDB? How will the use of FoundationDB impact the way that the current capabilities are implemented, such as data replication? What will the migration path be for people running an existing installation? What are some of the biggest challenges that you are facing in rearchitecting the codebase? What new capabilities will the FoundationDB storage layer enable? What are some of the most interesting/unexpected/innovative ways that you have seen CouchDB used?

What new capabilities or use cases do you anticipate once this migration is complete?

What are some of the most interesting/unexpected/challenging lessons that you have learned while working with the CouchDB project and community? What is in store for the future of CouchDB?

Contact Info

LinkedIn @kocolosk on Twitter kocolosk on GitHub

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Apache CouchDB FoundationDB

Podcast Episode

IBM Cloudant Experimental Particle Physics FPGA == Field Programmable Gate Array Apache Software Foundation CRDT == Conflict-free Replicated Data Type

Podcast Episode

Erlang Riak RabbitMQ Heisenbug Kubernetes Property Based Testing

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast

IBM DS8000 and Transparent Cloud Tiering

2020-03-17 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Eddie Lin , Andreas Reinhardt , Bert Dufrasne , Alexander Warmuth

Cloud Storage IBM data data-engineering

This IBM® Redbooks® publication gives a broad understanding of storage clouds and the initial functionality that was introduced for mainframes to have Transparent Cloud Tiering. IBM DFSMS and the IBM DS8000 added functionality to provide elements of serverless data movement, and for IBM z/OS® to communicate with a storage cloud. The function is known as Transparent Cloud Tiering and is composed of the following key elements: A gateway in the DS8000, which allows the movement of data to and from Object Storage by using a network connection, with the option to encrypt data in the Cloud. DFSMShsm enhancements to support Migrate and Recall functions to and from the Object Storage. Other commands were enhanced to monitor and report on the new functionality. DFSMShsm uses the Web Enablement toolkit for z/OS to create and access the metadata for specific clouds, containers, and objects. DFSMSdss enhancements to provide some basic backup and restore functions to and from the cloud. The IBM TS7700 can also be set up to act as if it were cloud storage from the DS8000 perspective. This IBM Redbooks publication is divided into the following parts: Part 1 provides you with an introduction to clouds. Part 2 shows you how we set up the Transparent Cloud Tiering in a controlled laboratory and how the new functions work. We provide points to consider to help you set up your storage cloud and integrate it into your operational environment. Part 3 shows you how we used the new functionality to communicate with the cloud and to send data and retrieve data from it..

IBM TS7700 R5.0 Cloud Storage Tier Guide

2020-03-17 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Alberto Barajas Ortiz , Larry Coyne , Taisei Takai , Tomoaki Ogino , Michael Scott , Derek Erdmann , Joe Hew , Joe Swingler , Sosuke Matsui

Cloud Storage IBM S3 cloud-storage data data-engineering storage-repositories

Building on over 20 years of virtual tape experience, the TS7700 (TS7760, TS7770) now supports the ability to store virtual tape volumes in an object store. This IBM® Redpaper publication helps you set up and configure the cloud object storage support for IBM Cloud™ Object Storage (COS) or Amazon Simple Storage Service (Amazon S3). The TS7700 supported off loading to physical tape for over two decades. Off loading to physical tape behind a TS7700 is used by hundreds of organizations around the world. By using the same hierarchical storage techniques, the TS7700 can also off load to object storage. Because object storage is cloud-based and accessible from different regions, the TS7700 Cloud Storage Tier support essentially allows the cloud to be an extension of the grid. In this IBM Redpaper publication, we provide a brief overview of cloud technology with an emphasis on Object Storage. Object Storage is used by a broad set of technologies, including those technologies that are exclusive to IBM Z®. The aim of this publication is to provide a basic understanding of cloud, Object Storage, and different ways it can be integrated into your environment. This Redpaper is intended for system architects and storage administrators with TS7700 experience who want to add the support of a Cloud Storage Tier to their TS7700 solution. Note: As of this writing, the TS7700C supports the ability to offload to on-premise cloud with IBM Cloud Object Storage and public cloud with Amazon S3.

SQL Server 2019 Administration Inside Out

2020-03-06 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by William Assaf , Sven Aelterman , Melody Zacharias , Randolph West , Joseph D'Antoni , Louis Davidson

API Azure Big Data Linux PowerShell SQL data data-engineering microsoft-sql-server relational-databases

Conquer SQL Server 2019 administration–from the inside out Dive into SQL Server 2019 administration–and really put your SQL Server DBA expertise to work. This supremely organized reference packs hundreds of timesaving solutions, tips, and workarounds–all you need to plan, implement, manage, and secure SQL Server 2019 in any production environment: on-premises, cloud, or hybrid. Six experts thoroughly tour DBA capabilities available in SQL Server 2019 Database Engine, SQL Server Data Tools, SQL Server Management Studio, PowerShell, and Azure Portal. You’ll find extensive new coverage of Azure SQL, big data clusters, PolyBase, data protection, automation, and more. Discover how experts tackle today’s essential tasks–and challenge yourself to new levels of mastery. Explore SQL Server 2019’s toolset, including the improved SQL Server Management Studio, Azure Data Studio, and Configuration Manager Design, implement, manage, and govern on-premises, hybrid, or Azure database infrastructures Install and configure SQL Server on Windows and Linux Master modern maintenance and monitoring with extended events, Resource Governor, and the SQL Assessment API Automate tasks with maintenance plans, PowerShell, Policy-Based Management, and more Plan and manage data recovery, including hybrid backup/restore, Azure SQL Database recovery, and geo-replication Use availability groups for high availability and disaster recovery Protect data with Transparent Data Encryption, Always Encrypted, new Certificate Management capabilities, and other advances Optimize databases with SQL Server 2019’s advanced performance and indexing features Provision and operate Azure SQL Database and its managed instances Move SQL Server workloads to Azure: planning, testing, migration, and post-migration

Implementing and Managing a High-performance Enterprise Infrastructure with Nutanix on IBM Power Systems

2020-03-03 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Dino Quintero , Ricardo Dobelin Barros , Slav Martinski , Gabriel Padilla Jimenez , Alan Verdugo Munoz , Ismael Solis Moreno , Luis Daniel Gonzalez Diaz , Miguel Gomez Gonzalez

Agile/Scrum AI/ML Analytics Big Data IBM data data-engineering ibm-power-systems

This IBM® Redbooks® publication describes how to implement and manage a hyperconverged private cloud solution by using theoretical knowledge, hands-on exercises, and documenting the findings by way of sample scenarios. This book also is a guide about how to implement and manage a high-performance enterprise infrastructure and private cloud platform for big data, artificial intelligence, and transactional and analytics workloads on IBM Power Systems. This book use available documentation, hardware, and software resources to meet the following goals: Document the web-scale architecture that demonstrates the simple and agile nature of public clouds. Showcase the hyperconverged infrastructure to help cloud native applications mine cognitive analytics workloads. Conduct and document implementation case studies. Document guidelines to help provide an optimal system configuration, implementation, and management. This publication addresses topics for developers, IT architects, IT specialists, sellers, and anyone that wants to implement and manage a high-performance enterprise infrastructure and private cloud platform on IBM Power Systems. This book also provides documentation to transfer the how-to-skills to the technical teams, and solution guidance to the sales team. This book compliments any documentation that is available in IBM Knowledge Center, and aligns with the educational materials that are provided by the IBM Systems Software Education (SSE).

talk-data.com

Activity Trend

Top Events

Top Speakers

Evolutionary Computation in Scheduling

Power Up Your PostgreSQL Analytics With Swarm64

Optimize the Value of Your Data with Oracle and IBM Flash Storage Solutions

IBM AIX Enhancements and Modernization

StreamNative Brings Streaming Data To The Cloud Native Landscape With Pulsar

Implementing IBM Spectrum Virtualize for Public Cloud Version 8.3

Introducing Microsoft SQL Server 2019

Streaming Integration

The Evolving Role of the Data Engineer

IBM DS8000 Encryption for data at rest, Transparent Cloud Tiering, and Endpoint Security (DS8000 Release 9.0)

IBM z15 Technical Introduction

IBM Spectrum Scale CSI Driver for Container Persistent Storage

Advice for a Successful Career Path feat. Alyse Daghelian

IBM Storage for Red Hat OpenShift Blueprint Version 1 Release 4

Behind The Scenes Of The Linode Object Storage Service

Building A New Foundation For CouchDB

IBM DS8000 and Transparent Cloud Tiering

IBM TS7700 R5.0 Cloud Storage Tier Guide

SQL Server 2019 Administration Inside Out

Implementing and Managing a High-performance Enterprise Infrastructure with Nutanix on IBM Power Systems