talk-data.com talk-data.com

Topic

Cloud Computing

infrastructure saas iaas

4055

tagged

Activity Trend

471 peak/qtr
2020-Q1 2026-Q1

Activities

4055 activities · Newest first

Securing Your Cloud: IBM Security for LinuxONE

As workloads are being offloaded to IBM® LinuxONE based cloud environments, it is important to ensure that these workloads and environments are secure. This IBM Redbooks® publication describes the necessary steps to secure your environment from the hardware level through all of the components that are involved in a LinuxONE cloud infrastructure that use Linux and IBM z/VM®. The audience for this book is IT architects, IT Specialists, and those users who plan to use LinuxONE for their cloud environments.

Deploying a Database Instance in an IBM Cloud Private Cluster on IBM Z

This IBM® Redpaper™ publication shows you how to deploy a database instance within a container using an IBM Cloud™ Private cluster on IBM Z®. A preinstalled IBM Spectrum™ Scale 5.0.3 cluster file system provides back-end storage for the persistent volumes bound to the database. A container is a standard unit of software that packages code and all its dependencies, so the application runs quickly and reliably from one computing environment to another. By default, containers are ephemeral. However, stateful applications, such as databases, require some type of persistent storage that can survive service restarts or container crashes. IBM provides several products helping organizations build an environment on an IBM Z infrastructure to develop and manage containerized applications, including dynamic provisioning of persistent volumes. As an example for a stateful application, this paper describes how to deploy the relational database MariaDB using a Helm chart. The IBM Spectrum Scale V5.0.3 cluster file system is providing back-end storage for the persistent volumes. This document provides step-by-step guidance regarding how to install and configure the following components: IBM Cloud Private 3.1.2 (including Kubernetes) Docker 18.03.1-ce IBM Storage Enabler for Containers 2.0.0 and 2.1.0 This Redpaper demonstrates how we set up the example for a stateful application in our lab. The paper gives you insights about planning for your implementation. IBM Z server hardware, the IBM Z hypervisor z/VM®, and the IBM Spectrum Scale cluster file system are prerequisites to set up the example environment. The Redpaper is written with the assumption that you have familiarity with and basic knowledge of the software products used in setting up the environment. The intended audience includes the following roles: Storage administrators IT/Cloud administrators Technologists IT specialists

Operationalizing the Data Lake

Big data and advanced analytics have increasingly moved to the cloud as organizations pursue actionable insights and data-driven products using the growing amounts of information they collect. But few companies have truly operationalized data so it’s usable for the entire organization. With this pragmatic ebook, engineers, architects, and data managers will learn how to build and extract value from a data lake in the cloud and leverage the compute power and scalability of a cloud-native data platform to put your company’s vast data trove into action. Holden Ackerman and Jon King of Qubole take you through the basics of building a data lake operation, from people to technology, employing multiple technologies and frameworks in a cloud-native data platform. You'll dive into the tools and processes you need for the entire lifecycle of a data lake, from data preparation, storage, and management to distributed computing and analytics. You’ll also explore the unique role that each member of your data team needs to play as you migrate to your cloud-native data platform. Leverage your data effectively through a single source of truth Understand the importance of building a self-service culture for your data lake Define the structure you need to build a data lake in the cloud Implement financial governance and data security policies for your data lake through a cloud-native data platform Identify the tools you need to manage your data infrastructure Delineate the scope, usage rights, and best tools for each team working with a data lake—analysts, data scientists, data engineers, and security professionals, among others

Rebuilding Reliable Data Pipelines Through Modern Tools

When data-driven applications fail, identifying the cause is both challenging and time-consuming—especially as data pipelines become more and more complex. Hunting for the root cause of application failure from messy, raw, and distributed logs is difficult for performance experts and a nightmare for data operations teams. This report examines DataOps processes and tools that enable you to manage modern data pipelines efficiently. Author Ted Malaska describes a data operations framework and shows you the importance of testing and monitoring to plan, rebuild, automate, and then manage robust data pipelines—whether it’s in the cloud, on premises, or in a hybrid configuration. You’ll also learn ways to apply performance monitoring software and AI to your data pipelines in order to keep your applications running reliably. You’ll learn: How performance management software can reduce the risk of running modern data applications Methods for applying AI to provide insights, recommendations, and automation to operationalize big data systems and data applications How to plan, migrate, and operate big data workloads and data pipelines in the cloud and in hybrid deployment models

Send us a text What are your major concerns before flying on a trip? Would you ever give up your seat due to overbooking? How do airlines predict weather patterns and take proactive action to minimize delays? In this episode of Making Data Simple, Yianni Gamvros, Global Data Science Enablement Leader for IBM Watson and Cloud Platform, talks about how to use data science to better manage passenger flight experiences.   Show Notes 00.30 Connect with Al Martin on Twitter (@amartin_v) and LinkedIn (linkedin.com/in/al-martin-ku) 00.40 Connect with Yianni Gamvros on Twitter (@YGamvros), LinkedIn (linkedin.com/in/gamvros), or at datascience.ibm.com 00.50  Copyright and all rights reserved to Yanni. Song from Yanni Concert 2006 can be found here http://bit.ly/2iSlJJG  1:00 Learn more about Yanni the composer at http://www.yanni.com/welcome 08.25 Check the latest in weather and storm reports at https://weather.com/en-CA/ 23.50 Discover what Watson is doing in aviation at https://ibm.co/2yHDm5g or check out this video to see how data science and Watson can be used to better your flight experience: http://bit.ly/2j9kdGN 28.45 Find Negotiation Genius: How to Overcome Obstacles and Achieve Brilliant Results at the Bargaining Table and Beyond by Deepak Malhotra & Max Bazerman here: http://amzn.to/2zox6Dy 29.10 Find The Goal: A Process of Ongoing Improvement  by Eliyahu M Goldratt here: http://amzn.to/2iETTkd 29.40 Learn more about the IBM Watson Data Platform at https://ibm.co/2reqLmN or http://bit.ly/2iEUbYl   30.40 Copyright YanniVEVO, all rights reserved to Yanni. Song Name: The Rain Must Fall. Listen here, http://bit.ly/2iU3uDU Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

Professional Azure SQL Database Administration - Second Edition

Professional Azure SQL Database Administration serves as your comprehensive guide to mastering the management and optimization of cloud-based Azure SQL Database solutions. With the differences and unique features of Azure SQL Database compared to the on-premise SQL Server, this book offers a clear roadmap to efficiently migrate, secure, scale, and maintain these databases in the cloud. What this Book will help me do Understand the differences between Azure SQL Database and on-premise SQL Server and their practical implications. Learn techniques to migrate existing SQL Server databases to Azure SQL Database seamlessly. Discover advanced ways to optimize database performance and scalability leveraging cloud capabilities. Master security strategies for Azure SQL databases, including backup, disaster recovery, and automated tasks. Develop proficiency in using tools such as PowerShell to automate and manage routine database administration tasks. Author(s) Ahmad Osama is an experienced database professional and author specializing in SQL Server and Azure SQL Database administration. With a robust background in database migration, maintenance, and performance tuning, Ahmad expertly bridges the gap between theory and practice. His approachable writing style makes complex database topics accessible to professionals seeking to expand their expertise. Who is it for? Professional Azure SQL Database Administration is an essential resource for database administrators, developers, and IT professionals keen on developing their knowledge about Azure SQL Database administration and cloud database solutions. Whether you're transitioning from traditional SQL Server environments or looking to optimize your database strategies in the cloud, this book caters to professionals with intermediate to advanced experience in database management and programming with SQL.

Data Science with Python and Dask

Dask is a native parallel analytics tool designed to integrate seamlessly with the libraries you’re already using, including Pandas, NumPy, and Scikit-Learn. With Dask you can crunch and work with huge datasets, using the tools you already have. And Data Science with Python and Dask is your guide to using Dask for your data projects without changing the way you work! About the Technology An efficient data pipeline means everything for the success of a data science project. Dask is a flexible library for parallel computing in Python that makes it easy to build intuitive workflows for ingesting and analyzing large, distributed datasets. Dask provides dynamic task scheduling and parallel collections that extend the functionality of NumPy, Pandas, and Scikit-learn, enabling users to scale their code from a single laptop to a cluster of hundreds of machines with ease. About the Book Data Science with Python and Dask teaches you to build scalable projects that can handle massive datasets. After meeting the Dask framework, you’ll analyze data in the NYC Parking Ticket database and use DataFrames to streamline your process. Then, you’ll create machine learning models using Dask-ML, build interactive visualizations, and build clusters using AWS and Docker. What's Inside Working with large, structured and unstructured datasets Visualization with Seaborn and Datashader Implementing your own algorithms Building distributed apps with Dask Distributed Packaging and deploying Dask apps About the Reader For data scientists and developers with experience using Python and the PyData stack. About the Author Jesse Daniel is an experienced Python developer. He taught Python for Data Science at the University of Denver and leads a team of data scientists at a Denver-based media technology company. We interviewed Jesse as a part of our Six Questions series. Check it out here. Quotes The most comprehensive coverage of Dask to date, with real-world examples that made a difference in my daily work. - Al Krinker, United States Patent and Trademark Office An excellent alternative to PySpark for those who are not on a cloud platform. The author introduces Dask in a way that speaks directly to an analyst. - Jeremy Loscheider, Panera Bread A greatly paced introduction to Dask with real-world datasets. - George Thomas, R&D Architecture Manhattan Associates The ultimate resource to quickly get up and running with Dask and parallel processing in Python. - Gustavo Patino, Oakland University William Beaumont School of Medicine

Send us a text On this episode of Making Data Simple, host Al Martin chats with David Townsend, the head of design for IBM Data and AI. Before IBM, David was working as the design director, brand components and user experience for General Motors. His areas of design include multi-cloud, machine learning, AI, data visualization and augmented reality. Listen to learn about IBM design thinking and more.

Check us out on: - YouTube - Apple Podcasts - Google Play Music - Spotify - TuneIn - Stitcher

Show notes:  00:00 - Check out Making Data Simple on YouTube and SoundCloud. 00:05 - Connect with Producer Steve Moore on LinkedIn and Twitter. 00:10 - Connect with Producer Liam Seston on LinkedIn and Twitter. 00:15 - Connect with Producer Rachit Sharma on LinkedIn. 00:20 - Connect with Producer Lana Cosic on LinkedIn. 00:25 - Connect with Host Al Martin on LinkedIn and Twitter. 00:40 - Connect with David Townsend on LinkedIn. 06:35 - What is ICP for Data? 18:10 - What is Offering Management (OM)? 34:30 - What is Design Thinking? Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

IBM Spectrum Scale: Big Data and Analytics Solution Brief

This IBM® Redguide™ publication describes big data and analytics deployments that are built on IBM Spectrum Scale™. IBM Spectrum Scale is a proven enterprise-level distributed file system that is a high-performance and cost-effective alternative to Hadoop Distributed File System (HDFS) for Hadoop analytics services. IBM Spectrum Scale includes NFS, SMB, and Object services and meets the performance that is required by many industry workloads, such as technical computing, big data, analytics, and content management. IBM Spectrum Scale provides world-class, web-based storage management with extreme scalability, flash accelerated performance, and automatic policy-based storage tiering from flash through disk to the cloud, which reduces storage costs up to 90% while improving security and management efficiency in cloud, big data, and analytics environments. This Redguide publication is intended for technical professionals (analytics consultants, technical support staff, IT Architects, and IT Specialists) who are responsible for providing Hadoop analytics services and are interested in learning about the benefits of the use of IBM Spectrum Scale as an alternative to HDFS.

Summary Successful machine learning and artificial intelligence projects require large volumes of data that is properly labelled. The challenge is that most data is not clean and well annotated, requiring a scalable data labeling process. Ideally this process can be done using the tools and systems that already power your analytics, rather than sending data into a black box. In this episode Mark Sears, CEO of CloudFactory, explains how he and his team built a platform that provides valuable service to businesses and meaningful work to developing nations. He shares the lessons learned in the early years of growing the business, the strategies that have allowed them to scale and train their workforce, and the benefits of working within their customer’s existing platforms. He also shares some valuable insights into the current state of the art for machine learning in the real world.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! Integrating data across the enterprise has been around for decades – so have the techniques to do it. But, a new way of integrating data and improving streams has evolved. By integrating each silo independently – data is able to integrate without any direct relation. At CluedIn they call it “eventual connectivity”. If you want to learn more on how to deliver fast access to your data across the enterprise leveraging this new method, and the technologies that make it possible, get a demo or presentation of the CluedIn Data Hub by visiting dataengineeringpodcast.com/cluedin. And don’t forget to thank them for supporting the show! You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management.For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Coming up this fall is the combined events of Graphorum and the Data Architecture Summit. The agendas have been announced and super early bird registration for up to $300 off is available until July 26th, with early bird pricing for up to $200 off through August 30th. Use the code BNLLC to get an additional 10% off any pass when you register. Go to dataengineeringpodcast.com/conferences to learn more and take advantage of our partner discounts when you register. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Your host is Tobias Macey and today I’m interviewing Mark Sears about Cloud Factory, masters of the art and science of labeling data for Machine Learning and more

Interview

Introduction How did you get involved in the area of data management? Can you start by explaining what CloudFactory is and the story behind it? What are some of the common requirements

IBM FlashSystem 900 Model AE3 Product Guide

Today's global organizations depend on the ability to unlock business insights from massive volumes of data. Now, with IBM® FlashSystem 900 Model AE3, they can make faster decisions based on real-time insights. Thus, they unleash the power of demanding applications, including these: Online transaction processing (OLTP) and analytical databases Virtual desktop infrastructures (VDIs) Technical computing applications Cloud environments Easy to deploy and manage, IBM FlashSystem® 900 Model AE3 is designed to accelerate the applications that drive your business. Powered by IBM FlashCore® Technology, IBM FlashSystem Model AE3 provides the following characteristics: Accelerate business-critical workloads, real-time analytics, and cognitive applications with the consistent microsecond latency and extreme reliability of IBM FlashCore technology Improve performance and help lower cost with new inline data compression Help reduce capital and operational expenses with IBM enhanced 3D triple-level cell (3D TLC) flash Protect critical data assets with patented IBM Variable Stripe RAID™ Power faster insights with IBM FlashCore including hardware-accelerated nonvolatile memory (NVM) architecture, purpose-engineered IBM MicroLatency® modules and advanced flash management FlashSystem 900 Model AE3 can be configured in capacity points as low as 14.4 TB to 180 TB usable and up to 360 TB effective capacity after RAID 5 protection and compression. You can couple this product with either 16 Gbps, 8 Gbps Fibre Channel, 16 Gbps NVMe over Fibre Channel, or 40 Gbps InfiniBand connectivity. Thus, the IBM FlashSystem 900 Model AE3 provides extreme performance to existing and next generation infrastructure.

IBM Hybrid Solution for Scalable Data Solutions using IBM Spectrum Scale

This document is intended to facilitate the deployment of the scalable hybrid cloud solution for data agility and collaboration using IBM® Spectrum Scale across multiple public clouds. To complete the tasks it describes, you must understand IBM Spectrum Scale and IBM Spectrum Scale Active File Management (AFM). The information in this document is distributed on an basis without any warranty that is either expressed or implied. Support assistance for the use of this material is limited to situations where IBM Spectrum Scale or IBM Spectrum Scale Active File Management are supported and entitled, and where the issues are specific to a blueprint implementation.

Multicloud Storage as a Service using vRealize Automation and IBM Spectrum Storage

This document is intended to facilitate the deployment of the Multicloud Solution for Business Continuity and Storage as service by using IBM Spectrum Virtualize for Public Cloud on Amazon Web Services (AWS). To complete the tasks it describes, you must understand IBM FlashSystem 9100, IBM Spectrum Virtualize for Public Cloud, IBM Spectrum Connect, VMware vRealize Orchestrator, and vRealize Automation and AWS Cloud. The information in this document is distributed on an "as is" basis without any warranty that is either expressed or implied. Support assistance for the use of this material is limited to situations where IBM Storwize or IBM FlashSystem storage devices are supported and entitled and where the issues are specific to a blueprint implementation.

Summary Building a data platform that works equally well for data engineering and data science is a task that requires familiarity with the needs of both roles. Data engineering platforms have a strong focus on stateful execution and tasks that are strictly ordered based on dependency graphs. Data science platforms provide an environment that is conducive to rapid experimentation and iteration, with data flowing directly between stages. Jeremiah Lowin has gained experience in both styles of working, leading him to be frustrated with all of the available tools. In this episode he explains his motivation for creating a new workflow engine that marries the needs of data engineers and data scientists, how it helps to smooth the handoffs between teams working on data projects, and how the design lets you focus on what you care about while it handles the failure cases for you. It is exciting to see a new generation of workflow engine that is learning from the benefits and failures of previous tools for processing your data pipelines.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management.For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Coming up this fall is the combined events of Graphorum and the Data Architecture Summit. The agendas have been announced and super early bird registration for up to $300 off is available until July 26th, with early bird pricing for up to $200 off through August 30th. Use the code BNLLC to get an additional 10% off any pass when you register. Go to dataengineeringpodcast.com/conferences to learn more and take advantage of our partner discounts when you register. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Your host is Tobias Macey and today I’m interviewing Jeremiah Lowin about Prefect, a workflow platform for data engineering

Interview

Introduction How did you get involved in the area of data management? Can you start by explaining what Prefect is and your motivation for creating it? What are the axes along which a workflow engine can differentiate itself, and which of those have you focused on for Prefect? In some of your blog posts and your PyData presentation you discuss the concept of negative vs. positive engineering. Can you briefly outline what you mean by that and the ways that Prefect handles the negative cases for you? How is Prefect itself implemented and what tools or systems have you relied on most heavily for inspiration? How do you manage passing data between stages in a pipeline when they are running across distributed nodes? What was your decision making process when deciding to use Dask as your supported execution engine?

For tasks that require specific resources or dependencies how do you approach the idea of task affinity?

Does Prefect support managing tasks that bridge network boundaries? What are some of the features or capabilities of Prefect that are misunderstood or overlooked by users which you think should be exercised more often? What are the limitations of the open source core as compared to the cloud offering that you are building? What were your assumptions going into this project and how have they been challenged or updated as you dug deeper into the problem domain and received feedback from users? What are some of the most interesting/innovative/unexpected ways that you have seen Prefect used? When is Prefect the wrong choice? In your experience working on Airflow and Prefect, what are some of the common challenges and anti-patterns that arise in data engineering projects?

What are some best practices and industry trends that you are most excited by?

What do you have planned for the future of the Prefect project and company?

Contact Info

LinkedIn @jlowin on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Prefect Airflow Dask

Podcast Episode

Prefect Blog PyData Presentation Tensorflow Workflow Engine

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast

Streaming Data

Managers and staff responsible for planning, hiring, and allocating resources need to understand how streaming data can fundamentally change their organizations. Companies everywhere are disrupting business, government, and society by using data and analytics to shape their business. Even if you don’t have deep knowledge of programming or digital technology, this high-level introduction brings data streaming into focus. You won’t find math or programming details here, or recommendations for particular tools in this rapidly evolving space. But you will explore the decision-making technologies and practices that organizations need to process streaming data and respond to fast-changing events. By describing the principles and activities behind this new phenomenon, author Andy Oram shows you how streaming data provides hidden gems of information that can transform the way your business works. Learn where streaming data comes from and how companies put it to work Follow a simple data processing project from ingesting and analyzing data to presenting results Explore how (and why) big data processing tools have evolved from MapReduce to Kubernetes Understand why streaming data is particularly useful for machine learning projects Learn how containers, microservices, and cloud computing led to continuous integration and DevOps

Summary Building and maintaining a data lake is a choose your own adventure of tools, services, and evolving best practices. The flexibility and freedom that data lakes provide allows for generating significant value, but it can also lead to anti-patterns and inconsistent quality in your analytics. Delta Lake is an open source, opinionated framework built on top of Spark for interacting with and maintaining data lake platforms that incorporates the lessons learned at DataBricks from countless customer use cases. In this episode Michael Armbrust, the lead architect of Delta Lake, explains how the project is designed, how you can use it for building a maintainable data lake, and some useful patterns for progressively refining the data in your lake. This conversation was useful for getting a better idea of the challenges that exist in large scale data analytics, and the current state of the tradeoffs between data lakes and data warehouses in the cloud.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show! And to keep track of how your team is progressing on building new pipelines and tuning their workflows, you need a project management system designed by engineers, for engineers. Clubhouse lets you craft a workflow that fits your style, including per-team tasks, cross-project epics, a large suite of pre-built integrations, and a simple API for crafting your own. With such an intuitive tool it’s easy to make sure that everyone in the business is on the same page. Data Engineering Podcast listeners get 2 months free on any plan by going to dataengineeringpodcast.com/clubhouse today and signing up for a free trial. Support the show and get your data projects in order! You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Coming up this fall is the combined events of Graphorum and the Data Architecture Summit. The agendas have been announced and super early bird registration for up to $300 off is available until July 26th, with early bird pricing for up to $200 off through August 30th. Use the code BNLLC to get an additional 10% off any pass when you register. Go to dataengineeringpodcast.com/conferences to learn more and take advantage of our partner discounts when you register. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Your host is Tobias Macey and today I’m interviewing Michael Armbrust about Delta Lake, an open source storage layer that brings ACID transactions to Apache Spark and big data workloads.

Interview

Introduction How did you get involved in the area of data m

IBM Storage Solutions for Blockchain Platform Version 1.2

This Blueprint is intended to define the infrastructure that is required for a blockchain remote peer and to facilitate the deployment of IBM Blockchain Platform on IBM Cloud Private using that infrastructure. This infrastructure includes the necessary document handler components, such as IBM Blockchain Document Store, and covers the required storage for on-chain and off-chain blockchain data. To complete these tasks, you must have a basic understanding of each of the used components or have access the correct educational material to gain that knowledge.

IBM FlashSystem A9000 Product Guide (Version 12.3.2)

This IBM® Redbooks® Product Guide is an overview of the main characteristics, features, and technology that are used in IBM FlashSystem® A9000Model 425, with IBM FlashSystem A9000 Software V12.3.2. Software version 12.3.2, with Hyper-Scale Manager version 5.6 or later, introduces support for VLAN tagging and port trunking. IBM FlashSystem A9000 storage system uses the IBM FlashCore® technology to help realize higher capacity and improved response times over disk-based systems and other competing flash and solid-state drive (SSD)-based storage. The extreme performance of IBM FlashCore technology with a grid architecture and comprehensive data reduction creates one powerful solution. Whether you are a service provider who requires highly efficient management or an enterprise that is implementing cloud on a budget, FlashSystem A9000 provides consistent and predictable microsecond response times and the simplicity that you need. The A9000 features always on data reduction and now offers intelligent capacity management for deduplication. As a cloud optimized solution, FlashSystem A9000 suits the requirements of public and private cloud providers who require features, such as inline data deduplication, multi-tenancy, and quality of service. It also uses powerful software-defined storage capabilities from IBM Spectrum™ Accelerate, such as Hyper-Scale technology, VMware, and storage container integration.

IBM FlashSystem A9000R Product Guide (Version 12.3.2)

This IBM® Redbooks® Product Guide is an overview of the main characteristics, features, and technology that are used in IBM FlashSystem® A9000R Model 415 and Model 425, with IBM FlashSystem A9000R Software V12.3.2. Software version 12.3.2, with Hyper-Scale Manager version 5.6 or later, introduces support for VLAN tagging and port trunking.. IBM FlashSystem A9000R is a grid-scale, all-flash storage platform designed for industry leaders with rapidly growing cloud storage and mixed workload environments to help drive your business into the cognitive era. FlashSystem A9000R provides consistent, extreme performance for dynamic data at scale, integrating the microsecond latency and high availability of IBM FlashCore® technology. The rack-based offering comes integrated with the world class software features that are built with IBM Spectrum™ Accelerate. For example, comprehensive data reduction, including inline pattern removal, data deduplication, and compression, helps lower total cost of ownership (TCO) while the grid architecture and IBM Hyper-Scale framework simplify and automate storage administration. The A9000R features always on data reduction and now offers intelligent capacity management for deduplication. Ready for the cloud and well-suited for large deployments, FlashSystem A9000R delivers predictable high performance and ultra-low latency, even under heavy workloads with full data reduction enabled. As a result, the grid-scale architecture maintains this performance by automatically self-optimizing workloads across all storage resources without manual intervention.

Send us a text How did companies like Facebook and Airbnb get so big so fast? What can we learn from them? Why is data so important for growth? Nancy Hensley, Director of Strategy & Growth for IBM Hybrid Cloud, has the answers in this episode of Making Data Simple. Learn how you can use growth hacking strategies to build your business and why growth hacking isn't just for startups. Show Notes 00:25 Connect with Al Martin on Twitter (@amartin_v) and LinkedIn (linkedin.com/in/al-martin-ku) 00:36 Connect with Nancy Hensley on Twitter (@nancykoppdw) and LinkedIn (linkedin.com/in/nancyhensley) 03:30 Explore The Growth Hacker: The next VP of Marketing by Andrew Chen here: http://bit.ly/104Xa0r  03:55 Read Hacking Growth by Sean Ellis & Morgan Brown here: http://growthhacker.com/ 04:55 Visit the Jagermeister website for more information on their company and product: https://www.jagermeister.com/en-CA (must be legal age) 22:15 Find Hooked: How to Build Habit-Forming Products by Nir Eyal here: http://amzn.to/2geOTlp 31:50 Find Rework by Jason Fried here: http://amzn.to/2xIU08B Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.