talk-data.com talk-data.com

Topic

Cloud Computing

infrastructure saas iaas

4055

tagged

Activity Trend

471 peak/qtr
2020-Q1 2026-Q1

Activities

4055 activities · Newest first

Hands-On Data Warehousing with Azure Data Factory

Dive into the world of ETL (Extract, Transform, Load) with 'Hands-On Data Warehousing with Azure Data Factory'. This book guides readers through the essential techniques for working with Azure Data Factory and SQL Server Integration Services to design, implement, and optimize ETL solutions for both on-premises and cloud data environments. What this Book will help me do Understand and utilize Azure Data Factory and SQL Server Integration Services to build ETL solutions. Design scalable and high-performance ETL architectures tailored to modern data problems. Integrate various Azure services, such as Azure Data Lake Analytics, Machine Learning, and Databricks Spark, into your workflows. Troubleshoot and optimize ETL pipelines and address common challenges in data processing. Create insightful Power BI dashboards to visualize and interact with data from your ETL workflows. Author(s) Authors None Cote, Michelle Gutzait, and Giuseppe Ciaburro bring a wealth of experience in data engineering and cloud technologies to this practical guide. Combining expertise in Azure ecosystem and hands-on Data Warehousing, they deliver actionable insights for working professionals. Who is it for? This book is crafted for software professionals working in data engineering, especially those specializing in ETL processes. Readers with a foundational knowledge of SQL Server and cloud infrastructures will benefit most. If you aspire to implement state-of-the-art ETL pipelines or enhance existing workflows with ADF and SSIS, this book is an ideal resource.

Introducing Microsoft Flow: Automating Workflows Between Apps and Services

Use Microsoft Flow in your business to improve productivity through automation with this step-by-step introductory text from a Microsoft Flow expert. You’ll see the prerequisites to get started with this cloud-based service, including how to create a flow and how to use different connectors. Introducing Microsoft Flow takes you through connecting with SharePoint, creating approval flows, and using mobile apps. This vital information gives you a head-start when planning your Microsoft Flow implementation. The second half of the book continues with managing connections and gateways, where you’ll cover the configuration, creation, and deletion of connectors and how to connect to a data gateway. The final topic is Flow administration and techniques to manage the environment. After reading this book, you will be able to create and manage Flow from desktop, laptop, or mobile devices and connect with multiple services such as SharePoint, Twitter, Facebook, and other networking sites. What You Will Learn Create flows from built-in and blank templates Manage flows, connections, and gateways Create approvals, connect with multiple services, and use mobile apps Who This Book Is For Administrators and those who are interested in creating automated workflows using templates and connecting with multiple services without writing a single line of code.

Summary

Building an ETL pipeline is a common need across businesses and industries. It’s easy to get one started but difficult to manage as new requirements are added and greater scalability becomes necessary. Rather than duplicating the efforts of other engineers it might be best to use a hosted service to handle the plumbing so that you can focus on the parts that actually matter for your business. In this episode CTO and co-founder of Alooma, Yair Weinberger, explains how the platform addresses the common needs of data collection, manipulation, and storage while allowing for flexible processing. He describes the motivation for starting the company, how their infrastructure is architected, and the challenges of supporting multi-tenancy and a wide variety of integrations.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute. For complete visibility into the health of your pipeline, including deployment tracking, and powerful alerting driven by machine-learning, DataDog has got you covered. With their monitoring, metrics, and log collection agent, including extensive integrations and distributed tracing, you’ll have everything you need to find and fix performance bottlenecks in no time. Go to dataengineeringpodcast.com/datadog today to start your free 14 day trial and get a sweet new T-Shirt. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. Your host is Tobias Macey and today I’m interviewing Yair Weinberger about Alooma, a company providing data pipelines as a service

Interview

Introduction How did you get involved in the area of data management? What is Alooma and what is the origin story? How is the Alooma platform architected?

I want to go into stream VS batch here What are the most challenging components to scale?

How do you manage the underlying infrastructure to support your SLA of 5 nines? What are some of the complexities introduced by processing data from multiple customers with various compliance requirements?

How do you sandbox user’s processing code to avoid security exploits?

What are some of the potential pitfalls for automatic schema management in the target database? Given the large number of integrations, how do you maintain the

What are some challenges when creating integrations, isn’t it simply conforming with an external API?

For someone getting started with Alooma what does the workflow look like? What are some of the most challenging aspects of building and maintaining Alooma? What are your plans for the future of Alooma?

Contact Info

LinkedIn @yairwein on Twitter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Alooma Convert Media Data Integration ESB (Enterprise Service Bus) Tibco Mulesoft ETL (Extract, Transform, Load) Informatica Microsoft SSIS OLAP Cube S3 Azure Cloud Storage Snowflake DB Redshift BigQuery Salesforce Hubspot Zendesk Spark The Log: What every software engineer should know about real-time data’s unifying abstraction by Jay Kreps RDBMS (Relational Database Management System) SaaS (Software as a Service) Change Data Capture Kafka Storm Google Cloud PubSub Amazon Kinesis Alooma Code Engine Zookeeper Idempotence Kafka Streams Kubernetes SOC2 Jython Docker Python Javascript Ruby Scala PII (Personally Identifiable Information) GDPR (General Data Protection Regulation) Amazon EMR (Elastic Map Reduce) Sequoia Capital Lightspeed Investors Redis Aerospike Cassandra MongoDB

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast

In this podcast, Justin Borgman talks about his journey of starting a data science start, doing an exit, and jumping on another one. The session is filled with insights for leadership, looking for entrepreneurial wisdom to get on a data-driven journey.

Timeline: 0:28 Justin's journey. 3:22 Taking the plunge to start a new company. 5:49 Perception vs. reality of starting a data warehouse company. 8:15 Bringing in something new to the IT legacy. 13:20 Getting your first few customers. 16:16 Right moment for a data warehouse company to look for a new venture. 18:20 Right person to have as a co-founder. 20:29 Advantages of going seed vs. series A. 22:13 When is a company ready for seeding or series A? 24:40 Who's a good adviser? 26:35 Exiting Teradata. 28:54 Teradata to starting a new company. 31:24 Excitement of starting something from scratch. 32:24 What is Starburst? 37:15 Presto, a great engine for cloud platforms. 40:30 How can a company get started with Presto. 41:50 Health of enterprise data. 44:15 Where does Presto not fit in? 45:19 Future of enterprise data. 46:36 Drawing parallels between proprietary space and open source space. 49:02 Does align with open-source gives a company a better chance in seeding. 51:44 John's ingredients for success. 54:05 John's favorite reads. 55:01 Key takeaways.

Paul's Recommended Read: The Outsiders Paperback – S. E. Hinton amzn.to/2Ai84Gl

Podcast Link: https://futureofdata.org/running-a-data-science-startup-one-decision-at-a-time-futureofdata-podcast/

Justin's BIO: Justin has spent the better part of a decade in senior executive roles building new businesses in the data warehousing and analytics space. Before co-founding Starburst, Justin was Vice President and General Manager at Teradata (NYSE: TDC), where he was responsible for the company’s portfolio of Hadoop products. Prior to joining Teradata, Justin was co-founder and CEO of Hadapt, the pioneering "SQL-on-Hadoop" company that transformed Hadoop from file system to analytic database accessible to anyone with a BI tool. Teradata acquired Hadapt in 2014.

Justin earned a BS in Computer Science from the University of Massachusetts at Amherst and an MBA from the Yale School of Management.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Matplotlib for Python Developers - Second Edition

"Matplotlib for Python Developers" is your comprehensive guide to creating interactive and informative data visualizations using the Matplotlib library in Python. This book covers all the essentials-from building static plots to integrating dynamic graphics with web applications. What this Book will help me do Design and customize stunning data visualizations including heatmaps and scatter plots. Integrate Matplotlib visualization seamlessly into GUI applications using GTK3 or Qt. Utilize advanced plotting libraries like Seaborn and GeoPandas for enhanced visual representation. Develop web-based dashboards and plots that dynamically update using Django. Master techniques to prepare your Matplotlib projects for deployment in a cloud-based environment. Author(s) Authors Aldrin Yim, Claire Chung, and Allen Yu are seasoned developers and data scientists with extensive experience in Python and data visualization. They bring a practical touch to technical concepts, aiming to bridge theory with hands-on applications. With such a skilled team behind this book, you'll gain both foundational knowledge and advanced insights into Matplotlib. Who is it for? This book is the ideal resource for Python developers and data analysts looking to enhance their data visualization skills. If you're familiar with Python and want to create engaging, clear, and dynamic visualizations, this book will give you the tools to achieve that. Designed for a range of expertise, from beginners understanding the basics to experienced users diving into complex integrations, this book has something for everyone. You'll be guided through every step, ensuring you build the confidence and skills needed to thrive in this area.

In this podcast, Wayne Eckerson and Joe Caserta discuss data migration, compare cloud offerings from Amazon, Google, and Microsoft, and define and explain artificial intelligence.

You can contact Caserta by visiting caserta.com or by sending him an email to [email protected]. Follow him on Twitter @joe_caserta.

Caserta is President of a New York City-based consulting firm he founded in 2001 and a longtime data guy. In 2004, Joe teamed up with data warehousing legend, Ralph Kimball to write to write the book The Data Warehouse ETL Toolkit. Today he’s now one of the leading authorities on big data implementations. This makes Joe one of the few individuals with in-the-trenches experience on both sides of the data divide, traditional data warehousing on relational databases and big data implementations on Hadoop and the cloud.

Creating a Data-Driven Enterprise in Media

The data-driven revolution is finally hitting the media and entertainment industry. For decades, broadcast television and print media relied on traditional delivery channels for solvency and growth, but those channels fragmented as cable, streaming, and digital devices stole the show. In this ebook, you’ll learn about the trends, challenges, and opportunities facing players in this industry as they tackle big data, advanced analytics, and DataOps. You’ll explore best practices and lessons learned from three real-world media companies—Sling TV, Turner Broadcasting, and Comcast—as they proceed on their data-driven journeys. Along the way, authors Ashish Thusoo and Joydeep Sen Sarma explain how DataOps breaks down silos and connects everyone who handles data, including engineers, data scientists, analysts, and business users. Big-data-as-a-service provider Qubole provides a five-step maturity model that outlines the phases that a company typically goes through when it first encounters big data. Case studies include: Sling TV: this live streaming content platform delivers live TV and on-demand entertainment instantly to a variety of smart televisions, tablets, game consoles, computers, smartphones, and streaming devices Turner Broadcasting System: this Time Warner division recently created the Turner Data Cloud to support direct-to-consumer services, including FilmStruck, Boom (for kids), and NBA League Pass Comcast: the largest broadcasting and cable TV company is building a single integrated big data platform to deliver internet, TV, and voice to more than 28 million customers

Implementing IBM FlashSystem V9000 AE3

Abstract The success or failure of businesses often depends on how well organizations use their data assets for competitive advantage. Deeper insights from data require better information technology. As organizations modernize their IT infrastructure to boost innovation rather than limit it, they need a data storage system that can keep pace with several areas that affect your business: Highly virtualized environments Cloud computing Mobile and social systems of engagement In-depth, real-time analytics Making the correct decision on storage investment is critical. Organizations must have enough storage performance and agility to innovate when they need to implement cloud-based IT services, deploy virtual desktop infrastructure, enhance fraud detection, and use new analytics capabilities. At the same time, future storage investments must lower IT infrastructure costs while helping organizations to derive the greatest possible value from their data assets. The IBM® FlashSystem V9000 is the premier, fully integrated, Tier 1, all-flash offering from IBM. It has changed the economics of today's data center by eliminating storage bottlenecks. Its software-defined storage features simplify data management, improve data security, and preserve your investments in storage. The IBM FlashSystem® V9000 SAS expansion enclosures provide new tiering options with read-intensive SSDs or nearline SAS HDDs. IBM FlashSystem V9000 includes IBM FlashCore® technology and advanced software-defined storage available in one solution in a compact 6U form factor. IBM FlashSystem V9000 improves business application availability. It delivers greater resource utilization so you can get the most from your storage resources, and achieve a simpler, more scalable, and cost-efficient IT Infrastructure. This IBM Redbooks® publication provides information about IBM FlashSystem V9000 Software V8.1. It describes the core product architecture, software, hardware, and implementation, and provides hints and tips. The underlying basic hardware and software architecture and features of the IBM FlashSystem V9000 AC3 control enclosure and on IBM Spectrum Virtualize 8.1 software are described in these publications: Implementing IBM FlashSystem 900 Model AE3, SG24-8414 Implementing the IBM System Storage SAN Volume Controller V7.4, SG24-7933 Using IBM FlashSystem V9000 software functions, management tools, and interoperability combines the performance of IBM FlashSystem architecture with the advanced functions of software-defined storage to deliver performance, efficiency, and functions that meet the needs of enterprise workloads that demand IBM MicroLatency® response time. This book offers IBM FlashSystem V9000 scalability concepts and guidelines for planning, installing, and configuring, which can help environments scale up and out to add more flash capacity and expand virtualized systems. Port utilization methodologies are provided to help you maximize the full potential of IBM FlashSystem V9000 performance and low latency in your scalable environment. This book is intended for pre-sales and post-sales technical support professionals, storage administrators, and anyone who wants to understand how to implement this exciting technology.

IBM z14 Model ZR1 Technical Introduction

Abstract This IBM® Redbooks® publication introduces the latest member of the IBM Z platform, the IBM z14 Model ZR1 (Machine Type 3907). It includes information about the Z environment and how it helps integrate data and transactions more securely, and provides insight for faster and more accurate business decisions. The z14 ZR1 is a state-of-the-art data and transaction system that delivers advanced capabilities, which are vital to any digital transformation. The z14 ZR1 is designed for enhanced modularity, which is in an industry standard footprint. This system excels at the following tasks: Securing data with pervasive encryption Transforming a transactional platform into a data powerhouse Getting more out of the platform with IT Operational Analytics Providing resilience towards zero downtime Accelerating digital transformation with agile service delivery Revolutionizing business processes Mixing open source and Z technologies This book explains how this system uses new innovations and traditional Z strengths to satisfy growing demand for cloud, analytics, and open source technologies. With the z14 ZR1 as the base, applications can run in a trusted, reliable, and secure environment that improves operations and lessens business risk.

In this podcast, Wayne Eckerson and James Serra discuss myths of modern data management. Some of the myths discussed include 'all you need is a data lake', 'the data warehouse is dead', 'we don’t need OLAP cubes anymore', 'cloud is too expensive and latency is too slow', 'you should always use a NoSQL product over a RDBMS.'

Serra is big data and data warehousing solutions architect at Microsoft with over thirty years of IT experience. He is a popular blogger and speaker and has presented at dozens of Microsoft PASS and other events. Prior to Microsoft, Serra was an independent data warehousing and business intelligence architect and developer.

Summary

Cloud computing and ubiquitous virtualization have changed the ways that our applications are built and deployed. This new environment requires a new way of tracking and addressing the security of our systems. ThreatStack is a platform that collects all of the data that your servers generate and monitors for unexpected anomalies in behavior that would indicate a breach and notifies you in near-realtime. In this episode ThreatStack’s director of operations, Pete Cheslock, and senior infrastructure security engineer, Patrick Cable, discuss the data infrastructure that supports their platform, how they capture and process the data from client systems, and how that information can be used to keep your systems safe from attackers.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute. For complete visibility into the health of your pipeline, including deployment tracking, and powerful alerting driven by machine-learning, DataDog has got you covered. With their monitoring, metrics, and log collection agent, including extensive integrations and distributed tracing, you’ll have everything you need to find and fix performance bottlenecks in no time. Go to dataengineeringpodcast.com/datadog today to start your free 14 day trial and get a sweet new T-Shirt. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. Your host is Tobias Macey and today I’m interviewing Pete Cheslock and Pat Cable about the data infrastructure and security controls at ThreatStack

Interview

Introduction How did you get involved in the area of data management? Why don’t you start by explaining what ThreatStack does?

What was lacking in the existing options (services and self-hosted/open source) that ThreatStack solves for?

Can you describe the type(s) of data that you collect and how it is structured? What is the high level data infrastructure that you use for ingesting, storing, and analyzing your customer data?

How do you ensure a consistent format of the information that you receive? How do you ensure that the various pieces of your platform are deployed using the proper configurations and operating as intended? How much configuration do you provide to the end user in terms of the captured data, such as sampling rate or additional context?

I understand that your original architecture used RabbitMQ as your ingest mechanism, which you then migrated to Kafka. What was your initial motivation for that change?

How much of a benefit has that been in terms of overall complexity and cost (both time and infrastructure)?

How do you ensure the security and provenance of the data that you collect as it traverses your infrastructure? What are some of the most common vulnerabilities that you detect in your client’s infrastructure? For someone who wants to start using ThreatStack, what does the setup process look like? What have you found to be the most challenging aspects of building and managing the data processes in your environment? What are some of the projects that you have planned to improve the capacity or capabilities of your infrastructure?

Contact Info

Pete Cheslock

@petecheslock on Twitter Website petecheslock on GitHub

Patrick Cable

@patcable on Twitter Website patcable on GitHub

ThreatStack

Website @threatstack on Twitter threatstack on GitHub

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

ThreatStack SecDevO

Modern Big Data Processing with Hadoop

Delve into the world of big data with 'Modern Big Data Processing with Hadoop.' This comprehensive guide introduces you to the powerful capabilities of Apache Hadoop and its ecosystem to solve data processing and analytics challenges. By the end, you will have mastered the techniques necessary to architect innovative, scalable, and efficient big data solutions. What this Book will help me do Master the principles of building an enterprise-level big data strategy with Apache Hadoop. Learn to integrate Hadoop with tools such as Apache Spark, Elasticsearch, and more for comprehensive solutions. Set up and manage your big data architecture, including deployment on cloud platforms with Apache Ambari. Develop real-time data pipelines and enterprise search solutions. Leverage advanced visualization tools like Apache Superset to make sense of data insights. Author(s) None R. Patil, None Kumar, and None Shindgikar are experienced big data professionals and accomplished authors. With years of hands-on experience in implementing and managing Apache Hadoop systems, they bring a depth of expertise to their writing. Their dedication lies in making complex technical concepts accessible while demonstrating real-world best practices. Who is it for? This book is designed for data professionals aiming to advance their expertise in big data solutions using Apache Hadoop. Ideal readers include engineers and project managers involved in data architecture and those aspiring to become big data architects. Some prior exposure to big data systems is beneficial to fully benefit from this book's insights and tutorials.

IBM Open Platform for DBaaS on IBM Power Systems

Abstract This IBM Redbooks publication describes how to implement an Open Platform for Database as a Service (DBaaS) on IBM Power Systems environment for Linux, and demonstrate the open source tools, optimization and best practices guidelines for it. Open Platform for DBaaS on Power Systems is an on-demand, secure, and scalable self-service database platform that automates provisioning and administration of databases to support new business applications and information insights. This publication addresses topics to help sellers, architects, brand specialists, distributors, resellers and anyone offering secure and scalable Open Platform for DBaaS on Power Systems solution with APIs that are consistent across heterogeneous open database types. An Open Platform for DBaaS on Power Systems solution has the capability to accelerate business success by providing an infrastructure, and tools leveraging Open Source and OpenStack software engineered to optimize hardware and software between workloads and resources so you have a responsive, and an adaptive environment. Moreover, this publication provides documentation to transfer the how-to-skills for cloud oriented operational management of Open Platform for DBaaS on Power Systems service and underlying infrastructure to the technical teams. Open Platform for DBaaS on Power Systems mission is to provide scalable and reliable cloud database as a service provisioning functionality for both relational and non-relational database engines, and to continue to improve its fully-featured and extensible open source framework. For example, Trove is a database as a service for OpenStack. It is designed to run entirely on OpenStack, with the goal of allowing users to quickly and easily utilize the features of a relational or non-relational database without the burden of handling complex administrative tasks. Cloud users and database administrators can provision and manage multiple database instances as needed. Initially, the service focuses on providing resource isolation at high performance while automating complex administrative tasks including deployment, configuration, patching, backups, restores, and monitoring. In the context of this publication, the monitoring tool implemented is Nagios Core which is an open source monitoring tool. Hence, when you see a reference of Nagios in this book, Nagios Core is the open source monitoring solution implemented. Also note that the implementation of Open Platform for DBaaS on IBM Power Systems is based on open source solutions. This book is targeted toward sellers, architects, brand specialists, distributors, resellers and anyone developing and implementing Open Platform for DBaaS on Power Systems solutions.

Mastering the SAS DS2 Procedure

Enhance your SAS data-wrangling skills with high-precision and parallel data manipulation using the DS2 programming language. Now in its second edition, this book addresses the DS2 programming language from SAS, which combines the precise procedural power and control of the Base SAS DATA step language with the simplicity and flexibility of SQL. DS2 provides simple, safe syntax for performing complex data transformations in parallel and enables manipulation of native database data types at full precision. It also covers PROC FEDSQL, a modernized SQL language that blends perfectly with DS2. You will learn to harness the power of parallel processing to speed up CPU-intensive computing processes in Base SAS and how to achieve even more speed by processing DS2 programs on massively parallel database systems. Techniques for leveraging internet APIs to acquire data, avoiding large data movements when working with data from disparate sources, and leveraging DS2's new data types for full-precision numeric calculations are presented, with examples of why these techniques are essential for the modern data wrangler. Here's what's new in this edition: how to significantly improve performance by using the new SAS Viya architecture with its SAS Cloud Analytic Services (CAS) how to declare private variables and methods in a package the new PROC DSTODS2 the PCRXFIND and PCRXREPLACE packages While working though the code samples provided with this book, you will build a library of custom, reusable, and easily shareable DS2 program modules, execute parallelized DATA step programs to speed up a CPU-intensive process, and conduct advanced data transformations using hash objects and matrix math operations. This book is part of the SAS Press Series.

In this podcast Stephen Gatchell (@stephengatchell) from @Dell talks about the ingredients of a successful data scientist. He sheds light on the importance of data governance and compliance in defining a robust data science strategy. He suggested tactical steps that executives could take in starting their journey to a robust governance framework. He talked about how to take away the scare from governance. He gave insights on some of the things leaders could do today to build robust data science teams and framework. This podcast is great for leaders seeking some tactical insights into building a robust data science framework.

Timeline:

0:29 Stephen's journey. 4:45 Dell's customer experience journey. 7:39 Suggestions for a startup in regard to customer experience. 12:02 Building a center of excellence around data. 15:29 Data ownership. 19:18 Fixing data governance. 24:02 Fixing the data culture. 29:40 Distributed data ownership and data lakes. 32:50 Understanding data lakes. 35:50 Common pitfalls and opportunities in data governance. 38:50 Pleasant surprises in data governance. 41:30 Ideal data team. 44:04 Hiring the right candidates for data excellence. 46:13 How do I know the "why"? 49:05 Stephen's success mantra. 50:56 Stephen's best read. Steve's Recommended Read: Big Data MBA: Driving Business Strategies with Data Science by Bill Schmarzo http://amzn.to/2HWjOyT

Podcast Link: https://futureofdata.org/want-to-fix-datascience-fix-governance-by-stephengatchell-futureofdata/

Steve's BIO: Stephen is currently a Chief Data Officer Engineering & Data Lake at Dell and serves on the Dell Information Quality Governance Office and the Dell IT Technology Advisory Board, developing Dell’s corporate strategies for the Business Data Lake, Advanced Analytics, and Information Asset Management. Stephen also serves as a Customer Insight Analyst for the Chief Technology Office, analyzing customer technology challenges and requirements. Stephen has been awarded the People’s Choice Award by the Dell Total Customer Experience Team for the Data Governance and Business Data Lake project, as well as a Chief Technology Officer Innovation finalist for utilizing advanced analytics for customer configurations improving product development and product test coverage. Prior to Stephen’s current role, he managed Dell’s Global Product Development Lab Operations team developing internal cloud orchestration and automation environments, an Information Systems Executive for IBM leading acquisition conversion efforts, and was VP of Enterprise Systems and Operations managing mission-critical Information Systems for Telelogic (a Swedish public software firm). Stephen has an MBA from Southern New Hampshire University, a BSBA, and an AS in Finance from Northeastern University.

About #Podcast:

FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future.

Wanna Join? If you or any you know wants to join in, Register your interest @ http://play.analyticsweek.com/guest/

Want to sponsor? Email us @ [email protected]

Keywords:

FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

0 Comments

IBM TS4500 R4 Tape Library Guide

Abstract The IBM® TS4500 (TS4500) tape library is a next-generation tape solution that offers higher storage density and integrated management than previous solutions. This IBM Redbooks® publication gives you a close-up view of the new IBM TS4500 tape library. In the TS4500, IBM delivers the density that today's and tomorrow's data growth requires. It has the cost-effectiveness and the manageability to grow with business data needs, while you preserve existing investments in IBM tape library products. Now, you can achieve both a low cost per terabyte (TB) and a high TB density per square foot, because the TS4500 can store up to 8.25 petabytes (PB) of uncompressed data in a single frame library or scale up at 1.5 PB per square foot to over 263 PB, which is more than 4 times the capacity of the IBM TS3500 tape library. The TS4500 offers these benefits: High availability dual active accessors with integrated service bays to reduce inactive service space by 40%. The Elastic Capacity option can be used to completely eliminate inactive service space. Flexibility to grow: The TS4500 library can grow from both the right side and the left side of the first L frame because models can be placed in any active position. Increased capacity: The TS4500 can grow from a single L frame up to an additional 17 expansion frames with a capacity of over 23,000 cartridges. High-density (HD) generation 1 frames from the existing TS3500 library can be redeployed in a TS4500. Capacity on demand (CoD): CoD is supported through entry-level, intermediate, and base-capacity configurations. Advanced Library Management System (ALMS): ALMS supports dynamic storage management, which enables users to create and change logical libraries and configure any drive for any logical library. Support for the IBM TS1155 while also supporting TS1150 and TS1140 tape drive: The TS1155 gives organizations an easy way to deliver fast access to data, improve security, and provide long-term retention, all at a lower cost than disk solutions. The TS1155 offers high-performance, flexible data storage with support for data encryption. Also, this enhanced fifth-generation drive can help protect investments in tape automation by offering compatibility with existing automation. The new TS1155 Tape Drive Model 55E delivers a 10 Gb Ethernet host attachment interface optimized for cloud-based and hyperscale environments. The TS1155 Tape Drive Model 55F delivers a native data rate of 360 MBps, the same load/ready, locate speeds, and access times as the TS1150, and includes dual-port 8 Gb Fibre Channel support. Support of the IBM Linear Tape-Open (LTO) Ultrium 8 tape drive: The LTO Ultrium 8 offering represents significant improvements in capacity, performance, and reliability over the previous generation, LTO Ultrium 7, while they still protect your investment in the previous technology. Support of LTO 8 Type M cartridge (M8): The LTO Program is introducing a new capability with LTO-8 drives. The ability of the LTO-8 drive to write 9 TB on a brand new LTO-7 cartridge instead of 6 TB as specified by the LTO-7 format. Such a cartridge is called an LTO-7 initialized LTO-8 Type M cartridge. Integrated TS7700 back-end Fibre Channel (FC) switches are available. Up to four library-managed encryption (LME) key paths per logical library are available. This book describes the TS4500 components, feature codes, specifications, supported tape drives, encryption, new integrated management console (IMC), and command-line interface (CLI). You learn how to accomplish several specific tasks: Improve storage density with increased expansion frame capacity up to 2.4 times and support 33% more tape drives per frame. Manage storage by using the ALMS feature. Improve business continuity and disaster recovery with dual active accessor, automatic control path failover, and data path failover. Help ensure security and regulatory compliance with tape-drive encryption and Write Once Read Many (WORM) media. Support IBM LTO Ultrium 8, 7, 6, and 5, IBM TS1155, TS1150, and TS1140 tape drives. Provide a flexible upgrade path for users who want to expand their tape storage as their needs grow. Reduce the storage footprint and simplify cabling with 10 U of rack space on top of the library. This guide is for anyone who wants to understand more about the IBM TS4500 tape library. It is particularly suitable for IBM clients, IBM Business Partners, IBM specialist sales representatives, and technical specialists.

Summary

Search is a common requirement for applications of all varieties. Elasticsearch was built to make it easy to include search functionality in projects built in any language. From that foundation, the rest of the Elastic Stack has been built, expanding to many more use cases in the proces. In this episode Philipp Krenn describes the various pieces of the stack, how they fit together, and how you can use them in your infrastructure to store, search, and analyze your data.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute. For complete visibility into the health of your pipeline, including deployment tracking, and powerful alerting driven by machine-learning, DataDog has got you covered. With their monitoring, metrics, and log collection agent, including extensive integrations and distributed tracing, you’ll have everything you need to find and fix performance bottlenecks in no time. Go to dataengineeringpodcast.com/datadog today to start your free 14 day trial and get a sweet new T-Shirt. Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. Your host is Tobias Macey and today I’m interviewing Philipp Krenn about the Elastic Stack and the ways that you can use it in your systems

Interview

Introduction How did you get involved in the area of data management? The Elasticsearch product has been around for a long time and is widely known, but can you give a brief overview of the other components that make up the Elastic Stack and how they work together? Beyond the common pattern of using Elasticsearch as a search engine connected to a web application, what are some of the other use cases for the various pieces of the stack? What are the common scaling bottlenecks that users should be aware of when they are dealing with large volumes of data? What do you consider to be the biggest competition to the Elastic Stack as you expand the capabilities and target usage patterns? What are the biggest challenges that you are tackling in the Elastic stack, technical or otherwise? What are the biggest challenges facing Elastic as a company in the near to medium term? Open source as a business model: https://www.elastic.co/blog/doubling-down-on-open?utm_source=rss&utm_medium=rss What is the vision for Elastic and the Elastic Stack going forward and what new features or functionality can we look forward to?

Contact Info

@xeraa on Twitter xeraa on GitHub Website Email

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Elastic Vienna – Capital of Austria What Is Developer Advocacy? NoSQL MongoDB Elasticsearch Cassandra Neo4J Hazelcast Apache Lucene Logstash Kibana Beats X-Pack ELK Stack Metrics APM (Application Performance Monitoring) GeoJSON Split Brain Elasticsearch Ingest Nodes PacketBeat Elastic Cloud Elasticon Kibana Canvas SwiftType

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast

Beginning PostgreSQL on the Cloud: Simplifying Database as a Service on Cloud Platforms

Get started with PostgreSQL on the cloud and discover the advantages, disadvantages, and limitations of the cloud services from Amazon, Rackspace, Google, and Azure. Once you have chosen your cloud service, you will focus on securing it and developing a back-up strategy for your PostgreSQL instance as part of your long-term plan. Beginning PostgreSQL on the Cloud covers other essential topics such as setting up replication and high availability; encrypting your saved cloud data; creating a connection pooler for your database; and monitoring PostgreSQL on the cloud. The book concludes by showing you how to install and configure some of the tools that will help you get started with PostgreSQL on the cloud. This book shows you how database as a service enables you to spread your data across multiple data centers, ensuring that it is always accessible. You’ll discover that this model does not expect you to install and maintain databases yourself because the database service provider does it for you. You no longer have to worry about the scalability and high availability of your database. What You Will Learn Migrate PostgreSQL to the cloud Choose the best configuration and specifications of cloud instances Set up a backup strategy that enables point-in-time recovery Use connection pooling and load balancing on cloud environments Monitor database environments on the cloud Who This Book Is For Those who are looking to migrate to PostgreSQL on the Cloud. It will also help database administrators in setting up a cloud environment in an optimized way and help them with their day-to-day tasks.

Mastering Qlik Sense

Mastering Qlik Sense is a comprehensive guide designed to empower you to utilize Qlik Sense for advanced data analytics and dynamic visualizations. This book provides detailed insights into creating seamless Business Intelligence solutions tailored to your needs. Whether you're building dashboards, optimizing data models, or exploring Qlik Cloud functionalities, this book has you covered. What this Book will help me do Build interactive and insightful dashboards using Qlik Sense's intuitive tools. Learn to model data efficiently and apply best practices for optimized performance. Master the Qlik Sense APIs and create advanced custom extensions. Understand enterprise security measures including role-based access controls. Gain expertise in migrating from QlikView to Qlik Sense effectively Author(s) Juan Ignacio Vitantonio is an experienced expert in Business Intelligence solutions and data analytics. With a profound understanding of Qlik technologies, Juan has developed and implemented impactful BI solutions across various industries. His writing reflects his practical knowledge and passion for empowering users with actionable insights into data. Who is it for? This book is perfect for BI professionals, data analysts, and organizations aiming to leverage Qlik Sense for advanced analytics. Ideal for those with a foundational grasp of Qlik Sense, it also provides comprehensive guidance for QlikView users transitioning to Qlik Sense. If you want to improve your BI solutions and data-driven decision-making skills, this book is for you.