talk-data.com talk-data.com

Topic

data

5765

tagged

Activity Trend

3 peak/qtr
2020-Q1 2026-Q1

Activities

5765 activities · Newest first

Advanced Elasticsearch 7.0

Dive deep into the advanced capabilities of Elasticsearch 7.0 with this expert-level guide. In this book, you will explore the most effective techniques and tools for building, indexing, and querying advanced distributed search engines. Whether optimizing performance, scaling applications, or integrating with big data analytics, this guide empowers you with practical skills and insights. What this Book will help me do Master ingestion pipelines and preprocess documents for faster and more efficient indexing. Model search data optimally for complex and varied real-world applications. Perform exploratory data analyses using Elasticsearch's robust features. Integrate Elasticsearch with modern analytics platforms like Kibana and Logstash. Leverage Elasticsearch with Apache Spark and machine learning libraries for real-time advanced analytics. Author(s) None Wong is a seasoned Elasticsearch expert with years of real-world experience developing enterprise-grade search and analytics systems. With a passion for innovation and teaching, Wong enjoys breaking down complex technical concepts into digestible learning experiences. His work reflects a pragmatic and results-driven approach to teaching Elasticsearch. Who is it for? This book is ideal for Elasticsearch developers and data engineers with some prior experience who are looking to elevate their skills to an advanced level. It suits professionals seeking to enhance their expertise in building scalable search and analytics solutions. If you aim to master sophisticated Elasticsearch operations and real-time integrations, this book is tailored for you.

Business Data Science: Combining Machine Learning and Economics to Optimize, Automate, and Accelerate Business Decisions

Publisher's Note: Products purchased from Third Party sellers are not guaranteed by the publisher for quality, authenticity, or access to any online entitlements included with the product. Use machine learning to understand your customers, frame decisions, and drive value The business analytics world has changed, and Data Scientists are taking over. Business Data Science takes you through the steps of using machine learning to implement best-in-class business data science. Whether you are a business leader with a desire to go deep on data, or an engineer who wants to learn how to apply Machine Learning to business problems, you’ll find the information, insight, and tools you need to flourish in today’s data-driven economy. You’ll learn how to: •Use the key building blocks of Machine Learning: sparse regularization, out-of-sample validation, and latent factor and topic modeling •Understand how use ML tools in real world business problems, where causation matters more that correlation •Solve data science programs by scripting in the R programming language Today’s business landscape is driven by data and constantly shifting. Companies live and die on their ability to make and implement the right decisions quickly and effectively. Business Data Science is about doing data science right. It’s about the exciting things being done around Big Data to run a flourishing business. It’s about the precepts, principals, and best practices that you need know for best-in-class business data science.

Mastering SQL Server 2017

Leverage the power of SQL Server 2017 Integration Services to build data integration solutions with ease Key Features Work with temporal tables to access information stored in a table at any time Get familiar with the latest features in SQL Server 2017 Integration Services Program and extend your packages to enhance their functionality Book Description Microsoft SQL Server 2017 uses the power of R and Python for machine learning and containerization-based deployment on Windows and Linux. By learning how to use the features of SQL Server 2017 effectively, you can build scalable apps and easily perform data integration and transformation. You'll start by brushing up on the features of SQL Server 2017. This Learning Path will then demonstrate how you can use Query Store, columnstore indexes, and In-Memory OLTP in your apps. You'll also learn to integrate Python code in SQL Server and graph database implementations for development and testing. Next, you'll get up to speed with designing and building SQL Server Integration Services (SSIS) data warehouse packages using SQL server data tools. Toward the concluding chapters, you'll discover how to develop SSIS packages designed to maintain a data warehouse using the data flow and other control flow tasks. By the end of this Learning Path, you'll be equipped with the skills you need to design efficient, high-performance database applications with confidence. This Learning Path includes content from the following Packt books: SQL Server 2017 Developer's Guide by Milos Radivojevic, Dejan Sarka, et. al SQL Server 2017 Integration Services Cookbook by Christian Cote, Dejan Sarka, et. al What you will learn Use columnstore indexes to make storage and performance improvements Extend database design solutions using temporal tables Exchange JSON data between applications and SQL Server Migrate historical data to Microsoft Azure by using Stretch Database Design the architecture of a modern Extract, Transform, and Load (ETL) solution Implement ETL solutions using Integration Services for both on-premise and Azure data Who this book is for This Learning Path is for database developers and solution architects looking to develop ETL solutions with SSIS, and explore the new features in SSIS 2017. Advanced analysis practitioners, business intelligence developers, and database consultants dealing with performance tuning will also find this book useful. Basic understanding of database concepts and T-SQL is required to get the best out of this Learning Path.

R Data Science Quick Reference: A Pocket Guide to APIs, Libraries, and Packages

In this handy, practical book you will cover each concept concisely, with many illustrative examples. You'll be introduced to several R data science packages, with examples of how to use each of them. In this book, you’ll learn about the following APIs and packages that deal specifically with data science applications: readr, dibble, forecasts, lubridate, stringr, tidyr, magnittr, dplyr, purrr, ggplot2, modelr, and more. After using this handy quick reference guide, you'll have the code, APIs, and insights to write data science-based applications in the R programming language. You'll also be able to carry out data analysis. What You Will Learn Import data with readr Work with categories using forcats, time and dates with lubridate, and strings with stringr Format data using tidyr and then transform that data using magrittr and dplyrWrite functions with R for data science, data mining, and analytics-based applications Visualize data with ggplot2 and fit data to models using modelr Who This Book Is For Programmers new to R's data science, data mining, and analytics packages. Some prior coding experience with R in general is recommended.

Beginning Oracle SQL for Oracle Database 18c: From Novice to Professional

Start developing with Oracle SQL. This book is a one-stop introduction to everything you need to know about getting started developing an Oracle Database. You'll learn about foundational concepts, setting up a simple schema, adding data, reading data from the database, and making changes. No experience with databases is required to get started. Examples in the book are built around Oracle Live SQL, a freely available, online sandbox for practicing and experimenting with SQL statements, and Oracle Express Edition, a free version of Oracle Database that is available for download. A marquee feature of Beginning Oracle SQL for Oracle Database 18c is the small chapter size. Content is divided into easily digestible chunks that can be read and practiced in very short intervals of time, making this the ideal book for a busy professional to learn from. Even just a 15-20 minute block of free time can be put to good use. AuthorBen Brumm begins by helping you understand what a database is, and getting you set up with a sandbox in which to practice the SQL that you are learning. From there, easily digestible chapters cover, point-by-point, the different aspects of writing queries to get data out of a database. You’ll also learn about creating tables and getting data into the database. Crucial topics such as working with nulls and writing analytic queries are given the attention they deserve, helping you to avoid pitfalls when writing queries for production use. What You'll Learn Create, update, and delete tables in an Oracle database Add, update, delete data from those database tables Query and view data stored in your database Manipulate and transform data using in-built database functions and features Correctly choose when to use Oracle-specific syntax and features Who This Book Is For Those new to Oracle who are planning to develop software using Oracle as the back-end data store. The book is also for those who are getting started in software development and realize they need to learn some kind of database language. Those who are learning software development on the side of their normal job, or learning it as a college student, who are ready to learn what a database is and how to use it also will find this book useful.

IBM Spectrum Scale Immutability Introduction, Configuration Guidance, and Use Cases

This IBM Redpaper™ publication introduces the IBM Spectrum Scale immutability function. It shows how to set it up and presents different ways for managing immutable and append-only files. This publication also provides guidance for implementing IT security aspects in an IBM Spectrum Scale cluster by addressing regulatory requirements. It also describes two typical use cases for managing immutable files. One use case involves applications that manage file immutability; the other use case presents a solution to automatically set files to immutable within a IBM Spectrum Scale immutable fileset.

Implementing the IBM Storwize V5000 Gen2 (including the Storwize V5010, V5020, and V5030) with IBM Spectrum Virtualize V8.2.1

Organizations of all sizes face the challenge of managing massive volumes of increasingly valuable data. But storing this data can be costly, and extracting value from the data is becoming more difficult. IT organizations have limited resources but must stay responsive to dynamic environments and act quickly to consolidate, simplify, and optimize their IT infrastructures. The IBM® Storwize® V5000 Gen2 system provides a smarter solution that is affordable, easy to use, and self-optimizing, which enables organizations to overcome these storage challenges. The Storwize V5000 Gen2 delivers efficient, entry-level configurations that are designed to meet the needs of small and midsize businesses. Designed to provide organizations with the ability to consolidate and share data at an affordable price, the Storwize V5000 Gen2 offers advanced software capabilities that are found in more expensive systems. This IBM Redbooks® publication is intended for pre-sales and post-sales technical support professionals and storage administrators. It applies to the Storwize V5030, V5020, and V5010, and to IBM Spectrum Virtualize™ V8.2.1.

Securing Your Cloud: IBM Security for LinuxONE

As workloads are being offloaded to IBM® LinuxONE based cloud environments, it is important to ensure that these workloads and environments are secure. This IBM Redbooks® publication describes the necessary steps to secure your environment from the hardware level through all of the components that are involved in a LinuxONE cloud infrastructure that use Linux and IBM z/VM®. The audience for this book is IT architects, IT Specialists, and those users who plan to use LinuxONE for their cloud environments.

Deploying a Database Instance in an IBM Cloud Private Cluster on IBM Z

This IBM® Redpaper™ publication shows you how to deploy a database instance within a container using an IBM Cloud™ Private cluster on IBM Z®. A preinstalled IBM Spectrum™ Scale 5.0.3 cluster file system provides back-end storage for the persistent volumes bound to the database. A container is a standard unit of software that packages code and all its dependencies, so the application runs quickly and reliably from one computing environment to another. By default, containers are ephemeral. However, stateful applications, such as databases, require some type of persistent storage that can survive service restarts or container crashes. IBM provides several products helping organizations build an environment on an IBM Z infrastructure to develop and manage containerized applications, including dynamic provisioning of persistent volumes. As an example for a stateful application, this paper describes how to deploy the relational database MariaDB using a Helm chart. The IBM Spectrum Scale V5.0.3 cluster file system is providing back-end storage for the persistent volumes. This document provides step-by-step guidance regarding how to install and configure the following components: IBM Cloud Private 3.1.2 (including Kubernetes) Docker 18.03.1-ce IBM Storage Enabler for Containers 2.0.0 and 2.1.0 This Redpaper demonstrates how we set up the example for a stateful application in our lab. The paper gives you insights about planning for your implementation. IBM Z server hardware, the IBM Z hypervisor z/VM®, and the IBM Spectrum Scale cluster file system are prerequisites to set up the example environment. The Redpaper is written with the assumption that you have familiarity with and basic knowledge of the software products used in setting up the environment. The intended audience includes the following roles: Storage administrators IT/Cloud administrators Technologists IT specialists

Hands-On Data Analysis with Pandas

Hands-On Data Analysis with Pandas provides an intensive dive into mastering the pandas library for data science and analysis using Python. Through a combination of conceptual explanations and practical demonstrations, readers will learn how to manipulate, visualize, and analyze data efficiently. What this Book will help me do Understand and apply the pandas library for efficient data manipulation. Learn to perform data wrangling tasks such as cleaning and reshaping datasets. Create effective visualizations using pandas and libraries like matplotlib and seaborn. Grasp the basics of machine learning and implement solutions with scikit-learn. Develop reusable data analysis scripts and modules in Python. Author(s) Stefanie Molin is a seasoned data scientist and software engineer with extensive experience in Python and data analytics. She specializes in leveraging the latest data science techniques to solve real-world problems. Her engaging and detailed writing draws from her practical expertise, aiming to make complex concepts accessible to all. Who is it for? This book is ideal for data analysts and aspiring data scientists who are at the beginning stages of their careers or looking to enhance their toolset with pandas and Python. It caters to Python developers eager to delve into data analysis workflows. Readers should have some programming knowledge to fully benefit from the examples and exercises.

Data Warehousing with Greenplum, 2nd Edition

Data professionals are confronting the most disruptive change since relational databases appeared in the 1980s. SQL is still a major tool for data analytics, but conventional relational database management systems can’t handle the increasing size and complexity of today’s datasets. This updated edition teaches you best practices for Greenplum Database, the open source massively parallel processing (MPP) database that accommodates large sets of nonrelational and relational data. Marshall Presser, field CTO at Pivotal, introduces Greenplum’s approach to data analytics and data-driven decisions, beginning with its shared-nothing architecture. IT managers, developers, data analysts, system architects, and data scientists will all gain from exploring data organization and storage, data loading, running queries, and learning to perform analytics in the database. Discover how MPP and Greenplum will help you go beyond the traditional data warehouse. This ebook covers: Greenplum features, use case examples, and techniques for optimizing use Four Greenplum deployment options to help you balance security, cost, and time to usability Why each networked node in Greenplum’s architecture includes an independent operating system, memory, and storage Additional tools for monitoring, managing, securing, and optimizing query responses in the Pivotal Greenplum commercial database

Fundamentals of Programming in SAS

Unlock the essentials of SAS programming! Fundamentals of Programming in SAS: A Case Studies Approach gives a complete introduction to SAS programming. Perfect for students, novice SAS users, and programmers studying for their Base SAS certification, this book covers all the basics, including: working with data creating visualizations data validation good programming practices Experienced programmers know that real-world scenarios require practical solutions. Designed for use in the classroom and for self-guided learners, this book takes a novel approach to learning SAS programming by following a single case study throughout the text and circling back to previous concepts to reinforce material. Readers will benefit from the variety of exercises, including both multiple choice questions and in-depth case studies. Additional case studies are also provided online for extra practice. This approach mirrors the way good SAS programmers develop their skills—through hands-on work with an eye toward developing the knowledge necessary to tackle more difficult tasks. After reading this book, you will gain the skills and confidence to take on larger challenges with the power of SAS.

Getting DataOps Right

Many large organizations have accumulated dozens of disconnected data sources to serve different lines of business over the years. These applications might be useful to one area of the enterprise, but they’re usually inaccessible to other data consumers in the organization. In this short report, five data industry thought leaders explore DataOps—the automated, process-oriented methodology for making clean, reliable data available to teams throughout your company. Andy Palmer, Michael Stonebraker, Nik Bates-Haus, Liam Cleary, and Mark Marinelli from Tamr use real-world examples to explain how DataOps works. DataOps is as much about changing people’s relationship to data as it is about technology, infrastructure, and process. This report provides an organizational approach to implementing this discipline in your company—including various behavioral, process, and technology changes. Through individual essays, you’ll learn how to: Move toward scalable data unification (Michael Stonebraker) Understand DataOps as a discipline (Nik Bates-Haus) Explore the key principles of a DataOps ecosystem (Andy Palmer) Learn the key components of a DataOps ecosystem (Andy Palmer) Build a DataOps toolkit (Liam Cleary) Build a team and prepare for future trends (Mark Marinelli)

Operationalizing the Data Lake

Big data and advanced analytics have increasingly moved to the cloud as organizations pursue actionable insights and data-driven products using the growing amounts of information they collect. But few companies have truly operationalized data so it’s usable for the entire organization. With this pragmatic ebook, engineers, architects, and data managers will learn how to build and extract value from a data lake in the cloud and leverage the compute power and scalability of a cloud-native data platform to put your company’s vast data trove into action. Holden Ackerman and Jon King of Qubole take you through the basics of building a data lake operation, from people to technology, employing multiple technologies and frameworks in a cloud-native data platform. You'll dive into the tools and processes you need for the entire lifecycle of a data lake, from data preparation, storage, and management to distributed computing and analytics. You’ll also explore the unique role that each member of your data team needs to play as you migrate to your cloud-native data platform. Leverage your data effectively through a single source of truth Understand the importance of building a self-service culture for your data lake Define the structure you need to build a data lake in the cloud Implement financial governance and data security policies for your data lake through a cloud-native data platform Identify the tools you need to manage your data infrastructure Delineate the scope, usage rights, and best tools for each team working with a data lake—analysts, data scientists, data engineers, and security professionals, among others

Rebuilding Reliable Data Pipelines Through Modern Tools

When data-driven applications fail, identifying the cause is both challenging and time-consuming—especially as data pipelines become more and more complex. Hunting for the root cause of application failure from messy, raw, and distributed logs is difficult for performance experts and a nightmare for data operations teams. This report examines DataOps processes and tools that enable you to manage modern data pipelines efficiently. Author Ted Malaska describes a data operations framework and shows you the importance of testing and monitoring to plan, rebuild, automate, and then manage robust data pipelines—whether it’s in the cloud, on premises, or in a hybrid configuration. You’ll also learn ways to apply performance monitoring software and AI to your data pipelines in order to keep your applications running reliably. You’ll learn: How performance management software can reduce the risk of running modern data applications Methods for applying AI to provide insights, recommendations, and automation to operationalize big data systems and data applications How to plan, migrate, and operate big data workloads and data pipelines in the cloud and in hybrid deployment models

Professional Azure SQL Database Administration - Second Edition

Professional Azure SQL Database Administration serves as your comprehensive guide to mastering the management and optimization of cloud-based Azure SQL Database solutions. With the differences and unique features of Azure SQL Database compared to the on-premise SQL Server, this book offers a clear roadmap to efficiently migrate, secure, scale, and maintain these databases in the cloud. What this Book will help me do Understand the differences between Azure SQL Database and on-premise SQL Server and their practical implications. Learn techniques to migrate existing SQL Server databases to Azure SQL Database seamlessly. Discover advanced ways to optimize database performance and scalability leveraging cloud capabilities. Master security strategies for Azure SQL databases, including backup, disaster recovery, and automated tasks. Develop proficiency in using tools such as PowerShell to automate and manage routine database administration tasks. Author(s) Ahmad Osama is an experienced database professional and author specializing in SQL Server and Azure SQL Database administration. With a robust background in database migration, maintenance, and performance tuning, Ahmad expertly bridges the gap between theory and practice. His approachable writing style makes complex database topics accessible to professionals seeking to expand their expertise. Who is it for? Professional Azure SQL Database Administration is an essential resource for database administrators, developers, and IT professionals keen on developing their knowledge about Azure SQL Database administration and cloud database solutions. Whether you're transitioning from traditional SQL Server environments or looking to optimize your database strategies in the cloud, this book caters to professionals with intermediate to advanced experience in database management and programming with SQL.

Data Science with Python and Dask

Dask is a native parallel analytics tool designed to integrate seamlessly with the libraries you’re already using, including Pandas, NumPy, and Scikit-Learn. With Dask you can crunch and work with huge datasets, using the tools you already have. And Data Science with Python and Dask is your guide to using Dask for your data projects without changing the way you work! About the Technology An efficient data pipeline means everything for the success of a data science project. Dask is a flexible library for parallel computing in Python that makes it easy to build intuitive workflows for ingesting and analyzing large, distributed datasets. Dask provides dynamic task scheduling and parallel collections that extend the functionality of NumPy, Pandas, and Scikit-learn, enabling users to scale their code from a single laptop to a cluster of hundreds of machines with ease. About the Book Data Science with Python and Dask teaches you to build scalable projects that can handle massive datasets. After meeting the Dask framework, you’ll analyze data in the NYC Parking Ticket database and use DataFrames to streamline your process. Then, you’ll create machine learning models using Dask-ML, build interactive visualizations, and build clusters using AWS and Docker. What's Inside Working with large, structured and unstructured datasets Visualization with Seaborn and Datashader Implementing your own algorithms Building distributed apps with Dask Distributed Packaging and deploying Dask apps About the Reader For data scientists and developers with experience using Python and the PyData stack. About the Author Jesse Daniel is an experienced Python developer. He taught Python for Data Science at the University of Denver and leads a team of data scientists at a Denver-based media technology company. We interviewed Jesse as a part of our Six Questions series. Check it out here. Quotes The most comprehensive coverage of Dask to date, with real-world examples that made a difference in my daily work. - Al Krinker, United States Patent and Trademark Office An excellent alternative to PySpark for those who are not on a cloud platform. The author introduces Dask in a way that speaks directly to an analyst. - Jeremy Loscheider, Panera Bread A greatly paced introduction to Dask with real-world datasets. - George Thomas, R&D Architecture Manhattan Associates The ultimate resource to quickly get up and running with Dask and parallel processing in Python. - Gustavo Patino, Oakland University William Beaumont School of Medicine

IBM Spectrum Virtualize: Hot-Spare Node and NPIV Target Ports

The use of N_Port ID Virtualization (NPIV) to provide host-only ports (NPIV target ports) and spare nodes improves the host failover characteristics by separating out host communications from communication tasks on the same port and providing standby hardware, which can be automatically introduced into the cluster to reintroduce redundancy. Because the host ports are not used for internode communications, they can freely move between nodes, and this includes spare nodes that are added to the cluster automatically. This IBM® Redpaper™ publication describes the use of the IBM Spectrum™ Virtualize Hot-Spare Node function to provide a high availability storage infrastructure. This paper focuses on the functional behavior of hot-spare node when subjected to various failure conditions. This paper does not provide the details necessary to implement the reference architectures (although some implementation detail is provided).

IBM Spectrum Scale: Big Data and Analytics Solution Brief

This IBM® Redguide™ publication describes big data and analytics deployments that are built on IBM Spectrum Scale™. IBM Spectrum Scale is a proven enterprise-level distributed file system that is a high-performance and cost-effective alternative to Hadoop Distributed File System (HDFS) for Hadoop analytics services. IBM Spectrum Scale includes NFS, SMB, and Object services and meets the performance that is required by many industry workloads, such as technical computing, big data, analytics, and content management. IBM Spectrum Scale provides world-class, web-based storage management with extreme scalability, flash accelerated performance, and automatic policy-based storage tiering from flash through disk to the cloud, which reduces storage costs up to 90% while improving security and management efficiency in cloud, big data, and analytics environments. This Redguide publication is intended for technical professionals (analytics consultants, technical support staff, IT Architects, and IT Specialists) who are responsible for providing Hadoop analytics services and are interested in learning about the benefits of the use of IBM Spectrum Scale as an alternative to HDFS.

SAP HANA on IBM Power Systems: High Availability and Disaster Recovery Implementation Updates

This IBM® Redbooks® publication updates Implementing High Availability and Disaster Recovery Solutions with SAP HANA on IBM Power Systems, REDP-5443 with the latest technical content that describes how to implement an SAP HANA on IBM Power Systems™ high availability (HA) and disaster recovery (DR) solution by using theoretical knowledge and sample scenarios. This book describes how all the pieces of the reference architecture work together (IBM Power Systems servers, IBM Storage servers, IBM Spectrum™ Scale, IBM PowerHA® SystemMirror® for Linux, IBM VM Recovery Manager DR for Power Systems, and Linux distributions) and demonstrates the resilience of SAP HANA with IBM Power Systems servers. This publication is for architects, brand specialists, distributors, resellers, and anyone developing and implementing SAP HANA on IBM Power Systems integration, automation, HA, and DR solutions. This publication provides documentation to transfer the how-to-skills to the technical teams, and documentation to the sales team.