talk-data.com talk-data.com

Topic

data

5765

tagged

Activity Trend

3 peak/qtr
2020-Q1 2026-Q1

Activities

5765 activities · Newest first

Metaheuristics for String Problems in Bio-informatics

So-called string problems are abundant in bioinformatics and computational biology. New optimization problems dealing with DNA or protein sequences are constantly arising and researchers are highly in need of efficient optimization techniques for solving them. One obstacle for optimization practitioners is the atypical nature of these problems which require an interdisciplinary approach in order to solve them efficiently and accurately.

Real World SQL and PL/SQL: Advice from the Experts

Master the Underutilized Advanced Features of SQL and PL/SQL This hands-on guide from Oracle Press shows how to fully exploit lesser known but extremely useful SQL and PL/SQL features―and how to effectively use both languages together. Written by a team of Oracle ACE Directors, Real-World SQL and PL/SQL: Advice from the Experts features best practices, detailed examples, and insider tips that clearly demonstrate how to write, troubleshoot, and implement code for a wide variety of practical applications. The book thoroughly explains underutilized SQL and PL/SQL functions and lays out essential development strategies. Data modeling, advanced analytics, database security, secure coding, and administration are covered in complete detail. Learn how to: • Apply advanced SQL and PL/SQL tools and techniques • Understand SQL and PL/SQL functionality and determine when to use which language • Develop accurate data models and implement business logic • Run PL/SQL in SQL and integrate complex datasets • Handle PL/SQL instrumenting and profiling • Use Oracle Advanced Analytics and Oracle R Enterprise • Build and execute predictive queries • Secure your data using encryption, hashing, redaction, and masking • Defend against SQL injection and other code-based attacks • Work with Oracle Virtual Private Database Code examples in the book are available for download at www.MHProfessional.com. TAG: For a complete list of Oracle Press titles, visit www.OraclePressBooks.com

IBM TS4500 R3 Tape Library Guide

The IBM® TS4500 tape library is a next-generation tape solution that offers higher storage density and integrated management than previous solutions. This IBM Redbooks® publication gives you a close-up view of the new IBM TS4500 tape library. In the TS4500, IBM delivers the density that today's and tomorrow's data growth require, with the cost-effectiveness and the manageability to grow with business data needs, while you preserve existing investments in IBM tape library products. Now, you can achieve both a low cost per terabyte (TB) and a high TB density per square foot because the TS4500 can store up to 5.5 petabytes (PBs) of data in a single 10-square foot library frame, which is up to 3.4 times more capacity than the IBM TS3500 tape library. The TS4500 offers these benefits: High availability dual active accessors with integrated service bays to reduce inactive service space by 40%. The Elastic Capacity option can be used to completely eliminate inactive service space. Flexibility to grow: The TS4500 library can grow from both the right side and the left side of the first L frame because models can be placed in any active position. Increased capacity: The TS4500 can grow from a single L frame up to an additional 17 expansion frames with a capacity of over 23,000 cartridges. High-density (HD) generation 1 frames from the existing TS3500 library can be redeployed in a TS4500. Capacity on demand (CoD): CoD is supported through entry-level, intermediate, and base-capacity configurations. Advanced Library Management System (ALMS): ALMS supports dynamic storage management, which enables users to create and change logical libraries and configure any drive for any logical library. Support for the IBM TS1150 tape drive: The TS1150 gives organizations an easy way to deliver fast access to data, improve security, and provide long-term retention, all at a lower cost than disk solutions. The TS1150 offers high-performance, flexible data storage with support for data encryption. Also, this fifth-generation drive can help protect investments in tape automation by offering compatibility with existing automation. Support of the IBM Linear Tape-Open (LTO) Ultrium 7 tape drive: The LTO Ultrium 7 offering represents significant improvements in capacity, performance, and reliability over the previous generation, LTO Ultrium 6, while they still protect your investment in the previous technology. Integrated TS7700 back-end Fibre Channel (FC) switches are available. Up to four library-managed encryption (LME) key paths per logical library are available. This book describes the TS4500 components, feature codes, specifications, supported tape drives, encryption, new integrated management console (IMC), and command-line interface (CLI). You learn how to accomplish several specific tasks: Improve storage density with increased expansion frame capacity up to 2.4 times and support 33% more tape drives per frame. Manage storage by using the ALMS feature. Improve business continuity and disaster recovery with dual active accessor, automatic control path failover, and data path failover. Help ensure security and regulatory compliance with tape-drive encryption and Write Once Read Many (WORM) media. Support IBM LTO Ultrium 7, 6, and 5, IBM TS1150, and TS1140 tape drives. Provide a flexible upgrade path for users who want to expand their tape storage as their needs grow. Reduce the storage footprint and simplify cabling with 10 U of rack space on top of the library. This guide is for anyone who wants to understand more about the IBM TS4500 tape library. It is particularly suitable for IBM clients, IBM Business Partners, IBM specialist sales representatives, and technical specialists.

Machine Learning

Machine learning, one of the top emerging sciences, has an extremely broad range of applications. However, many books on the subject provide only a theoretical approach, making it difficult for a newcomer to grasp the subject material. This book provides a more practical approach by explaining the concepts of machine learning algorithms and describing the areas of application for each algorithm, using simple practical examples to demonstrate each algorithm and showing how different issues related to these algorithms are applied.

A Primer on Nonparametric Analysis, Volume I

Nonparametric statistics provide a scientific methodology for cases where customary statistics are not applicable. Nonparametric statistics are used when the requirements for parametric analysis fail, such as when data are not normally distributed or the sample size is too small. The method provides an alternative for such cases and is often nearly as powerful as parametric statistics. Another advantage of nonparametric statistics is that it offers analytical methods that are not available otherwise. Nonparametric methods are intuitive and simple to comprehend, which helps researchers in the social sciences understand the methods in spite of lacking mathematical rigor needed in analytical methods customarily used in science. This book is a methodology book and bypasses theoretical proofs while providing comprehensive explanations of the logic behind the methods and ample examples, which are all solved using direct computations as well as by using Stata. It is arranged into two integrated volumes. Although each volume, and for that matter each chapter, can be used separately, it is advisable to read as much of both volumes as possible; because familiarity with what is applicable for different problems will enhance capabilities.

A Primer on Nonparametric Analysis, Volume II

Nonparametric statistics provide a scientific methodology for cases where customary statistics are not applicable. Nonparametric statistics are used when the requirements for parametric analysis fail, such as when data are not normally distributed or the sample size is too small. The method provides an alternative for such cases and is often nearly as powerful as parametric statistics. Another advantage of nonparametric statistics is that it offers analytical methods that are not available otherwise. Nonparametric methods are intuitive and simple to comprehend, which helps researchers in the social sciences understand the methods in spite of lacking mathematical rigor needed in analytical methods customarily used in science. This book is a methodology book and bypasses theoretical proofs while providing comprehensive explanations of the logic behind the methods and ample examples, which are all solved using direct computations as well as by using Stata. It is arranged into two integrated volumes. Although each volume, and for that matter each chapter, can be used separately, it is advisable to read as much of both volumes as possible; because familiarity with what is applicable for different problems will enhance capabilities.

Data Visualization Toolkit: Using JavaScript, Rails™, and Postgres to Present Data and Geospatial Information

Create Beautiful Visualizations that Free Your Data to Tell Powerful Truths “The depth of Barrett Clark’s knowledge shines through in his writing: clear, concise, and confident. Barrett has been practicing all of this stuff in his day job for many years–Postgres, D3, GIS, all of it. The knowledge in this book is real-world and hard-earned!” –From the Foreword by Obie Fernandez is your hands-on, practical, and holistic guide to the art of visualizing data. You’ll learn how to use Rails, jQuery, D3, Leaflet, PostgreSQL, and PostGIS together, creating beautiful visualizations and maps that give your data a voice and to make it “dance.” Data Visualization Toolkit Barrett Clark teaches through real-world problems and examples developed specifically to illuminate every technique you need to generate stunningly effective visualizations. You’ll move from the absolute basics toward deep dives, mastering diverse visualizations and discovering when to use each. Along the way, you’ll build three start-to-finish visualization applications, using actual real estate, weather, and travel datasets. Clark addresses every component of data visualization: your data, database, application server, visualization libraries, and more. He explains data transformations; presents expert techniques in JavaScript, Ruby, and SQL; and illuminates key concepts associated with both descriptive statistics and geospatial data. Throughout, everything is aimed at one goal: to help you cut through the clutter and let your data tell all it can. This guide will help you Explore and understand the data visualization technology stack Master the thought process and steps involved in importing data Extract, transform, and load data in usable, reliable form Handle spotty data, or data that doesn’t line up with what your chart expects Use D3 to build pie and bar charts, scatter and box plots, and more Work effectively with time-series data Tweak Ruby and SQL to optimize performance with large datasets Use raw SQL in Rails: window functions, subqueries, and common table expressions Build chord diagrams and time-series aggregates Use separate databases or schema for reporting databases Integrate geographical data via geospatial SQL queries Construct maps with Leaflet and Rails Query geospatial data the “Rails way” and the “raw SQL way”

Demand Forecasting for Managers

Most decisions and plans in a firm require a forecast. Not matching supply with demand can make or break any business, and that's why forecasting is so invaluable. Forecasting can appear as a frightening topic with many arcane equations to master. For this reason, the authors start out from the very basics and provide a non-technical overview of common forecasting techniques as well as organizational aspects of creating a robust forecasting process. The book also discusses how to measure forecast accuracy to hold people accountable and guide continuous improvement. This book does not require prior knowledge of higher mathematics, statistics, or operations research. It is designed to serve as a first introduction to the non-expert, such as a manager overseeing a forecasting group, or an MBA student who needs to be familiar with the broad outlines of forecasting without specializing in it.

Sams Teach Yourself Apache Spark™ in 24 Hours

Apache Spark is a fast, scalable, and flexible open source distributed processing engine for big data systems and is one of the most active open source big data projects to date. In just 24 lessons of one hour or less, Sams Teach Yourself Apache Spark in 24 Hours helps you build practical Big Data solutions that leverage Spark’s amazing speed, scalability, simplicity, and versatility. This book’s straightforward, step-by-step approach shows you how to deploy, program, optimize, manage, integrate, and extend Spark–now, and for years to come. You’ll discover how to create powerful solutions encompassing cloud computing, real-time stream processing, machine learning, and more. Every lesson builds on what you’ve already learned, giving you a rock-solid foundation for real-world success. Whether you are a data analyst, data engineer, data scientist, or data steward, learning Spark will help you to advance your career or embark on a new career in the booming area of Big Data. Learn how to • Discover what Apache Spark does and how it fits into the Big Data landscape • Deploy and run Spark locally or in the cloud • Interact with Spark from the shell • Make the most of the Spark Cluster Architecture • Develop Spark applications with Scala and functional Python • Program with the Spark API, including transformations and actions • Apply practical data engineering/analysis approaches designed for Spark • Use Resilient Distributed Datasets (RDDs) for caching, persistence, and output • Optimize Spark solution performance • Use Spark with SQL (via Spark SQL) and with NoSQL (via Cassandra) • Leverage cutting-edge functional programming techniques • Extend Spark with streaming, R, and Sparkling Water • Start building Spark-based machine learning and graph-processing applications • Explore advanced messaging technologies, including Kafka • Preview and prepare for Spark’s next generation of innovations Instructions walk you through common questions, issues, and tasks; Q-and-As, Quizzes, and Exercises build and test your knowledge; "Did You Know?" tips offer insider advice and shortcuts; and "Watch Out!" alerts help you avoid pitfalls. By the time you're finished, you'll be comfortable using Apache Spark to solve a wide spectrum of Big Data problems.

Architecting for Access

Fragmented, disparate backend data systems have become the norm in today’s enterprise, where you’ll find a mix of relational databases, Hadoop stores, and NoSQL engines, with access and analytics tools bolted on every which way. This mishmash of options presents a real challenge when it comes to choosing frontend analytics and visualization tools. How did we get here? In this O’Reilly report, IT veteran Rich Morrow takes you through the rapid changes to both backend storage and frontend analytics over the past decade, and provides a pragmatic list of requirements for an analytics stack that will centralize access to all of these data systems. You’ll examine current analytics platforms, including Looker—a new breed of analytics and visualization tools built specifically to handle our fragmented data space. Understand why and how data became so fractured so quickly Explore the tangled web of data and backend tools in today’s enterprises Learn the tool requirements for accessing and analyzing the full spectrum of data Examine the relative strengths of popular analytics and visualization tools, including Looker, Tableau, and MicroStrategy Inspect Looker’s unique focus on both the frontend and backend

In Search of Database Nirvana

The database pendulum is in full swing. Ten years ago, web-scale companies began moving away from proprietary relational databases to handle big data use cases with NoSQL and Hadoop. Now, for a variety of reasons, the pendulum is swinging back toward SQL-based solutions. What many companies really want is a system that can handle all of their operational, OLTP, BI, and analytic workloads. Could such an all-in-one database exist? This O’Reilly report examines this quest for database nirvana, or what Gartner recently dubbed Hybrid Transaction/Analytical Processing (HTAP). Author Rohit Jain takes an in-depth look at the possibilities and the challenges for companies that long for a single query engine to rule them all. With this report, you’ll explore: The challenges of having one query engine support operational, BI, and analytical workloads Efforts to produce a query engine that supports multiple storage engines Attempts to support multiple data models with the same query engine Why an HTAP database engine needs to provide enterprise-caliber capabilities, including high availability, security, and manageability How to assess various options for meeting workload requirements with one database engine, or a combination of query and storage engines

Interactive Spark using PySpark

Apache Spark is an in-memory framework that allows data scientists to explore and interact with big data much more quickly than with Hadoop. Python users can work with Spark using an interactive shell called PySpark. Why is it important? PySpark makes the large-scale data processing capabilities of Apache Spark accessible to data scientists who are more familiar with Python than Scala or Java. This also allows for reuse of a wide variety of Python libraries for machine learning, data visualization, numerical analysis, etc. What you'll learn—and how you can apply it Compare the different components provided by Spark, and what use cases they fit. Learn how to use RDDs (resilient distributed datasets) with PySpark. Write Spark applications in Python and submit them to the cluster as Spark jobs. Get an introduction to the Spark computing framework. Apply this approach to a worked example to determine the most frequent airline delays in a specific month and year. This lesson is for you because… You're a data scientist, familiar with Python coding, who needs to get up and running with PySpark You're a Python developer who needs to leverage the distributed computing resources available on a Hadoop cluster, without learning Java or Scala first Prerequisites Familiarity with writing Python applications Some familiarity with bash command-line operations Basic understanding of how to use simple functional programming constructs in Python, such as closures, lambdas, maps, etc. Materials or downloads needed in advance Apache Spark This lesson is taken from by Jenny Kim and Benjamin Bengfort. Data Analytics with Hadoop

Writing code for R packages

R packages are a great way to share and create code that you and others can use over and over again. Why is it important? Developing R code for inclusion in a package is different than simply writing R scripts. What you'll learn—and how you can apply it Learn best practices for writing R code for packages: organizing your functions, code style recommendations, understanding and planning for how code will be run. Plan for the "unknowns" once you release a package to the world. Also includes hints for submitting a package to CRAN. This lesson is for you because… You're an R developer and need to package code so that others can reuse it You want to prepare a package to submit to CRAN Prerequisites Some familiarity with the R language Materials or downloads needed in advance Install R Install RStudio This lesson is taken from by Hadley Wickham. R Packages

The Data and Analytics Playbook

The Data and Analytics Playbook: Proven Methods for Governed Data and Analytic Quality explores the way in which data continues to dominate budgets, along with the varying efforts made across a variety of business enablement projects, including applications, web and mobile computing, big data analytics, and traditional data integration. The book teaches readers how to use proven methods and accelerators to break through data obstacles to provide faster, higher quality delivery of mission critical programs. Drawing upon years of practical experience, and using numerous examples and an easy to understand playbook, Lowell Fryman, Gregory Lampshire, and Dan Meers discuss a simple, proven approach to the execution of multiple data oriented activities. In addition, they present a clear set of methods to provide reliable governance, controls, risk, and exposure management for enterprise data and the programs that rely upon it. In addition, they discuss a cost-effective approach to providing sustainable governance and quality outcomes that enhance project delivery, while also ensuring ongoing controls. Example activities, templates, outputs, resources, and roles are explored, along with different organizational models in common use today and the ways they can be mapped to leverage playbook data governance throughout the organization. Provides a mature and proven playbook approach (methodology) to enabling data governance that supports agile implementation Features specific examples of current industry challenges in enterprise risk management, including anti-money laundering and fraud prevention Describes business benefit measures and funding approaches using exposure based cost models that augment risk models for cost avoidance analysis and accelerated delivery approaches using data integration sprints for application, integration, and information delivery success

IBM Tape Library Guide for Open Systems

This IBM® Redbooks® publication presents a general introduction to the latest IBM tape and tape library technologies. Featured tape technologies include the IBM LTO Ultrium and Enterprise 3592 tape drives, and their implementation in IBM tape libraries. This 13th edition includes information about the latest enhancements to the IBM TS4500 enterprise tape library. In particular, it includes details about the latest TS4500 High Availability feature and its elastic capacity option. This book also provides details about the new TS7650G IBM ProtecTIER® gateway model DD6, contains technical information about each IBM tape product for open systems, and includes generalized sections about Small Computer System Interface (SCSI) and Fibre Channel connections and multipath architecture configurations. This book also covers tools and techniques for library management. It is intended for anyone who wants to understand more about IBM tape products and their implementation. It is suitable for IBM clients, IBM Business Partners, IBM specialist sales representatives, and technical specialists. If you do not have a background in computer tape storage products, you might need to read other sources of information. In the interest of being concise, topics that are generally understood are not covered in detail.

Practical Hadoop Migration: How to Integrate Your RDBMS with the Hadoop Ecosystem and Re-Architect Relational Applications to NoSQL

Re-architect relational applications to NoSQL, integrate relational database management systems with the Hadoop ecosystem, and transform and migrate relational data to and from Hadoop components. This book covers the best-practice design approaches to re-architecting your relational applications and transforming your relational data to optimize concurrency, security, denormalization, and performance. Winner of IBM's 2012 Gerstner Award for his implementation of big data and data warehouse initiatives and author of Practical Hadoop Security, author Bhushan Lakhe walks you through the entire transition process. First, he lays out the criteria for deciding what blend of re-architecting, migration, and integration between RDBMS and HDFS best meets your transition objectives. Then he demonstrates how to design your transition model. Lakhe proceeds to cover the selection criteria for ETL tools, the implementation steps for migration with SQOOP- and Flume-based data transfers, and transition optimization techniques for tuning partitions, scheduling aggregations, and redesigning ETL. Finally, he assesses the pros and cons of data lakes and Lambda architecture as integrative solutions and illustrates their implementation with real-world case studies. Hadoop/NoSQL solutions do not offer by default certain relational technology features such as role-based access control, locking for concurrent updates, and various tools for measuring and enhancing performance. Practical Hadoop Migration shows how to use open-source tools to emulate such relational functionalities in Hadoop ecosystem components. What You'll Learn Decide whether you should migrate your relational applications to big data technologies or integrate them Transition your relational applications to Hadoop/NoSQL platforms in terms of logical design and physical implementation Discover RDBMS-to-HDFS integration, data transformation, and optimization techniques Consider when to use Lambda architecture and data lake solutions Select and implement Hadoop-based components and applications to speed transition, optimize integrated performance, and emulate relational functionalities Who This Book Is For Database developers, database administrators, enterprise architects, Hadoop/NoSQL developers, and IT leaders. Its secondary readership is project and program managers and advanced students of database and management information systems.

Enabling Real-time Analytics on IBM z Systems Platform

Regarding online transaction processing (OLTP) workloads, IBM® z Systems™ platform, with IBM DB2®, data sharing, Workload Manager (WLM), geoplex, and other high-end features, is the widely acknowledged leader. Most customers now integrate business analytics with OLTP by running, for example, scoring functions from transactional context for real-time analytics or by applying machine-learning algorithms on enterprise data that is kept on the mainframe. As a result, IBM adds investment so clients can keep the complete lifecycle for data analysis, modeling, and scoring on z Systems control in a cost-efficient way, keeping the qualities of services in availability, security, reliability that z Systems solutions offer. Because of the changed architecture and tighter integration, IBM has shown, in a customer proof-of-concept, that a particular client was able to achieve an orders-of-magnitude improvement in performance, allowing that client’s data scientist to investigate the data in a more interactive process. Open technologies, such as Predictive Model Markup Language (PMML) can help customers update single components instead of being forced to replace everything at once. As a result, you have the possibility to combine your preferred tool for model generation (such as SAS Enterprise Miner or IBM SPSS® Modeler) with a different technology for model scoring (such as Zementis, a company focused on PMML scoring). IBM SPSS Modeler is a leading data mining workbench that can apply various algorithms in data preparation, cleansing, statistics, visualization, machine learning, and predictive analytics. It has over 20 years of experience and continued development, and is integrated with z Systems. With IBM DB2 Analytics Accelerator 5.1 and SPSS Modeler 17.1, the possibility exists to do the complete predictive model creation including data transformation within DB2 Analytics Accelerator. So, instead of moving the data to a distributed environment, algorithms can be pushed to the data, using cost-efficient DB2 Accelerator for the required resource-intensive operations. This IBM Redbooks® publication explains the overall z Systems architecture, how the components can be installed and customized, how the new IBM DB2 Analytics Accelerator loader can help efficient data loading for z Systems data and external data, how in-database transformation, in-database modeling, and in-transactional real-time scoring can be used, and what other related technologies are available. This book is intended for technical specialists and architects, and data scientists who want to use the technology on the z Systems platform. Most of the technologies described in this book require IBM DB2 for z/OS®. For acceleration of the data investigation, data transformation, and data modeling process, DB2 Analytics Accelerator is required. Most value can be archived if most of the data already resides on z Systems platforms, although adding external data (like from social sources) poses no problem at all.

Implementing or Migrating to an IBM Gen 5 b-type SAN

The IBM® b-type Gen 5 Fibre Channel directors and switches provide reliable, scalable, and secure high-performance foundations for high-density server virtualization, cloud architectures, and next generation flash and SSD storage. They are designed to meet the demands of highly virtualized private cloud storage and data center environments. This IBM Redbooks® publication helps administrators learn how to implement or migrate to an IBM Gen 5 b-type SAN. It provides an overview of the key hardware and software products and explains how to install, monitor, tune, and troubleshoot your storage area network (SAN). Read this publication to learn about fabric design, managing and monitoring your network, key tools such as IBM Network Advisor and Fabric Vision, and troubleshooting.

T-SQL Fundamentals, Third Edition

Effectively query and modify data using Transact-SQL Master T-SQL fundamentals and write robust code for Microsoft SQL Server and Azure SQL Database. Itzik Ben-Gan explains key T-SQL concepts and helps you apply your knowledge with hands-on exercises. The book first introduces T-SQL’s roots and underlying logic. Next, it walks you through core topics such as single-table queries, joins, subqueries, table expressions, and set operators. Then the book covers more-advanced data-query topics such as window functions, pivoting, and grouping sets. The book also explains how to modify data, work with temporal tables, and handle transactions, and provides an overview of programmable objects. Related Content Book: T-SQL Fundamentals, 4th Edition Microsoft Data Platform MVP Itzik Ben-Gan shows you how to: Review core SQL concepts and its mathematical roots Create tables and enforce data integrity Perform effective single-table queries by using the SELECT statement Query multiple tables by using joins, subqueries, table expressions, and set operators Use advanced query techniques such as window functions, pivoting, and grouping sets Insert, update, delete, and merge data Use transactions in a concurrent environment Get started with programmable objects–from variables and batches to user-defined functions, stored procedures, triggers, and dynamic SQL

A Recipe for Success Using SAS University Edition

Filled with helpful examples and real-life projects of SAS users, A Recipe for Success Using SAS University Edition is an easy guide on how to start applying the analytical power of SAS to real-world scenarios. This book shows you: how to start using analytics how to use SAS to accomplish a project goal how to effectively apply SAS to your community or school how users like you implemented SAS to solve their analytical problems A beginner’s guide on how to create and complete your first analytics project using SAS University Edition, this book is broken down into easy-to-read chapters that also include quick takeaway tips. It introduces you to the vocabulary and structure of the SAS language, shows you how to plan and execute a successful project, introduces you to basic statistics, and it walks you through case studies to inspire and motivate you to complete your own projects. Following a recipe for success using this book, harness the power of SAS to plan and complete your first analytics project!