data

Statistical Intervals, 2nd Edition

2017-04-10 · O'Reilly Data Science Books O'Reilly Amazon

book

by Gerald J. Hahn , William Q. Meeker , Luis A. Escobar

data-science data-science-tasks statistics

Describes statistical intervals to quantify sampling uncertainty,focusing on key application needs and recently developed methodology in an easy-to-apply format Statistical intervals provide invaluable tools for quantifying sampling uncertainty. The widely hailed first edition, published in 1991, described the use and construction of the most important statistical intervals. Particular emphasis was given to intervals—such as prediction intervals, tolerance intervals and confidence intervals on distribution quantiles—frequently needed in practice, but often neglected in introductory courses. Vastly improved computer capabilities over the past 25 years have resulted in an explosion of the tools readily available to analysts. This second edition—more than double the size of the first—adds these new methods in an easy-to-apply format. In addition to extensive updating of the original chapters, the second edition includes new chapters on: • Likelihood-based statistical intervals • Nonparametric bootstrap intervals • Parametric bootstrap and other simulation-based intervals • An introduction to Bayesian intervals • Bayesian intervals for the popular binomial, Poisson and normal distributions • Statistical intervals for Bayesian hierarchical models • Advanced case studies, further illustrating the use of the newly described methods New technical appendices provide justification of the methods and pathways to extensions and further applications. A webpage directs readers to current readily accessible computer software and other useful information. Statistical Intervals: A Guide for Practitioners and Researchers, Second Edition is an up-to-date working guide and reference for all who analyze data, allowing them to quantify the uncertainty in their results using statistical intervals. William Q. Meeker is Professor of Statistics and Distinguished Professor of Liberal Arts and Sciences at Iowa State University. He is co-author of Statistical Methods for Reliability Data (Wiley, 1998) and of numerous publications in the engineering and statistical literature and has won many awards for his research. Gerald J. Hahn served for 46 years as applied statistician and manager of an 18-person statistics group supporting General Electric and has co-authored four books. His accomplishments have been recognized by GE’s prestigious Coolidge Fellowship and 19 professional society awards. Luis A. Escobar is Professor of Statistics at Louisiana State University. He is co-author of Statistical Methods for Reliability Data (Wiley, 1998) and several book chapters. His publications have appeared in the engineering and statistical literature and he has won several research and teaching awards.

Sams Teach Yourself Hadoop in 24 Hours

2017-04-07 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jeffrey Aven

API Big Data Cloud Computing Hadoop HDFS Hive Java Spark data-engineering

Apache Hadoop is the technology at the heart of the Big Data revolution, and Hadoop skills are in enormous demand. Now, in just 24 lessons of one hour or less, you can learn all the skills and techniques you'll need to deploy each key component of a Hadoop platform in your local environment or in the cloud, building a fully functional Hadoop cluster and using it with real programs and datasets. Each short, easy lesson builds on all that's come before, helping you master all of Hadoop's essentials, and extend it to meet your unique challenges. Apache Hadoop in 24 Hours, Sams Teach Yourself covers all this, and much more: Understanding Hadoop and the Hadoop Distributed File System (HDFS) Importing data into Hadoop, and process it there Mastering basic MapReduce Java programming, and using advanced MapReduce API concepts Making the most of Apache Pig and Apache Hive Implementing and administering YARN Taking advantage of the full Hadoop ecosystem Managing Hadoop clusters with Apache Ambari Working with the Hadoop User Environment (HUE) Scaling, securing, and troubleshooting Hadoop environments Integrating Hadoop into the enterprise Deploying Hadoop in the cloud Getting started with Apache Spark Step-by-step instructions walk you through common questions, issues, and tasks; Q-and-As, Quizzes, and Exercises build and test your knowledge; "Did You Know?" tips offer insider advice and shortcuts; and "Watch Out!" alerts help you avoid pitfalls. By the time you're finished, you'll be comfortable using Apache Hadoop to solve a wide spectrum of Big Data problems.

Usage-Driven Database Design: From Logical Data Modeling through Physical Schema Definition

2017-04-07 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by George Tillmann

Big Data Cassandra Data Modelling Hadoop NoSQL Oracle SQL data-engineering data-models

Design great databases—from logical data modeling through physical schema definition. You will learn a framework that finally cracks the problem of merging data and process models into a meaningful and unified design that accounts for how data is actually used in production systems. Key to the framework is a method for taking the logical data model that is a static look at the definition of the data, and merging that static look with the process models describing how the data will be used in actual practice once a given system is implemented. The approach solves the disconnect between the static definition of data in the logical data model and the dynamic flow of the data in the logical process models. The design framework in this book can be used to create operational databases for transaction processing systems, or for data warehouses in support of decision support systems. The information manager can be a flat file, Oracle Database, IMS, NoSQL, Cassandra, Hadoop, or any other DBMS. Usage-Driven Database Design emphasizes practical aspects of design, and speaks to what works, what doesn't work, and what to avoid at all costs. Included in the book are lessons learned by the author over his 30+ years in the corporate trenches. Everything in the book is grounded on good theory, yet demonstrates a professional and pragmatic approach to design that can come only from decades of experience. Presents an end-to-end framework from logical data modeling through physical schema definition. Includes lessons learned, techniques, and tricks that can turn a database disaster into a success. Applies to all types of database management systems, including NoSQL such as Cassandra and Hadoop, and mainstream SQL databases such as Oracle and SQL Server What You'll Learn Create logical data models that accurately reflect the real world of the user Create usage scenarios reflecting how applications will use a new database Merge static data models with dynamic process models to create resilient yet flexible database designs Support application requirements by creating responsive database schemas in any database architecture Cope with big data and unstructured data for transaction processing and decision support systems Recognize when relational approaches won't work, and when to turn toward NoSQL solutions such as Cassandra or Hadoop Who This Book Is For System developers, including business analysts, database designers, database administrators, and application designers and developers who must design or interact with database systems

Exam Ref 70-761 Querying Data with Transact-SQL, 1st Edition

2017-04-06 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Itzik Ben-Gan

Azure Cloud Computing Data Management JSON Microsoft SQL XML data-engineering microsoft-sql-server relational-databases transact-sql

Prepare for Microsoft Exam 70-761–and help demonstrate your real-world mastery of SQL Server 2016 Transact-SQL data management, queries, and database programming. Designed for experienced IT professionals ready to advance their status, Exam Ref focuses on the critical-thinking and decision-making acumen needed for success at the MCSA level. Focus on the expertise measured by these objectives: Filter, sort, join, aggregate, and modify data Use subqueries, table expressions, grouping sets, and pivoting Query temporal and non-relational data, and output XML or JSON Create views, user-defined functions, and stored procedures Implement error handling, transactions, data types, and nulls This Microsoft Exam Ref: Organizes its coverage by exam objectives Features strategic, what-if scenarios to challenge you Assumes you have experience working with SQL Server as a database administrator, system engineer, or developer Includes downloadable sample database and code for SQL Server 2016 SP1 (or later) and Azure SQL Database Querying Data with Transact-SQL About the Exam Exam 70-761 focuses on the skills and knowledge necessary to manage and query data and to program databases with Transact-SQL in SQL Server 2016. About Microsoft Certification Passing this exam earns you credit toward a Microsoft Certified Solutions Associate (MCSA) certification that demonstrates your mastery of essential skills for building and implementing on-premises and cloud-based databases across organizations. Exam 70-762 (Developing SQL Databases) is also required for MCSA: SQL 2016 Database Development certification. See full details at: microsoft.com/learning

Oracle SQL Tuning with Oracle SQLTXPLAIN: Oracle Database 12c Edition, Second Edition

2017-04-06 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Stelios Charalambides

Oracle SQL data-engineering

Learn through this practical guide to SQL tuning how Oracle's own experts do it, using a freely downloadable tool called SQLTXPLAIN. This new edition has been expanded to include AWR, Oracle 12c Statistics, interpretation of SQL Monitor reports, Parallel execution, and Exadata-related features. Reading this book and using SQL helps you learn to tune even the most complex SQL, and you'll learn to do it quickly, without the huge learning curve usually associated with tuning as a whole. Firmly based in real-world problems, this book helps you reclaim system resources and avoid the most common bottleneck in overall performance, badly tuned SQL. You'll learn how the optimizer works, how to take advantage of its latest features, and when it's better to turn them off. Best of all, the book is updated to cover the very latest feature set in Oracle Database 12c. Covers AWR report integration Helps with SQL Monitor Report Interpretation Provides a reliable method that is repeatable Shows the very latest tuning features in Oracle Database 12c Enables the building of test cases without affecting production What You Will Learn Identify how and why complex SQL has gone wrong Correctly interpret AWR reports generated via SQLTXPLAIN Collect the best statistics for your environment Know when to invoke built-in tuning facilities Recognize when tuning is not the solution Spot the steps in a SQL statement's execution plan that are critical to performance of that statement Modify your SQL to solve performance problems and increase the speed and throughput of production database systems Who This Book Is For Anyone who deals with SQL and SQL tuning. Both developers and DBAs will benefit from learning how to use the SQLTXPLAIN tool, and from the problem solving methodology in this book.

Myths of PR

2017-04-03 · O'Reilly Data Science Books O'Reilly Amazon

book

by Rich Leigh

data-science data-science-as-a-profession

Explore some of the most widely-accepted myths that permeate PR with this fascinating examination of the industry.

D3.js: Cutting-edge Data Visualization

2017-03-31 · O'Reilly Data Visualization Books O'Reilly Amazon

book

by Ændrew H. Rininsland , Michael Heydt , Pablo Navarro Castillo

DataViz HTML JavaScript d3 data-science data-science-tasks data-visualization

Turn your raw data into real knowledge by creating and deploying complex data visualizations with D3.js About This Book Understand how to best represent your data by developing the right kind of visualization Explore the concepts of D3.js through examples that enable you to quickly create visualizations including charts, network diagrams, and maps Get practical examples of visualizations using real-world data sets that show you how to use D3.js to visualize and interact with information to glean its underlying meaning Who This Book Is For Whether you are new to data and data visualization, a seasoned data scientist, or a computer graphics specialist, this Learning Path will provide you with the skills you need to create web-based and interactive data visualizations. Some basic JavaScript knowledge is expected, but no prior experience with data visualization or D3 is required What You Will Learn Gain a solid understanding of the common D3 development idioms Find out how to write basic D3 code for servers using Node.js Install and use D3.js to create HTML elements within a document Create and style graphical elements such as circles, ellipses, rectangles, lines, paths, and text using SVG Turn your data into bar and scatter charts, and add margins, axes, labels, and legends Use D3.js generators to perform the magic of creating complex visualizations from data Add interactivity to your visualizations, including tool-tips, sorting, hover-to-highlight, and grouping and dragging of visuals Write, test, and distribute a D3-based charting package Make a real-time application with Node and D3 In Detail D3 has emerged as one of the leading platforms to develop beautiful, interactive visualizations over the web. We begin the course by setting up a strong foundation, then build on this foundation as we take you through the entire world of reimagining data using interactive, animated visualizations created in D3.js. In the first module, we cover the various features of D3.js to build a wide range of visualizations. We also focus on the entire process of representing data through visualizations. By the end of this module, you will be ready to use D3 to transform any data into a more engaging and sophisticated visualization. In the next module, you will learn to master the creation of graphical elements from data. Using practical examples provided, you will quickly get to grips with the features of D3.js and use this learning to create your own spectacular data visualizations with D3.js. Over the last leg of this course, you will get acquainted with how to integrate D3 with mapping libraries to provide reverse geocoding and interactive maps among many other advanced features of D3. This module culminates by showing you how to create enterprise-level dashboards to display real-time data. This Learning Path combines some of the best that Packt has to offer in one complete, curated package. It includes content from the following Packt products: Learning D3.js Data Visualization, Second Edition by Andrew H. Rininsland D3.js By Example by Michael Heydt Mastering D3.js by Pablo Navarro Castillo Style and approach This course provides a comprehensive explanation of how to leverage the power of D3.js to create powerful and creative visualizations through step-by-step instructions in the form of modules. Each module help you skill up a level in creating meaningful visualizations. Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.

R: Predictive Analysis

2017-03-31 · O'Reilly Data Science Books O'Reilly Amazon

book

by Tony Fischetti , Eric Mayor , Rui Miguel Forte

Analytics DataViz JSON XML data-science data-science-tools r

Master the art of predictive modeling About This Book Load, wrangle, and analyze your data using the world's most powerful statistical programming language Familiarize yourself with the most common data mining tools of R, such as k-means, hierarchical regression, linear regression, Naïve Bayes, decision trees, text mining and so on. We emphasize important concepts, such as the bias-variance trade-off and over-fitting, which are pervasive in predictive modeling Who This Book Is For If you work with data and want to become an expert in predictive analysis and modeling, then this Learning Path will serve you well. It is intended for budding and seasoned practitioners of predictive modeling alike. You should have basic knowledge of the use of R, although it’s not necessary to put this Learning Path to great use. What You Will Learn Get to know the basics of R’s syntax and major data structures Write functions, load data, and install packages Use different data sources in R and know how to interface with databases, and request and load JSON and XML Identify the challenges and apply your knowledge about data analysis in R to imperfect real-world data Predict the future with reasonably simple algorithms Understand key data visualization and predictive analytic skills using R Understand the language of models and the predictive modeling process In Detail Predictive analytics is a field that uses data to build models that predict a future outcome of interest. It can be applied to a range of business strategies and has been a key player in search advertising and recommendation engines. The power and domain-specificity of R allows the user to express complex analytics easily, quickly, and succinctly. R offers a free and open source environment that is perfect for both learning and deploying predictive modeling solutions in the real world. This Learning Path will provide you with all the steps you need to master the art of predictive modeling with R. We start with an introduction to data analysis with R, and then gradually you’ll get your feet wet with predictive modeling. You will get to grips with the fundamentals of applied statistics and build on this knowledge to perform sophisticated and powerful analytics. You will be able to solve the difficulties relating to performing data analysis in practice and find solutions to working with “messy data”, large data, communicating results, and facilitating reproducibility. You will then perform key predictive analytics tasks using R, such as train and test predictive models for classification and regression tasks, score new data sets and so on. By the end of this Learning Path, you will have explored and tested the most popular modeling techniques in use on real-world data sets and mastered a diverse range of techniques in predictive analytics. This Learning Path combines some of the best that Packt has to offer in one complete, curated package. It includes content from the following Packt products: Data Analysis with R, Tony Fischetti Learning Predictive Analytics with R, Eric Mayor Mastering Predictive Analytics with R, Rui Miguel Forte Style and approach Learn data analysis using engaging examples and fun exercises, and with a gentle and friendly but comprehensive "learn-by-doing" approach. This is a practical course, which analyzes compelling data about life, health, and death with the help of tutorials. It offers you a useful way of interpreting the data that’s specific to this course, but that can also be applied to any other data. This course is designed to be both a guide and a reference for moving beyond the basics of predictive modeling. Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.

Mastering Spark for Data Science

2017-03-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Matthew Hallett , David George , Antoine Amend (Databricks) , Andrew Morgan

AI/ML Analytics API Big Data Data Science Spark SQL Data Streaming apache-spark data-engineering

Master the techniques and sophisticated analytics used to construct Spark-based solutions that scale to deliver production-grade data science products About This Book Develop and apply advanced analytical techniques with Spark Learn how to tell a compelling story with data science using Spark’s ecosystem Explore data at scale and work with cutting edge data science methods Who This Book Is For This book is for those who have beginner-level familiarity with the Spark architecture and data science applications, especially those who are looking for a challenge and want to learn cutting edge techniques. This book assumes working knowledge of data science, common machine learning methods, and popular data science tools, and assumes you have previously run proof of concept studies and built prototypes. What You Will Learn Learn the design patterns that integrate Spark into industrialized data science pipelines See how commercial data scientists design scalable code and reusable code for data science services Explore cutting edge data science methods so that you can study trends and causality Discover advanced programming techniques using RDD and the DataFrame and Dataset APIs Find out how Spark can be used as a universal ingestion engine tool and as a web scraper Practice the implementation of advanced topics in graph processing, such as community detection and contact chaining Get to know the best practices when performing Extended Exploratory Data Analysis, commonly used in commercial data science teams Study advanced Spark concepts, solution design patterns, and integration architectures Demonstrate powerful data science pipelines In Detail Data science seeks to transform the world using data, and this is typically achieved through disrupting and changing real processes in real industries. In order to operate at this level you need to build data science solutions of substance –solutions that solve real problems. Spark has emerged as the big data platform of choice for data scientists due to its speed, scalability, and easy-to-use APIs. This book deep dives into using Spark to deliver production-grade data science solutions. This process is demonstrated by exploring the construction of a sophisticated global news analysis service that uses Spark to generate continuous geopolitical and current affairs insights.You will learn all about the core Spark APIs and take a comprehensive tour of advanced libraries, including Spark SQL, Spark Streaming, MLlib, and more. You will be introduced to advanced techniques and methods that will help you to construct commercial-grade data products. Focusing on a sequence of tutorials that deliver a working news intelligence service, you will learn about advanced Spark architectures, how to work with geographic data in Spark, and how to tune Spark algorithms so they scale linearly. Style and approach This is an advanced guide for those with beginner-level familiarity with the Spark architecture and working with Data Science applications. Mastering Spark for Data Science is a practical tutorial that uses core Spark APIs and takes a deep dive into advanced libraries including: Spark SQL, visual streaming, and MLlib. This book expands on titles like: Machine Learning with Spark and Learning Spark. It is the next learning curve for those comfortable with Spark and looking to improve their skills.

PostgreSQL High Performance Cookbook

2017-03-29 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Dinesh Kumar , Chitij Chauhan

data-engineering postgresql relational-databases

This book is your definitive guide to understanding and improving PostgreSQL database performance. You'll learn about query optimization, database monitoring, and advanced memory and configuration techniques. With examples and clear explanations, you'll gain the skills to identify performance bottlenecks and make your database system highly efficient. What this Book will help me do Effectively optimize PostgreSQL queries to enhance response times. Utilize robust server monitoring techniques to identify and address inefficiencies. Implement memory optimization strategies to maximize server performance. Master replication and failover methods for high availability. Build strategies for secure and efficient database migrations. Author(s) Dinesh Kumar and None Chauhan are experienced database professionals with years of expertise in working with PostgreSQL. They have been involved in database design, optimization, and innovations in open-source database technologies. Their teaching approach is clear and actionable, making the topics accessible to both beginners and seasoned professionals. Who is it for? This book is ideally suited for developers and database administrators with a basic understanding of PostgreSQL. If you're seeking practical guidance to enhance your PostgreSQL performance tuning and maintenance skills, this book is designed for you. It covers concepts for professionals looking to advance their database expertise. Beginners in database management who are motivated to learn advanced techniques will also find it approachable.

Learning Apache Spark 2

2017-03-28 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Muhammad Asif Abbasi

AI/ML Analytics Big Data Data Analytics Scala Spark SQL Data Streaming apache-spark data-engineering

Dive into the world of Big Data with "Learning Apache Spark 2". This book introduces you to the powerful Apache Spark framework, tailored for real-time data analytics and machine learning. Through practical examples and real-world use-cases, you'll gain hands-on experience in leveraging Spark's capabilities for your data processing needs. What this Book will help me do Master the fundamentals of Apache Spark 2 and its new features. Effectively use Spark SQL, MLlib, RDDs, GraphX, and Spark Streaming to tackle real-world challenges. Gain skills in data processing, transformation, and analysis with Spark. Deploy and operate your Spark applications in clustered environments. Develop your own recommendation engines and predictive analytics models with Spark. Author(s) None Abbasi brings a wealth of expertise in Big Data technologies with a keen focus on simplifying complex concepts for learners. With substantial experience working in data processing frameworks, their approach to teaching creates an engaging and practical learning experience. With "Learning Apache Spark 2", None empowers readers to confidently tackle challenges in Big Data processing and analytics. Who is it for? This book is ideal for aspiring Big Data professionals seeking an accessible introduction to Apache Spark. Beginners in Spark will find step-by-step guidance, while those familiar with earlier versions will appreciate the insights into Spark 2's new features. Familiarity with Big Data concepts and Scala programming is recommended for optimal understanding.

Oracle Database 12c Release 2 Performance Tuning Tips & Techniques

2017-03-22 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Richard Niemiec

Cloud Computing Oracle Cyber Security SQL data-engineering oracle-database-solutions

Proven Database Optimization Solutions―Fully Updated for Oracle Database 12c Release 2 Systematically identify and eliminate database performance problems with help from Oracle Certified Master Richard Niemiec. Filled with real-world case studies and best practices, Oracle Database 12c Release 2 Performance Tuning Tips and Techniques details the latest monitoring, troubleshooting, and optimization methods. Find out how to identify and fix bottlenecks on premises and in the cloud, configure storage devices, execute effective queries, and develop bug-free SQL and PL/SQL code. Testing, reporting, and security enhancements are also covered in this Oracle Press guide. • Properly index and partition Oracle Database 12c Release 2 • Work effectively with Oracle Cloud, Oracle Exadata, and Oracle Enterprise Manager • Efficiently manage disk drives, ASM, RAID arrays, and memory • Tune queries with Oracle SQL hints and the Trace utility • Troubleshoot databases using V$ views and X$ tables • Create your first cloud database service and prepare for hybrid cloud • Generate reports using Oracle’s Statspack and Automatic Workload Repository tools • Use sar, vmstat, and iostat to monitor operating system statistics

SQL Server 2016 Developer's Guide

2017-03-22 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Milo≈° Radivojeviƒá , William Durkin , Dejan Sarka

Analytics JSON Cyber Security SQL data-engineering microsoft-sql-server relational-databases

SQL Server 2016 Developer's Guide provides an in-depth overview of the new features and enhancements introduced in SQL Server 2016 that can significantly improve your development process. This book covers robust techniques for building high-performance, secure database applications while leveraging cutting-edge functionalities such as Stretch Database, temporal tables, and enhanced In-Memory OLTP capabilities. What this Book will help me do Master the new development features introduced in SQL Server 2016 and understand their applications. Use In-Memory OLTP enhancements to significantly boost application performance. Efficiently manage and analyze data using temporal tables and JSON integration. Explore SQL Server security enhancements to ensure data safety and access control. Gain insights into integrating R with SQL Server 2016 for advanced analytics. Author(s) None Radivojević, Dejan Sarka, and William Durkin are experienced database developers and architects with a strong focus on SQL Server technologies. They bring years of practical experience and a clear, insightful approach to teaching complex concepts. Their expertise shines in this comprehensive guide, providing readers with both foundational knowledge and advanced techniques. Who is it for? This guide is perfect for database developers and solution architects looking to harness the full potential of SQL Server 2016's new features. It's intended for professionals with prior experience in SQL Server or similar platforms who aim to develop efficient, high-performance applications. You'll benefit from this book if you are keen to master SQL Server 2016 and elevate your development skills.

An Introduction to SAS Visual Analytics

2017-03-21 · O'Reilly Data Science Books O'Reilly Amazon

book

by Tricia Aanderud , Rob Collum , Ryan Kumpfmiller

Analytics BI Dashboard SAS analytics-platforms data-science

When it comes to business intelligence and analytical capabilities, SAS Visual Analytics is the premier solution for data discovery, visualization, and reporting. An Introduction to SAS Visual Analytics will show you how to make sense of your complex data with the goal of leading you to smarter, data-driven decisions without having to write a single line of code – unless you want to! You will be able to use SAS Visual Analytics to access, prepare, and present your data to anyone anywhere in the world. SAS Visual Analytics automatically highlights key relationships, outliers, clusters, trends and more. These abilities will guide you to critical insights that inspire action from your data. With this book, you will become proficient using SAS Visual Analytics to present data and results in customizable, robust visualizations, as well as guided analyses through auto-charting. With interactive dashboards, charts, and reports, you will create visualizations which convey clear and actionable insights for any size and type of data. This book largely focuses on the version of SAS Visual Analytics on SAS 9.4, although it is available on both 9.4 and SAS Viya platforms. Each version is considered the latest release, with subsequent releases planned to continue on each platform; hence, the Viya version works similarly to the 9.4 version and will look familiar. This book covers new features of each and important differences between the two. With this book, you will learn how to: Build your first report using the SAS Visual Analytics Designer Prepare a dashboard and determine the best layout Effectively use geo-spatial objects to add location analytics to reports Understand and use the elements of data visualizations Prepare and load your data with the SAS Visual Analytics Data Builder Analyze data with a variety of options, including forecasting, word clouds, heat maps, correlation matrix, and more Understand administration activities to keep SAS Visual Analytics humming along Optimize your environment for considerations such as scalability, availability, and efficiency between components of your SAS software deployment and data providers

Introduction to Bayesian Estimation and Copula Models of Dependence

2017-03-20 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Arkady Shemyakin , Alexander Kniazev

Microsoft Monte Carlo data-engineering data-models

Presents an introduction to Bayesian statistics, presents an emphasis on Bayesian methods (prior and posterior), Bayes estimation, prediction, MCMC,Bayesian regression, and Bayesian analysis of statistical modelsof dependence, and features a focus on copulas for risk management Introduction to Bayesian Estimation and Copula Models of Dependence emphasizes the applications of Bayesian analysis to copula modeling and equips readers with the tools needed to implement the procedures of Bayesian estimation in copula models of dependence. This book is structured in two parts: the first four chapters serve as a general introduction to Bayesian statistics with a clear emphasis on parametric estimation and the following four chapters stress statistical models of dependence with a focus of copulas. A review of the main concepts is discussed along with the basics of Bayesian statistics including prior information and experimental data, prior and posterior distributions, with an emphasis on Bayesian parametric estimation. The basic mathematical background of both Markov chains and Monte Carlo integration and simulation is also provided. The authors discuss statistical models of dependence with a focus on copulas and present a brief survey of pre-copula dependence models. The main definitions and notations of copula models are summarized followed by discussions of real-world cases that address particular risk management problems. In addition, this book includes: • Practical examples of copulas in use including within the Basel Accord II documents that regulate the world banking system as well as examples of Bayesian methods within current FDA recommendations • Step-by-step procedures of multivariate data analysis and copula modeling, allowing readers to gain insight for their own applied research and studies • Separate reference lists within each chapter and end-of-the-chapter exercises within Chapters 2 through 8 • A companion website containing appendices: data files and demo files in Microsoft® Office Excel®, basic code in R, and selected exercise solutions Introduction to Bayesian Estimation and Copula Models of Dependence is a reference and resource for statisticians who need to learn formal Bayesian analysis as well as professionals within analytical and risk management departments of banks and insurance companies who are involved in quantitative analysis and forecasting. This book can also be used as a textbook for upper-undergraduate and graduate-level courses in Bayesian statistics and analysis. ARKADY SHEMYAKIN, PhD, is Professor in the Department of Mathematics and Director of the Statistics Program at the University of St. Thomas. A member of the American Statistical Association and the International Society for Bayesian Analysis, Dr. Shemyakin's research interests include informationtheory, Bayesian methods of parametric estimation, and copula models in actuarial mathematics, finance, and engineering. ALEXANDER KNIAZEV, PhD, is Associate Professor and Head of the Department of Mathematics at Astrakhan State University in Russia. Dr. Kniazev's research interests include representation theory of Lie algebras and finite groups, mathematical statistics, econometrics, and financial mathematics.

Statistical Analysis with R For Dummies

2017-03-20 · O'Reilly Data Science Books O'Reilly Amazon

book

by Joseph Schmuller

Analytics Data Science data-science data-science-tools r

Understanding the world of R programming and analysis has never been easier Most guides to R, whether books or online, focus on R functions and procedures. But now, thanks to Statistical Analysis with R For Dummies, you have access to a trusted, easy-to-follow guide that focuses on the foundational statistical concepts that R addresses—as well as step-by-step guidance that shows you exactly how to implement them using R programming. People are becoming more aware of R every day as major institutions are adopting it as a standard. Part of its appeal is that it's a free tool that's taking the place of costly statistical software packages that sometimes take an inordinate amount of time to learn. Plus, R enables a user to carry out complex statistical analyses by simply entering a few commands, making sophisticated analyses available and understandable to a wide audience. Statistical Analysis with R For Dummies enables you to perform these analyses and to fully understand their implications and results. Gets you up to speed on the #1 analytics/data science software tool Demonstrates how to easily find, download, and use cutting-edge community-reviewed methods in statistics and predictive modeling Shows you how R offers intel from leading researchers in data science, free of charge Provides information on using R Studio to work with R Get ready to use R to crunch and analyze your data—the fast and easy way!

Designing Data-Intensive Applications

2017-03-16 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Martin Kleppmann

NoSQL RDBMS data-engineering

Data is at the center of many challenges in system design today. Difficult issues need to be figured out, such as scalability, consistency, reliability, efficiency, and maintainability. In addition, we have an overwhelming variety of tools, including relational databases, NoSQL datastores, stream or batch processors, and message brokers. What are the right choices for your application? How do you make sense of all these buzzwords? In this practical and comprehensive guide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data. Software keeps changing, but the fundamental principles remain the same. With this book, software engineers and architects will learn how to apply those ideas in practice, and how to make full use of data in modern applications. Peer under the hood of the systems you already use, and learn how to use and operate them more effectively Make informed decisions by identifying the strengths and weaknesses of different tools Navigate the trade-offs around consistency, scalability, fault tolerance, and complexity Understand the distributed systems research upon which modern databases are built Peek behind the scenes of major online services, and learn from their architectures

Creating a Data-Driven Enterprise with DataOps

2017-03-15 · O'Reilly Data Science Books O'Reilly Amazon

book

by Ashish Thusoo , Joydeep Sen Sarma

Big Data Cloud Computing DataOps DevOps Hive data-science

Many companies are busy collecting massive amounts of data, but few are taking advantage of this treasure horde to build a truly data insights-driven organization. To do so, the data team must democratize both data and the insights in a way that provides real-time access to all employees in the organization. This report explores DataOps, the process, culture, tools, and people required to scale big data pervasively across the enterprise. Just as DevOps has enabled organizations to improve coordination between developers and the operations team, DataOps closely connects everyone who handles data, including engineers, data scientists, analysts, and business users. Democratizing data with this approach requires removing barriers typical of siloed data, teams, and systems. In this report, Apache Hive creators Ashish Thusoo and Joydeep Sen Sarma examine the characteristics of a data-driven organization that supports a self-service model. Explore related topics such as data lakes, metadata, cloud architecture, and data-infrastructure-as-a-service Examine conclusions from a survey of more than 400 senior executives whose companies are in various stages of data maturity Learn how data pioneers at Facebook, Uber, LinkedIn, Twitter, and eBay created data-driven cultures and self-service data infrastructures for their organizations

DS8000 Copy Services

2017-03-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Lukasz Drózda , Warren Stanley , Roland Wolf , Lisa Gundy , Alcides Bertazi , Axel Westphal , Michael Frankenberg , Bert Dufrasne , Cay-Uwe Kulzer

IBM data-engineering

Abstract This IBM® Redbooks® publication helps you plan, install, tailor, configure, and manage Copy Services on the IBM DS8000® operating in an IBM z Systems® or Open Systems environment. This book helps you design and implement a new Copy Services installation or migrate from an existing installation. It includes hints and tips to maximize the effectiveness of your installation, and information about tools and products to automate Copy Services functions. It is intended for anyone who needs a detailed and practical understanding of the DS8000 Copy Services.

Understanding Metadata

2017-03-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Scott Gidley , Federico Castanedo

Big Data Data Governance Data Lake IBM Informatica Teradata Trifacta data-engineering metadata

One viable option for organizations looking to harness massive amounts of data is the data lake, a single repository for storing all the raw data, both structured and unstructured, that floods into the company. But that isn’t the end of the story. The key to making a data lake work is data governance, using metadata to provide valuable context through tagging and cataloging. This practical report examines why metadata is essential for managing, migrating, accessing, and deploying any big data solution. Authors Federico Castanedo and Scott Gidley dive into the specifics of analyzing metadata for keeping track of your data—where it comes from, where it’s located, and how it’s being used—so you can provide safeguards and reduce risk. In the process, you’ll learn about methods for automating metadata capture. This report also explains the main features of a data lake architecture, and discusses the pros and cons of several data lake management solutions that support metadata. These solutions include: Traditional data integration/management vendors such as the IBM Research Accelerated Discovery Lab Tooling from open source projects, including Teradata Kylo and Informatica Startups such as Trifacta and Zaloni that provide best of breed technology

talk-data.com

Activity Trend

Top Events

Top Speakers

Statistical Intervals, 2nd Edition

Sams Teach Yourself Hadoop in 24 Hours

Usage-Driven Database Design: From Logical Data Modeling through Physical Schema Definition

Exam Ref 70-761 Querying Data with Transact-SQL, 1st Edition

Oracle SQL Tuning with Oracle SQLTXPLAIN: Oracle Database 12c Edition, Second Edition

Myths of PR

D3.js: Cutting-edge Data Visualization

R: Predictive Analysis

Mastering Spark for Data Science

PostgreSQL High Performance Cookbook

Learning Apache Spark 2

Oracle Database 12c Release 2 Performance Tuning Tips & Techniques

SQL Server 2016 Developer's Guide

An Introduction to SAS Visual Analytics

Introduction to Bayesian Estimation and Copula Models of Dependence

Statistical Analysis with R For Dummies

Designing Data-Intensive Applications

Creating a Data-Driven Enterprise with DataOps

DS8000 Copy Services

Understanding Metadata