O'Reilly Data Science Books

Fundamentals of Predictive Analytics with JMP, Second Edition

2017-12-19 O'Reilly Amazon

book

Ron Klimberg , B. D. McCullough

data data-science web-analytics google-analytics Analytics BI

Written for students in undergraduate and graduate statistics courses, as well as for the practitioner who wants to make better decisions from data and models, this updated and expanded second edition of Fundamentals of Predictive Analytics with JMP(R) bridges the gap between courses on basic statistics, which focus on univariate and bivariate analysis, and courses on data mining and predictive analytics. Going beyond the theoretical foundation, this book gives you the technical knowledge and problem-solving skills that you need to perform real-world multivariate data analysis. First, this book teaches you to recognize when it is appropriate to use a tool, what variables and data are required, and what the results might be. Second, it teaches you how to interpret the results and then, step-by-step, how and where to perform and evaluate the analysis in JMP . Using JMP 13 and JMP 13 Pro, this book offers the following new and enhanced features in an example-driven format: an add-in for Microsoft Excel Graph Builder dirty data visualization regression ANOVA logistic regression principal component analysis LASSO elastic net cluster analysis decision trees k-nearest neighbors neural networks bootstrap forests boosted trees text mining association rules model comparison With today’s emphasis on business intelligence, business analytics, and predictive analytics, this second edition is invaluable to anyone who needs to expand his or her knowledge of statistics and to apply real-world, problem-solving analysis. This book is part of the SAS Press program.

A Data Scientist's Guide to Acquiring, Cleaning, and Managing Data in R

2017-12-18 O'Reilly Amazon

book

Lyn R. Whitaker , Samuel E. Buttrey

data data-science Big Data

The only how-to guide offering a unified, systemic approach to acquiring, cleaning, and managing data in R Every experienced practitioner knows that preparing data for modeling is a painstaking, time-consuming process. Adding to the difficulty is that most modelers learn the steps involved in cleaning and managing data piecemeal, often on the fly, or they develop their own ad hoc methods. This book helps simplify their task by providing a unified, systematic approach to acquiring, modeling, manipulating, cleaning, and maintaining data in R. Starting with the very basics, data scientists Samuel E. Buttrey and Lyn R. Whitaker walk readers through the entire process. From what data looks like and what it should look like, they progress through all the steps involved in getting data ready for modeling. They describe best practices for acquiring data from numerous sources; explore key issues in data handling, including text/regular expressions, big data, parallel processing, merging, matching, and checking for duplicates; and outline highly efficient and reliable techniques for documenting data and recordkeeping, including audit trails, getting data back out of R, and more. The only single-source guide to R data and its preparation, it describes best practices for acquiring, manipulating, cleaning, and maintaining data Begins with the basics and walks readers through all the steps necessary to get data ready for the modeling process Provides expert guidance on how to document the processes described so that they are reproducible Written by seasoned professionals, it provides both introductory and advanced techniques Features case studies with supporting data and R code, hosted on a companion website A Data Scientist's Guide to Acquiring, Cleaning and Managing Data in R is a valuable working resource/bench manual for practitioners who collect and analyze data, lab scientists and research associates of all levels of experience, and graduate-level data mining students.

Data Mining Algorithms in C++: Data Patterns and Algorithms for Modern Applications

2017-12-15 O'Reilly Amazon

book

Timothy Masters

data data-science data-science-tasks web-scraping

Discover hidden relationships among the variables in your data, and learn how to exploit these relationships. This book presents a collection of data-mining algorithms that are effective in a wide variety of prediction and classification applications. All algorithms include an intuitive explanation of operation, essential equations, references to more rigorous theory, and commented C++ source code. Many of these techniques are recent developments, still not in widespread use. Others are standard algorithms given a fresh look. In every case, the focus is on practical applicability, with all code written in such a way that it can easily be included into any program. The Windows-based DATAMINE program lets you experiment with the techniques before incorporating them into your own work. What You'll Learn Use Monte-Carlo permutation tests to provide statistically sound assessments of relationships present in your data Discover how combinatorially symmetric cross validation reveals whether your model has true power or has just learned noise by overfitting the data Work with feature weighting as regularized energy-based learning to rank variables according to their predictive power when there is too little data for traditional methods See how the eigenstructure of a dataset enables clustering of variables into groups that exist only within meaningful subspaces of the data Plot regions of the variable space where there is disagreement between marginal and actual densities, or where contribution to mutual information is high Who This Book Is For Anyone interested in discovering and exploiting relationships among variables. Although all code examples are written in C++, the algorithms are described in sufficient detail that they can easily be programmed in any language.

Pandas for Everyone: Python Data Analysis, First Edition

2017-12-15 O'Reilly Amazon

book

Daniel Y. Chen

data data-science data-science-tools Pandas AI/ML Matplotlib

The Hands-On, Example-Rich Introduction to Pandas Data Analysis in Python Today, analysts must manage data characterized by extraordinary variety, velocity, and volume. Using the open source Pandas library, you can use Python to rapidly automate and perform virtually any data analysis task, no matter how large or complex. Pandas can help you ensure the veracity of your data, visualize it for effective decision-making, and reliably reproduce analyses across multiple datasets. brings together practical knowledge and insight for solving real problems with Pandas, even if you’re new to Python data analysis. Daniel Y. Chen introduces key concepts through simple but practical examples, incrementally building on them to solve more difficult, real-world problems. Pandas for Everyone Chen gives you a jumpstart on using Pandas with a realistic dataset and covers combining datasets, handling missing data, and structuring datasets for easier analysis and visualization. He demonstrates powerful data cleaning techniques, from basic string manipulation to applying functions simultaneously across dataframes. Once your data is ready, Chen guides you through fitting models for prediction, clustering, inference, and exploration. He provides tips on performance and scalability, and introduces you to the wider Python data analysis ecosystem. Work with DataFrames and Series, and import or export data Create plots with matplotlib, seaborn, and pandas Combine datasets and handle missing data Reshape, tidy, and clean datasets so they’re easier to work with Convert data types and manipulate text strings Apply functions to scale data manipulations Aggregate, transform, and filter large datasets with groupby Leverage Pandas’ advanced date and time capabilities Fit linear models using statsmodels and scikit-learn libraries Use generalized linear modeling to fit models with different response variables Compare multiple models to select the “best” Regularize to overcome overfitting and improve performance Use clustering in unsupervised machine learning Register your product at informit.com/register for convenient access to downloads, updates, and/or corrections as they become available.

The Power of Connection

2017-12-11 O'Reilly Amazon

book

Rik Rushton

data data-science business-intelligence microsoft-power-platform power-bi

A simple communication framework to begin practising today We all carry around the technology to stay connected 24/7, yet many of us are disengaged and challenged with our lack of communication skills. The Power of Connection provides you with practical, real-world solutions for improving your professional performance, your personal relationships and your outlook — one conversation at a time. Becoming a confident and compelling communicator might be the most important skill for leaders in the modern business landscape, parents in the modern home and individuals who use ‘self-talk' to help shape their world. By adopting the simple strategies revealed in every chapter, you can become an unshakeable success at what you set out to do. This book is designed to help you start communicating better today, so start reading and start practicing with your very next conversation! Understand your communication strengths and weaknesses Become a better listener to build a deeper connection Learn how communication sits at the heart of all relationships Develop the skills to connect, inspire, engage and empower We are surrounded by noise, yet no one is actually saying anything we can connect with — or are we just not listening? Communication is a two-way street, and involves so much more than just speaking. The Power of Connection offers a quick and easy road map for your personal journey of growth and development that will make you a better parent, friend, spouse and employee. It's the right message for this time considering there's never a wrong time to level up your skills and become more effective at work, at home and in life.

Pro Power BI Desktop

2017-12-08 O'Reilly Amazon

book

Adam Aspin

data data-science business-intelligence microsoft-power-platform power-bi BI

Deliver eye-catching Business Intelligence with Microsoft Power BI Desktop. This new edition has been updated to cover all the latest features, including combo charts, Cartesian charts, trend lines, use of gauges, and more. Also covered are Top-N features, the ability to bin data into groupings and chart the groupings, and new techniques for detecting and handling outlier data points. You can take data from virtually any source and use it to produce stunning dashboards and compelling reports that will seize your audience’s attention. Slice and dice the data with remarkable ease and then add metrics and KPIs to project the insights that create your competitive advantage. Make raw data into clear, accurate, and interactive information with Microsoft’s free self-service business intelligence tool. Pro Power BI Desktop shows you how to choose from a wide range of built-in and third-party visualization types so that your message is always enhanced. You’ll be able to deliver those results on the PC, tablets, and smartphones, as well as share results via the cloud. This book helps you save time by preparing the underlying data correctly without needing an IT department to prepare it for you. What You'll Learn Deliver attention-grabbing information, turning data into insight Mash up data from multiple sources into a cleansed and coherent data model Create dashboards that help in monitoring key performance indicators of your business Build interdependent charts, maps, and tables to deliver visually stunning information Share business intelligence in the cloud without involving IT Deliver visually stunning and interactive charts, maps, and tables Find new insights as you chop and tweak your data as never before Adapt delivery to mobile devices such as phones and tablets Who This Book Is For Everyone from CEOs and Business Intelligence developers to power users and IT managers

D3.js in Action, Second Edition

2017-12-07 O'Reilly Amazon

book

Elijah Meeks

data data-science data-science-tasks data-visualization d3 API

D3.js in Action, Second Edition is completely revised and updated for D3 v4 and ES6. It's a practical tutorial for creating interactive graphics and data-driven applications using D3. About the Technology Visualizing complex data is hard. Visualizing complex data on the web is darn near impossible without D3.js. D3 is a JavaScript library that provides a simple but powerful data visualization API over HTML, CSS, and SVG. Start with a structure, dataset, or algorithm; mix in D3; and you can programmatically generate static, animated, or interactive images that scale to any screen or browser. It's easy, and after a little practice, you'll be blown away by how beautiful your results can be! About the Book D3.js in Action, Second Edition is a completely updated revision of Manning's bestselling guide to data visualization with D3. You'll explore dozens of real-world examples in full-color, including force and network diagrams, workflow illustrations, geospatial constructions, and more! Along the way, you'll pick up best practices for building interactive graphics, animations, and live data representations. You'll also step through a fully interactive application created with D3 and React. What's Inside Rich full-color diagrams and illustrations Updated for D3 v4 and ES6 Reusable layouts and components Geospatial data visualizations Mixed-mode rendering About the Reader Suitable for web developers with HTML, CSS, and JavaScript skills. No specialized data science skills required. About the Author Elijah Meeks is a senior data visualization engineer at Netflix. Quotes From basic to complex, this book gives you the tools to create beautiful data visualizations. - Claudio Rodriguez, Cox Media Group The best reference for one of the most useful DataViz tools. - Jonathan Rioux, TD Insurance From toy examples to techniques for real projects. Shows how all the pieces fit together. - Scott McKissock, USAID A clever way to immerse yourself in the D3.js world. - Felipe Vildoso Castillo, University of Chile

Business Research Reporting

2017-12-05 O'Reilly Amazon

book

Dr. Dorinda Clippinger

data data-science business-intelligence Data Collection

Business Research Reporting addresses the essential activities of locating, collecting, evaluating, analyzing, interpreting, and reporting business data. It highlights the value of primary and secondary research to making business decisions and solving business problems. It aims to help business managers, MBA candidates, and upper-level college students boost their research skills and report research with confidence. This book discusses primary data collection, sampling concepts, and the use of measurement and scales in preparing instruments. Also, this book explores statistical and non-statistical analysis of qualitative and quantitative data and data interpretation (findings, conclusions, and recommendations). The author shows how to locate, evaluate, and extract secondary data found on the web and in brick-and-mortar libraries, including optimized searching, evaluating, and recording. Plus, the book demonstrates how to avoid copyright infringement and plagiarism, use online citation software, and cite sources when writing and presenting. Two glossaries—one each for primary and secondary research—round out the content. Business Research Reporting can be your go-to guidebook for years to come. Reading through it in a couple of hours, you can pick up ample information to apply instantly. Then keep it handy and refer to it in your ongoing research activities.

Learning Pentaho Data Integration 8 CE - Third Edition

2017-12-05 O'Reilly Amazon

book

Diethard Steiner , María Carina Roldán , Pablo Castagnaro , Miguel Gaspar , Paula Clemente , Paulo Alexandre de Oliveira Rodrigues Pires , Dan Keeley

data data-science analytics-platforms pentaho BI Data Management

"Learning Pentaho Data Integration 8 CE" is your comprehensive guide to mastering data manipulation and integration using Pentaho Data Integration (PDI) 8 Community Edition. Through step-by-step instructions and practical examples, you'll learn to explore, transform, validate, and integrate data from multiple sources, equipping you to handle real-world data challenges efficiently. What this Book will help me do Effectively install and understand the foundational concepts of Pentaho Data Integration 8 Community Edition. Efficiently organize, clean, and transform raw data from various sources into useful formats. Perform advanced data operations like metadata injection, managing relational databases, and implementing ETL solutions. Design, create, and deploy comprehensive data warehouse solutions using modern best practices. Streamline daily data processing tasks with flexibility and accuracy while handling errors gracefully. Author(s) The author, Carina Roldán, is an experienced professional in the field of data science and ETL (Extract, Transform, Load) development. Her expertise in leveraging tools like Pentaho Data Integration has allowed her to contribute significantly to BI and data management projects. Her approach in writing this book reflects her commitment to simplifying complex topics for aspiring professionals. Who is it for? This book is ideal for software developers, data analysts, business intelligence professionals, and IT students aiming to enhance their skills in ETL processes using Pentaho Data Integration. Beginners who wish to learn PDI comprehensively and professionals looking to deepen their expertise will both find value in this resource. It's also suitable for individuals involved in data warehouse design and implementation. This book will equip you with the skills to handle diverse data transformation tasks effectively.

Learning D3.js 5 Mapping - Second Edition

2017-11-30 O'Reilly Amazon

book

Lars Verspohl , Thomas Newton , Oscar Villarreal

data data-science data-science-tasks data-visualization d3 DataViz

This book, "Learning D3.js 5 Mapping", guides developers through the process of creating dynamic and interactive data visualizations. With a focus on D3.js, you'll learn to harness the power of JavaScript to create maps and graphical objects that inform and engage. What this Book will help me do Gain expertise in working with SVG geometric shapes to design compelling graphics. Learn techniques to manage, process, and use geographic data effectively. Master adding interactivity to visual maps to provide an immersive user experience. Understand how to optimize and manipulate geoJSON files using topoJSON. Learn to create varied map types, such as hexbins and globes, using D3.js and Canvas. Author(s) None Newton and Oscar Villarreal, among others, collaborated to author this guide. They are experienced in front-end development and data visualization, bringing a practical and hands-on approach to learning through this book. Their backgrounds ensure the book addresses common challenges faced during implementation, offering thoughtful solutions. Who is it for? "Learning D3.js 5 Mapping" is perfect for web developers familiar with HTML, CSS, and JavaScript who want to expand their expertise in data visualization and mapping. If you're looking to incorporate interactive charts or maps into your web applications, this book will provide practical guidance and solid fundamentals. No prior experience with D3.js is necessary.

R Data Mining

2017-11-29 O'Reilly Amazon

book

Andrea Cirillo , Enrico Pegoraro

data data-science data-science-tools r Data Science

Dive into the world of data mining with 'R Data Mining' and discover how to utilize R's vast tools for uncovering insights in data. This hands-on guide immerses you in real-world cases, teaching both foundational concepts and advanced techniques like regression models and text mining. You'll emerge with a sharp understanding of how to transform raw data into actionable information. What this Book will help me do Gain proficiency in R packages such as dplyr and ggplot2 for data manipulation and visualization. Master the CRISP-DM methodology to systematically approach data mining projects. Develop skillsets in data cleaning and validation to ensure quality data analysis. Understand and implement multiple regression and classification techniques effectively. Learn to use ensemble learning methods and produce reporting with R Markdown. Author(s) Andrea Cirillo brings extensive expertise in data science and R programming as the author of 'R Data Mining.' Their practical approach, drawing from professional experiences in various industries, makes complex techniques accessible and engaging. Their passion for teaching translates into a meticulously crafted learning journey for aspiring data miners. Who is it for? This book is ideal for beginner to intermediate-level data analysts or aspiring data scientists eager to delve into the field of data mining using R. If you're familiar with the basics of programming in R and want to expand into practical applications of data mining methodologies, this is the resource for you. Gain hands-on experience by engaging with real-world datasets and scenarios.

Introduction to MATLAB for Engineers and Scientists: Solutions for Numerical Computation and Modeling

2017-11-27 O'Reilly Amazon

book

Sandeep Nagar

data data-science data-science-tools MATLAB DataViz

Familiarize yourself with MATLAB using this concise, practical tutorial that is focused on writing code to learn concepts. Starting from the basics, this book covers array-based computing, plotting and working with files, numerical computation formalism, and the primary concepts of approximations. Introduction to MATLAB is useful for industry engineers, researchers, and students who are looking for open-source solutions for numerical computation. In this book you will learn by doing, avoiding technical jargon, which makes the concepts easy to learn. First you’ll see how to run basic calculations, absorbing technical complexities incrementally as you progress toward advanced topics. Throughout, the language is kept simple to ensure that readers at all levels can grasp the concepts. What You'll Learn Apply sample code to your engineering or science problems Work with MATLAB arrays, functions, and loops Use MATLAB’s plotting functions for data visualization Solve numerical computing and computational engineering problems with a MATLAB case study Who This Book Is For Engineers, scientists, researchers, and students who are new to MATLAB. Some prior programming experience would be helpful but not required.

Big Data Analytics with SAS

2017-11-23 O'Reilly Amazon

book

David Pope , Subhashini S Tripathi

data data-science analytics-platforms SAS Analytics Big Data

Discover how to leverage the power of SAS for big data analytics in 'Big Data Analytics with SAS.' This book helps you unlock key techniques for preparing, analyzing, and reporting on big data effectively using SAS. Whether you're exploring integration with Hadoop and Python or mastering SAS Studio, you'll advance your analytics capabilities. What this Book will help me do Set up a SAS environment for performing hands-on data analytics tasks efficiently. Master the fundamentals of SAS programming for data manipulation and analysis. Use SAS Studio and Jupyter Notebook to interface with SAS efficiently and effectively. Perform preparatory data workflows and advanced analytics, including predictive modeling and reporting. Integrate SAS with platforms like Hadoop, SAP HANA, and Cloud Foundry for scaling analytics processes. Author(s) None Pope is a seasoned data analytics expert with extensive experience in SAS and big data platforms. With a passion for demystifying complex data workflows, None teaches SAS techniques in an approachable way. Their expert insights and practical examples empower readers to confidently analyze and report on data. Who is it for? If you're a SAS professional or a data analyst looking to expand your skills in big data analysis, this book is for you. It suits readers aiming to integrate SAS into diverse tech ecosystems or seeking to learn predictive modeling and reporting with SAS. Both beginners and those familiar with SAS can benefit.

Analyzing Multidimensional Well-Being

2017-11-20 O'Reilly Amazon

book

Satya R. Chakravarty

data data-science data-science-tasks statistics

“An indispensable reference for all researchers interested in the measurement of social welfare. . .” —François Bourguignon, Emeritus Professor at Paris School of Economics, Former Chief Economist of the World Bank. “. . .a detailed, insightful, and pedagogical presentation of the theoretical grounds of multidimensional well-being, inequality, and poverty measurement. Any student, researcher, and practitioner interested in the multidimensional approach should begin their journey into such a fascinating theme with this wonderful book.” —François Maniquet, Professor, Catholic University of Louvain, Belgium. A Review of the Multidimensional Approaches to the Measurement of Welfare, Inequality, and Poverty Analyzing Multidimensional Well-Being: A Quantitative Approach offers a comprehensive approach to the measurement of well-being that includes characteristics such as income, health, literacy, and housing. The author presents a systematic comparison of the alternative approaches to the measurement of multidimensional welfare, inequality, poverty, and vulnerability. The text contains real-life applications of some multidimensional aggregations (most of which have been designed by international organizations such as the United Nations Development Program and the Organization for Economic Co-operation and Development) that help to judge the performance of a country in the various dimensions of well-being. The text offers an evaluation of how well a society is doing with respect to achievements of all the individuals in the dimensions considered and clearly investigates how achievements in the dimensions can be evaluated from different perspectives. The author includes a detailed scrutiny of alternative techniques for setting weights to individual dimensional metrics and offers an extensive analysis into both the descriptive and welfare theoretical approaches to the concerned multi-attribute measurement and related issues. This important resource: • Contains a synthesis of multidimensional welfare, inequality, poverty, and vulnerability analysis • Examines aggregations of achievement levels in the concerned dimensions of well-being from various standpoints • Shows how to measure poverty using panel data instead of restricting attention to a single period and when we have imprecise information on dimensional achievements • Argues that multidimensional analysis is intrinsically different from marginal distributions-based analysis Written for students, teachers, researchers, and scholars, Analyzing Multidimensional Well-Being: A Quantitative Approach puts the focus on various approaches to the measurementof the many aspects of well-being and quality of life. Satya R. Chakravarty is a Professor of Economics at the Indian Statistical Institute, Kolkata, India. He is an Editor of Social Choice and Welfare and a member of the Editorial Board of Journal of Economic Inequality.

Functional Data Structures in R: Advanced Statistical Programming in R

2017-11-17 O'Reilly Amazon

book

Thomas Mailund

data data-science data-science-tools r Big Data Data Science

Get an introduction to functional data structures using R and write more effective code and gain performance for your programs. This book teaches you workarounds because data in functional languages is not mutable: for example you’ll learn how to change variable-value bindings by modifying environments, which can be exploited to emulate pointers and implement traditional data structures. You’ll also see how, by abandoning traditional data structures, you can manipulate structures by building new versions rather than modifying them. You’ll discover how these so-called functional data structures are different from the traditional data structures you might know, but are worth understanding to do serious algorithmic programming in a functional language such as R. By the end of Functional Data Structures in R, you’ll understand the choices to make in order to most effectively work with data structures when you cannot modify the data itself. These techniques are especially applicable for algorithmic development important in big data, finance, and other data science applications. What You'll Learn Carry out algorithmic programming in R Use abstract data structures Work with both immutable and persistent data Emulate pointers and implement traditional data structures in R Build new versions of traditional data structures that are known Who This Book Is For Experienced or advanced programmers with at least a comfort level with R. Some experience with data structures recommended.

R Data Analysis Projects

2017-11-17 O'Reilly Amazon

book

Mark Hodnett , Gopi Subramanian

data data-science data-science-tools r Analytics

Step into the world of advanced data analysis with 'R Data Analysis Projects.' In this hands-on guide, you will learn to build efficient analytics systems and pipelines using R for practical applications in finance, social media, and more. By following real-world projects, you'll enhance your data analysis skills, from implementing recommender systems to performing time-series modeling. What this Book will help me do Develop end-to-end data analysis and visualization solutions using R. Create scalable predictive analytics systems with actionable insights. Leverage RShiny to build interactive dashboards for effective communication. Master popular R packages like dplyr, ggplot2, and recommenderlab. Tackle real-world data challenges in varied domains such as finance and social networks. Author(s) Gopi Subramanian is an experienced data scientist and educator with an extensive background in statistical modeling and analytics. With years of hands-on practice and teaching, Gopi specializes in making complex concepts accessible through practical examples. His passion for R programming and real-world applications shines in his approachable style, making learning empowering and engaging. Who is it for? This book is designed for readers with a foundational understanding of R and data analysis, aiming to advance their skills to a professional level. Ideal for data analysts, R programmers, and aspiring data scientists seeking practical experience in building analytics systems. Whether you're transitioning to or deepening your expertise in R, this guide offers actionable knowledge to enhance your projects.

Statistics for Data Science

2017-11-17 O'Reilly Amazon

book

James C. Mott , Shaikh Salamatullah , Vijayakumar Ramdoss , Rajprasath Subramanian , James D. Miller

data data-science data-science-tasks statistics AI/ML Data Science

Dive into the world of statistics specifically tailored for the needs of data science with 'Statistics for Data Science'. This book guides you from the fundamentals of statistical concepts to their practical application in data analysis, machine learning, and neural networks. Learn with clear explanations and practical R examples to fully grasp statistical methods for data-driven challenges. What this Book will help me do Understand foundational statistical concepts such as variance, standard deviation, and probability. Gain proficiency in using R for programmatically performing statistical computations for data science. Learn techniques for applying statistics in data cleaning, mining, and analysis tasks. Master methods for executing linear regression, regularization, and model assessment. Explore advanced techniques like boosting, SVMs, and neural network applications. Author(s) James D. Miller brings years of experience as a data scientist and educator. He has a deep understanding of how statistics foundationally supports data science and has worked across multiple industries applying these principles. Dedicated to teaching, James simplifies complex statistical concepts into approachable and actionable knowledge for developers aspiring to master data science applications. Who is it for? This book is intended for developers aiming to transition into the field of data science. If you have some basic programming knowledge and a desire to understand statistics essentials for data science applications, this book is designed for you. It's perfect for those who wish to apply statistical methods to practical tasks like data mining and analysis. A prior hands-on experience with R is helpful but not mandatory, as the book explains R methodologies comprehensively.

Practical Data Wrangling

2017-11-15 O'Reilly Amazon

book

Allan Visochek

data data-science data-science-tools Pandas Analytics Python

"Practical Data Wrangling" provides a comprehensive guide to cleaning and preparing data for analysis, focusing on techniques in Python and R. As you progress through the book, you'll learn how to handle various datasets, reshape their formats, and prepare them for insights, empowering you to derive more value from your data. What this Book will help me do Understand the data wrangling process and its importance in the data analysis pipeline. Learn how to retrieve, parse, and shape raw data into structured formats. Master packages and tools in Python and R to efficiently clean and manipulate data. Gain proficiency in using regular expressions for text data preparation. Acquire skills to analyze, merge, and transform datasets to meet analytics needs. Author(s) None Visochek has years of experience working with data and analytics, with expertise in using Python and R for solving real-world data challenges. Their teaching approach emphasizes practical examples and accessible explanations, ensuring complex concepts are easy to understand. Who is it for? This book is for data scientists, analysts, or statisticians who work with real-world data and want to optimize their data preparation process. It is ideal for professionals with basic knowledge of Python and R looking to enhance their skills in data wrangling and data preparation techniques. If you're seeking to streamline your data analysis workflow through better wrangling techniques, this book is for you.

The State of Data Analytics and Visualization Adoption

2017-11-15 O'Reilly Amazon

book

Matthew D. Sarrel

data data-science data-science-tasks data-visualization Analytics Big Data

Businesses regardless of industry or company size increasingly rely on data analytics and visualization to gain competitive advantage. That’s why organizations today are racing to gather, store, and analyze data from many sources in a wide range of formats. In the spring of 2017, Zoomdata commissioned an O’Reilly survey to assess the state of data analytics and visualization technology adoption across several industries, including manufacturing, financial services, and healthcare. Roughly 875 respondents answered questions online about their industry, job role, company size, and reasons for using analytics, as well as technologies they use in analytics programs, the perceived value of analytics programs, and many other topics. This report reveals: The industries furthest along in adopting big data analytics and visualization technologies The most commonly analyzed sources of big data The most commonly used technologies for analyzing streaming data Which analytics skills are in most demand The most valued characteristic of big data across all industries The types of users big data analytics and visualization projects typically target If you’re a technology decision maker, a product manager looking to embed analytics, a business user relying on analytics, or a developer pursuing the most marketable skills, this report provides valuable details on today’s data analytics trends.

Principles of Electromagnetic Waves and Materials, 2nd Edition

2017-11-14 O'Reilly Amazon

book

Dikshitulu K. Kalluri

data data-science data-science-tools MATLAB

The book takes an integrative approach to the subject of electromagnetics by supplementing quintessential "old school" information and methods with instruction in the use of new commercial software such as MATLAB.

Basic Applied Bioinformatics

2017-11-13 O'Reilly Amazon

book

Mir Asif Iquebal , Ratan Kumar Choudhary , Chandra Sekhar Mukhopadhyay

data data-science data-science-domains bioinformatics

An accessible guide that introduces students in all areas of life sciences to bioinformatics Basic Applied Bioinformatics provides a practical guidance in bioinformatics and helps students to optimize parameters for data analysis and then to draw accurate conclusions from the results. In addition to parameter optimization, the text will also familiarize students with relevant terminology. Basic Applied Bioinformatics is written as an accessible guide for graduate students studying bioinformatics, biotechnology, and other related sub-disciplines of the life sciences. This accessible text outlines the basics of bioinformatics, including pertinent information such as downloading molecular sequences (nucleotide and protein) from databases; BLAST analyses; primer designing and its quality checking, multiple sequence alignment (global and local using freely available software); phylogenetic tree construction (using UPGMA, NJ, MP, ME, FM algorithm and MEGA7 suite), prediction of protein structures and genome annotation, RNASeq data analyses and identification of differentially expressed genes and similar advanced bioinformatics analyses. The authors Chandra Sekhar Mukhopadhyay, Ratan Kumar Choudhary, and Mir Asif Iquebal are noted experts in the field and have come together to provide an updated information on bioinformatics. Salient features of this book includes: Accessible and updated information on bioinformatics tools A practical step-by-step approach to molecular-data analyses Information pertinent to study a variety of disciplines including biotechnology, zoology, bioinformatics and other related fields Worked examples, glossary terms, problems and solutions Basic Applied Bioinformatics gives students studying bioinformatics, agricultural biotechnology, animal biotechnology, medical biotechnology, microbial biotechnology, and zoology an updated introduction to the growing field of bioinformatics.

Measuring Agreement

2017-11-13 O'Reilly Amazon

book

Haikady N. Nagaraja , Pankaj K. Choudhary

data data-science data-science-tasks statistics

Presents statistical methodologies for analyzing common types of data from method comparison experiments and illustrates their applications through detailed case studies Measuring Agreement: Models, Methods, and Applications features statistical evaluation of agreement between two or more methods of measurement of a variable with a primary focus on continuous data. The authors view the analysis of method comparison data as a two-step procedure where an adequate model for the data is found, and then inferential techniques are applied for appropriate functions of parameters of the model. The presentation is accessible to a wide audience and provides the necessary technical details and references. In addition, the authors present chapter-length explorations of data from paired measurements designs, repeated measurements designs, and multiple methods; data with covariates; and heteroscedastic, longitudinal, and categorical data. The book also: • Strikes a balance between theory and applications • Presents parametric as well as nonparametric methodologies • Provides a concise introduction to Cohen’s kappa coefficient and other measures of agreement for binary and categorical data • Discusses sample size determination for trials on measuring agreement • Contains real-world case studies and exercises throughout • Provides a supplemental website containing the related datasets and R code Measuring Agreement: Models, Methods, and Applications is a resource for statisticians and biostatisticians engaged in data analysis, consultancy, and methodological research. It is a reference for clinical chemists, ecologists, and biomedical and other scientists who deal with development and validation of measurement methods. This book can also serve as a graduate-level text for students in statistics and biostatistics.

Python for R Users

2017-11-13 O'Reilly Amazon

book

Ajay Ohri

data data-science data-science-tools r AI/ML Analytics

The definitive guide for statisticians and data scientists who understand the advantages of becoming proficient in both R and Python The first book of its kind, Python for R Users: A Data Science Approach makes it easy for R programmers to code in Python and Python users to program in R. Short on theory and long on actionable analytics, it provides readers with a detailed comparative introduction and overview of both languages and features concise tutorials with command-by-command translations—complete with sample code—of R to Python and Python to R. Following an introduction to both languages, the author cuts to the chase with step-by-step coverage of the full range of pertinent programming features and functions, including data input, data inspection/data quality, data analysis, and data visualization. Statistical modeling, machine learning, and data mining—including supervised and unsupervised data mining methods—are treated in detail, as are time series forecasting, text mining, and natural language processing. • Features a quick-learning format with concise tutorials and actionable analytics • Provides command-by-command translations of R to Python and vice versa • Incorporates Python and R code throughout to make it easier for readers to compare and contrast features in both languages • Offers numerous comparative examples and applications in both programming languages • Designed for use for practitioners and students that know one language and want to learn the other • Supplies slides useful for teaching and learning either software on a companion website Python for R Users: A Data Science Approach is a valuable working resource for computer scientists and data scientists that know R and would like to learn Python or are familiar with Python and want to learn R. It also functions as textbook for students of computer science and statistics. A. Ohri is the founder of Decisionstats.com and currently works as a senior data scientist. He has advised multiple startups in analytics off-shoring, analytics services, and analytics education, as well as using social media to enhance buzz for analytics products. Mr. Ohri's research interests include spreading open source analytics, analyzing social media manipulation with mechanism design, simpler interfaces for cloud computing, investigating climate change and knowledge flows. His other books include R for Business Analytics and R for Cloud Computing.

Engineering Biostatistics

2017-11-06 O'Reilly Amazon

book

Brani Vidakovic

data data-science data-science-tasks statistics MATLAB

Provides a one-stop resource for engineers learning biostatistics using MATLAB® and WinBUGS Through its scope and depth of coverage, this book addresses the needs of the vibrant and rapidly growing bio-oriented engineering fields while implementing software packages that are familiar to engineers. The book is heavily oriented to computation and hands-on approaches so readers understand each step of the programming. Another dimension of this book is in parallel coverage of both Bayesian and frequentist approaches to statistical inference. It avoids taking sides on the classical vs. Bayesian paradigms, and many examples in this book are solved using both methods. The results are then compared and commented upon. Readers have the choice of MATLAB® for classical data analysis and WinBUGS/OpenBUGS for Bayesian data analysis. Every chapter starts with a box highlighting what is covered in that chapter and ends with exercises, a list of software scripts, datasets, and references. Engineering Biostatistics: An Introduction using MATLAB® and WinBUGS also includes: parallel coverage of classical and Bayesian approaches, where appropriate substantial coverage of Bayesian approaches to statistical inference material that has been classroom-tested in an introductory statistics course in bioengineering over several years exercises at the end of each chapter and an accompanying website with full solutions and hints to some exercises, as well as additional materials and examples Engineering Biostatistics: An Introduction using MATLAB® and WinBUGS can serve as a textbook for introductory-to-intermediate applied statistics courses, as well as a useful reference for engineers interested in biostatistical approaches.

Developing Business Acumen

2017-10-27 O'Reilly Amazon

book

Jennifer Currence

data data-science analytics-platforms microstrategy

Learn how small business HR professionals can build business acumen to become stronger strategic partners.

talk-data.com

O'Reilly Data Science Books

Top Topics

Top Speakers

Fundamentals of Predictive Analytics with JMP, Second Edition

A Data Scientist's Guide to Acquiring, Cleaning, and Managing Data in R

Data Mining Algorithms in C++: Data Patterns and Algorithms for Modern Applications

Pandas for Everyone: Python Data Analysis, First Edition

The Power of Connection

Pro Power BI Desktop

D3.js in Action, Second Edition

Business Research Reporting

Learning Pentaho Data Integration 8 CE - Third Edition

Learning D3.js 5 Mapping - Second Edition

R Data Mining

Introduction to MATLAB for Engineers and Scientists: Solutions for Numerical Computation and Modeling

Big Data Analytics with SAS

Analyzing Multidimensional Well-Being

Functional Data Structures in R: Advanced Statistical Programming in R

R Data Analysis Projects

Statistics for Data Science

Practical Data Wrangling

The State of Data Analytics and Visualization Adoption

Principles of Electromagnetic Waves and Materials, 2nd Edition

Basic Applied Bioinformatics

Measuring Agreement

Python for R Users

Engineering Biostatistics

Developing Business Acumen