Data Quality

CompTIA Data+ Study Guide, 2nd Edition

2025-11-04 · O'Reilly Data Science Books O'Reilly Amazon

book

by Sharif Nijim , Mike Chapple

Data Governance Data Science DataViz comptia-data comptia data+ data data-science

Prepare for the CompTIA Data+ exam, as well as a new career in data science, with this effective study guide In the newly revised second edition of CompTIA Data+ Study Guide: Exam DA0-002, veteran IT professionals Mike Chapple and Sharif Nijim provide a powerful, one-stop resource for anyone planning to pursue the CompTIA Data+ certification and go on to an exciting new career in data science. The authors walk you through the info you need to succeed on the exam and in your first day at a data science-focused job. Complete with two online practice tests, this book comprehensively covers every objective tested by the updated DA0-002 exam, including databases and data acquisition, data quality, data analysis and statistics, data visualization, and data governance. You'll also find: Efficient and comprehensive content, helping you get up-to-speed as quickly as possible Bite-size chapters that break down essential topics into manageable and accessible lessons Complimentary access to Sybex' famous online learning environment, with practice questions, a complete glossary of common industry terminology, hundreds of flashcards, and more A practical and hands-on pathway to the CompTIA Data+ certification, as well as a new career in data science, the CompTIA Data+ Study Guide, Second Edition, offers the foundational knowledge, skills, and abilities you need to get started in an exciting and rewarding new career.

Fundamentals of Analytics Engineering

2024-03-29 · O'Reilly Data Science Books O'Reilly Amazon

book

by Lasse Benninga (LaBenni Consulting) , Juan Manuel Perafan (Xebia) , Fanny Kassapian , Ricardo Angel Granados Lopez , Dumky de Wilde (MotherDuck) , Taís Laurindo Pereira , Jovan Gligorevic

Airbyte Analytics Analytics Engineering BigQuery CI/CD Data Engineering Data Modelling dbt Git business-intelligence data data-science +1 more

Master the art and science of analytics engineering with 'Fundamentals of Analytics Engineering.' This book takes you on a comprehensive journey from understanding foundational concepts to implementing end-to-end analytics solutions. You'll gain not just theoretical knowledge but practical expertise in building scalable, robust data platforms to meet organizational needs. What this Book will help me do Design and implement effective data pipelines leveraging modern tools like Airbyte, BigQuery, and dbt. Adopt best practices for data modeling and schema design to enhance system performance and develop clearer data structures. Learn advanced techniques for ensuring data quality, governance, and observability in your data solutions. Master collaborative coding practices, including version control with Git and strategies for maintaining well-documented codebases. Automate and manage data workflows efficiently using CI/CD pipelines and workflow orchestrators. Author(s) Dumky De Wilde, alongside six co-authors-experienced professionals from various facets of the analytics field-delivers a cohesive exploration of analytics engineering. The authors blend their expertise in software development, data analysis, and engineering to offer actionable advice and insights. Their approachable ethos makes complex concepts understandable, promoting educational learning. Who is it for? This book is a perfect fit for data analysts and engineers curious about transitioning into analytics engineering. Aspiring professionals as well as seasoned analytics engineers looking to deepen their understanding of modern practices will find guidance. It's tailored for individuals aiming to boost their career trajectory in data engineering roles, addressing fundamental to advanced topics.

Data Cleaning with Power BI

2024-02-29 · O'Reilly Data Science Books O'Reilly Amazon

book

by Gus Frazer

Analytics BI Data Science DAX Microsoft Power BI business-intelligence data data-science microsoft-power-platform power-bi

Delve into the powerful world of data cleaning with Microsoft Power BI in this detailed guide. You'll learn how to connect, transform, and optimize data from various sources, setting a strong foundation for insightful data-driven decisions. Equip yourself with the skills to master data quality, leverage DAX and Power Query, and produce actionable insights with improved efficiency. What this Book will help me do Master connecting to various data sources and importing data effectively into Power BI. Learn to use the Query Editor to clean and transform data efficiently. Understand how to use the M language to perform advanced data transformations. Gain expertise in creating optimized data models and handling relationships within Power BI. Explore insights-driven exploratory data analysis using Power BI's powerful tools. Author(s) None Frazer is an experienced data professional with a deep knowledge of business intelligence tools and analytics processes. With a strong background in data science and years of hands-on experience using Power BI, Frazer brings practical advice to help users improve their data preparation and analysis skills. Known for creating resources that are both comprehensive and approachable, Frazer is dedicated to empowering readers in their data journey. Who is it for? This book is ideal for data analysts, business intelligence professionals, and business analysts who work regularly with data. If you are someone with a basic understanding of BI tools and concepts looking to deepen their skills, especially in Power BI, this book will guide you effectively. It will also help data scientists and other professionals interested in data cleaning to build a robust basis for data quality and analysis. Whether you're addressing common data challenges or seeking to enhance your BI capabilities, this guide is tailored to accommodate your needs.

Fundamentals of Data Science

2023-11-17 · O'Reilly Data Science Books O'Reilly Amazon

book

by Swarup Roy , Dhruba K. Bhattacharyya , Jugal K. Kalita

AI/ML Analytics Big Data Data Analytics Data Science NLP data data-science

Fundamentals of Data Science: Theory and Practice presents basic and advanced concepts in data science along with real-life applications. The book provides students, researchers and professionals at different levels a good understanding of the concepts of data science, machine learning, data mining and analytics. Users will find the authors’ research experiences and achievements in data science applications, along with in-depth discussions on topics that are essential for data science projects, including pre-processing, that is carried out before applying predictive and descriptive data analysis tasks and proximity measures for numeric, categorical and mixed-type data. The book's authors include a systematic presentation of many predictive and descriptive learning algorithms, including recent developments that have successfully handled large datasets with high accuracy. In addition, a number of descriptive learning tasks are included. Presents the foundational concepts of data science along with advanced concepts and real-life applications for applied learning Includes coverage of a number of key topics such as data quality and pre-processing, proximity and validation, predictive data science, descriptive data science, ensemble learning, association rule mining, Big Data analytics, as well as incremental and distributed learning Provides updates on key applications of data science techniques in areas such as Computational Biology, Network Intrusion Detection, Natural Language Processing, Software Clone Detection, Financial Data Analysis, and Scientific Time Series Data Analysis Covers computer program code for implementing descriptive and predictive algorithms

Mastering Tableau 2023 - Fourth Edition

2023-08-29 · O'Reilly Data Science Books O'Reilly Amazon

book

by Marleen Meier

AI/ML Analytics BI Data Governance DataViz ETL/ELT Python Tableau data data-science data-science-tasks data-visualization

This comprehensive book on Tableau 2023 is your practical guide to mastering data visualization and business intelligence techniques. You will explore the latest features of Tableau, learn how to create insightful dashboards, and gain proficiency in integrating analytics and machine learning workflows. By the end, you'll have the skills to address a variety of analytics challenges using Tableau. What this Book will help me do Master the latest Tableau 2023 features and use cases to tackle analytics challenges. Develop and implement ETL workflows using Tableau Prep Builder for optimized data preparation. Integrate Tableau with programming languages such as Python and R to enhance analytics. Create engaging, visually impactful dashboards for effective data storytelling. Understand and apply data governance to ensure data quality and compliance. Author(s) Marleen Meier is an experienced data visualization expert and Tableau consultant with over a decade of experience helping organizations transform data into actionable insights. Her approach integrates her technical expertise and a keen eye for design to make analytics accessible rather than overwhelming. Her passion for teaching others to use visualization tools effectively shines through in her writing. Who is it for? This book is ideal for business analysts, BI professionals, or data analysts looking to enhance their Tableau expertise. It caters to both newcomers seeking to understand the foundations of Tableau and experienced users aiming to refine their skills in advanced analytics and data visualization. If your goal is to leverage Tableau as a strategic tool in your organization's BI projects, this book is for you.

CompTIA Data+: DAO-001 Certification Guide

2022-12-23 · O'Reilly Data Science Books O'Reilly Amazon

book

by Cameron Dodd

Analytics Data Analytics Data Governance comptia-data comptia data+ data data-science

The "CompTIA Data+: DAO-001 Certification Guide" is your complete resource to approaching and passing the CompTIA Data+ certification exam. This book offers clear explanations, step-by-step exercises, and practical examples designed to help you master the domain concepts essential for the DAO-001 exam. Prepare confidently and expand your career opportunities in data analytics. What this Book will help me do Understand and apply the five domains covered in the DAO-001 certification exam. Learn data preparation techniques such as collection, cleaning, and wrangling. Master descriptive statistical methods and hypothesis testing to analyze data. Create insightful visualizations and professional reports for stakeholders. Grasp the fundamentals of data governance, including data quality standards. Author(s) Cameron Dodd is an experienced data analyst and educator passionate about breaking down complex concepts. With years of teaching and hands-on analytics expertise, he has developed a student-centric approach to helping professionals achieve certification and career advancement. His structured yet relatable writing style makes learning intuitive. Who is it for? The ideal readers of this book are data professionals aiming to achieve CompTIA Data+ certification (DAO-001 exam), individuals entering the growing field of data analytics, and professionals looking to validate or expand their skills. Whether you're starting from scratch or solidifying your knowledge, this book is designed for all levels.

Data Literacy in Practice

2022-11-30 · O'Reilly Data Science Books O'Reilly Amazon

book

by Kevin Hanegan (Qlik) , Angelika Klidas

Analytics Data Analytics data data-science

"Data Literacy in Practice" teaches readers to unlock the power of data for making smarter decisions. You'll learn how to understand and work with data, gain the ability to derive actionable insights, and develop the skills required for data-informed decision-making. What this Book will help me do Understand the basics of data literacy and the importance of data in decision-making. Learn to visualize data effectively using charts and graphs tailored to your audience. Master the application of the four-pillar model for organizational data literacy advancement. Develop proficiency in managing data environments and assessing data quality. Become competent in deriving actionable insights and critical questioning for better analysis. Author(s) Angelika Klidas and Kevin Hanegan are pioneers in the field of data literacy with extensive experience in data analytics. Both are seasoned educators at top universities and bring their expertise to this book to help readers understand and leverage the power of data. Who is it for? "Data Literacy in Practice" is ideal for data analysts, professionals, and teams looking to enhance their data literacy skills. Readers should have a desire to utilize data effectively in their roles, regardless of prior experience. The book is designed to guide both beginners starting out and those who aim to deepen their knowledge.

Cleaning Data for Effective Data Science

2021-03-31 · O'Reilly Data Science Books O'Reilly Amazon

book

by David Mertz

AI/ML Data Science JSON Pandas Python SciPy SQL data data-science

Dive into the intricacies of data cleaning, a crucial aspect of any data science and machine learning pipeline, with 'Cleaning Data for Effective Data Science.' This comprehensive guide walks you through tools and methodologies like Python, R, and command-line utilities to prepare raw data for analysis. Learn practical strategies to manage, clean, and refine data encountered in the real world. What this Book will help me do Understand and utilize various data formats such as JSON, SQL, and PDF for data ingestion and processing. Master key tools like pandas, SciPy, and Tidyverse to manipulate and analyze datasets efficiently. Develop heuristics and methodologies for assessing data quality, detecting bias, and identifying irregularities. Apply advanced techniques like feature engineering and statistical adjustments to enhance data usability. Gain confidence in handling time series data by employing methods for de-trending and interpolating missing values. Author(s) David Mertz has years of experience as a Python programmer and data scientist. Known for his engaging and accessible teaching style, David has authored numerous technical articles and books. He emphasizes not only the technicalities of data science tools but also the critical thinking that approaches solutions creatively and effectively. Who is it for? 'Cleaning Data for Effective Data Science' is designed for data scientists, software developers, and educators dealing with data preparation. Whether you're an aspiring data enthusiast or an experienced professional looking to refine your skills, this book provides essential tools and frameworks. Prior programming knowledge, particularly in Python or R, coupled with an understanding of statistical fundamentals, will help you make the most of this resource.

The Data Detective's Toolkit

2020-12-15 · O'Reilly Data Science Books O'Reilly Amazon

book

by Kim Chantala

SAS data data-science data-science-tasks data-wrangling-preparation-cleaning data wrangling, preparation, cleaning

Reduce the cost and time of cleaning, managing, and preparing research data while also improving data quality! Have you ever wished there was an easy way to reduce your workload and improve the quality of your data? The Data Detective’s Toolkit: Cutting-Edge Techniques and SAS Macros to Clean, Prepare, and Manage Data will help you automate many of the labor-intensive tasks needed to turn raw data into high-quality, analysis-ready data. You will find the right tools and techniques in this book to reduce the amount of time needed to clean, edit, validate, and document your data. These tools include SAS macros as well as ingenious ways of using SAS procedures and functions. The innovative logic built into the book’s macro programs enables you to monitor the quality of your data using information from the formats and labels created for the variables in your data set. The book explains how to harmonize data sets that need to be combined and automate data cleaning tasks to detect errors in data including out-of-range values, inconsistent flow through skip paths, missing data, no variation in values for a variable, and duplicates. By the end of this book, you will be able to automatically produce codebooks, crosswalks, and data catalogs.

Learning Tableau 2020 - Fourth Edition

2020-08-31 · O'Reilly Data Science Books O'Reilly Amazon

book

by Joshua N. Milligan

Analytics BI DataViz Tableau data data-science data-science-tasks data-visualization

"Learning Tableau 2020" is a comprehensive resource designed to strengthen your understanding of Tableau. It takes you from mastering the fundamentals to achieving proficiency in advanced visualization and data handling techniques. Through this book, you will gain the ability to create impactful data visualizations and interactive dashboards, effectively leveraging the capabilities of Tableau 2020. What this Book will help me do Effectively utilize Tableau 2020 features to develop data visualizations and dashboards. Apply advanced Tableau techniques, such as LOD and table calculations, to solve complex data analysis problems. Clean and structure data using Tableau Prep, enhancing data quality and reliability. Incorporate mapping and geospatial visualization for geographic data insights. Master storytelling with data by constructing engaging and interactive dashboards. Author(s) Joshua N. Milligan, the author of "Learning Tableau 2020," is an experienced Tableau training consultant and professional. With extensive years in the data visualization and analytics field, Joshua brings a practical perspective to the book. He excels at breaking down complex topics into accessible learning paths, making advanced Tableau concepts approachable for learners of all levels. Who is it for? This book is perfect for aspiring data analysts, IT professionals, and data enthusiasts who aim to understand and create compelling business intelligence reports. Beginners in Tableau will find the learning process straightforward due to its structured and incremental lessons. Advanced users can refine their skills with the wide range of complex examples covered. A basic familiarity with working with data is beneficial, though not required.

The Data Wrangling Workshop - Second Edition

2020-07-29 · O'Reilly Data Science Books O'Reilly Amazon

book

by Shubhadeep Roychowdhury , John Wesley Doyle , Harshil Jain , Samik Sen , Akshay Khare , Dr. Tirthajyoti Sarkar , Nagendra Nagaraj , Dr. Vlad Sebastian Ionescu , Robert Thas John , Brian Lipp

Analytics Data Science Matplotlib NumPy Pandas Python RDBMS SQL data data-science data-science-tools

The Data Wrangling Workshop is your beginner's guide to the essential techniques and practices of data manipulation using Python. Throughout the book, you will progressively build your skills, learning key concepts such as extracting, cleaning, and transforming data into actionable insights. By the end, you'll be confident in handling various data wrangling tasks efficiently. What this Book will help me do Understand and apply the fundamentals of data wrangling using Python. Combine and aggregate data from diverse sources like web data, SQL databases, and spreadsheets. Use descriptive statistics and plotting to examine dataset properties. Handle missing or incorrect data effectively to maintain data quality. Gain hands-on experience with Python's powerful data science libraries like Pandas, NumPy, and Matplotlib. Author(s) Brian Lipp, None Roychowdhury, and Dr. Tirthajyoti Sarkar are experienced educators and professionals in the fields of data science and engineering. Their collective expertise spans years of teaching and working with data technologies. They aim to make data wrangling accessible and comprehensible, focusing on practical examples to equip learners with real-world skills. Who is it for? The Data Wrangling Workshop is ideal for developers, data analysts, and business analysts aiming to become data scientists or analytics experts. If you're just getting started with Python, you will find this book guiding you step-by-step. A basic understanding of Python programming, as well as relational databases and SQL, is recommended for smooth learning.

Prepare Your Data for Tableau: A Practical Guide to the Tableau Data Prep Tool

2019-12-16 · O'Reilly Data Science Books O'Reilly Amazon

book

by Lori Blackshear , Tim Costello

Analytics Cloud Computing ETL/ELT Tableau data data-science data-science-tasks data-visualization

Focus on the most important and most often overlooked factor in a successful Tableau project—data. Without a reliable data source, you will not achieve the results you hope for in Tableau. This book does more than teach the mechanics of data preparation. It teaches you: how to look at data in a new way, to recognize the most common issues that hinder analytics, and how to mitigate those factors one by one. Tableau can change the course of business, but the old adage of "garbage in, garbage out" is the hard truth that hides behind every Tableau sales pitch. That amazing sales demo does not work as well with bad data. The unfortunate reality is that almost all data starts out in a less-than-perfect state. Data prep is hard. Traditionally, we were forced into the world of the database where complex ETL (Extract, Transform, Load) operations created by the data team did all the heavy lifting for us. Fortunately, we have moved past those days. With the introduction of the Tableau Data Prep tool you can now handle most of the common Data Prep and cleanup tasks on your own, at your desk, and without the help of the data team. This essential book will guide you through: The layout and important parts of the Tableau Data Prep tool Connecting to data Data quality and consistency The shape of the data. Is the data oriented in columns or rows? How to decide? Why does it matter? What is the level of detail in the source data? Why is that important? Combining source data to bring in more fields and rows Saving the data flow and the results of our data prep work Common cleanup and setup tasks in Tableau Desktop What You Will Learn Recognize data sources that are good candidates for analytics in Tableau Connect tolocal, server, and cloud-based data sources Profile data to better understand its content and structure Rename fields, adjust data types, group data points, and aggregate numeric data Pivot data Join data from local, server, and cloud-based sources for unified analytics Review the steps and results of each phase of the Data Prep process Output new data sources that can be reviewed in Tableau or any other analytics tool Who This Book Is For Tableau Desktop users who want to: connect to data, profile the data to identify common issues, clean up those issues, join to additional data sources, and save the newly cleaned, joined data so that it can be used more effectively in Tableau

Practical DataOps: Delivering Agile Data Science at Scale

2019-12-09 · O'Reilly Data Science Books O'Reilly Amazon

book

by Harvinder Atwal

Agile/Scrum AI/ML Analytics Data Management Data Science DataOps DevOps data data-science

Gain a practical introduction to DataOps, a new discipline for delivering data science at scale inspired by practices at companies such as Facebook, Uber, LinkedIn, Twitter, and eBay. Organizations need more than the latest AI algorithms, hottest tools, and best people to turn data into insight-driven action and useful analytical data products. Processes and thinking employed to manage and use data in the 20th century are a bottleneck for working effectively with the variety of data and advanced analytical use cases that organizations have today. This book provides the approach and methods to ensure continuous rapid use of data to create analytical data products and steer decision making. Practical DataOps shows you how to optimize the data supply chain from diverse raw data sources to the final data product, whether the goal is a machine learning model or other data-orientated output. The book provides an approach to eliminate wasted effort and improve collaboration between data producers, data consumers, and the rest of the organization through the adoption of lean thinking and agile software development principles. This book helps you to improve the speed and accuracy of analytical application development through data management and DevOps practices that securely expand data access, and rapidly increase the number of reproducible data products through automation, testing, and integration. The book also shows how to collect feedback and monitor performance to manage and continuously improve your processes and output. What You Will Learn Develop a data strategy for your organization to help it reach its long-term goals Recognize and eliminate barriers to delivering data to users at scale Work on the right things for the right stakeholders through agile collaboration Create trust in data via rigorous testing and effective data management Build a culture of learning and continuous improvement through monitoring deployments and measuring outcomes Create cross-functional self-organizing teams focused on goals not reporting lines Build robust, trustworthy, data pipelines in support of AI, machine learning, and other analytical data products Who This Book Is For Data science and advanced analytics experts, CIOs, CDOs (chief data officers), chief analytics officers, business analysts, business team leaders, and IT professionals (data engineers, developers, architects, and DBAs) supporting data teams who want to dramatically increase the value their organization derives from data. The book is ideal for data professionals who want to overcome challenges of long delivery time, poor data quality, high maintenance costs, and scaling difficulties in getting data science output and machine learning into customer-facing production.

Applied Health Analytics and Informatics Using SAS

2018-11-08 · O'Reilly Data Science Books O'Reilly Amazon

book

by Joseph M. Woodside

Analytics Data Analytics SAS data data-science healthcare-analytics

Leverage health data into insight! Applied Health Analytics and Informatics Using SAS describes health anamatics, a result of the intersection of data analytics and health informatics. Healthcare systems generate nearly a third of the world’s data, and analytics can help to eliminate medical errors, reduce readmissions, provide evidence-based care, demonstrate quality outcomes, and add cost-efficient care. This comprehensive textbook includes data analytics and health informatics concepts, along with applied experiential learning exercises and case studies using SAS Enterprise MinerTM within the healthcare industry setting. Topics covered include: Sampling and modeling health data – both structured and unstructured Exploring health data quality Developing health administration and health data assessment procedures Identifying future health trends Analyzing high-performance health data mining models Applied Health Analytics and Informatics Using SAS is intended for professionals, lifelong learners, senior-level undergraduates, graduate-level students in professional development courses, health informatics courses, health analytics courses, and specialized industry track courses. This textbook is accessible to a wide variety of backgrounds and specialty areas, including administrators, clinicians, and executives. This book is part of the SAS Press program.

Python for R Users

2017-11-13 · O'Reilly Data Science Books O'Reilly Amazon

book

by Ajay Ohri

AI/ML Analytics Cloud Computing Computer Science Data Science DataViz NLP Python data data-science data-science-tools r

The definitive guide for statisticians and data scientists who understand the advantages of becoming proficient in both R and Python The first book of its kind, Python for R Users: A Data Science Approach makes it easy for R programmers to code in Python and Python users to program in R. Short on theory and long on actionable analytics, it provides readers with a detailed comparative introduction and overview of both languages and features concise tutorials with command-by-command translations—complete with sample code—of R to Python and Python to R. Following an introduction to both languages, the author cuts to the chase with step-by-step coverage of the full range of pertinent programming features and functions, including data input, data inspection/data quality, data analysis, and data visualization. Statistical modeling, machine learning, and data mining—including supervised and unsupervised data mining methods—are treated in detail, as are time series forecasting, text mining, and natural language processing. • Features a quick-learning format with concise tutorials and actionable analytics • Provides command-by-command translations of R to Python and vice versa • Incorporates Python and R code throughout to make it easier for readers to compare and contrast features in both languages • Offers numerous comparative examples and applications in both programming languages • Designed for use for practitioners and students that know one language and want to learn the other • Supplies slides useful for teaching and learning either software on a companion website Python for R Users: A Data Science Approach is a valuable working resource for computer scientists and data scientists that know R and would like to learn Python or are familiar with Python and want to learn R. It also functions as textbook for students of computer science and statistics. A. Ohri is the founder of Decisionstats.com and currently works as a senior data scientist. He has advised multiple startups in analytics off-shoring, analytics services, and analytics education, as well as using social media to enhance buzz for analytics products. Mr. Ohri's research interests include spreading open source analytics, analyzing social media manipulation with mechanism design, simpler interfaces for cloud computing, investigating climate change and knowledge flows. His other books include R for Business Analytics and R for Cloud Computing.

Total Survey Error in Practice

2017-02-21 · O'Reilly Data Science Books O'Reilly Amazon

book

by Frauke Kreuter (LMU Munich) , Edith de Leeuw , N. Clyde Tucker , Paul P. Biemer , Brady T. West , Stephanie Eckman , Brad Edwards , Lars E. Lyberg

Data Collection data data-science data-science-tasks statistics survey-methodologies

Featuring a timely presentation of total survey error (TSE), this edited volume introduces valuable tools for understanding and improving survey data quality in the context of evolving large-scale data sets This book provides an overview of the TSE framework and current TSE research as related to survey design, data collection, estimation, and analysis. It recognizes that survey data affects many public policy and business decisions and thus focuses on the framework for understanding and improving survey data quality. The book also addresses issues with data quality in official statistics and in social, opinion, and market research as these fields continue to evolve, leading to larger and messier data sets. This perspective challenges survey organizations to find ways to collect and process data more efficiently without sacrificing quality. The volume consists of the most up-to-date research and reporting from over 70 contributors representing the best academics and researchers from a range of fields. The chapters are broken out into five main sections: The Concept of TSE and the TSE Paradigm, Implications for Survey Design, Data Collection and Data Processing Applications, Evaluation and Improvement, and Estimation and Analysis. Each chapter introduces and examines multiple error sources, such as sampling error, measurement error, and nonresponse error, which often offer the greatest risks to data quality, while also encouraging readers not to lose sight of the less commonly studied error sources, such as coverage error, processing error, and specification error. The book also notes the relationships between errors and the ways in which efforts to reduce one type can increase another, resulting in an estimate with larger total error. This book: • Features various error sources, and the complex relationships between them, in 25 high-quality chapters on the most up-to-date research in the field of TSE • Provides comprehensive reviews of the literature on error sources as well as data collection approaches and estimation methods to reduce their effects • Presents examples of recent international events that demonstrate the effects of data error, the importance of survey data quality, and the real-world issues that arise from these errors • Spans the four pillars of the total survey error paradigm (design, data collection, evaluation and analysis) to address key data quality issues in official statistics and survey research Total Survey Error in Practice is a reference for survey researchers and data scientists in research areas that include social science, public opinion, public policy, and business. It can also be used as a textbook or supplementary material for a graduate-level course in survey research methods. Paul P. Biemer, PhD, is distinguished fellow at RTI International and associate director of Survey Research and Development at the Odum Institute, University of North Carolina, USA. Edith de Leeuw, PhD, is professor of survey methodology in the Department of Methodology and Statistics at Utrecht University, the Netherlands. Stephanie Eckman, PhD, is fellow at RTI International, USA. Brad Edwards is vice president, director of Field Services, and deputy area director at Westat, USA. Frauke Kreuter, PhD, is professor and director of the Joint Program in Survey Methodology, University of Maryland, USA; professor of statistics and methodology at the University of Mannheim, Germany; and head of the Statistical Methods Research Department at the Institute for Employment Research, Germany. Lars E. Lyberg, PhD, is senior advisor at Inizio, Sweden. N. Clyde Tucker, PhD, is principal survey methodologist at the American Institutes for Research, USA. Brady T. West, PhD, is research associate professor in the Survey Resea

SAS Data Analytic Development

2016-09-06 · O'Reilly Data Science Books O'Reilly Amazon

book

by Troy Martin Hughes

SAS Cyber Security analytics-platforms data data-science

Design quality SAS software and evaluate SAS software quality SAS Data Analytic Development is the developer’s compendium for writing better-performing software and the manager’s guide to building comprehensive software performance requirements. The text introduces and parallels the International Organization for Standardization (ISO) software product quality model, demonstrating 15 performance requirements that represent dimensions of software quality, including: reliability, recoverability, robustness, execution efficiency (i.e., speed), efficiency, scalability, portability, security, automation, maintainability, modularity, readability, testability, stability, and reusability. The text is intended to be read cover-to-cover or used as a reference tool to instruct, inspire, deliver, and evaluate software quality. A common fault in many software development environments is a focus on functional requirements—the what and how—to the detriment of performance requirements, which specify instead how well software should function (assessed through software execution) or how easily software should be maintained (assessed through code inspection). Without the definition and communication of performance requirements, developers risk either building software that lacks intended quality or wasting time delivering software that exceeds performance objectives—thus, either underperforming or gold-plating, both of which are undesirable. Managers, customers, and other decision makers should also understand the dimensions of software quality both to define performance requirements at project outset as well as to evaluate whether those objectives were met at software completion. As data analytic software, SAS transforms data into information and ultimately knowledge and data-driven decisions. Not surprisingly, data quality is a central focus and theme of SAS literature; however, code quality is far less commonly described and too often references only the speed or efficiency with which software should execute, omitting other critical dimensions of software quality. SAS® software project definitions and technical requirements often fall victim to this paradox, in which rigorous quality requirements exist for data and data products yet not for the software that undergirds them. By demonstrating the cost and benefits of software quality inclusion and the risk of software quality exclusion, stakeholders learn to value, prioritize, implement, and evaluate dimensions of software quality within risk management and project management frameworks of the software development life cycle (SDLC). Thus, SAS Data Analytic Development recalibrates business value, placing code quality on par with data quality, and performance requirements on par with functional requirements.

Designing and Conducting Survey Research: A Comprehensive Guide, 4th Edition

2014-09-09 · O'Reilly Data Science Books O'Reilly Amazon

book

by Richard A. Parker , Louis M. Rea

Data Collection SPSS data data-science data-science-tasks statistics survey-methodologies

The industry standard guide, updated with new ideas and SPSS analysis techniques Designing and Conducting Survey Research: A Comprehensive Guide Fourth Edition is the industry standard resource that covers all major components of the survey process, updated to include new data analysis techniques and SPSS procedures with sample data sets online. The book offers practical, actionable guidance on constructing the instrument, administrating the process, and analyzing and reporting the results, providing extensive examples and worksheets that demonstrate the appropriate use of survey and data techniques. By clarifying complex statistical concepts and modern analysis methods, this guide enables readers to conduct a survey research project from initial focus concept to the final report. Public and nonprofit managers with survey research responsibilities need to stay up-to-date on the latest methods, techniques, and best practices for optimal data collection, analysis, and reporting. Designing and Conducting Survey Research is a complete resource, answering the "what", "why", and "how" every step of the way, and providing the latest information about technological advancements in data analysis. The updated fourth edition contains step-by-step SPSS data entry and analysis procedures, as well as SPSS examples throughout the text, using real data sets from real-world studies. Other new information includes topics like: Nonresponse error/bias Ethical concerns and special populations Cell phone samples in telephone surveys Subsample screening and complex skip patterns The fourth edition also contains new information on the growing importance of focus groups, and places a special emphasis on data quality including size and variability. Those who employ survey research methods will find that Designing and Conducting Survey Research contains all the information needed to better design, conduct, and analyze a more effective survey.

Risk-Based Monitoring and Fraud Detection in Clinical Trials Using JMP and SAS

2014-07-01 · O'Reilly Data Science Books O'Reilly Amazon

book

by Richard C. Zink

SAS analytics-platforms data data-science jmp

Improve efficiency while reducing costs in clinical trials with centralized monitoring techniques using JMP and SAS.

International guidelines recommend that clinical trial data should be actively reviewed or monitored; the well-being of trial participants and the validity and integrity of the final analysis results are at stake. Traditional interpretation of this guidance for pharmaceutical trials has led to extensive on-site monitoring, including 100% source data verification. On-site review is time consuming, expensive (estimated at up to a third of the cost of a clinical trial), prone to error, and limited in its ability to provide insight for data trends across time, patients, and clinical sites. In contrast, risk-based monitoring (RBM) makes use of central computerized review of clinical trial data and site metrics to determine if and when clinical sites should receive more extensive quality review or intervention.

Risk-Based Monitoring and Fraud Detection in Clinical Trials Using JMP and SAS presents a practical implementation of methodologies within JMP Clinical for the centralized monitoring of clinical trials. Focused on intermediate users, this book describes analyses for RBM that incorporate and extend the recommendations of TransCelerate Biopharm Inc., methods to detect potential patient-or investigator misconduct, snapshot comparisons to more easily identify new or modified data, and other novel visual and analytical techniques to enhance safety and quality reviews. Further discussion highlights recent regulatory guidance documents on risk-based approaches, addresses the requirements for CDISC data, and describes methods to supplement analyses with data captured external to the study database.

Given the interactive, dynamic, and graphical nature of JMP Clinical, any individual from the clinical trial team - including clinicians, statisticians, data managers, programmers, regulatory associates, and monitors - can make use of this book and the numerous examples contained within to streamline, accelerate, and enrich their reviews of clinical trial data.

The analytical methods described in Risk-Based Monitoring and Fraud Detection in Clinical Trials Using JMP and SAS enable the clinical trial team to take a proactive approach to data quality and safety to streamline clinical development activities and address shortcomings while the study is ongoing.

This book is part of the SAS Press

Using OpenRefine

2013-09-10 · O'Reilly Data Science Books O'Reilly Amazon

book

by Ruben Verborgh , Max De Wilde

Data Management data data-science data-science-tasks data-wrangling-preparation-cleaning data wrangling, preparation, cleaning

Using OpenRefine provides a comprehensive guide to managing and cleaning large datasets efficiently. By following a practical, recipe-based approach, this book ensures readers can quickly master OpenRefine's features to enhance their data handling skills. Whether dealing with transformations, entity recognition, or dataset linking, you'll gain the tools to make your data work for you. What this Book will help me do Import and structure various formats of data for seamless processing. Apply both basic and advanced transformations to optimize data quality. Utilize regular expressions for sophisticated filtering and partitioning. Perform named-entity extraction and advanced reconciliation tasks. Master the General Refine Expression Language for powerful data operations. Author(s) The author is an experienced data analyst and educator, specializing in data preparation and transformation for real-world applications. Their approach combines a thorough technical understanding with an accessible teaching style, ensuring that complex concepts are easy to grasp. Who is it for? This book is crafted for anyone working with large datasets, from novices learning to handle and clean data to experienced practitioners seeking advanced techniques. If you aim to improve your data management skills or deliver quality insights from messy data, this book is for you.

talk-data.com

Activity Trend

Top Events

Top Speakers

CompTIA Data+ Study Guide, 2nd Edition

Fundamentals of Analytics Engineering

Data Cleaning with Power BI

Fundamentals of Data Science

Mastering Tableau 2023 - Fourth Edition

CompTIA Data+: DAO-001 Certification Guide

Data Literacy in Practice

Cleaning Data for Effective Data Science

The Data Detective's Toolkit

Learning Tableau 2020 - Fourth Edition

The Data Wrangling Workshop - Second Edition

Prepare Your Data for Tableau: A Practical Guide to the Tableau Data Prep Tool

Practical DataOps: Delivering Agile Data Science at Scale

Applied Health Analytics and Informatics Using SAS

Python for R Users

Total Survey Error in Practice

SAS Data Analytic Development

Designing and Conducting Survey Research: A Comprehensive Guide, 4th Edition

Risk-Based Monitoring and Fraud Detection in Clinical Trials Using JMP and SAS

Using OpenRefine