data

Beginning R 4: From Beginner to Pro

2020-10-17 · O'Reilly Data Science Books O'Reilly Amazon

book

by Joshua F. Wiley , Matt Wiley

Big Data DataViz data-science data-science-tools r

Learn how to use R 4, write and save R scripts, read in and write out data files, use built-in functions, and understand common statistical methods. This in-depth tutorial includes key R 4 features including a new color palette for charts, an enhanced reference counting system (useful for big data), and new data import settings for text (as well as the statistical methods to model text-based, categorical data). Each chapter starts with a list of learning outcomes and concludes with a summary of any R functions introduced in that chapter, along with exercises to test your new knowledge. The text opens with a hands-on installation of R and CRAN packages for both Windows and macOS. The bulk of the book is an introduction to statistical methods (non-proof-based, applied statistics) that relies heavily on R (and R visualizations) to understand, motivate, and conduct statistical tests and modeling. Beginning R 4 shows the use of R in specific cases such as ANOVA analysis, multiple and moderated regression, data visualization, hypothesis testing, and more. It takes a hands-on, example-based approach incorporating best practices with clear explanations of the statistics being done. You will: Acquire and install R and RStudio Import and export data from multiple file formats Analyze data and generate graphics (including confidence intervals) Interactively conduct hypothesis testing Code multiple and moderated regression solutions Who This Book Is For Programmers and data analysts who are new to R. Some prior experience in programming is recommended.

SQL Server Data Automation Through Frameworks: Building Metadata-Driven Frameworks with T-SQL, SSIS, and Azure Data Factory

2020-10-17 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Kent Bradshaw , Andy Leonard

Azure ADF Cloud Computing DevOps ETL/ELT Microsoft SQL SSIS data-engineering microsoft-sql-server relational-databases

Learn to automate SQL Server operations using frameworks built from metadata-driven stored procedures and SQL Server Integration Services (SSIS). Bring all the power of Transact-SQL (T-SQL) and Microsoft .NET to bear on your repetitive data, data integration, and ETL processes. Do this for no added cost over what you’ve already spent on licensing SQL Server. The tools and methods from this book may be applied to on-premises and Azure SQL Server instances. The SSIS framework from this book works in Azure Data Factory (ADF) and provides DevOps personnel the ability to execute child packages outside a project—functionality not natively available in SSIS. Frameworks not only reduce the time required to deliver enterprise functionality, but can also accelerate troubleshooting and problem resolution. You'll learn in this book how frameworks also improve code quality by using metadata to drive processes. Much of the work performed by data professionals can be classified as “drudge work”—tasks that are repetitive and template-based. The frameworks-based approach shown in this book helps you to avoid that drudgery by turning repetitive tasks into "one and done" operations. Frameworks as described in this book also support enterprise DevOps with built-in logging functionality. What You Will Learn Create a stored procedure framework to automate SQL process execution Base your framework on a working system of stored procedures and execution logging Create an SSIS framework to reduce the complexity of executing multiple SSIS packages Deploy stored procedure and SSIS frameworks to Azure Data Factory environments in the cloud Who This Book Is For Database administrators and developers who are involved in enterprise data projects built around stored procedures and SQL Server Integration Services (SSIS). Readersshould have a background in programming along with a desire to optimize their data efforts by implementing repeatable processes that support enterprise DevOps.

Mastering SAS Programming for Data Warehousing

2020-10-16 · O'Reilly Data Science Books O'Reilly Amazon

book

by Monika Wahi

DataViz DWH ETL/ELT SAS analytics-platforms data-science

"Mastering SAS Programming for Data Warehousing" dives into the effective use of SAS for handling large-scale data environments like data warehouses and data lakes. You will learn to design and manage ETL processes using SAS, standardize workflows with macros and arrays, and connect SAS to other systems to enhance reporting and data visualization. What this Book will help me do Master efficient data input/output management in SAS environments. Design and maintain robust ETL pipelines using SAS macros and arrays. Identify and address data warehouse user requirements. Utilize Output Delivery System (ODS) to create professional reports. Integrate SAS with external systems for optimized data processing. Author(s) Monika Wahi brings extensive SAS programming experience coupled with a strong background in data warehousing and data analysis. Her insightful approach demystifies complex topics, focusing on equipping readers with practical skills. Her collaborative writing style makes advanced concepts accessible and applicable to real-world scenarios. Who is it for? This book is designed for data professionals such as architects, managers leading data-intensive projects, and SAS programmers or developers. It's ideal for those with foundational SAS experience who aspire to manage, maintain, or develop data lakes, marts, or warehouses effectively. The book offers a logical progression from basic concepts to advanced implementations, tailored for ambitious learners.

EU GDPR – An international guide to compliance

2020-10-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Alan Calder

GDPR/CCPA data-engineering data-security-privacy eu-general-data-protection-regulation-gdpr eu general data protection regulation (gdpr)

This pocket guide will help you understand the Regulation, the broader principles of data protection, and what the GDPR means for businesses in Europe and beyond. Please visit https://www.itgovernancepublishing.co.uk/topic/uk-gdpr-supplemental-material to download your free Brexit supplement.

EU General Data Protection Regulation (GDPR) – An implementation and compliance guide, fourth edition

2020-10-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by IT Governance Privacy Team

GDPR/CCPA data-engineering data-security-privacy eu-general-data-protection-regulation-gdpr eu general data protection regulation (gdpr)

This bestselling guide is the ideal companion for anyone carrying out a GDPR (General Data Protection Regulation) compliance project. It provides comprehensive guidance and practical advice on complying with the Regulation. Visit https://www.itgovernancepublishing.co.uk/topic/uk-gdpr-supplemental-material to download your free Brexit supplement.

Security and Privacy Issues in IoT Devices and Sensor Networks

2020-10-15 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Narayan C. Debnath , Sudhir Kumar Sharma , Bharat Bhushan

AI/ML Analytics Cloud Computing IoT Cyber Security data-engineering data-security-privacy data security & privacy

Security and Privacy Issues in IoT Devices and Sensor Networks investigates security breach issues in IoT and sensor networks, exploring various solutions. The book follows a two-fold approach, first focusing on the fundamentals and theory surrounding sensor networks and IoT security. It then explores practical solutions that can be implemented to develop security for these elements, providing case studies to enhance understanding. Machine learning techniques are covered, as well as other security paradigms, such as cloud security and cryptocurrency technologies. The book highlights how these techniques can be applied to identify attacks and vulnerabilities, preserve privacy, and enhance data security. This in-depth reference is ideal for industry professionals dealing with WSN and IoT systems who want to enhance the security of these systems. Additionally, researchers, material developers and technology specialists dealing with the multifarious aspects of data privacy and security enhancement will benefit from the book's comprehensive information. Provides insights into the latest research trends and theory in the field of sensor networks and IoT security Presents machine learning-based solutions for data security enhancement Discusses the challenges to implement various security techniques Informs on how analytics can be used in security and privacy

Artificial Intelligence in Finance

2020-10-14 · O'Reilly AI & ML Books O'Reilly Amazon

book

by Yves Hilpisch

AI/ML Data Science Python ai-ml artificial-intelligence-ai artificial intelligence (ai)

The widespread adoption of AI and machine learning is revolutionizing many industries today. Once these technologies are combined with the programmatic availability of historical and real-time financial data, the financial industry will also change fundamentally. With this practical book, you'll learn how to use AI and machine learning to discover statistical inefficiencies in financial markets and exploit them through algorithmic trading. Author Yves Hilpisch shows practitioners, students, and academics in both finance and data science practical ways to apply machine learning and deep learning algorithms to finance. Thanks to lots of self-contained Python examples, you'll be able to replicate all results and figures presented in the book. In five parts, this guide helps you: Learn central notions and algorithms from AI, including recent breakthroughs on the way to artificial general intelligence (AGI) and superintelligence (SI) Understand why data-driven finance, AI, and machine learning will have a lasting impact on financial theory and practice Apply neural networks and reinforcement learning to discover statistical inefficiencies in financial markets Identify and exploit economic inefficiencies through backtesting and algorithmic trading--the automated execution of trading strategies Understand how AI will influence the competitive dynamics in the financial industry and what the potential emergence of a financial singularity might bring about

Oracle Database Transactions and Locking Revealed: Building High Performance Through Concurrency

2020-10-14 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Darl Kuhn , Thomas Kyte

Oracle data-engineering oracle-database-solutions

Access much-needed information for building scalable, high-concurrency applications and deploying them against the Oracle Database. This new edition is updated to be current with Oracle Database 19. It includes a new chapter with troubleshooting recipes to help you quickly diagnose and resolve locking problems that are urgent and block production. Good transaction design is an important facet of highly-concurrent applications that are run by hundreds, even thousands, of users who are executing transactions at the same time. Transaction design, in turn, relies on a good understanding of how the database engine manages the locking of resources to prevent access conflicts and data loss that might otherwise result from concurrent access to data in the database. This book provides a solid and accurate explanation of how locking and concurrency are dealt with by Oracle Database. You will learn how the Oracle Database architecture accommodates user transactions, and how you can write code to mesh with the way in which Oracle Database is designed to operate. Oracle Database Transactions and Locking Revealed covers in detail the various lock types, and also different locking schemes such as pessimistic and optimistic locking. Then you will learn about transaction isolation and multi-version concurrency, and how the various lock types support Oracle Database’s transactional features. You will learn tips for transaction design, as well as some bad practices and habits to avoid. Coverage is also given to redo and undo, and their role in concurrency. The book is loaded with insightful code examples that drive home each concept. This is an important book that anyone developing highly-concurrent applications will want to have handy on their shelf. What You Will Learn Avoid application lockups due to conflicts over accessing the same resource Understand how Oracle prevents one application from overwriting another’s modifications Create transaction designs that mesh with how Oracle Database is designed Build high-throughput applications supporting thousands of concurrent users Design applications to take full advantage of Oracle’s powerful database engine Gain a fundamental knowledge of Oracle’s transaction and locking architecture Develop techniques to quickly diagnose and resolve common locking issues Who This Book Is For Oracle developers and database administrators faced with troubleshooting and solving deadlocks, locking contention, and similar problems that are encountered in high-concurrency environments; and application developers wanting to design their applications to excel at multi-user concurrency by taking full advantage of Oracle Database’s multi-versioning and concurrency support

Advanced Analytics in Power BI with R and Python: Ingesting, Transforming, Visualizing

2020-10-13 · O'Reilly Data Science Books O'Reilly Amazon

book

by Ryan Wade (Blue Granite)

AI/ML Analytics BI Data Analytics Data Science IBM Microsoft Power BI Python SQL business-intelligence data-science +2 more

This easy-to-follow guide provides R and Python recipes to help you learn and apply the top languages in the field of data analytics to your work in Microsoft Power BI. Data analytics expert and author Ryan Wade shows you how to use R and Python to perform tasks that are extremely hard, if not impossible, to do using native Power BI tools. For example, you will learn to score Power BI data using custom data science models and powerful models from Microsoft Cognitive Services. The R and Python languages are powerful complements to Power BI. They enable advanced data transformation techniques that are difficult to perform in Power BI in its default configuration but become easier by leveraging the capabilities of R and Python. If you are a business analyst, data analyst, or a data scientist who wants to push Power BI and transform it from being just a business intelligence tool into an advanced data analytics tool, then this is the book to help you do that. What You Will Learn Create advanced data visualizations via R using the ggplot2 package Ingest data using R and Python to overcome some limitations of Power Query Apply machine learning models to your data using R and Python without the need of Power BI premium capacity Incorporate advanced AI in Power BI without the need of Power BI premium capacity via Microsoft Cognitive Services, IBM Watson Natural Language Understanding, and pre-trained models in SQL Server Machine Learning Services Perform advanced string manipulations not otherwise possible in Power BI using R and Python Who This Book Is For Power users, data analysts, and data scientists who want to go beyond Power BI’s built-in functionality to create advanced visualizations, transform data in ways not otherwise supported, and automate data ingestion from sources such as SQL Server and Excel in a more concise way

Practical Data Migration, 3rd Edition

2020-10-12 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Johny Morris

data-engineering data-migration

This book is for executives and practitioners tasked with the movement of data from old systems to a new repository. It uses a series of steps guaranteed to get the reader from an empty new system to one that is working and backed by the user population. Using this proven methodology will vastly increase the chances of a successful migration.

Predictive Intelligence in Biomedical and Health Informatics

2020-10-12 · O'Reilly Data Science Books O'Reilly Amazon

book

by Ashish Khanna , Rajshree Srivastava , Nhu Gia Nguyen , Siddhartha Bhattacharyya

bioinformatics data-science data-science-domains

Predictive Intelligence in Biomedical and Health Informatics focuses on imaging, computer-aided diagnosis and therapy as well as intelligent biomedical image processing and analysis. It develops computational models, methods and tools for biomedical engineering related to computer-aided diagnostics (CAD), computer-aided surgery (CAS), computational anatomy and bioinformatics. Large volumes of complex data are often a key feature of biomedical and engineering problems and computational intelligence helps to address such problems. Practical and validated solutions to hard biomedical and engineering problems can be developed by the applications of neural networks, support vector machines, reservoir computing, evolutionary optimization, biosignal processing, pattern recognition methods and other techniques to address complex problems of the real world.

Stochastic Dynamics of Economic Cycles

2020-10-12 · O'Reilly Data Science Books O'Reilly Amazon

book

by Viacheslav Karmalita

data-science data-science-tasks statistics

This book includes discussions related to solutions of such tasks as: probabilistic description of the investment function; recovering the income function from GDP estimates; development of models for the economic cycles; selecting the time interval of pseudo-stationarity of cycles; estimating characteristics/parameters of cycle models; analysis of accuracy of model factors. All of the above constitute the general principles of a theory explaining the phenomenon of economic cycles and provide mathematical tools for their quantitative description. The introduced theory is applicable to macroeconomic analyses as well as econometric estimations of economic cycles.

Learn PostgreSQL

2020-10-09 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Luca Ferrari (Bending Spoons) , Enrico Pirozzi

RDBMS SQL data-engineering postgresql relational-databases

Dive into the world of PostgreSQL, one of the most powerful and versatile open-source relational databases! This book guides you through all the essentials of PostgreSQL version 12 and 13, from installation to high-performance database deployments. You'll learn how to design schemas, perform database operations efficiently, and implement advanced functionalities. What this Book will help me do Install, configure, and monitor a PostgreSQL server for optimal performance. Implement SQL and PL/pgSQL scripts to build complex database solutions. Analyze and optimize database schemas and indexes for efficiency. Secure a PostgreSQL database and manage roles and permissions effectively. Set up high-availability configurations through replication techniques. Author(s) None Ferrari and Enrico Pirozzi are seasoned database professionals with extensive experience in PostgreSQL. They bring practical expertise and a real-world perspective to the subject, ensuring you get hands-on knowledge and apply it effectively. Their approachable writing style simplifies even the most complex database concepts. Who is it for? This book is perfect for database professionals, developers, or tech enthusiasts looking to gain mastery over PostgreSQL. Whether you are new to PostgreSQL or have a fundamental understanding of databases, you'll find this book highly insightful in achieving your database management goals.

Data Lake Analytics on Microsoft Azure: A Practitioner's Guide to Big Data Engineering

2020-10-08 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Pankaj Khattar , Harsh Chawla

AI/ML Analytics Azure Big Data Cosmos Data Analytics Data Engineering Data Lake Data Science Databricks DWH Hadoop +7 more

Get a 360-degree view of how the journey of data analytics solutions has evolved from monolithic data stores and enterprise data warehouses to data lakes and modern data warehouses. You will This book includes comprehensive coverage of how: To architect data lake analytics solutions by choosing suitable technologies available on Microsoft Azure The advent of microservices applications covering ecommerce or modern solutions built on IoT and how real-time streaming data has completely disrupted this ecosystem These data analytics solutions have been transformed from solely understanding the trends from historical data to building predictions by infusing machine learning technologies into the solutions Data platform professionals who have been working on relational data stores, non-relational data stores, and big data technologies will find the content in this book useful. The book also can help you start your journey into the data engineer world as it provides an overview of advanced data analytics and touches on data science concepts and various artificial intelligence and machine learning technologies available on Microsoft Azure. What Will You Learn You will understand the: Concepts of data lake analytics, the modern data warehouse, and advanced data analytics Architecture patterns of the modern data warehouse and advanced data analytics solutions Phases—such as Data Ingestion, Store, Prep and Train, and Model and Serve—of data analytics solutions and technology choices available on Azure under each phase In-depth coverage of real-time and batch mode data analytics solutions architecture Various managed services available on Azure such as Synapse analytics, event hubs, Stream analytics, CosmosDB, and managed Hadoop services such as Databricks and HDInsight Who This Book Is For Data platform professionals, database architects, engineers, and solution architects

Discrete Signals and Systems with MATLAB®, 3rd Edition

2020-10-07 · O'Reilly Data Science Books O'Reilly Amazon

book

by Taan S. ElAli

MATLAB data-science data-science-tools

The subject of Discrete Signals and Systems is broad and deserves a single book devoted to it. The objective of this textbook is to present all the required material that an undergraduate student will need to master this subject matter and the use of MATLAB.

Pro Microsoft Power Platform: Solution Building for the Citizen Developer

2020-10-07 · O'Reilly Data Science Books O'Reilly Amazon

book

by Mitchell Pearson , Manuel Quintana , Brian Knight , Devin Knight

BI Microsoft Power BI business-intelligence data-science microsoft-power-platform

Become a self-sufficient citizen developer by learning the tools within the Microsoft Power Platform and how they can be used together to drive change and multiply your productivity. Learn about PowerApps for building applications, Power Automate for automating business processes across those applications, and Power BI for analyzing results and communicating business intelligence through compelling visuals. By understanding the purpose and capabilities of these tools, you will be able to enhance your organization’s visibility into key areas and make informed business decisions in a timely matter. This book is divided into four parts and begins in Part I by showing you how to build applications through PowerApps. You will learn about screens and controls, application sharing and administration, and how to make your applications accessible from mobile devices such as phones and tablets. Part II is about creating workflows using Power Automate that implement business logic across your applications. Part III brings in dashboards and data analysis, showing you how to connect to a data source, cleanse the data from that source, and drive decision making through interactive reports and storytelling. Part IV brings together all the pieces by showing the integrations that are possible when all three tools are combined into a single solution. What You Will Learn Understand the need for the citizen developer in today’s business environment Organize and plan the building of line-of-business applications with PowerApps solutions Replace wasteful paper processes with automated applications built in PowerApps Automate workflows across processes with Power Automate Communicate analytical results through visualizations and storytelling Integrate PowerApps, Power Automate, and Power BI into solutions that multiply productivity Who This Book Is For Power users and analysts with strong Excel skills who need a more comprehensive set of tools that can better help them accomplish their vision on projects, those familiar with one of the Power Platform tools who wish to learn how all three can fit together, and those who are seen as as “rogue IT” problem solvers who get things done when others have tried but failed

IBM Storage Solutions for SAS Analytics using IBM Spectrum Scale and IBM Elastic Storage System 3000 Version 1 Release 1

2020-10-06 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Sanjay Sudam

Analytics ELK IBM Linux SAS data-engineering

This IBM® Redpaper® publication is a blueprint for configuration, testing results, and tuning guidelines for running SAS workloads on Red Hat Enterprise Linux that use IBM Spectrum® Scale and IBM Elastic Storage® System (ESS) 3000. IBM lab validation was conducted with the Red Hat Linux nodes running with the SAS simulator scripts that are connected to the IBM Spectrum Scale and IBM ESS 3000. Simultaneous workloads are simulated across multiple x-86 nodes running with Red Hat Linux to determine scalability against the IBM Spectrum Scale clustered file system and ESS 3000 array. This paper outlines the architecture, configuration details, and performance tuning to maximize SAS application performance with the IBM Spectrum Scale 5.0.4.3 and IBM ESS 3000. This document is intended to facilitate the deployment and configuration of the SAS applications that use IBM Spectrum Scale and IBM Elastic Storage System (ESS) 3000. The information in this document is distributed on an "as is" basis without any warranty that is either expressed or implied. Support assistance for the use of this material is limited to situations where IBM Spectrum Scale or IBM ESS 3000 are supported and entitled and where the issues are specific to a blueprint implementation.

The A-Z of Careers and Jobs, 26th Edition

2020-10-03 · O'Reilly Data Science Books O'Reilly Amazon

book

by Kogan Page Editorial

data-science data-science-as-a-profession

Find your dream job with this handy and informative reference guide, packed with accessible advice on over 300 positions, including details on entry routes, qualifications, salary expectations and useful contacts.

AI and Machine Learning for Coders

2020-10-01 · O'Reilly AI & ML Books O'Reilly Amazon

book

by Laurence Moroney

AI/ML Cloud Computing NLP TensorFlow ai-ml machine-learning

If you're looking to make a career move from programmer to AI specialist, this is the ideal place to start. Based on Laurence Moroney's extremely successful AI courses, this introductory book provides a hands-on, code-first approach to help you build confidence while you learn key topics. You'll understand how to implement the most common scenarios in machine learning, such as computer vision, natural language processing (NLP), and sequence modeling for web, mobile, cloud, and embedded runtimes. Most books on machine learning begin with a daunting amount of advanced math. This guide is built on practical lessons that let you work directly with the code. You'll learn: How to build models with TensorFlow using skills that employers desire The basics of machine learning by working with code samples How to implement computer vision, including feature detection in images How to use NLP to tokenize and sequence words and sentences Methods for embedding models in Android and iOS How to serve models over the web and in the cloud with TensorFlow Serving

Creating Good Data: A Guide to Dataset Structure and Data Representation

2020-10-01 · O'Reilly Data Science Books O'Reilly Amazon

book

by Harry J. Foxwell

Analytics Data Analytics data-science data-science-tasks data-visualization

Create good data from the start, rather than fixing it after it is collected. By following the guidelines in this book, you will be able to conduct more effective analyses and produce timely presentations of research data. Data analysts are often presented with datasets for exploration and study that are poorly designed, leading to difficulties in interpretation and to delays in producing meaningful results. Much data analytics training focuses on how to clean and transform datasets before serious analyses can even be started. Inappropriate or confusing representations, unit of measurement choices, coding errors, missing values, outliers, etc., can be avoided by using good dataset design and by understanding how data types determine the kinds of analyses which can be performed. This book discusses the principles and best practices of dataset creation, and covers basic data types and their related appropriate statistics and visualizations. A key focus of the book is why certain data types are chosen for representing concepts and measurements, in contrast to the typical discussions of how to analyze a specific data type once it has been selected. What You Will Learn Be aware of the principles of creating and collecting data Know the basic data types and representations Select data types, anticipating analysis goals Understand dataset structures and practices for analyzing and sharing Be guided by examples and use cases (good and bad) Use cleaning tools and methods to create good data Who This Book Is For Researchers who design studies and collect data and subsequently conduct and report the results of their analyses can use the best practices in this book to produce better descriptions and interpretations of their work. In addition, data analysts who explore and explain data of other researchers will be able to create better datasets.

talk-data.com

Activity Trend

Top Events

Top Speakers

Beginning R 4: From Beginner to Pro

SQL Server Data Automation Through Frameworks: Building Metadata-Driven Frameworks with T-SQL, SSIS, and Azure Data Factory

Mastering SAS Programming for Data Warehousing

EU GDPR – An international guide to compliance

EU General Data Protection Regulation (GDPR) – An implementation and compliance guide, fourth edition

Security and Privacy Issues in IoT Devices and Sensor Networks

Artificial Intelligence in Finance

Oracle Database Transactions and Locking Revealed: Building High Performance Through Concurrency

Advanced Analytics in Power BI with R and Python: Ingesting, Transforming, Visualizing

Practical Data Migration, 3rd Edition

Predictive Intelligence in Biomedical and Health Informatics

Stochastic Dynamics of Economic Cycles

Learn PostgreSQL

Data Lake Analytics on Microsoft Azure: A Practitioner's Guide to Big Data Engineering

Discrete Signals and Systems with MATLAB®, 3rd Edition

Pro Microsoft Power Platform: Solution Building for the Citizen Developer

IBM Storage Solutions for SAS Analytics using IBM Spectrum Scale and IBM Elastic Storage System 3000 Version 1 Release 1

The A-Z of Careers and Jobs, 26th Edition

AI and Machine Learning for Coders

Creating Good Data: A Guide to Dataset Structure and Data Representation