talk-data.com talk-data.com

Topic

CSV

Comma-Separated Values (CSV)

tabular_data text_based human_readable

17

tagged

Activity Trend

8 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: O'Reilly Data Science Books ×
DuckDB: Up and Running

DuckDB, an open source in-process database created for OLAP workloads, provides key advantages over more mainstream OLAP solutions: It's embeddable and optimized for analytics. It also integrates well with Python and is compatible with SQL, giving you the performance and flexibility of SQL right within your Python environment. This handy guide shows you how to get started with this versatile and powerful tool. Author Wei-Meng Lee takes developers and data professionals through DuckDB's primary features and functions, best practices, and practical examples of how you can use DuckDB for a variety of data analytics tasks. You'll also dive into specific topics, including how to import data into DuckDB, work with tables, perform exploratory data analysis, visualize data, perform spatial analysis, and use DuckDB with JSON files, Polars, and JupySQL. Understand the purpose of DuckDB and its main functions Conduct data analytics tasks using DuckDB Integrate DuckDB with pandas, Polars, and JupySQL Use DuckDB to query your data Perform spatial analytics using DuckDB's spatial extension Work with a diverse range of data including Parquet, CSV, and JSON

DuckDB in Action

Dive into DuckDB and start processing gigabytes of data with ease—all with no data warehouse. DuckDB is a cutting-edge SQL database that makes it incredibly easy to analyze big data sets right from your laptop. In DuckDB in Action you’ll learn everything you need to know to get the most out of this awesome tool, keep your data secure on prem, and save you hundreds on your cloud bill. From data ingestion to advanced data pipelines, you’ll learn everything you need to get the most out of DuckDB—all through hands-on examples. Open up DuckDB in Action and learn how to: Read and process data from CSV, JSON and Parquet sources both locally and remote Write analytical SQL queries, including aggregations, common table expressions, window functions, special types of joins, and pivot tables Use DuckDB from Python, both with SQL and its "Relational"-API, interacting with databases but also data frames Prepare, ingest and query large datasets Build cloud data pipelines Extend DuckDB with custom functionality Pragmatic and comprehensive, DuckDB in Action introduces the DuckDB database and shows you how to use it to solve common data workflow problems. You won’t need to read through pages of documentation—you’ll learn as you work. Get to grips with DuckDB's unique SQL dialect, learning to seamlessly load, prepare, and analyze data using SQL queries. Extend DuckDB with both Python and built-in tools such as MotherDuck, and gain practical insights into building robust and automated data pipelines. About the Technology DuckDB makes data analytics fast and fun! You don’t need to set up a Spark or run a cloud data warehouse just to process a few hundred gigabytes of data. DuckDB is easily embeddable in any data analytics application, runs on a laptop, and processes data from almost any source, including JSON, CSV, Parquet, SQLite and Postgres. About the Book DuckDB in Action guides you example-by-example from setup, through your first SQL query, to advanced topics like building data pipelines and embedding DuckDB as a local data store for a Streamlit web app. You’ll explore DuckDB’s handy SQL extensions, get to grips with aggregation, analysis, and data without persistence, and use Python to customize DuckDB. A hands-on project accompanies each new topic, so you can see DuckDB in action. What's Inside Prepare, ingest and query large datasets Build cloud data pipelines Extend DuckDB with custom functionality Fast-paced SQL recap: From simple queries to advanced analytics About the Reader For data pros comfortable with Python and CLI tools. About the Authors Mark Needham is a blogger and video creator at @‌LearnDataWithMark. Michael Hunger leads product innovation for the Neo4j graph database. Michael Simons is a Java Champion, author, and Engineer at Neo4j. Quotes I use DuckDB every day, and I still learned a lot about how DuckDB makes things that are hard in most databases easy! - Jordan Tigani, Founder, MotherDuck An excellent resource! Unlocks possibilities for storing, processing, analyzing, and summarizing data at the edge using DuckDB. - Pramod Sadalage, Director, Thoughtworks Clear and accessible. A comprehensive resource for harnessing the power of DuckDB for both novices and experienced professionals. - Qiusheng Wu, Associate Professor, University of Tennessee Excellent! The book all we ducklings have been waiting for! - Gunnar Morling, Decodable

Pandas Workout

Practice makes perfect pandas! Work out your pandas skills against dozens of real-world challenges, each carefully designed to build an intuitive knowledge of essential pandas tasks. In Pandas Workout you’ll learn how to: Clean your data for accurate analysis Work with rows and columns for retrieving and assigning data Handle indexes, including hierarchical indexes Read and write data with a number of common formats, such as CSV and JSON Process and manipulate textual data from within pandas Work with dates and times in pandas Perform aggregate calculations on selected subsets of data Produce attractive and useful visualizations that make your data come alive Pandas Workout hones your pandas skills to a professional-level through two hundred exercises, each designed to strengthen your pandas skills. You’ll test your abilities against common pandas challenges such as importing and exporting, data cleaning, visualization, and performance optimization. Each exercise utilizes a real-world scenario based on real-world data, from tracking the parking tickets in New York City, to working out which country makes the best wines. You’ll soon find your pandas skills becoming second nature—no more trips to StackOverflow for what is now a natural part of your skillset. About the Technology Python’s pandas library can massively reduce the time you spend analyzing, cleaning, exploring, and manipulating data. And the only path to pandas mastery is practice, practice, and, you guessed it, more practice. In this book, Python guru Reuven Lerner is your personal trainer and guide through over 200 exercises guaranteed to boost your pandas skills. About the Book Pandas Workout is a thoughtful collection of practice problems, challenges, and mini-projects designed to build your data analysis skills using Python and pandas. The workouts use realistic data from many sources: the New York taxi fleet, Olympic athletes, SAT scores, oil prices, and more. Each can be completed in ten minutes or less. You’ll explore pandas’ rich functionality for string and date/time handling, complex indexing, and visualization, along with practical tips for every stage of a data analysis project. What's Inside Clean data with less manual labor Retrieving and assigning data Process and manipulate text Calculations on selected data subsets About the Reader For Python programmers and data analysts. About the Author Reuven M. Lerner teaches Python and data science around the world and publishes the “Bamboo Weekly” newsletter. He is the author of Manning’s Python Workout (2020). Quotes A carefully crafted tour through the pandas library, jam-packed with wisdom that will help you become a better pandas user and a better data scientist. - Kevin Markham, Founder of Data School, Creator of pandas in 30 days Will help you apply pandas to real problems and push you to the next level. - Michael Driscoll, RFA Engineering, creator of Teach Me Python The explanations, paired with Reuven’s storytelling and personal tone, make the concepts simple. I’ll never get them wrong again! - Rodrigo Girão Serrão, Python developer and educator The definitive source! - Kiran Anantha, Amazon

Data Science Fundamentals with R, Python, and Open Data

Data Science Fundamentals with R, Python, and Open Data Introduction to essential concepts and techniques of the fundamentals of R and Python needed to start data science projects Organized with a strong focus on open data, Data Science Fundamentals with R, Python, and Open Data discusses concepts, techniques, tools, and first steps to carry out data science projects, with a focus on Python and RStudio, reflecting a clear industry trend emerging towards the integration of the two. The text examines intricacies and inconsistencies often found in real data, explaining how to recognize them and guiding readers through possible solutions, and enables readers to handle real data confidently and apply transformations to reorganize, indexing, aggregate, and elaborate. This book is full of reader interactivity, with a companion website hosting supplementary material including datasets used in the examples and complete running code (R scripts and Jupyter notebooks) of all examples. Exam-style questions are implemented and multiple choice questions to support the readers’ active learning. Each chapter presents one or more case studies. Written by a highly qualified academic, Data Science Fundamentals with R, Python, and Open Data discuss sample topics such as: Data organization and operations on data frames, covering reading CSV dataset and common errors, and slicing, creating, and deleting columns in R Logical conditions and row selection, covering selection of rows with logical condition and operations on dates, strings, and missing values Pivoting operations and wide form-long form transformations, indexing by groups with multiple variables, and indexing by group and aggregations Conditional statements and iterations, multicolumn functions and operations, data frame joins, and handling data in list/dictionary format Data Science Fundamentals with R, Python, and Open Data is a highly accessible learning resource for students from heterogeneous disciplines where Data Science and quantitative, computational methods are gaining popularity, along with hard sciences not closely related to computer science, and medical fields using stochastic and quantitative models.

Graph Algorithms for Data Science

Practical methods for analyzing your data with graphs, revealing hidden connections and new insights. Graphs are the natural way to represent and understand connected data. This book explores the most important algorithms and techniques for graphs in data science, with concrete advice on implementation and deployment. You don’t need any graph experience to start benefiting from this insightful guide. These powerful graph algorithms are explained in clear, jargon-free text and illustrations that makes them easy to apply to your own projects. In Graph Algorithms for Data Science you will learn: Labeled-property graph modeling Constructing a graph from structured data such as CSV or SQL NLP techniques to construct a graph from unstructured data Cypher query language syntax to manipulate data and extract insights Social network analysis algorithms like PageRank and community detection How to translate graph structure to a ML model input with node embedding models Using graph features in node classification and link prediction workflows Graph Algorithms for Data Science is a hands-on guide to working with graph-based data in applications like machine learning, fraud detection, and business data analysis. It’s filled with fascinating and fun projects, demonstrating the ins-and-outs of graphs. You’ll gain practical skills by analyzing Twitter, building graphs with NLP techniques, and much more. About the Technology A graph, put simply, is a network of connected data. Graphs are an efficient way to identify and explore the significant relationships naturally occurring within a dataset. This book presents the most important algorithms for graph data science with examples from machine learning, business applications, natural language processing, and more. About the Book Graph Algorithms for Data Science shows you how to construct and analyze graphs from structured and unstructured data. In it, you’ll learn to apply graph algorithms like PageRank, community detection/clustering, and knowledge graph models by putting each new algorithm to work in a hands-on data project. This cutting-edge book also demonstrates how you can create graphs that optimize input for AI models using node embedding. What's Inside Creating knowledge graphs Node classification and link prediction workflows NLP techniques for graph construction About the Reader For data scientists who know machine learning basics. Examples use the Cypher query language, which is explained in the book. About the Author Tomaž Bratanič works at the intersection of graphs and machine learning. Arturo Geigel was the technical editor for this book. Quotes Undoubtedly the quickest route to grasping the practical applications of graph algorithms. Enjoyable and informative, with real-world business context and practical problem-solving. - Roger Yu, Feedzai Brilliantly eases you into graph-based applications. - Sumit Pal, Independent Consultant I highly recommend this book to anyone involved in analyzing large network databases. - Ivan Herreros, talentsconnect Insightful and comprehensive. The author’s expertise is evident. Be prepared for a rewarding journey. - Michal Štefaňák, Volke

Data Science at the Command Line, 2nd Edition

This thoroughly revised guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You'll learn how to combine small yet powerful command-line tools to quickly obtain, scrub, explore, and model your data. To get you started, author Jeroen Janssens provides a Docker image packed with over 100 Unix power tools--useful whether you work with Windows, macOS, or Linux. You'll quickly discover why the command line is an agile, scalable, and extensible technology. Even if you're comfortable processing data with Python or R, you'll learn how to greatly improve your data science workflow by leveraging the command line's power. This book is ideal for data scientists, analysts, engineers, system administrators, and researchers. Obtain data from websites, APIs, databases, and spreadsheets Perform scrub operations on text, CSV, HTML, XML, and JSON files Explore data, compute descriptive statistics, and create visualizations Manage your data science workflow Create your own tools from one-liners and existing Python or R code Parallelize and distribute data-intensive pipelines Model data with dimensionality reduction, regression, and classification algorithms Leverage the command line from Python, Jupyter, R, RStudio, and Apache Spark

Getting Started with SAS Programming

Get up and running with SAS using Ron Cody’s easy-to-follow, step-by-step guide. Aimed at beginners, Getting Started with SAS Programming: Using SAS Studio in the Cloud uses short examples to teach SAS programming from the basics to more advanced topics in the point-and-click interactive environment of SAS Studio. To begin, you will learn how to register for SAS OnDemand for Academics, an online delivery platform for teaching and learning statistical analysis that provides free access to SAS software via the cloud. The first part of the book shows you how to use SAS Studio built-in tasks to produce a report, summarize data, and create charts and graphs. It also describes how you can perform basic statistical tests using the interactive point-and-click environment. The second part of the book uses easy-to-follow examples to show you how to write your own SAS programs and how to use SAS procedures to perform a variety of tasks. This part of the book also explains how to read data from a variety of sources: text files, Excel workbooks, and CSV files. In order to get familiar with the SAS Studio environment, this book also shows you how to access dozens of interesting data sets that are included with the SAS OnDemand for Academics platform.

Learn RStudio IDE: Quick, Effective, and Productive Data Science

Discover how to use the popular RStudio IDE as a professional tool that includes code refactoring support, debugging, and Git version control integration. This book gives you a tour of RStudio and shows you how it helps you do exploratory data analysis; build data visualizations with ggplot; and create custom R packages and web-based interactive visualizations with Shiny. In addition, you will cover common data analysis tasks including importing data from diverse sources such as SAS files, CSV files, and JSON. You will map out the features in RStudio so that you will be able to customize RStudio to fit your own style of coding. Finally, you will see how to save a ton of time by adopting best practices and using packages to extend RStudio. Learn RStudio IDE is a quick, no-nonsense tutorial of RStudio that will give you a head start to develop the insights you need in your data science projects. What YouWill Learn Quickly, effectively, and productively use RStudio IDE for building data science applications Install RStudio and program your first Hello World application Adopt the RStudio workflow Make your code reusable using RStudio Use RStudio and Shiny for data visualization projects Debug your code with RStudio Import CSV, SPSS, SAS, JSON, and other data Who This Book Is For Programmers who want to start doing data science, but don’t know what tools to focus on to get up to speed quickly.

Learn Chart.js

This book, 'Learn Chart.js', serves as a comprehensive guide to mastering Chart.js for creating stunning web-based data visualizations. By combining JavaScript, HTML5 Canvas, and Chart.js, you will understand how to turn raw data into interactive visual stories. What this Book will help me do Develop skills to create interactive and engaging data visualizations using the Chart.js library. Learn to efficiently load, parse, and handle data from external formats like CSV and JSON. Understand different chart types offered by Chart.js and learn when to best use each one. Gain the ability to customize Chart.js charts, such as adjusting properties for styling or animations. Acquire hands-on experience with practical examples, equipping you to apply what you learn in real-world scenarios. Author(s) Helder da Rocha brings his extensive experience in programming and software development to this book, offering readers a clear and practical approach to mastering Chart.js. With a deep understanding of data visualization and web technologies, he conveys complex concepts in a straightforward way. Who is it for? This book is ideal for web developers, data analysts, and designers who have basic proficiency in HTML, CSS, and JavaScript. It is particularly suited for professionals looking to create impactful web-based data visualizations using open-source tools. Additionally, the book assumes no prior knowledge of the Canvas element, making it accessible for Chart.js beginners.

Kibana 7 Quick Start Guide

Dive into the world of Kibana 7 with this hands-on guide that simplifies the process of visualizing and analyzing data using Elasticsearch. From fundamental concepts to advanced tools, this book enables you to create intuitive dashboards and leverage powerful machine learning capabilities effectively. Discover how to transform your data into actionable insights with ease. What this Book will help me do Configure Logstash to fetch and process CSV data for visualization. Master creating and managing index patterns within Kibana for efficient data navigation. Effectively apply filters to refine data presentations and insights. Develop and utilize machine learning jobs in Kibana to identify trends and anomalies. Create, customize, and share impactful visualizations and dashboards to drive data-driven decisions. Author(s) None Srivastava is a technical expert in data visualization and Elasticsearch tools, with practical experience implementing and teaching about the Elastic Stack. The author brings a hands-on approach to this book, simplifying complex concepts for ease of understanding. Their expertise ensures that the book serves both as a learning guide and a practical reference. Who is it for? This book is ideal for developers and IT professionals who are either new to Kibana or looking to deepen their understanding of its visualization capabilities. It is suitable for individuals working with the Elastic Stack or seeking to leverage Kibana for data analysis purposes. Even if you are progressing from a novice to an intermediate level, this guide will provide future-proof skills to optimize your workflow.

Learning Apache Drill

Get up to speed with Apache Drill, an extensible distributed SQL query engine that reads massive datasets in many popular file formats such as Parquet, JSON, and CSV. Drill reads data in HDFS or in cloud-native storage such as S3 and works with Hive metastores along with distributed databases such as HBase, MongoDB, and relational databases. Drill works everywhere: on your laptop or in your largest cluster. In this practical book, Drill committers Charles Givre and Paul Rogers show analysts and data scientists how to query and analyze raw data using this powerful tool. Data scientists today spend about 80% of their time just gathering and cleaning data. With this book, you’ll learn how Drill helps you analyze data more effectively to drive down time to insight. Use Drill to clean, prepare, and summarize delimited data for further analysis Query file types including logfiles, Parquet, JSON, and other complex formats Query Hadoop, relational databases, MongoDB, and Kafka with standard SQL Connect to Drill programmatically using a variety of languages Use Drill even with challenging or ambiguous file formats Perform sophisticated analysis by extending Drill’s functionality with user-defined functions Facilitate data analysis for network security, image metadata, and machine learning

An Introduction to SAS University Edition

SAS ® OnDemand for Academics is now the primary software choice for learners. SAS OnDemand for Academics is available for free access to SAS for individual learners as well as university educators and students. Access to SAS University Edition will end Aug. 2, 2021; users will no longer be able to download it after Apr. 30, 2021. Get up and running with the SAS University Edition using Ron Cody’s easy-to-follow, step-by-step guide. Aimed at beginners who have downloaded the free SAS University Edition and want to either use the point-and-click interactive environment of SAS Studio, or who want to write their own SAS programs, or both, An Introduction to SAS University Edition, begins by showing you how to obtain the SAS University Edition, and how you can run SAS on a PC or Macintosh computer. The first part of the book shows you how to perform basic tasks, such as producing a report, summarizing data, producing charts and graphs, and using the SAS Studio built-in tasks. The first part also describes how you can perform basic statistical tests using the interactive point-and-click environment. The second part of the book shows you how to write your own SAS programs, and how to use SAS procedures to perform a variety of tasks. This part of the book also explains how to read data from a variety of sources: text files, Excel workbooks, and CSV files. In order to get familiar with the SAS Studio environment, this book also shows you how to access dozens of interesting data sets that are included with the product.

Preparing Data for Analysis with JMP

Access and clean up data easily using JMP®! Data acquisition and preparation commonly consume approximately 75% of the effort and time of total data analysis. JMP provides many visual, intuitive, and even innovative data-preparation capabilities that enable you to make the most of your organization's data. Preparing Data for Analysis with JMP® is organized within a framework of statistical investigations and model-building and illustrates the new data-handling features in JMP, such as the Query Builder. Useful to students and programmers with little or no JMP experience, or those looking to learn the new data-management features and techniques, it uses a practical approach to getting started with plenty of examples. Using step-by-step demonstrations and screenshots, this book walks you through the most commonly used data-management techniques that also include lots of tips on how to avoid common problems. With this book, you will learn how to: Manage database operations using the JMP Query Builder Get data into JMP from other formats, such as Excel, csv, SAS, HTML, JSON, and the web Identify and avoid problems with the help of JMP’s visual and automated data-exploration tools Consolidate data from multiple sources with Query Builder for tables Deal with common issues and repairs that include the following tasks: reshaping tables (stack/unstack) managing missing data with techniques such as imputation and Principal Components Analysis cleaning and correcting dirty data computing new variables transforming variables for modelling reconciling time and date Subset and filter your data Save data tables for exchange with other platforms

Learning Pentaho CTools

Learning Pentaho CTools is a comprehensive guide to building sophisticated and custom analytics dashboards using the powerful capabilities of Pentaho CTools. This book walks you through the process of creating interactive dashboards, integrating data sources, and applying data visualization best practices. You'll quickly gain the expertise needed to create impactful dashboards with ease. What this Book will help me do Master installing and configuring CTools for Pentaho to jumpstart dashboard development. Harness diverse data sources and deliver data in formats like CSV, JSON, and XML for customized analytics. Design and implement dynamic, visually stunning dashboards using Community Dashboard Framework (CDF). Deploy and integrate plugins, leverage widgets, and manage dashboards effectively with version control. Enhance interactivity by customizing dashboard components, charts, and filters to suit unique requirements. Author(s) None Gaspar, an expert in Pentaho and its tools, has been a Senior Consultant at Pentaho, where he gained in-depth experience crafting analytics solutions. He brings to this book his teaching passion and field expertise, combining theoretical insights with practical applications. His approachable style ensures readers can follow technical concepts effectively. Who is it for? This book is ideal for developers who are looking to enhance their understanding of Pentaho's CTools portfolio to build advanced dashboards. A working knowledge of JavaScript and CSS will enable readers to get the most out of this guide. Whether you aim to extend your analytics capabilities or learn the tools from scratch, this book bridges the gap between learning and application.

Data Science at the Command Line

This hands-on guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You’ll learn how to combine small, yet powerful, command-line tools to quickly obtain, scrub, explore, and model your data. To get you started—whether you’re on Windows, OS X, or Linux—author Jeroen Janssens introduces the Data Science Toolbox, an easy-to-install virtual environment packed with over 80 command-line tools. Discover why the command line is an agile, scalable, and extensible technology. Even if you’re already comfortable processing data with, say, Python or R, you’ll greatly improve your data science workflow by also leveraging the power of the command line. Obtain data from websites, APIs, databases, and spreadsheets Perform scrub operations on plain text, CSV, HTML/XML, and JSON Explore data, compute descriptive statistics, and create visualizations Manage your data science workflow using Drake Create reusable tools from one-liners and existing Python or R code Parallelize and distribute data-intensive pipelines using GNU Parallel Model data with dimensionality reduction, clustering, regression, and classification algorithms

PROC DOCUMENT by Example Using SAS
PROC DOCUMENT by Example Using SAS demonstrates the practical uses of the DOCUMENT procedure, a part of the Output Delivery System, in SAS 9.3. Michael Tuchman explains how to work with PROC DOCUMENT, which is designed to store your SAS procedure output for replay at a later time without having to rerun your original SAS code. You’ll learn how to:

save a collection of procedure output, descriptive text, and supporting graphs that can be replayed as a single unit save output once and distribute that same output in a variety of ODS formats such as HTML, CSV, and PDF create custom reports by comparing output from the same procedure run at different points in time create a table of contents for your output modify the appearance of both textual and graphical ODS output even if the original data is no longer available or easily accessible manage your tabular and graphical output by using descriptive labels, titles, and footnotes rearrange the original order of output in a procedure to suit your needs

After using this book, you’ll be able to quickly and easily create libraries of professional-looking output that are accessible at any time.

This book is part of the SAS Press program.

SAS Server Pages

SAS Server Pages have been used by SAS developers as a way of creating custom user interfaces for Web-based applications. This enhanced book offers information on how to create SAS Server Pages using the SAS 9.3 experimental procedure PROC STREAM, providing users with a foundation technology that greatly expands the capabilities of SAS for dynamic and rich content generation. By combining PROC STREAM and the Macro facility, SAS can now more easily generate any type of markup or text-based content such as HTML, XML, and CSV.

Exclusively available in electronic format, this book provides more extensive and flexible ways to develop applications using video examples of a wide range of PROC STREAM and SAS Server Pages techniques, including both Web applications and Base SAS implementations. Users can see results immediately and can access additional content and information online through embedded links. It also offers basic how-to documentation on PROC STREAM and an overview of a Portal Reporting Framework that illustrates creating custom user interfaces for stored processes within the SAS Portal.

Ideal for SAS programmers who have some knowledge of the Macro facility as well as BI users, SAS Server Pages: Generating Dynamic Content removes the difficulties associated with HTML-based content creation while providing a resource on using PROC STREAM in a dynamic, enhanced format.