talk-data.com talk-data.com

Topic

data

2093

tagged

Activity Trend

3 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: O'Reilly Data Science Books ×
Mastering Python Data Analysis

Mastering Python Data Analysis provides a comprehensive roadmap for Python developers to enhance their data analysis skills to tackle real-world problems. This book delves into advanced statistical analysis, covering tools, models, and methods to transform raw data into valuable insights. What this Book will help me do Effectively handle and preprocess data using Python and Pandas. Explore statistical models to identify patterns and gain insights from data. Learn clustering approaches to detect data groupings and predict outcomes. Utilize Bayesian methods for quantifying causal relationships. Generate professional reports and visualizations with Python tools like Jupyter Notebook. Author(s) None Vilhelm Persson is a seasoned software developer and data analyst with expertise in leveraging Python for sophisticated data analysis and machine learning tasks. Drawing from years of experience in the tech industry, None provides practical, real-world insights throughout the book. His approachable writing style ensures technical concepts are conveyed with clarity, making data analysis accessible to developers at varying skill levels. Who is it for? This book is ideal for intermediate Python developers seeking to elevate their data analysis skills. If you are familiar with Python libraries and have an interest in solving complex data problems, this guide will serve as a stepping stone to mastery. Advanced beginners with a curiosity for statistical methods and a desire to learn through practical examples will find this book invaluable. It is also perfect for professionals aiming to integrate Python-based statistical techniques into their workflow.

Theory and Methods of Statistics

Theory and Methods of Statistics covers essential topics for advanced graduate students and professional research statisticians. This comprehensive resource covers many important areas in one manageable volume, including core subjects such as probability theory, mathematical statistics, and linear models, and various special topics, including nonparametrics, curve estimation, multivariate analysis, time series, and resampling. The book presents subjects such as "maximum likelihood and sufficiency," and is written with an intuitive, heuristic approach to build reader comprehension. It also includes many probability inequalities that are not only useful in the context of this text, but also as a resource for investigating convergence of statistical procedures. Codifies foundational information in many core areas of statistics into a comprehensive and definitive resource Serves as an excellent text for select master’s and PhD programs, as well as a professional reference Integrates numerous examples to illustrate advanced concepts Includes many probability inequalities useful for investigating convergence of statistical procedures

Applied Regression and Modeling

The book is divided into three parts – (1) prerequisite to regression analysis followed by a discussion on simple regression, (2) multiple regression analysis with applications, and (3) regression and modeling including the second order models, nonlinear regression, and interaction models in regressions. All these sections provide examples with complete computer analysis and instructions commonly used in modeling and analyzing these problems. The book deals with detailed analysis and interpretation of computer results. This will help readers to appreciate the power of computer in applying regression models. The readers will find that the understanding of computer results is critical to implementing regression and modeling in real world situation. The book is written for juniors, seniors and graduate students in business, MBAs, professional MBAs, and working people in business and industry. Managers, practitioners, professionals, quality professionals, quality engineers, and anyone involved in data analysis, business analytics, and quality and six sigma will find the book to be a valuable resource.

Advancing Procurement Analytics

One area where data analytics can have profound effect is your company’s procurement process. Some organizations spend more than two thirds of their revenue buying goods and services, making procurement—out of all business activities—a key element in achieving cost reduction. This report examines how your company can significantly improve procurement analytics to solve business questions quickly and effectively. Author Federico Castanedo, Chief Data Scientist at WiseAthena.com, explains how a probabilistic, bottom-up approach can significantly increase the quality, speed, and scalability of your data preparation operations—whether you’re integrating datasets or cleaning and classifying them. You’ll learn how new solutions leverage automation and machine learning, including the Tamr platform, and help you take advantage of several data-driven actions for procurement—including compliance, price arbitrage, and spend recovery.

The Data Industry

Provides an introduction of the data industry to the field of economics This book bridges the gap between economics and data science to help data scientists understand the economics of big data, and enable economists to analyze the data industry. It begins by explaining data resources and introduces the data asset. This book defines a data industry chain, enumerates data enterprises’ business models versus operating models, and proposes a mode of industrial development for the data industry. The author describes five types of enterprise agglomerations, and multiple industrial cluster effects. A discussion on the establishment and development of data industry related laws and regulations is provided. In addition, this book discusses several scenarios on how to convert data driving forces into productivity that can then serve society. This book is designed to serve as a reference and training guide for ata scientists, data-oriented managers and executives, entrepreneurs, scholars, and government employees. Defines and develops the concept of a “Data Industry,” and explains the economics of data to data scientists and statisticians Includes numerous case studies and examples from a variety of industries and disciplines Serves as a useful guide for practitioners and entrepreneurs in the business of data technology The Data Industry: The Business and Economics of Information and Big Data is a resource for practitioners in the data science industry, government, and students in economics, business, and statistics. CHUNLEI TANG, Ph.D., is a research fellow at Harvard University. She is the co-founder of Fudan’s Institute for Data Industry and proposed the concept of the “data industry”. She received a Ph.D. in Computer and Software Theory in 2012 and a Master of Software Engineering in 2006 from Fudan University, Shanghai, China.

Python: Real-World Data Science

Unleash the power of Python and its robust data science capabilities About This Book Unleash the power of Python 3 objects Learn to use powerful Python libraries for effective data processing and analysis Harness the power of Python to analyze data and create insightful predictive models Unlock deeper insights into machine learning with this vital guide to cutting-edge predictive analytics Who This Book Is For Entry-level analysts who want to enter in the data science world will find this course very useful to get themselves acquainted with Python's data science capabilities for doing real-world data analysis. What You Will Learn Install and setup Python Implement objects in Python by creating classes and defining methods Get acquainted with NumPy to use it with arrays and array-oriented computing in data analysis Create effective visualizations for presenting your data using Matplotlib Process and analyze data using the time series capabilities of pandas Interact with different kind of database systems, such as file, disk format, Mongo, and Redis Apply data mining concepts to real-world problems Compute on big data, including real-time data from the Internet Explore how to use different machine learning models to ask different questions of your data In Detail The Python: Real-World Data Science course will take you on a journey to become an efficient data science practitioner by thoroughly understanding the key concepts of Python. This learning path is divided into four modules and each module are a mini course in their own right, and as you complete each one, you'll have gained key skills and be ready for the material in the next module. The course begins with getting your Python fundamentals nailed down. After getting familiar with Python core concepts, it's time that you dive into the field of data science. In the second module, you'll learn how to perform data analysis using Python in a practical and example-driven way. The third module will teach you how to design and develop data mining applications using a variety of datasets, starting with basic classification and affinity analysis to more complex data types including text, images, and graphs. Machine learning and predictive analytics have become the most important approaches to uncover data gold mines. In the final module, we'll discuss the necessary details regarding machine learning concepts, offering intuitive yet informative explanations on how machine learning algorithms work, how to use them, and most importantly, how to avoid the common pitfalls. Style and approach This course includes all the resources that will help you jump into the data science field with Python and learn how to make sense of data. The aim is to create a smooth learning path that will teach you how to get started with powerful Python libraries and perform various data science techniques in depth.

Understanding and Applying Basic Statistical Methods Using R

Features a straightforward and concise resource for introductory statistical concepts, methods, and techniques using R Understanding and Applying Basic Statistical Methods Using R uniquely bridges the gap between advances in the statistical literature and methods routinely used by non-statisticians. Providing a conceptual basis for understanding the relative merits and applications of these methods, the book features modern insights and advances relevant to basic techniques in terms of dealing with non-normality, outliers, heteroscedasticity (unequal variances), and curvature. Featuring a guide to R, the book uses R programming to explore introductory statistical concepts and standard methods for dealing with known problems associated with classic techniques. Thoroughly class-room tested, the book includes sections that focus on either R programming or computational details to help the reader become acquainted with basic concepts and principles essential in terms of understanding and applying the many methods currently available. Covering relevant material from a wide range of disciplines, Understanding and Applying Basic Statistical Methods Using R also includes: Numerous illustrations and exercises that use data to demonstrate the practical importance of multiple perspectives Discussions on common mistakes such as eliminating outliers and applying standard methods based on means using the remaining data Detailed coverage on R programming with descriptions on how to apply both classic and more modern methods using R A companion website with the data and solutions to all of the exercises Understanding and Applying Basic Statistical Methods Using R is an ideal textbook for an undergraduate and graduate-level statistics courses in the science and/or social science departments. The book can also serve as a reference for professional statisticians and other practitioners looking to better understand modern statistical methods as well as R programming.

Learning Pentaho CTools

Learning Pentaho CTools is a comprehensive guide to building sophisticated and custom analytics dashboards using the powerful capabilities of Pentaho CTools. This book walks you through the process of creating interactive dashboards, integrating data sources, and applying data visualization best practices. You'll quickly gain the expertise needed to create impactful dashboards with ease. What this Book will help me do Master installing and configuring CTools for Pentaho to jumpstart dashboard development. Harness diverse data sources and deliver data in formats like CSV, JSON, and XML for customized analytics. Design and implement dynamic, visually stunning dashboards using Community Dashboard Framework (CDF). Deploy and integrate plugins, leverage widgets, and manage dashboards effectively with version control. Enhance interactivity by customizing dashboard components, charts, and filters to suit unique requirements. Author(s) None Gaspar, an expert in Pentaho and its tools, has been a Senior Consultant at Pentaho, where he gained in-depth experience crafting analytics solutions. He brings to this book his teaching passion and field expertise, combining theoretical insights with practical applications. His approachable style ensures readers can follow technical concepts effectively. Who is it for? This book is ideal for developers who are looking to enhance their understanding of Pentaho's CTools portfolio to build advanced dashboards. A working knowledge of JavaScript and CSS will enable readers to get the most out of this guide. Whether you aim to extend your analytics capabilities or learn the tools from scratch, this book bridges the gap between learning and application.

Network Reliability

In Engineering theory and applications, we think and operate in terms of logics and models with some acceptable and reasonable assumptions. The present text is aimed at providing modelling and analysis techniques for the evaluation of reliability measures (2-terminal, all-terminal, k-terminal reliability) for systems whose structure can be described in the form of a probabilistic graph. Among the several approaches of network reliability evaluation, the multiple-variable-inversion sum-of-disjoint product approach finds a well-deserved niche as it provides the reliability or unreliability expression in a most efficient and compact manner. However, it does require an efficiently enumerated minimal inputs (minimal path, spanning tree, minimal k-trees, minimal cut, minimal global-cut, minimal k-cut) depending on the desired reliability. The present book covers these two aspects in detail through the descriptions of several algorithms devised by the ‘reliability fraternity’ and explained through solved examples to obtain and evaluate 2-terminal, k-terminal and all-terminal network reliability/unreliability measures and could be its USP. The accompanying web-based supplementary information containing modifiable Matlab® source code for the algorithms is another feature of this book. A very concerted effort has been made to keep the book ideally suitable for first course or even for a novice stepping into the area of network reliability. The mathematical treatment is kept as minimal as possible with an assumption on the readers’ side that they have basic knowledge in graph theory, probabilities laws, Boolean laws and set theory.

Cyber-Risk Informatics

This book provides a scientific modeling approach for conducting metrics-based quantitative risk assessments of cybersecurity vulnerabilities and threats. This book provides a scientific modeling approach for conducting metrics-based quantitative risk assessments of cybersecurity threats. The author builds from a common understanding based on previous class-tested works to introduce the reader to the current and newly innovative approaches to address the maliciously-by-human-created (rather than by-chance-occurring) vulnerability and threat, and related cost-effective management to mitigate such risk. This book is purely statistical data-oriented (not deterministic) and employs computationally intensive techniques, such as Monte Carlo and Discrete Event Simulation. The enriched JAVA ready-to-go applications and solutions to exercises provided by the author at the book’s specifically preserved website will enable readers to utilize the course related problems. • Enables the reader to use the book's website's applications to implement and see results, and use them making ‘budgetary’ sense • Utilizes a data analytical approach and provides clear entry points for readers of varying skill sets and backgrounds • Developed out of necessity from real in-class experience while teaching advanced undergraduate and graduate courses by the author Cyber-Risk Informatics is a resource for undergraduate students, graduate students, and practitioners in the field of Risk Assessment and Management regarding Security and Reliability Modeling. Mehmet Sahinoglu, a Professor (1990) Emeritus (2000), is the founder of the Informatics Institute (2009) and its SACS-accredited (2010) and NSA-certified (2013) flagship Cybersystems and Information Security (CSIS) graduate program (the first such full degree in-class program in Southeastern USA) at AUM, Auburn University’s metropolitan campus in Montgomery, Alabama. He is a fellow member of the SDPS Society, a senior member of the IEEE, and an elected member of ISI. Sahinoglu is the recipient of Microsoft's Trustworthy Computing Curriculum (TCC) award and the author of Trustworthy Computing (Wiley, 2007).

Mastering the SAS DS2 Procedure

Enhance your SAS® data wrangling skills with high precision and parallel data manipulation using the new DS2 programming language.

This book addresses the new DS2 programming language from SAS, which combines the precise procedural power and control of the Base SAS DATA step language with the simplicity and flexibility of SQL. DS2 provides simple, safe syntax for performing complex data transformations in parallel and enables manipulation of native database data types at full precision. It also introduces PROC FEDSQL, a modernized SQL language that blends perfectly with DS2. You will learn to harness the power of parallel processing to speed up CPU-intensive computing processes in Base SAS and how to achieve even more speed by processing DS2 programs on massively parallel database systems. Techniques for leveraging Internet APIs to acquire data, avoiding large data movements when working with data from disparate sources, and leveraging DS2’s new data types for full-precision numeric calculations are presented, with examples of why these techniques are essential for the modern data wrangler.

While working through the code samples provided with this book, you will build a library of custom, reusable, and easily shareable DS2 program modules, execute parallelized DATA step programs to speed up a CPU-intensive process, and conduct advanced data transformations using hash objects and matrix math operations.

Threat Forecasting

Drawing upon years of practical experience and using numerous examples and illustrative case studies, Threat Forecasting: Leveraging Big Data for Predictive Analysis discusses important topics, including the danger of using historic data as the basis for predicting future breaches, how to use security intelligence as a tool to develop threat forecasting techniques, and how to use threat data visualization techniques and threat simulation tools. Readers will gain valuable security insights into unstructured big data, along with tactics on how to use the data to their advantage to reduce risk. Presents case studies and actual data to demonstrate threat data visualization techniques and threat simulation tools Explores the usage of kill chain modelling to inform actionable security intelligence Demonstrates a methodology that can be used to create a full threat forecast analysis for enterprise networks of any size

The Evolution of Analytics

Machine learning is a hot topic in business. Even data-driven organizations that have spent years developing successful data analysis platforms, with many accurate statistical models in place, are now looking into this decades-old discipline. But how can companies turn hyped opportunities for machine learning into real business value? This report examines the growing momentum of machine learning in the analytics landscape, the challenges machine learning presents to businesses, and examples of how organizations are actively seeking to incorporate modern machine learning techniques into their production data infrastructures. Authors Patrick Hall, Wen Phan, and Katie Whitson look at two companies in depth—one in healthcare and one in finance—that are seeing the real impact of machine learning. Discover how machine learning can help your organization: Analyze and generate insights from large amounts of varied, messy, and unstructured data unfit for traditional statistical analysis Increase the predictive accuracy beyond what was previously possible Augment aging analytical processes and other decision-making tools

2016 Software Development Salary Survey

Early this year, more than 5000 software engineers, developers, and other programming professionals participated in O’Reilly Media’s first Software Development Salary Survey. Participants included professionals from large and small companies in a variety of industries across 51 countries and all 50 US states. With the complete survey results in this in-depth report, you’ll be able to explore the world of software development—and the careers that propel it—in great detail. With this report, you’ll learn: The top programming languages that respondents currently use professionally Where programmers make the highest salaries—by country and by regions in the US Salary ranges by industry and by specific programming language The difference in earnings between programmers who work on tiny teams vs those work on larger teams The most common programming languages that respondents no longer use in their work The most common languages that respondents intend to learn within the next couple of years Pick up a copy of this report and find out where you stand in the programming world. We encourage you to plug in your own data points to our survey model to see how you compare to other programming professionals in your industry.

Regression Analysis Microsoft® Excel®

This is today’s most complete guide to regression analysis with Microsoft® Excel for any business analytics or research task. Drawing on 25 years of advanced statistical experience, Microsoft MVP Conrad Carlberg shows how to use Excel’s regression-related worksheet functions to perform a wide spectrum of practical analyses. Carlberg clearly explains all the theory you’ll need to avoid mistakes, understand what your regressions are really doing, and evaluate analyses performed by others. From simple correlations and t-tests through multiple analysis of covariance, Carlberg offers hands-on, step-by-step walkthroughs using meaningful examples. He discusses the consequences of using each option and argument, points out idiosyncrasies and controversies associated with Excel’s regression functions, and shows how to use them reliably in fields ranging from medical research to financial analysis to operations. You don’t need expensive software or a doctorate in statistics to work with regression analyses. Microsoft Excel has all the tools you need—and this book has all the knowledge! Understand what regression analysis can and can’t do, and why Master regression-based functions built into all recent versions of Excel Work with correlation and simple regression Make the most of Excel’s improved LINEST() function Plan and perform multiple regression Distinguish the assumptions that matter from the ones that don’t Extend your analysis options by using regression instead of traditional analysis of variance Add covariates to your analysis to reduce bias and increase statistical power

Introducing Data Science

Introducing Data Science teaches you how to accomplish the fundamental tasks that occupy data scientists. Using the Python language and common Python libraries, you'll experience firsthand the challenges of dealing with data at scale and gain a solid foundation in data science. About the Technology Many companies need developers with data science skills to work on projects ranging from social media marketing to machine learning. Discovering what you need to learn to begin a career as a data scientist can seem bewildering. This book is designed to help you get started. About the Book Introducing Data Science explains vital data science concepts and teaches you how to accomplish the fundamental tasks that occupy data scientists. You'll explore data visualization, graph databases, the use of NoSQL, and the data science process. You'll use the Python language and common Python libraries as you experience firsthand the challenges of dealing with data at scale. Discover how Python allows you to gain insights from data sets so big that they need to be stored on multiple machines, or from data moving so quickly that no single machine can handle it. This book gives you hands-on experience with the most popular Python data science libraries, Scikit-learn and StatsModels. After reading this book, you'll have the solid foundation you need to start a career in data science. What's Inside Handling large data Introduction to machine learning Using Python to work with data Writing data science algorithms About the Reader This book assumes you're comfortable reading code in Python or a similar language, such as C, Ruby, or JavaScript. No prior experience with data science is required. About the Authors Davy Cielen, Arno D. B. Meysman, and Mohamed Ali are the founders and managing partners of Optimately and Maiton, where they focus on developing data science projects and solutions in various sectors. Quotes Read this book if you want to get a quick overview of data science, with lots of examples to get you started! - Alvin Raj, Oracle The map that will help you navigate the data science oceans. - Marius Butuc, Shopify Covers the processes involved in data science from end to end… A complete overview. - Heather Campbell, Kainos A must-read for anyone who wants to get into the data science world. - Hector Cuesta, Big Data Bootcamp

A Course in Statistics with R

Integrates the theory and applications of statistics using R A Course in Statistics with R has been written to bridge the gap between theory and applications and explain how mathematical expressions are converted into R programs. The book has been primarily designed as a useful companion for a Masters student during each semester of the course, but will also help applied statisticians in revisiting the underpinnings of the subject. With this dual goal in mind, the book begins with R basics and quickly covers visualization and exploratory analysis. Probability and statistical inference, inclusive of classical, nonparametric, and Bayesian schools, is developed with definitions, motivations, mathematical expression and R programs in a way which will help the reader to understand the mathematical development as well as R implementation. Linear regression models, experimental designs, multivariate analysis, and categorical data analysis are treated in a way which makes effective use of visualization techniques and the related statistical techniques underlying them through practical applications, and hence helps the reader to achieve a clear understanding of the associated statistical models. Key features: Integrates R basics with statistical concepts Provides graphical presentations inclusive of mathematical expressions Aids understanding of limit theorems of probability with and without the simulation approach Presents detailed algorithmic development of statistical models from scratch Includes practical applications with over 50 data sets

Regression for Economics, Second Edition

Regression analysis can be used to establish causal relationships between factors and the response variable. However, in order to be able to do so, economic theory must be used to provide the causal relationship and then regression analysis is applied to verify the validity of the theory. Regression analysis is the most commonly used analytical tool and can be understood without complex mathematics.  This book simplifies and demystifies regression analysis. All the examples are from economics and in almost all the cases, real data is used to show the application of the method. By limiting the use of mathematical symbols, the author enables a logical reader to learn regression, without shortchanging the subject.  The book is targeted to all business students and executives who need to understand the concept of regression for practical and professional purposes.

Learning Probabilistic Graphical Models in R

Explore the fundamentals of probabilistic graphical models (PGM) with hands-on examples using R. This book helps you translate theoretical concepts into practical solutions, addressing complex problems with Bayesian and Markov networks. It's written to demystify PGMs, equipping you to create robust models for inference, learning, and prediction. What this Book will help me do Understand and implement probabilistic graphical models, including Bayesian and Markov networks, directly in R. Learn to use various R packages for performing inference and analyzing probabilistic models. Master the essentials of Bayesian methods, transitioning to advanced concepts with clear, step-by-step guidance. Familiarize yourself with methods like PCA and ICA for analyzing and reducing complex data dimensions. Develop practical skills to apply PGM techniques to machine learning challenges and real-world data problems. Author(s) The authors bring diverse expertise in probabilistic modeling, R programming, and applied machine learning. They are passionate educators and technical writers, focusing on breaking down complex theories into accessible knowledge. Their writing emphasizes practical demonstration, leveraging their industry and academic experiences. Who is it for? This book is designed for data scientists, engineers, and machine learning enthusiasts who wish to enhance their understanding of probabilistic graphical models. Whether you're curious about Bayesian methods or looking to apply PGM approaches to data-rich challenges, this guide is perfect for learners at an intermediate level, offering practical insights and real-world applications.

Practical Data Analysis Cookbook

Practical Data Analysis Cookbook takes you on a comprehensive journey to mastering data exploration and analysis using Python. From data cleaning and transformation to building predictive and classification models, this book provides practical recipes for tackling real-world data challenges and extracting valuable insights. What this Book will help me do Efficiently clean, transform, and explore datasets using tools like pandas and OpenRefine. Develop predictive models for time series and other datasets using Python libraries such as scikit-learn and Statsmodels. Apply clustering and classification techniques to real-world data problems to gain actionable insights. Explore advanced topics like natural language processing and graph theory concepts using specialized tools. Build the skills to solve practical data modeling problems encountered in a data science role. Author(s) None Drabas is an experienced data scientist and author who specializes in Python-based data analysis. With a background in tackling intricate data-driven problems, None brings real-world experience to the readers. In creating this Cookbook, None adopts a step-by-step approach, making complex techniques accessible to learners of all backgrounds. Who is it for? If you are a data analyst, data scientist, or someone interested in exploring Python for practical data problems, this book is for you. It suits beginners starting their data journey and intermediate professionals looking to enhance their toolset. With clear instructions, it's ideal for anyone willing to build practical skills and tackle real-world challenges in data analysis.