O'Reilly Data Science Books

Learning R for Geospatial Analysis

2014-12-26 O'Reilly Amazon

book

Michael Dorman

data data-science data-science-tools r Data Science GIS

Learn how to leverage the power of R for geospatial analysis in this comprehensive guide. Whether you're processing spatial datasets, creating publication-quality maps, or performing GIS operations, this book covers the necessary tools and techniques for effective analysis, without requiring prior programming knowledge. What this Book will help me do Discover how to manipulate and analyze geospatial data effectively using R. Gain proficiency in loading, reshaping, and visualizing spatial data. Master key concepts like spatial queries and overlays for GIS tasks. Learn to automate spatial data workflows using reproducible R scripts. Create high-quality visualizations and maps tailored to your datasets. Author(s) None Dorman, the author of this book, is an experienced data science educator and practitioner with a particular focus on geospatial data analysis in R. With years of teaching and applied geospatial research, Dorman brings expertise in making advanced topics approachable. Their practical approach ensures readers can immediately put concepts into practice. Who is it for? This book is ideal for GIS analysts, geospatial researchers, educators, and students looking to enhance their skillset with R programming. It's particularly suited for those familiar with geographic concepts like coordinates but new to programming or R. If you aim to efficiently analyze spatial data and produce professional-grade visualizations and GIS analyses, this book is for you.

R Recipes: A Problem-Solution Approach

2014-12-24 O'Reilly Amazon

book

Larry A. Pace

data data-science data-science-tools r Analytics Cloud Computing

R Recipes is your handy problem-solution reference for learning and using the popular R programming language for statistics and other numerical analysis. Packed with hundreds of code and visual recipes, this book helps you to quickly learn the fundamentals and explore the frontiers of programming, analyzing and using R. R Recipes provides textual and visual recipes for easy and productive templates for use and re-use in your day-to-day R programming and data analysis practice. Whether you're in finance, cloud computing, big or small data analytics, or other applied computational and data science - R Recipes should be a staple for your code reference library.

Web and Network Data Science: Modeling Techniques in Predictive Analytics

2014-12-21 O'Reilly Amazon

book

Thomas W. Miller

data data-science web-analytics google-analytics Analytics Data Modelling

Master modern web and network data modeling: both theory and applications. In a top faculty member of Northwestern University’s prestigious analytics program presents the first fully-integrated treatment of both the business and academic elements of web and network modeling for predictive analytics. Web and Network Data Science, Some books in this field focus either entirely on business issues (e.g., Google Analytics and SEO); others are strictly academic (covering topics such as sociology, complexity theory, ecology, applied physics, and economics). This text gives today's managers and students what they really need: integrated coverage of concepts, principles, and theory in the context of real-world applications. Building on his pioneering Web Analytics course at Northwestern University, Thomas W. Miller covers usability testing, Web site performance, usage analysis, social media platforms, search engine optimization (SEO), and many other topics. He balances this practical coverage with accessible and up-to-date introductions to both social network analysis and network science, demonstrating how these disciplines can be used to solve real business problems.

Data Scientists at Work

2014-12-15 O'Reilly Amazon

book

Sebastian Gutierrez

data data-science data-science-as-a-profession AI/ML Big Data Cloud Computing

Data Scientists at Work is a collection of interviews with sixteen of the world's most influential and innovative data scientists from across the spectrum of this hot new profession. "Data scientist is the sexiest job in the 21st century," according to the Harvard Business Review. By 2018, the United States will experience a shortage of 190,000 skilled data scientists, according to a McKinsey report. Through incisive in-depth interviews, this book mines the what, how, and why of the practice of data science from the stories, ideas, shop talk, and forecasts of its preeminent practitioners across diverse industries: social network (Yann LeCun, Facebook); professional network (Daniel Tunkelang, LinkedIn); venture capital (Roger Ehrenberg, IA Ventures); enterprise cloud computing and neuroscience (Eric Jonas, formerly Salesforce.com); newspaper and media (Chris Wiggins, The New York Times); streaming television (Caitlin Smallwood, Netflix); music forecast (Victor Hu, Next Big Sound); strategic intelligence (Amy Heineike, Quid); environmental big data (Andre´ Karpis?ts?enkoEach of these data scientists shares how he or she tailors the torrent-taming techniques of big data, data visualization, search, and statistics to specific jobs by dint of ingenuity, imagination, patience, and passion. , Planet OS); geospatial marketing intelligence (Jonathan Lenaghan, PlaceIQ); advertising (Claudia Perlich, Dstillery); fashion e-commerce (Anna Smith, Rent the Runway); specialty retail (Erin Shellman, Nordstrom); email marketing (John Foreman, MailChimp); predictive sales intelligence (Kira Radinsky, SalesPredict); and humanitarian nonprofit (Jake Porway, DataKind). The book features a stimulating foreword by Google's Director of Research, Peter Norvig. Data Scientists at Work parts the curtain on the interviewees’ earliest data projects, how they became data scientists, their discoveries and surprises in working with data, their thoughts on the past, present, and future of the profession, their experiences of team collaboration within their organizations, and the insights they have gained as they get their hands dirty refining mountains of raw data into objects of commercial, scientific, and educational value for their organizations and clients.

Data Science at the Command Line

2014-10-02 O'Reilly Amazon

book

Jeroen Janssens

data data-science Agile/Scrum API CSV Data Science

This hands-on guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You’ll learn how to combine small, yet powerful, command-line tools to quickly obtain, scrub, explore, and model your data. To get you started—whether you’re on Windows, OS X, or Linux—author Jeroen Janssens introduces the Data Science Toolbox, an easy-to-install virtual environment packed with over 80 command-line tools. Discover why the command line is an agile, scalable, and extensible technology. Even if you’re already comfortable processing data with, say, Python or R, you’ll greatly improve your data science workflow by also leveraging the power of the command line. Obtain data from websites, APIs, databases, and spreadsheets Perform scrub operations on plain text, CSV, HTML/XML, and JSON Explore data, compute descriptive statistics, and create visualizations Manage your data science workflow using Drake Create reusable tools from one-liners and existing Python or R code Parallelize and distribute data-intensive pipelines using GNU Parallel Model data with dimensionality reduction, clustering, regression, and classification algorithms

Modeling Techniques in Predictive Analytics: Business Problems and Solutions with R, Revised and Expanded Edition

2014-10-01 O'Reilly Amazon

book

Thomas W. Miller

data data-science data-science-tools r Analytics Big Data

To succeed with predictive analytics, you must understand it on three levels: Strategy and management Methods and models Technology and code This up-to-the-minute reference thoroughly covers all three categories. Now fully updated, this uniquely accessible book will help you use predictive analytics to solve real business problems and drive real competitive advantage. If you’re new to the discipline, it will give you the strong foundation you need to get accurate, actionable results. If you’re already a modeler, programmer, or manager, it will teach you crucial skills you don’t yet have. Unlike competitive books, this guide illuminates the discipline through realistic vignettes and intuitive data visualizations– not complex math. Thomas W. Miller, leader of Northwestern University’s pioneering program in predictive analytics, guides you through defining problems, identifying data, crafting and optimizing models, writing effective R code, interpreting results, and more. Every chapter focuses on one of today’s key applications for predictive analytics, delivering skills and knowledge to put models to work–and maximize their value. Reflecting extensive student and instructor feedback, this edition adds five classroom-tested case studies, updates all code for new versions of R, explains code behavior more clearly and completely, and covers modern data science methods even more effectively. All data sets, extensive R code, and additional examples available for download at http://www.ftpress.com/miller If you want to make the most of predictive analytics, data science, and big data, this is the book for you. Thomas W. Miller’s unique balanced approach combines business context and quantitative tools, appealing to managers, analysts, programmers, and students alike. Miller addresses multiple business cases and challenges, including segmentation, brand positioning, product choice modeling, pricing research, finance, sports, text analytics, sentiment analysis, and social network analysis. He illuminates the use of cross-sectional data, time series, spatial, and spatio-temporal data. You’ll learn why each problem matters, what data are relevant, and how to explore the data you’ve identified. Miller guides you through conceptually modeling each data set with words and figures; and then modeling it again with realistic R programs that deliver actionable insights. You’ll walk through model construction, explanatory variable subset selection, and validation, mastering best practices for improving out-of-sample predictive performance. Throughout, Miller employs data visualization and statistical graphics to help you explore data, present models, and evaluate performance. This edition adds five new case studies, updates all code for the newest versions of R, adds more commenting to clarify how the code works, and offers a more detailed and up-to-date primer on data science methods. Gain powerful, actionable, profitable insights about: Advertising and promotion Consumer preference and choice Market baskets and related purchases Economic forecasting Operations management Unstructured text and language Customer sentiment Brand and price Sports team performance And much more

Modeling Techniques in Predictive Analytics with Python and R: A Guide to Data Science

2014-10-01 O'Reilly Amazon

book

Thomas W. Miller

data data-science Analytics Big Data Data Science DataViz

Master predictive analytics, from start to finish Start with strategy and management Master methods and build models Transform your models into highly-effective code—in both Python and R This one-of-a-kind book will help you use predictive analytics, Python, and R to solve real business problems and drive real competitive advantage. You’ll master predictive analytics through realistic case studies, intuitive data visualizations, and up-to-date code for both Python and R—not complex math. Step by step, you’ll walk through defining problems, identifying data, crafting and optimizing models, writing effective Python and R code, interpreting results, and more. Each chapter focuses on one of today’s key applications for predictive analytics, delivering skills and knowledge to put models to work—and maximize their value. Thomas W. Miller, leader of Northwestern University’s pioneering program in predictive analytics, addresses everything you need to succeed: strategy and management, methods and models, and technology and code. If you’re new to predictive analytics, you’ll gain a strong foundation for achieving accurate, actionable results. If you’re already working in the field, you’ll master powerful new skills. If you’re familiar with either Python or R, you’ll discover how these languages complement each other, enabling you to do even more. All data sets, extensive Python and R code, and additional examples available for download at http://www.ftpress.com/miller/ Python and R offer immense power in predictive analytics, data science, and big data. This book will help you leverage that power to solve real business problems, and drive real competitive advantage. Thomas W. Miller’s unique balanced approach combines business context and quantitative tools, illuminating each technique with carefully explained code for the latest versions of Python and R. If you’re new to predictive analytics, Miller gives you a strong foundation for achieving accurate, actionable results. If you’re already a modeler, programmer, or manager, you’ll learn crucial skills you don’t already have. Using Python and R, Miller addresses multiple business challenges, including segmentation, brand positioning, product choice modeling, pricing research, finance, sports, text analytics, sentiment analysis, and social network analysis. He illuminates the use of cross-sectional data, time series, spatial, and spatio-temporal data. You’ll learn why each problem matters, what data are relevant, and how to explore the data you’ve identified. Miller guides you through conceptually modeling each data set with words and figures; and then modeling it again with realistic code that delivers actionable insights. You’ll walk through model construction, explanatory variable subset selection, and validation, mastering best practices for improving out-of-sample predictive performance. Miller employs data visualization and statistical graphics to help you explore data, present models, and evaluate performance. Appendices include five complete case studies, and a detailed primer on modern data science methods. Use Python and R to gain powerful, actionable, profitable insights about: Advertising and promotion Consumer preference and choice Market baskets and related purchases Economic forecasting Operations management Unstructured text and language Customer sentiment Brand and price Sports team performance And much more

Guerrilla Analytics

2014-09-25 O'Reilly Amazon

book

Enda Ridge

data data-science business-intelligence prescriptive-analytics Analytics Data Science

Doing data science is difficult. Projects are typically very dynamic with requirements that change as data understanding grows. The data itself arrives piecemeal, is added to, replaced, contains undiscovered flaws and comes from a variety of sources. Teams also have mixed skill sets and tooling is often limited. Despite these disruptions, a data science team must get off the ground fast and begin demonstrating value with traceable, tested work products. This is when you need Guerrilla Analytics. In this book, you will learn about: The Guerrilla Analytics Principles: simple rules of thumb for maintaining data provenance across the entire analytics life cycle from data extraction, through analysis to reporting. Reproducible, traceable analytics: how to design and implement work products that are reproducible, testable and stand up to external scrutiny. Practice tips and war stories: 90 practice tips and 16 war stories based on real-world project challenges encountered in consulting, pre-sales and research. Preparing for battle: how to set up your team's analytics environment in terms of tooling, skill sets, workflows and conventions. Data gymnastics: over a dozen analytics patterns that your team will encounter again and again in projects The Guerrilla Analytics Principles: simple rules of thumb for maintaining data provenance across the entire analytics life cycle from data extraction, through analysis to reporting Reproducible, traceable analytics: how to design and implement work products that are reproducible, testable and stand up to external scrutiny Practice tips and war stories: 90 practice tips and 16 war stories based on real-world project challenges encountered in consulting, pre-sales and research Preparing for battle: how to set up your team's analytics environment in terms of tooling, skill sets, workflows and conventions Data gymnastics: over a dozen analytics patterns that your team will encounter again and again in projects

Hands-On Programming with R

2014-07-23 O'Reilly Amazon

book

Garrett Grolemund

data data-science data-science-tools r Data Science R

Learn how to program by diving into the R language, and then use your newfound skills to solve practical data science problems. With this book, you’ll learn how to load data, assemble and disassemble data objects, navigate R’s environment system, write your own functions, and use all of R’s programming tools. RStudio Master Instructor Garrett Grolemund not only teaches you how to program, but also shows you how to get more from R than just visualizing and modeling data. You’ll gain valuable programming skills and support your work as a data scientist at the same time. Work hands-on with three practical data analysis projects based on casino games Store, retrieve, and change data values in your computer’s memory Write programs and simulations that outperform those written by typical R users Use R programming tools such as if else statements, for loops, and S3 classes Learn how to write lightning-fast vectorized R code Take advantage of R’s package system and debugging tools Practice and apply R programming concepts as you learn them

Discovering Knowledge in Data: An Introduction to Data Mining, 2nd Edition

2014-07-08 O'Reilly Amazon

book

Daniel T. Larose

data data-science data-science-tasks exploratory-data-analysis Analytics BI

The field of data mining lies at the confluence of predictive analytics, statistical analysis, and business intelligence. Due to the ever-increasing complexity and size of data sets and the wide range of applications in computer science, business, and health care, the process of discovering knowledge in data is more relevant than ever before. This book provides the tools needed to thrive in today's big data world. The author demonstrates how to leverage a company's existing databases to increase profits and market share, and carefully explains the most current data science methods and techniques. The reader will "learn data mining by doing data mining". By adding chapters on data modelling preparation, imputation of missing data, and multivariate statistical analysis, Discovering Knowledge in Data, Second Edition remains the eminent reference on data mining. The second edition of a highly praised, successful reference on data mining, with thorough coverage of big data applications, predictive analytics, and statistical analysis. Includes new chapters on Multivariate Statistics, Preparing to Model the Data, and Imputation of Missing Data, and an Appendix on Data Summarization and Visualization Offers extensive coverage of the R statistical programming language Contains 280 end-of-chapter exercises Includes a companion website with further resources for all readers, and Powerpoint slides, a solutions manual, and suggested projects for instructors who adopt the book

Analytics in a Big Data World: The Essential Guide to Data Science and its Applications

2014-05-19 O'Reilly Amazon

book

Bart Baesens

data data-science Analytics Big Data Data Analytics Data Science

The guide to targeting and leveraging business opportunities using big data & analytics By leveraging big data & analytics, businesses create the potential to better understand, manage, and strategically exploiting the complex dynamics of customer behavior. Analytics in a Big Data World reveals how to tap into the powerful tool of data analytics to create a strategic advantage and identify new business opportunities. Designed to be an accessible resource, this essential book does not include exhaustive coverage of all analytical techniques, instead focusing on analytics techniques that really provide added value in business environments. The book draws on author Bart Baesens' expertise on the topics of big data, analytics and its applications in e.g. credit risk, marketing, and fraud to provide a clear roadmap for organizations that want to use data analytics to their advantage, but need a good starting point. Baesens has conducted extensive research on big data, analytics, customer relationship management, web analytics, fraud detection, and credit risk management, and uses this experience to bring clarity to a complex topic. Includes numerous case studies on risk management, fraud detection, customer relationship management, and web analytics Offers the results of research and the author's personal experience in banking, retail, and government Contains an overview of the visionary ideas and current developments on the strategic use of analytics for business Covers the topic of data analytics in easy-to-understand terms without an undo emphasis on mathematics and the minutiae of statistical analysis For organizations looking to enhance their capabilities via data analytics, this resource is the go-to reference for leveraging data to enhance business capabilities.

Developing Analytic Talent: Becoming a Data Scientist

2014-04-07 O'Reilly Amazon

book

Vincent Granville

data data-science data-science-as-a-profession Analytics Big Data Data Science

Learn what it takes to succeed in the the most in-demand tech job Harvard Business Review calls it the sexiest tech job of the 21st century. Data scientists are in demand, and this unique book shows you exactly what employers want and the skill set that separates the quality data scientist from other talented IT professionals. Data science involves extracting, creating, and processing data to turn it into business value. With over 15 years of big data, predictive modeling, and business analytics experience, author Vincent Granville is no stranger to data science. In this one-of-a-kind guide, he provides insight into the essential data science skills, such as statistics and visualization techniques, and covers everything from analytical recipes and data science tricks to common job interview questions, sample resumes, and source code. The applications are endless and varied: automatically detecting spam and plagiarism, optimizing bid prices in keyword advertising, identifying new molecules to fight cancer, assessing the risk of meteorite impact. Complete with case studies, this book is a must, whether you're looking to become a data scientist or to hire one. Explains the finer points of data science, the required skills, and how to acquire them, including analytical recipes, standard rules, source code, and a dictionary of terms Shows what companies are looking for and how the growing importance of big data has increased the demand for data scientists Features job interview questions, sample resumes, salary surveys, and examples of job ads Case studies explore how data science is used on Wall Street, in botnet detection, for online advertising, and in many other business-critical situations Developing Analytic Talent: Becoming a Data Scientist is essential reading for those aspiring to this hot career choice and for employers seeking the best candidates.

Practical Data Science with R

2014-03-25 O'Reilly Amazon

book

John Mount , Nina Zumel

data data-science BI Computer Science Data Science Marketing

NEWER EDITION AVAILABLE IN MEAP Practical Data Science with R, Second Edition is now available in the Manning Early Access Program. An eBook of this older edition is included at no additional cost when you buy the revised edition! You may still purchase Practical Data Science with R (First Edition) using the Buy options on this page. Practical Data Science with R lives up to its name. It explains basic principles without the theoretical mumbo-jumbo and jumps right to the real use cases you'll face as you collect, curate, and analyze the data crucial to the success of your business. You'll apply the R programming language and statistical analysis techniques to carefully explained examples based in marketing, business intelligence, and decision support. About the Technology Business analysts and developers are increasingly collecting, curating, analyzing, and reporting on crucial business data. The R language and its associated tools provide a straightforward way to tackle day-to-day data science tasks without a lot of academic theory or advanced mathematics. About the Book Practical Data Science with R shows you how to apply the R programming language and useful statistical techniques to everyday business situations. Using examples from marketing, business intelligence, and decision support, it shows you how to design experiments (such as A/B tests), build predictive models, and present results to audiences of all levels. What's Inside Data science for the business professional Statistical analysis using the R language Project lifecycle, from planning to delivery Numerous instantly familiar use cases Keys to effective data presentations About the Reader This book is accessible to readers without a background in data science. Some familiarity with basic statistics, R, or another scripting language is assumed. About the Authors Nina Zumel and John Mount are cofounders of a San Francisco-based data science consulting firm. Both hold PhDs from Carnegie Mellon and blog on statistics, probability, and computer science at win-vector.com. Quotes A unique and important addition to any data scientist’s library. - From the Foreword by Jim Porzak, Cofounder Bay Area R Users Group Covers the process end-to-end, from data exploration to modeling to delivering the results. - Nezih Yigitbasi, Intel Full of useful gems for both aspiring and experienced data scientists. - Fred Rahmanian, Siemens Healthcare Hands-on data analysis with real-world examples. Highly recommended. - Dr. Kostas Passadis, IPTO

2013 Data Science Salary Survey

2014-02-17 O'Reilly Amazon

book

John King , Roger Magoulas

data data-science data-science-as-a-profession Big Data Data Science Hadoop

What tools do successful data scientists and analysts use, and how much money do they make? We surveyed hundreds of attendees at the O'Reilly Strata Conferences in Santa Clara, California and New York to understand. Findings from the survey include: Average number of tools and median income for all respondents Distribution of responses by age, location, industry, and position Detailed analysis of tools used by respondents and correlation to their salaries - including by tool clusters (Hadoop, SQL/Excel, and other) Correlation of specialized big data tools usage and salary What tools should you be learning and using? Read this valuable report to gain insight from these potentially career-changing findings.

Data Smart: Using Data Science to Transform Information into Insight

2013-11-04 O'Reilly Amazon

book

John W. Foreman

data data-science AI/ML Big Data Data Science Monte Carlo

Data Science gets thrown around in the press like it's magic. Major retailers are predicting everything from when their customers are pregnant to when they want a new pair of Chuck Taylors. It's a brave new world where seemingly meaningless data can be transformed into valuable insight to drive smart business decisions. But how does one exactly do data science? Do you have to hire one of these priests of the dark arts, the "data scientist," to extract this gold from your data? Nope. Data science is little more than using straight-forward steps to process raw data into actionable insight. And in Data Smart, author and data scientist John Foreman will show you how that's done within the familiar environment of a spreadsheet. Why a spreadsheet? It's comfortable! You get to look at the data every step of the way, building confidence as you learn the tricks of the trade. Plus, spreadsheets are a vendor-neutral place to learn data science without the hype. But don't let the Excel sheets fool you. This is a book for those serious about learning the analytic techniques, the math and the magic, behind big data. Each chapter will cover a different technique in a spreadsheet so you can follow along: Mathematical optimization, including non-linear programming and genetic algorithms Clustering via k-means, spherical k-means, and graph modularity Data mining in graphs, such as outlier detection Supervised AI through logistic regression, ensemble models, and bag-of-words models Forecasting, seasonal adjustments, and prediction intervals through monte carlo simulation Moving from spreadsheets into the R programming language You get your hands dirty as you work alongside John through each technique. But never fear, the topics are readily applicable and the author laces humor throughout. You'll even learn what a dead squirrel has to do with optimization modeling, which you no doubt are dying to know.

Doing Data Science

2013-10-24 O'Reilly Amazon

book

Rachel Schutt , Cathy O'Neil

data data-science Data Engineering Data Science DataViz Hadoop

Now that people are aware that data can make the difference in an election or a business model, data science as an occupation is gaining ground. But how can you get started working in a wide-ranging, interdisciplinary field that’s so clouded in hype? This insightful book, based on Columbia University’s Introduction to Data Science class, tells you what you need to know. In many of these chapter-long lectures, data scientists from companies such as Google, Microsoft, and eBay share new algorithms, methods, and models by presenting case studies and the code they use. If you’re familiar with linear algebra, probability, and statistics, and have programming experience, this book is an ideal introduction to data science. Topics include: Statistical inference, exploratory data analysis, and the data science process Algorithms Spam filters, Naive Bayes, and data wrangling Logistic regression Financial modeling Recommendation engines and causality Data visualization Social networks and data journalism Data engineering, MapReduce, Pregel, and Hadoop Doing Data Science is collaboration between course instructor Rachel Schutt, Senior VP of Data Science at News Corp, and data science consultant Cathy O’Neil, a senior data scientist at Johnson Research Labs, who attended and blogged about the course.

Getting Started with Greenplum for Big Data Analytics

2013-10-23 O'Reilly Amazon

book

Sunila Gollapudi

data data-science Analytics Big Data Data Analytics Data Science

This book serves as a thorough introduction to using the Greenplum platform for big data analytics. It explores key concepts for processing, analyzing, and deriving insights from big data using Greenplum, covering aspects from data integration to advanced analytics techniques like programming with R and MADlib. What this Book will help me do Understand the architecture and core components of the Greenplum platform. Learn how to design and execute data science projects using Greenplum. Master loading, processing, and querying big data in Greenplum efficiently. Explore programming with R and integrating it with Greenplum for analytics. Gain skills in high-availability configurations, backups, and recovery within Greenplum. Author(s) Sunila Gollapudi is a seasoned expert in the field of big data analytics and has multiple years of experience working with platforms like Greenplum. Her real-world problem-solving expertise shapes her practical and approachable writing style, making this book not only educational but enjoyable to read. Who is it for? This book is ideal for data scientists or analysts aiming to explore the capabilities of big data platforms like Greenplum. It suits readers with basic knowledge of data warehousing, programming, and analytics tools who want to deepen their expertise and effectively harness Greenplum for analytics.

Agile Data Science

2013-10-18 O'Reilly Amazon

book

Russell Jurney

data data-science Agile/Scrum Analytics Big Data Data Science

Mining big data requires a deep investment in people and time. How can you be sure you’re building the right models? With this hands-on book, you’ll learn a flexible toolset and methodology for building effective analytics applications with Hadoop. Using lightweight tools such as Python, Apache Pig, and the D3.js library, your team will create an agile environment for exploring data, starting with an example application to mine your own email inboxes. You’ll learn an iterative approach that enables you to quickly change the kind of analysis you’re doing, depending on what the data is telling you. All example code in this book is available as working Heroku apps. Create analytics applications by using the agile big data development methodology Build value from your data in a series of agile sprints, using the data-value stack Gain insight by using several data structures to extract multiple features from a single dataset Visualize data with charts, and expose different aspects through interactive reports Use historical data to predict the future, and translate predictions into action Get feedback from users after each sprint to keep your project on track

On Being a Data Skeptic

2013-09-30 O'Reilly Amazon

book

Cathy O'Neil

data data-science data-science-as-a-profession Big Data Data Science

"Data is here, it's growing, and it's powerful." Author Cathy O'Neil argues that the right approach to data is skeptical, not cynical––it understands that, while powerful, data science tools often fail. Data is nuanced, and "a really excellent skeptic puts the term 'science' into 'data science.'" The big data revolution shouldn't be dismissed as hype, but current data science tools and models shouldn't be hailed as the end-all-be-all, either.

Analyzing the Analyzers

2013-06-15 O'Reilly Amazon

book

Marck Vaisman , Sean Murphy , Harlan Harris

data data-science data-science-as-a-profession Analytics Big Data Data Science

Despite the excitement around "data science," "big data," and "analytics," the ambiguity of these terms has led to poor communication between data scientists and organizations seeking their help. In this report, authors Harlan Harris, Sean Murphy, and Marck Vaisman examine their survey of several hundred data science practitioners in mid-2012, when they asked respondents how they viewed their skills, careers, and experiences with prospective employers. The results are striking. Based on the survey data, the authors found that data scientists today can be clustered into four subgroups, each with a different mix of skillsets. Their purpose is to identify a new, more precise vocabulary for data science roles, teams, and career paths. This report describes: Four data scientist clusters: Data Businesspeople, Data Creatives, Data Developers, and Data Researchers Cases in miscommunication between data scientists and organizations looking to hire Why "T-shaped" data scientists have an advantage in breadth and depth of skills How organizations can apply the survey results to identify, train, integrate, team up, and promote data scientists

How Data Science Is Transforming Health Care

2012-08-24 O'Reilly Amazon

book

Tim O'Reilly , Colin Hill , Julia Steele , Mike Loukides

data data-science healthcare-analytics Data Science

In the early days of the 20th century, department store magnate JohnWanamaker famously said, "I know that half of my advertising doesn'twork. The problem is that I don't know which half." That remainedbasically true until Google transformed advertising with AdSense basedon new uses of data and analysis. The same might be said about healthcare and it's poised to go through a similar transformation as newtools, techniques, and data sources come on line. Soon we'll makepolicy and resource decisions based on much better understanding ofwhat leads to the best outcomes, and we'll make medical decisionsbased on a patient's specific biology. The result will be betterhealth at less cost. This paper explores how data analysis will help us structure thebusiness of health care more effectively around outcomes, and how itwill transform the practice of medicine by personalizing for eachspecific patient.

Statistical Learning and Data Science

2011-12-19 O'Reilly Amazon

book

Myriam Touati , Leon Bottou , Fionn Murtagh , Bernard Goldfarb , Catherine Pardoux , Mireille Gettler Summa

data data-science data-science-tasks statistics AI/ML Data Science

Driven by a vast range of applications, data analysis and learning from data are vibrant areas of research. Various methodologies, including unsupervised data analysis, supervised machine learning, and semi-supervised techniques, have continued to develop to cope with the increasing amount of data collected through modern technology. With a focus on applications, this volume presents contributions from some of the leading researchers in the different fields of data analysis. Synthesizing the methodologies into a coherent framework, the book covers a range of topics, from large-scale machine learning to synthesis objects analysis.

Building Data Science Teams

2011-09-15 O'Reilly Amazon

book

DJ Patil

data data-science Data Science

As data science evolves to become a business necessity, the importance of assembling a strong and innovative data teams grows. In this in-depth report, data scientist DJ Patil explains the skills,perspectives, tools and processes that position data science teams for success.

What Is Data Science?

2011-04-10 O'Reilly Amazon

book

Mike Loukides

data data-science Data Science

We've all heard it: according to Hal Varian, statistics is the next sexy job. Five years ago, in What is Web 2.0, Tim O'Reilly said that "data is the next Intel Inside." But what does that statement mean? Why do we suddenly care about statistics and about data? This report examines the many sides of data science -- the technologies, the companies and the unique skill sets.The web is full of "data-driven apps." Almost any e-commerce application is a data-driven application. There's a database behind a web front end, and middleware that talks to a number of other databases and data services (credit card processing companies, banks, and so on). But merely using data isn't really what we mean by "data science." A data application acquires its value from the data itself, and creates more data as a result. It's not just an application with data; it's a data product. Data science enables the creation of data products.

talk-data.com

O'Reilly Data Science Books

Top Topics

Top Speakers

Learning R for Geospatial Analysis

R Recipes: A Problem-Solution Approach

Web and Network Data Science: Modeling Techniques in Predictive Analytics

Data Scientists at Work

Data Science at the Command Line

Modeling Techniques in Predictive Analytics: Business Problems and Solutions with R, Revised and Expanded Edition

Modeling Techniques in Predictive Analytics with Python and R: A Guide to Data Science

Guerrilla Analytics

Hands-On Programming with R

Discovering Knowledge in Data: An Introduction to Data Mining, 2nd Edition

Analytics in a Big Data World: The Essential Guide to Data Science and its Applications

Developing Analytic Talent: Becoming a Data Scientist

Practical Data Science with R

2013 Data Science Salary Survey

Data Smart: Using Data Science to Transform Information into Insight

Doing Data Science

Getting Started with Greenplum for Big Data Analytics

Agile Data Science

On Being a Data Skeptic

Analyzing the Analyzers

How Data Science Is Transforming Health Care

Statistical Learning and Data Science

Building Data Science Teams

What Is Data Science?