Python

Data Science from Scratch

2015-04-20 · O'Reilly Data Science Books O'Reilly Amazon

book

by Joel Grus

AI/ML Data Science NLP data data-science

Data science libraries, frameworks, modules, and toolkits are great for doing data science, but they’re also a good way to dive into the discipline without actually understanding data science. In this book, you’ll learn how many of the most fundamental data science tools and algorithms work by implementing them from scratch. If you have an aptitude for mathematics and some programming skills, author Joel Grus will help you get comfortable with the math and statistics at the core of data science, and with hacking skills you need to get started as a data scientist. Today’s messy glut of data holds answers to questions no one’s even thought to ask. This book provides you with the know-how to dig those answers out. Get a crash course in Python Learn the basics of linear algebra, statistics, and probability—and understand how and when they're used in data science Collect, explore, clean, munge, and manipulate data Dive into the fundamentals of machine learning Implement models such as k-nearest Neighbors, Naive Bayes, linear and logistic regression, decision trees, neural networks, and clustering Explore recommender systems, natural language processing, network analysis, MapReduce, and databases

Learning Pandas

2015-04-16 · O'Reilly Data Science Books O'Reilly Amazon

book

by Michael Heydt

Data Science Pandas data data-science data-science-tools

"Learning Pandas" is your comprehensive guide to mastering pandas, the powerful Python library for data manipulation and analysis. In this book, you'll explore pandas' capabilities and learn to apply them to real-world data challenges. With clear explanations and hands-on examples, you'll enhance your ability to analyze, clean, and visualize data effectively. What this Book will help me do Understand the core concepts of pandas and how it integrates with Python. Learn to efficiently manipulate and transform datasets using pandas. Gain skills in analyzing and cleaning data to prepare for insights. Explore techniques for working with time-series data and financial datasets. Discover how to create compelling visualizations with pandas to communicate findings. Author(s) Michael Heydt is an experienced Python developer and data scientist with expertise in teaching technical concepts to others. With a deep understanding of the pandas library, Michael has authored several guides on data analysis and is passionate about making complex information accessible. His practical approach ensures readers can directly apply lessons to their own projects. Who is it for? This book is ideal for Python programmers who want to harness the power of pandas for data analysis. Whether you're a beginner in data science or looking to refine your skills, you'll find clear, actionable guidance here. Basic programming knowledge is assumed, but no prior pandas experience is necessary. If you're eager to turn data into impactful insights, this book is for you.

Computer Science Illuminated, 6th Edition

2015-01-27 · O'Reilly Data Science Books O'Reilly Amazon

book

by Nell Dale , John Lewis

Analytics C#/.NET Computer Science Google Analytics HTML Informatica Java JavaScript SQL analytics-platforms data data-science

Each new print copy includes Navigate 2 Advantage Access that unlocks a comprehensive and interactive eBook, student practice activities and assessments, a full suite of instructor resources, and learning analytics reporting tools.

Fully revised and updated, the Sixth Edition of the best-selling text Computer Science Illuminated retains the accessibility and in-depth coverage of previous editions, while incorporating all-new material on cutting-edge issues in computer science. Authored by the award-winning Nell Dale and John Lewis, Computer Science Illuminated’s unique and innovative layered approach moves through the levels of computing from an organized, language-neutral perspective.

Designed for the introductory computing and computer science course, this student-friendly Sixth Edition provides students with a solid foundation for further study, and offers non-majors a complete introduction to computing.

Key Features of the Sixth Edition include:

Access to Navigate 2 online learning materials including a comprehensive and interactive eBook, student practice activities and assessments, learning analytics reporting tools, and more
Completely revised sections on HTML and CSS
Updates regarding Top Level Domains, Social Networks, and Google Analytics
All-new section on Internet management, including ICANN control and net neutrality 
New design, including fully revised figures and tables
New and updated Did You Know callouts are included in the chapter margins
New and revised Ethical Issues and Biographies throughout emphasize the history and breadth of computing
Available in our customizable PUBLISH platform

A collection of programming language chapters are available as low-cost bundling options. Available chapters include: Java, C++, Python, Alice, SQL, VB.NET, RUBY, Perl, Pascal, and JavaScript.

With Navigate 2, technology and content combine to expand the reach of your classroom. Whether you teach an online, hybrid, or traditional classroom-based course, Navigate 2 delivers unbeatable value. Experience Navigate 2 today at www.jblnavigate.com/2

Think Stats, 2nd Edition

2014-10-16 · O'Reilly Data Science Books O'Reilly Amazon

book

by Allen B. Downey

data data-science data-science-tasks statistics

If you know how to program, you have the skills to turn data into knowledge, using tools of probability and statistics. This concise introduction shows you how to perform statistical analysis computationally, rather than mathematically, with programs written in Python. By working with a single case study throughout this thoroughly revised book, you’ll learn the entire process of exploratory data analysis—from collecting data and generating statistics to identifying patterns and testing hypotheses. You’ll explore distributions, rules of probability, visualization, and many other tools and concepts. New chapters on regression, time series analysis, survival analysis, and analytic methods will enrich your discoveries. Develop an understanding of probability and statistics by writing and testing code Run experiments to test statistical behavior, such as generating samples from several distributions Use simulations to understand concepts that are hard to grasp mathematically Import data from most sources with Python, rather than rely on data that’s cleaned and formatted for statistics tools Use statistical inference to answer questions about real-world data

Data Science at the Command Line

2014-10-02 · O'Reilly Data Science Books O'Reilly Amazon

book

by Jeroen Janssens

Agile/Scrum API CSV Data Science HTML JSON Linux XML data data-science

This hands-on guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You’ll learn how to combine small, yet powerful, command-line tools to quickly obtain, scrub, explore, and model your data. To get you started—whether you’re on Windows, OS X, or Linux—author Jeroen Janssens introduces the Data Science Toolbox, an easy-to-install virtual environment packed with over 80 command-line tools. Discover why the command line is an agile, scalable, and extensible technology. Even if you’re already comfortable processing data with, say, Python or R, you’ll greatly improve your data science workflow by also leveraging the power of the command line. Obtain data from websites, APIs, databases, and spreadsheets Perform scrub operations on plain text, CSV, HTML/XML, and JSON Explore data, compute descriptive statistics, and create visualizations Manage your data science workflow using Drake Create reusable tools from one-liners and existing Python or R code Parallelize and distribute data-intensive pipelines using GNU Parallel Model data with dimensionality reduction, clustering, regression, and classification algorithms

Modeling Techniques in Predictive Analytics with Python and R: A Guide to Data Science

2014-10-01 · O'Reilly Data Science Books O'Reilly Amazon

book

by Thomas W. Miller

Analytics Big Data Data Science DataViz data data-science

Master predictive analytics, from start to finish Start with strategy and management Master methods and build models Transform your models into highly-effective code—in both Python and R This one-of-a-kind book will help you use predictive analytics, Python, and R to solve real business problems and drive real competitive advantage. You’ll master predictive analytics through realistic case studies, intuitive data visualizations, and up-to-date code for both Python and R—not complex math. Step by step, you’ll walk through defining problems, identifying data, crafting and optimizing models, writing effective Python and R code, interpreting results, and more. Each chapter focuses on one of today’s key applications for predictive analytics, delivering skills and knowledge to put models to work—and maximize their value. Thomas W. Miller, leader of Northwestern University’s pioneering program in predictive analytics, addresses everything you need to succeed: strategy and management, methods and models, and technology and code. If you’re new to predictive analytics, you’ll gain a strong foundation for achieving accurate, actionable results. If you’re already working in the field, you’ll master powerful new skills. If you’re familiar with either Python or R, you’ll discover how these languages complement each other, enabling you to do even more. All data sets, extensive Python and R code, and additional examples available for download at http://www.ftpress.com/miller/ Python and R offer immense power in predictive analytics, data science, and big data. This book will help you leverage that power to solve real business problems, and drive real competitive advantage. Thomas W. Miller’s unique balanced approach combines business context and quantitative tools, illuminating each technique with carefully explained code for the latest versions of Python and R. If you’re new to predictive analytics, Miller gives you a strong foundation for achieving accurate, actionable results. If you’re already a modeler, programmer, or manager, you’ll learn crucial skills you don’t already have. Using Python and R, Miller addresses multiple business challenges, including segmentation, brand positioning, product choice modeling, pricing research, finance, sports, text analytics, sentiment analysis, and social network analysis. He illuminates the use of cross-sectional data, time series, spatial, and spatio-temporal data. You’ll learn why each problem matters, what data are relevant, and how to explore the data you’ve identified. Miller guides you through conceptually modeling each data set with words and figures; and then modeling it again with realistic code that delivers actionable insights. You’ll walk through model construction, explanatory variable subset selection, and validation, mastering best practices for improving out-of-sample predictive performance. Miller employs data visualization and statistical graphics to help you explore data, present models, and evaluate performance. Appendices include five complete case studies, and a detailed primer on modern data science methods. Use Python and R to gain powerful, actionable, profitable insights about: Advertising and promotion Consumer preference and choice Market baskets and related purchases Economic forecasting Operations management Unstructured text and language Customer sentiment Brand and price Sports team performance And much more

Computational and Visualization Techniques for Structural Bioinformatics Using Chimera

2014-07-29 · O'Reilly Data Science Books O'Reilly Amazon

book

by Forbes J. Burkowski

bioinformatics data data-science data-science-domains

A Step-by-Step Guide to Describing Biomolecular Structure Computational and Visualization Techniques for Structural Bioinformatics Using Chimera shows how to perform computations with Python scripts in the Chimera environment. It focuses on the three core areas needed to study structural bioinformatics: biochemistry, mathematics, and computation. Understand Important Concepts of Structural Bioinformatics The book covers topics that deal primarily with protein structure and includes many exercises that are grounded in biological problems at the molecular level. The text encourages mathematical analysis by providing a firm foundation for computations. It analyzes numerous Python scripts for the Chimera environment, with the scripts and other material available on a supplementary website. Build Python Scripts to Extend the Capabilities of Chimera Through more than 60 exercises that involve the development of Python scripts, the book gives you concrete guidance on using the scripting capabilities of Chimera. You’ll gain experience in solving real problems as well as understand the various applications of linear algebra. You can also use the scripts as starting points for the development of similar applications and use classes from the StructBio toolkit for computations, such as structure overlap, data plotting, scenographics, and display of residue networks.

Learning NumPy Array

2014-06-13 · O'Reilly Data Science Books O'Reilly Amazon

book

by Ivan Idris

NumPy data data-science data-science-tools

This book, 'Learning NumPy Array,' is the ultimate guide to mastering the fundamental library for numerical computing in Python: NumPy. Through concise explanations and practical examples, you will learn how to create and manipulate arrays, perform complex computations, and leverage NumPy's capabilities to streamline data analysis workflows. What this Book will help me do Install and set up NumPy in your Python environment for numerical computing. Create and manipulate multidimensional arrays to handle and process large data sets. Perform complex mathematical and statistical computations with NumPy's built-in methods. Explore time series analysis and signal processing techniques using NumPy. Optimize and improve the performance of Python code leveraging NumPy's efficient operations. Author(s) Ivan Idris is a seasoned programmer and data scientist with a great passion for Python and numerical computing. With years of experience working on data analysis projects, he has solidified his expertise in Python's scientific libraries, including NumPy. Ivan creates practical, reader-friendly guides that not only teach the technical how-to's but also inspire confidence in solving real-world problems. Who is it for? This book is ideal for Python programmers taking their first steps into the world of numerical computing or data analysis. Beginners looking to understand the basics of handling large numerical datasets in Python will find this resource highly enlightening. Developers and scientists wanting to streamline their calculations using efficient techniques will gain valuable insights. If working with Python in a data-driven environment interests you, this book is for you.

matplotlib Plotting Cookbook

2014-03-26 · O'Reilly Data Science Books O'Reilly Amazon

book

by Alexandre Devert

DataViz Matplotlib data data-science data-science-tasks data-visualization python-viz-tools

The "matplotlib Plotting Cookbook" equips you with the skills to create impactful scientific visualizations using Python's matplotlib library. Through a series of concise recipes, this book covers everything from basic plotting to advanced techniques, ensuring you can create impressive graphics for your data. What this Book will help me do Learn to produce standard 2D plots like line, bar, and scatter plots. Master advanced plotting techniques such as 3D plotting and data overlays. Enhance plots with detailed annotations, rich legends, and labeling. Understand the use of colors, styles, and scales to maximize readability. Use matplotlib to generate plots programmatically or integrate with applications. Author(s) Alexandre Devert, the author of the "matplotlib Plotting Cookbook," is an experienced data scientist with a strong foundation in Python and data visualization techniques. Alexandre has worked extensively in the field of data analysis, and his expertise is reflected in the practical examples and hands-on guidance provided throughout this book. He takes a learner-focused approach to presenting technical topics in an accessible way. Who is it for? This book is designed for Python developers, data scientists, and researchers who need to create clear, professional-quality visualizations. If you are at a beginner or intermediate level in using matplotlib or visualization libraries, this book will empower you with essential plotting skills. Readers looking to save time while producing meaningful insights through data visualizations will find this book valuable. It is suitable for those aiming to improve their data representation skills for presentations or publications.

Getting Started with Beautiful Soup

2014-01-24 · O'Reilly Data Science Books O'Reilly Amazon

book

by Vineeth G Nair

HTML XML data data-science data-science-tasks web-scraping

"Getting Started with Beautiful Soup" is your practical guide to website scraping using Python. It teaches you how to use Beautiful Soup and the urllib2 module to extract data from websites efficiently and effectively. Through hands-on examples and clear explanations, you'll gain the skills to navigate, search, and modify HTML content. What this Book will help me do Navigate and scrape web pages using the Beautiful Soup Python library. Understand and implement the urllib2 module to access web content programmatically. Search and analyze HTML structures efficiently to extract the needed data. Modify and format extracted HTML and XML content effectively. Handle encoding and manage output formats for diverse scraping requirements. Author(s) Vineeth G. Nair is an experienced Python developer with a strong focus on web technologies, data extraction, and automation. His expertise in Python's Beautiful Soup library has helped countless learners and professionals tackle the challenges of web scraping. Vineeth combines a methodical approach to teaching with practical examples, making complex concepts accessible and actionable. Who is it for? This book is ideal for Python enthusiasts, data analysts, and budding developers looking to explore web scraping. Whether you're a beginner or have some programming experience, this book will guide you through the fundamental concepts of extracting web data. If you're aiming to delve into practical, real-world implementations of web scraping, this is the book for you.

Agile Data Science

2013-10-18 · O'Reilly Data Science Books O'Reilly Amazon

book

by Russell Jurney

Agile/Scrum Analytics Big Data Data Science Hadoop JavaScript data data-science

Mining big data requires a deep investment in people and time. How can you be sure you’re building the right models? With this hands-on book, you’ll learn a flexible toolset and methodology for building effective analytics applications with Hadoop. Using lightweight tools such as Python, Apache Pig, and the D3.js library, your team will create an agile environment for exploring data, starting with an example application to mine your own email inboxes. You’ll learn an iterative approach that enables you to quickly change the kind of analysis you’re doing, depending on what the data is telling you. All example code in this book is available as working Heroku apps. Create analytics applications by using the agile big data development methodology Build value from your data in a series of agile sprints, using the data-value stack Gain insight by using several data structures to extract multiple features from a single dataset Visualize data with charts, and expose different aspects through interactive reports Use historical data to predict the future, and translate predictions into action Get feedback from users after each sprint to keep your project on track

Think Bayes

2013-09-24 · O'Reilly Data Science Books O'Reilly Amazon

book

by Allen B. Downey

bayesian-statistics data data-science data-science-tasks statistics

If you know how to program with Python, and know a little about probability, you’re ready to tackle Bayesian statistics. This book shows you how to use Python code instead of math to help you learn Bayesian fundamentals. Once you get the math out of the way, you’ll be able to apply these techniques to real-world problems.

SciPy and NumPy

2012-11-15 · O'Reilly Data Science Books O'Reilly Amazon

book

by Eli Bressert

NumPy SciPy data data-science data-science-tools

Are you new to SciPy and NumPy? Do you want to learn it quickly and easily through examples and a concise introduction? Then this is the book for you. You’ll cut through the complexity of online documentation and discover how easily you can get up to speed with these Python libraries. Ideal for data analysts and scientists in any field, this overview shows you how to use NumPy for numerical processing, including array indexing, math operations, and loading and saving data. You’ll learn how SciPy helps you work with advanced mathematical functions such as optimization, interpolation, integration, clustering, statistics, and other tools that take scientific programming to a whole new level. The new edition is now available, fully revised and updated in June 2013. Learn the capabilities of NumPy arrays, element-by-element operations, and core mathematical operations Solve minimization problems quickly with SciPy’s optimization package Use SciPy functions for interpolation, from simple univariate to complex multivariate cases Apply a variety of SciPy statistical tools such as distributions and functions Learn SciPy’s spatial and cluster analysis classes Save operation time and memory usage with sparse matrices

Python for Data Analysis

2012-10-22 · O'Reilly Data Science Books O'Reilly Amazon

book

by Wes McKinney (Posit)

Analytics Matplotlib NumPy Pandas data data-science

Python for Data Analysis is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python. It is also a practical, modern introduction to scientific computing in Python, tailored for data-intensive applications. This is a book about the parts of the Python language and libraries you’ll need to effectively solve a broad set of data analysis problems. This book is not an exposition on analytical methods using Python as the implementation language. Written by Wes McKinney, the main author of the pandas library, this hands-on book is packed with practical cases studies. It’s ideal for analysts new to Python and for Python programmers new to scientific computing. Use the IPython interactive shell as your primary development environment Learn basic and advanced NumPy (Numerical Python) features Get started with data analysis tools in the pandas library Use high-performance tools to load, clean, transform, merge, and reshape data Create scatter plots and static or interactive visualizations with matplotlib Apply the pandas groupby facility to slice, dice, and summarize datasets Measure data by points in time, whether it’s specific instances, fixed periods, or intervals Learn how to solve problems in web analytics, social sciences, finance, and economics, through detailed examples

The Art of R Programming

2011-10-05 · O'Reilly Data Science Books O'Reilly Amazon

book

by Norman Matloff

data data-science data-science-tools r

R is the world's most popular language for developing statistical software: Archaeologists use it to track the spread of ancient civilizations, drug companies use it to discover which medications are safe and effective, and actuaries use it to assess financial risks and keep economies running smoothly. The Art of R Programming takes you on a guided tour of software development with R, from basic types and data structures to advanced topics like closures, recursion, and anonymous functions. No statistical knowledge is required, and your programming skills can range from hobbyist to pro. Along the way, you'll learn about functional and object-oriented programming, running mathematical simulations, and rearranging complex data into simpler, more useful formats. You'll also learn to: •Create artful graphs to visualize complex data sets and functions •Write more efficient code using parallel R and vectorization •Interface R with C/C++ and Python for increased speed or functionality •Find new R packages for text analysis, image manipulation, and more •Squash annoying bugs with advanced debugging techniques Whether you're designing aircraft, forecasting the weather, or you just need to tame your data, The Art of R Programming is your guide to harnessing the power of statistical computing.

Think Stats

2011-07-15 · O'Reilly Data Science Books O'Reilly Amazon

book

by Allen B. Downey

data data-science data-science-tasks statistics

Think Stats: Probability and Statistics for Programmers is a textbook for a new kind of introductory prob-stat class. It emphasizes the use of statistics to explore large datasets. It takes a computation approach: students write programs in Python as a way of developing and testing their understanding.

Mining the Social Web

2011-02-01 · O'Reilly Data Science Books O'Reilly Amazon

book

by Matthew Russell (Digital Reasoning)

API JavaScript Redis data data-science data-science-tasks web-scraping

Popular social networks such as Facebook and Twitter generate a tremendous amount of valuable data on topics and use patterns. Who's talking to whom? What are they talking about? How often are they talking? This concise and practical book shows you how to answer these questions and more by harvesting and analyzing data using social web APIs, Python, and pragmatic storage technologies such as Redis, CouchDB, and NetworkX. With Mining the Social Web, intermediate to advanced programmers will learn how to harvest and analyze social data in way that lends itself to hacking as well as more industrial-strength analysis. Algorithms are designed with robustness and efficiency in mind so that the approaches scale well on an ordinary piece of commodity hardware. The book is highly readable from cover to cover as content progressively grows in complexity, but also lends itself to being read in an ad-hoc fashion. Use easily adaptable scripts to access popular social network APIs including Twitter, OpenSocial, and Facebook Learn approaches for slicing and dicing social data that's been harvested from social web APIs as well as other common formats such as email and markup formats Harvest data from other sources such as Freebase and other sites to enrich your analytic capabilities with additional context Visualize and analyze data in interactive ways with tools built upon rich UI JavaScript toolkits Get a concise and straightforward synopsis of some practical technologies from the semantic web landscape that you can incorporate into your analysis This book is still in progress, but you can get going on this technology through our Rough Cuts edition, which lets you read the manuscript as it's being written, either online or via PDF.

21 Recipes for Mining Twitter

2011-01-31 · O'Reilly Data Science Books O'Reilly Amazon

book

by Matthew Russell (Digital Reasoning)

API Data Streaming data data-science data-science-tasks web-scraping

Millions of public Twitter streams harbor a wealth of data, and once you mine them, you can gain some valuable insights. This short and concise book offers a collection of recipes to help you extract nuggets of Twitter information using easy-to-learn Python tools. Each recipe offers a discussion of how and why the solution works, so you can quickly adapt it to fit your particular needs. The recipes include techniques to: Use OAuth to access Twitter data Create and analyze graphs of retweet relationships Use the streaming API to harvest tweets in realtime Harvest and analyze friends and followers Discover friendship cliques Summarize webpages from short URLs This book is a perfect companion to O’Reilly's Mining the Social Web.

Bioinformatics Programming Using Python

2009-12-15 · O'Reilly Data Science Books O'Reilly Amazon

book

by Mitchell L Model

bioinformatics data data-science data-science-domains

Powerful, flexible, and easy to use, Python is an ideal language for building software tools and applications for life science research and development. This unique book shows you how to program with Python, using code examples taken directly from bioinformatics. In a short time, you'll be using sophisticated techniques and Python modules that are particularly effective for bioinformatics programming. Bioinformatics Programming Using Python is perfect for anyone involved with bioinformatics -- researchers, support staff, students, and software developers interested in writing bioinformatics applications. You'll find it useful whether you already use Python, write code in another language, or have no programming experience at all. It's an excellent self-instruction tool, as well as a handy reference when facing the challenges of real-life programming tasks. Become familiar with Python's fundamentals, including ways to develop simple applications Learn how to use Python modules for pattern matching, structured text processing, online data retrieval, and database access Discover generalized patterns that cover a large proportion of how Python code is used in bioinformatics Learn how to apply the principles and techniques of object-oriented programming Benefit from the "tips and traps" section in each chapter

Python for Bioinformatics

2008-06-16 · O'Reilly Data Science Books O'Reilly Amazon

book

by Jason Kinser

bioinformatics data data-science data-science-domains

Python for Bioinformatics provides a clear introduction to the Python programming language and instructs beginners on the development of simple programming exercises. Important Notice: The digital edition of this book is missing some of the images or content found in the physical edition.

talk-data.com

Activity Trend

Top Events

Top Speakers

Data Science from Scratch

Learning Pandas

Computer Science Illuminated, 6th Edition

Think Stats, 2nd Edition

Data Science at the Command Line

Modeling Techniques in Predictive Analytics with Python and R: A Guide to Data Science

Computational and Visualization Techniques for Structural Bioinformatics Using Chimera

Learning NumPy Array

matplotlib Plotting Cookbook

Getting Started with Beautiful Soup

Agile Data Science

Think Bayes

SciPy and NumPy

Python for Data Analysis

The Art of R Programming

Think Stats

Mining the Social Web

21 Recipes for Mining Twitter

Bioinformatics Programming Using Python

Python for Bioinformatics