talk-data.com talk-data.com

Event

O'Reilly Data Science Books

2013-08-09 – 2026-02-25 Oreilly Visit website ↗

Activities tracked

324

Collection of O'Reilly books on Data Science.

Filtering by: Data Science ×

Sessions & talks

Showing 251–275 of 324 · Newest first

Search within this event →
Think Like a Data Scientist

Think Like a Data Scientist presents a step-by-step approach to data science, combining analytic, programming, and business perspectives into easy-to-digest techniques and thought processes for solving real world data-centric problems. About the Technology Data collected from customers, scientific measurements, IoT sensors, and so on is valuable only if you understand it. Data scientists revel in the interesting and rewarding challenge of observing, exploring, analyzing, and interpreting this data. Getting started with data science means more than mastering analytic tools and techniques, however; the real magic happens when you begin to think like a data scientist. This book will get you there. About the Book Think Like a Data Scientist teaches you a step-by-step approach to solving real-world data-centric problems. By breaking down carefully crafted examples, you'll learn to combine analytic, programming, and business perspectives into a repeatable process for extracting real knowledge from data. As you read, you'll discover (or remember) valuable statistical techniques and explore powerful data science software. More importantly, you'll put this knowledge together using a structured process for data science. When you've finished, you'll have a strong foundation for a lifetime of data science learning and practice. What's Inside The data science process, step-by-step How to anticipate problems Dealing with uncertainty Best practices in software and scientific thinking About the Reader Readers need beginner programming skills and knowledge of basic statistics. About the Author Brian Godsey has worked in software, academia, finance, and defense and has launched several data-centric start-ups. Quotes Explains difficult concepts and techniques concisely and approachably. - Jenice Tom, CVS Health Goes beyond simple tools and techniques and helps you to conceptualize and solve challenging, real-world data science problems. - Casimir Saternos, Synchronoss Technologies A successful attempt to put the mind of a data scientist on paper. - David Krief, Altansia The book that changed my career path! - Nicolas Boulet-Lavoie, DL Innov

Data Science For Dummies, 2nd Edition

Your ticket to breaking into the field of data science! Jobs in data science are projected to outpace the number of people with data science skills—making those with the knowledge to fill a data science position a hot commodity in the coming years. Data Science For Dummies is the perfect starting point for IT professionals and students interested in making sense of an organization's massive data sets and applying their findings to real-world business scenarios. From uncovering rich data sources to managing large amounts of data within hardware and software limitations, ensuring consistency in reporting, merging various data sources, and beyond, you'll develop the know-how you need to effectively interpret data and tell a story that can be understood by anyone in your organization. Provides a background in data science fundamentals and preparing your data for analysis Details different data visualization techniques that can be used to showcase and summarize your data Explains both supervised and unsupervised machine learning, including regression, model validation, and clustering techniques Includes coverage of big data processing tools like MapReduce, Hadoop, Dremel, Storm, and Spark It's a big, big data world out there—let Data Science For Dummies help you harness its power and gain a competitive edge for your organization.

The Data Science Handbook

A comprehensive overview of data science covering the analytics, programming, and business skills necessary to master the discipline Finding a good data scientist has been likened to hunting for a unicorn: the required combination of technical skills is simply very hard to find in one person. In addition, good data science is not just rote application of trainable skill sets; it requires the ability to think flexibly about all these areas and understand the connections between them. This book provides a crash course in data science, combining all the necessary skills into a unified discipline. Unlike many analytics books, computer science and software engineering are given extensive coverage since they play such a central role in the daily work of a data scientist. The author also describes classic machine learning algorithms, from their mathematical foundations to real-world applications. Visualization tools are reviewed, and their central importance in data science is highlighted. Classical statistics is addressed to help readers think critically about the interpretation of data and its common pitfalls. The clear communication of technical results, which is perhaps the most undertrained of data science skills, is given its own chapter, and all topics are explained in the context of solving real-world data problems. The book also features: • Extensive sample code and tutorials using Python™ along with its technical libraries • Core technologies of “Big Data,” including their strengths and limitations and how they can be used to solve real-world problems • Coverage of the practical realities of the tools, keeping theory to a minimum; however, when theory is presented, it is done in an intuitive way to encourage critical thinking and creativity • A wide variety of case studies from industry • Practical advice on the realities of being a data scientist today, including the overall workflow, where time is spent, the types of datasets worked on, and the skill sets needed The Data Science Handbook is an ideal resource for data analysis methodology and big data software tools. The book is appropriate for people who want to practice data science, but lack the required skill sets. This includes software professionals who need to better understand analytics and statisticians who need to understand software. Modern data science is a unified discipline, and it is presented as such. This book is also an appropriate reference for researchers and entry-level graduate students who need to learn real-world analytics and expand their skill set. FIELD CADY is the data scientist at the Allen Institute for Artificial Intelligence, where he develops tools that use machine learning to mine scientific literature. He has also worked at Google and several Big Data startups. He has a BS in physics and math from Stanford University, and an MS in computer science from Carnegie Mellon.

Scala: Guide for Data Science Professionals

Scala will be a valuable tool to have on hand during your data science journey for everything from data cleaning to cutting-edge machine learning About This Book Build data science and data engineering solutions with ease An in-depth look at each stage of the data analysis process — from reading and collecting data to distributed analytics Explore a broad variety of data processing, machine learning, and genetic algorithms through diagrams, mathematical formulations, and source code Who This Book Is For This learning path is perfect for those who are comfortable with Scala programming and now want to enter the field of data science. Some knowledge of statistics is expected. What You Will Learn Transfer and filter tabular data to extract features for machine learning Read, clean, transform, and write data to both SQL and NoSQL databases Create Scala web applications that couple with JavaScript libraries such as D3 to create compelling interactive visualizations Load data from HDFS and HIVE with ease Run streaming and graph analytics in Spark for exploratory analysis Bundle and scale up Spark jobs by deploying them into a variety of cluster managers Build dynamic workflows for scientific computing Leverage open source libraries to extract patterns from time series Master probabilistic models for sequential data In Detail Scala is especially good for analyzing large sets of data as the scale of the task doesn’t have any significant impact on performance. Scala’s powerful functional libraries can interact with databases and build scalable frameworks — resulting in the creation of robust data pipelines. The first module introduces you to Scala libraries to ingest, store, manipulate, process, and visualize data. Using real world examples, you will learn how to design scalable architecture to process and model data — starting from simple concurrency constructs and progressing to actor systems and Apache Spark. After this, you will also learn how to build interactive visualizations with web frameworks. Once you have become familiar with all the tasks involved in data science, you will explore data analytics with Scala in the second module. You’ll see how Scala can be used to make sense of data through easy to follow recipes. You will learn about Bokeh bindings for exploratory data analysis and quintessential machine learning with algorithms with Spark ML library. You’ll get a sufficient understanding of Spark streaming, machine learning for streaming data, and Spark graphX. Armed with a firm understanding of data analysis, you will be ready to explore the most cutting-edge aspect of data science — machine learning. The final module teaches you the A to Z of machine learning with Scala. You’ll explore Scala for dependency injections and implicits, which are used to write machine learning algorithms. You’ll also explore machine learning topics such as clustering, dimentionality reduction, Naïve Bayes, Regression models, SVMs, neural networks, and more. This learning path combines some of the best that Packt has to offer into one complete, curated package. It includes content from the following Packt products: Scala for Data Science, Pascal Bugnion Scala Data Analysis Cookbook, Arun Manivannan Scala for Machine Learning, Patrick R. Nicolas Style and approach A complete package with all the information necessary to start building useful data engineering and data science solutions straight away. It contains a diverse set of recipes that cover the full spectrum of interesting data analysis tasks and will help you revolutionize your data analysis skills using Scala. Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.

2017 European Data Science Salary Survey

How do data science salaries for people in Europe compare to their counterparts in the rest of the world? Among the more than 1000 people who responded to O’Reilly’s 2016 Data Science Salary Survey, 359 live and work in various European countries as data scientists, analysts, engineers, and related professions. This report takes a deep dive into the survey results from respondents in various regions of Europe, including the tools they use, the compensation they receive, and the roles they play in their respective organizations. Even if you didn’t take part in the survey, you can still plug your own information into the survey’s simple linear model to see where you fit. With this report, you’ll learn: How salaries vary by country and specific regions in Europe Average size of companies by region How salary is affected by a country’s GDP Top industries for data scientists, including software, banking, finance, retail, and ecommerce Most commonly used tools vs tools used by respondents with above-average salaries Primary and secondary job tasks performed by survey respondents To stay up-to-date on this research, your participation is crucial. The survey is now open for the 2017 report; please take just 5 to 10 minutes to participate in the survey here.

Strategies in Biomedical Data Science

An essential guide to healthcare data problems, sources, and solutions Strategies in Biomedical Data Science provides medical professionals with much-needed guidance toward managing the increasing deluge of healthcare data. Beginning with a look at our current top-down methodologies, this book demonstrates the ways in which both technological development and more effective use of current resources can better serve both patient and payer. The discussion explores the aggregation of disparate data sources, current analytics and toolsets, the growing necessity of smart bioinformatics, and more as data science and biomedical science grow increasingly intertwined. You'll dig into the unknown challenges that come along with every advance, and explore the ways in which healthcare data management and technology will inform medicine, politics, and research in the not-so-distant future. Real-world use cases and clear examples are featured throughout, and coverage of data sources, problems, and potential mitigations provides necessary insight for forward-looking healthcare professionals. Big Data has been a topic of discussion for some time, with much attention focused on problems and management issues surrounding truly staggering amounts of data. This book offers a lifeline through the tsunami of healthcare data, to help the medical community turn their data management problem into a solution. Consider the data challenges personalized medicine entails Explore the available advanced analytic resources and tools Learn how bioinformatics as a service is quickly becoming reality Examine the future of IOT and the deluge of personal device data The sheer amount of healthcare data being generated will only increase as both biomedical research and clinical practice trend toward individualized, patient-specific care. Strategies in Biomedical Data Science provides expert insight into the kind of robust data management that is becoming increasingly critical as healthcare evolves.

Pro Tableau: A Step-by-Step Guide

Leverage the power of visualization in business intelligence and data science to make quicker and better decisions. Use statistics and data mining to make compelling and interactive dashboards. This book will help those familiar with Tableau software chart their journey to being a visualization expert. Pro Tableau demonstrates the power of visual analytics and teaches you how to: Connect to various data sources such as spreadsheets, text files, relational databases (Microsoft SQL Server, MySQL, etc.), non-relational databases (NoSQL such as MongoDB, Cassandra), R data files, etc. Write your own custom SQL, etc. Perform statistical analysis in Tableau using R Use a multitude of charts (pie, bar, stacked bar, line, scatter plots, dual axis, histograms, heat maps, tree maps, highlight tables, box and whisker, etc.) What you'll learn Connect to various data sources such as relational databases (Microsoft SQL Server, MySQL), non-relational databases (NoSQL such as MongoDB, Cassandra), write your own custom SQL, join and blend data sources, etc. Leverage table calculations (moving average, year over year growth, LOD (Level of Detail), etc. Integrate Tableau with R Tell a compelling story with data by creating highly interactive dashboards Who this book is for All levels of IT professionals, from executives responsible for determining IT strategies to systems administrators, to data analysts, to decision makers responsible for driving strategic initiatives, etc. The book will help those familiar with Tableau software chart their journey to a visualization expert.

Principles of Data Science

If you've ever wondered how to bridge the gap between mathematics, programming, and actionable data insights, 'Principles of Data Science' is the guide for you. This book explores the full data science pipeline, providing you with tools and knowledge to transform raw data into impactful decisions. With practical lessons and hands-on tutorials, you'll master the essential skills of a data scientist. What this Book will help me do Understand and apply the five core steps of the data science process. Gain insight into data cleaning, visualization, and effective communication of results. Learn and implement foundational machine learning models using Python or R. Bridge gaps between mathematics, statistics, and programming to solve data-driven problems. Evaluate machine learning models using key metrics for better predictive capabilities. Author(s) The author, a seasoned data scientist with years of professional experience in analytics and software development, brings a rich perspective to the topic. Combining a strong foundation in mathematics with expertise in Python and R, they have worked on diverse real-world data projects. Their teaching philosophy emphasizes clarity and practical application, ensuring you not only gain knowledge but also know how to apply it effectively. Who is it for? This book is intended for individuals with a basic understanding of algebra and some programming experience in Python or R. It is perfect for programmers who wish to dive into the world of data science or for those with math skills looking to apply them practically. If you seek to turn raw data into valuable insights and predictions, this book is tailored for you.

Efficient R Programming

There are many excellent R resources for visualization, data science, and package development. Hundreds of scattered vignettes, web pages, and forums explain how to use R in particular domains. But little has been written on how to simply make R work effectively—until now. This hands-on book teaches novices and experienced R users how to write efficient R code. Drawing on years of experience teaching R courses, authors Colin Gillespie and Robin Lovelace provide practical advice on a range of topics—from optimizing the set-up of RStudio to leveraging C++—that make this book a useful addition to any R user’s bookshelf.

Improve the outcome of your data experiments with A-B testing

Data scientists are faced with the need to conduct continual experiments, particularly regarding user interface and product marketing. Designing experiments is a cornerstone of the practice of statistics, with clear application to data science. In this lesson, you’ll learn about A-B testing and hypothesis, or significance tests—critical aspects of experimental design for data science. What you’ll learn—and how you can apply it You will learn the central concepts of A-B testing, understand its role in designing and conducting data science experiments, and the characteristics of a proper A-B test. Through a series of sample tests, you’ll learn how to interpret results, and apply that insight to your analysis of the data. Since A-B tests are typically constructed with a hypothesis in mind, you’ll also learn how to conduct various hypothesis, or significance tests, enabling you to avoid misinterpreting randomness. This lesson is for you because You are a data scientist or analyst working with data, and want to gain beginner-level knowledge of key statistical concepts to improve the design, and outcome of your experimental tests with data. Prerequisites: Basic familiarity with coding in R Materials or downloads needed: n/a

R for Data Science

Learn how to use R to turn raw data into insight, knowledge, and understanding. This book introduces you to R, RStudio, and the tidyverse, a collection of R packages designed to work together to make data science fast, fluent, and fun. Suitable for readers with no previous programming experience, R for Data Science is designed to get you doing data science as quickly as possible. Authors Hadley Wickham and Garrett Grolemund guide you through the steps of importing, wrangling, exploring, and modeling your data and communicating the results. You’ll get a complete, big-picture understanding of the data science cycle, along with basic tools you need to manage the details. Each section of the book is paired with exercises to help you practice what you’ve learned along the way. You’ll learn how to: Wrangle—transform your datasets into a form convenient for analysis Program—learn powerful R tools for solving data problems with greater clarity and ease Explore—examine your data, generate hypotheses, and quickly test them Model—provide a low-dimensional summary that captures true "signals" in your dataset Communicate—learn R Markdown for integrating prose, code, and results

Python Data Science Handbook

For many researchers, Python is a first-class tool mainly because of its libraries for storing, manipulating, and gaining insight from data. Several resources exist for individual pieces of this data science stack, but only with the Python Data Science Handbook do you get them all—IPython, NumPy, Pandas, Matplotlib, Scikit-Learn, and other related tools. Working scientists and data crunchers familiar with reading and writing Python code will find this comprehensive desk reference ideal for tackling day-to-day issues: manipulating, transforming, and cleaning data; visualizing different types of data; and using data to build statistical or machine learning models. Quite simply, this is the must-have reference for scientific computing in Python. With this handbook, you’ll learn how to use: IPython and Jupyter: provide computational environments for data scientists using Python NumPy: includes the ndarray for efficient storage and manipulation of dense data arrays in Python Pandas: features the DataFrame for efficient storage and manipulation of labeled/columnar data in Python Matplotlib: includes capabilities for a flexible range of data visualizations in Python Scikit-Learn: for efficient and clean Python implementations of the most important and established machine learning algorithms

Learning R Programming

This book provides a comprehensive introduction to R programming, a powerful tool for data science and statistics. Throughout the book, readers will explore programming constructs, data structures, and popular R packages, gaining the skills needed for practical applications and problem-solving. What this Book will help me do Understand R's foundational concepts like variables, data types, and functions. Learn how to use R for data analysis, visualization, and machine learning tasks. Develop advanced R skills such as meta-programming and performance optimization. Master object-oriented programming using R's S3, S4, and R6 systems. Gain confidence in utilizing R for creating web scraping scripts and interactive reports. Author(s) None Ren, an experienced software developer and educator, specializes in languages for data analysis, including R. With years of practical experience and teaching R programming, they bring clarity and depth to complex topics. Their approachable writing style ensures learners at any level can engage effectively. Who is it for? This book is ideal for professionals in data science, statistics, and related fields with basic programming skills looking to delve into R programming. It caters to beginners and those consolidating their knowledge of R, aiming to develop practical skills for data manipulation and analysis.

2016 Data Science Salary Survey

In this fourth edition of O’Reilly’s Data Science Salary Survey, 983 respondents working across a variety of industries answered questions about the tools they use, the tasks they engage in, and the salaries they make. This year’s survey includes data scientists, engineers, and others in the data space from 45 countries and 45 US states. The 2016 survey included new questions, most notably about specific data-related tasks that may affect salary. Plug in your own data points to the survey model and see how you compare to other data science professionals in your industry. With this report, you’ll learn: Where data scientists make the highest salaries—by country and by US state Tools that respondents most commonly use on the job, and tools that contribute most to salary Two activities that contribute to higher earnings among respondents How gender and bargaining skills affect salaries when all other factors are equal Salary differences between those using open source tools vs those using proprietary tools Salary differences between those who rely on Python vs those who use several tools Participate in the 2017 Survey The survey is now open for the 2017 report. Spend just 5 to 10 minutes and take the anonymous salary survey here: https://www.oreilly.com/ideas/take-the-2​017-data-science-salary-survey.

R for Data Science Cookbook

The "R for Data Science Cookbook" is your comprehensive guide to tackling data problems using R. Focusing on practical applications, you will learn data manipulation, visualization, statistical inference, and machine learning with a hands-on approach using popular R packages. What this Book will help me do Master the use of R's functional programming features to streamline your analysis workflows. Extract, transform, and visualize data effectively using robust R packages like dplyr and ggplot2. Learn to create intuitive and professional visualizations and reports that communicate insights effectively. Implement key statistical modeling and machine learning techniques to solve real-world problems. Acquire expertise in data mining techniques, including clustering and association rule mining. Author(s) Yu-Wei Chiu, also known as David Chiu, is an experienced data scientist and educator. With a solid technical background in using R for data science, he combines theory with practical applications in his writing. David's approachable style and rich examples make complex topics accessible and engaging for learners. Who is it for? This book is perfect for individuals who already have a foundation in R and are looking to deepen their expertise in applying R to data science tasks. Ideal readers are analysts and statisticians eager to solve real-world problems using practical tools. If you're aspiring to work effectively with large data sets or want to learn versatile data analysis techniques, this book is designed for you. It bridges the gap between theoretical knowledge and actionable skills, making it invaluable for professionals and learners alike.

AI and Medicine

Data-driven techniques have improved decision-making processes for people in industries such as finance and real estate. Yet, despite promising solutions that data analytics and artificial intelligence/machine learning (ML) tools can bring to healthcare, the industry remains largely unconvinced. In this O’Reilly report, you’ll explore the potential of—and impediments to—widespread adoption of AI and ML in the medical field. You’ll also learn how extensive government regulation and resistance from the medical community have so far stymied full-scale acceptance of sophisticated data analytics in healthcare. Through interviews with several professionals working at the intersection of medicine and data science, author Mike Barlow examines five areas where the application of AI/ML strategies can spur a beneficial revolution in healthcare: Identifying risks and interventions for healthcare management of entire populations Closing gaps in care by designing plans for individual patients Supporting customized self-care treatment plans and monitoring patient health in real time Optimizing healthcare processes through data analysis to improve care and reduce costs Helping doctors and patients choose proper medications, dosages, and promising surgical options

Simulation for Data Science with R

"Simulation for Data Science with R" introduces data professionals to fundamental and advanced simulation techniques using R. You'll understand essential statistical modeling concepts and learn to apply simulation methods to tackle data challenges and enhance your decision-making skills. What this Book will help me do Master five popular simulation methodologies including Monte Carlo and Agent-Based Modeling. Learn to simulate real-world data to uncover patterns and enhance predictions. Enhance your R programming expertise by exploring its advanced statistical features. Gain hands-on experience solving statistical problems through practical examples. Develop comprehensive statistical models aimed at real-world decision support. Author(s) Matthias Templ is a seasoned data science expert with extensive experience in statistical modeling and simulations using R. His work is rooted in real-world problem solving, outlining frameworks that are practical and research-driven. With a dedication to education, Matthias conveys his knowledge in an accessible and supportive manner. Who is it for? If you're experienced in computational methods and wish to refine your understanding of R for advanced statistical simulations, this book is for you. It's ideal for analysts or scientists aiming to enhance their decision-making with simulated data models. Prior experience with R is recommended to fully engage with the rigorous concepts presented.

The Data Industry

Provides an introduction of the data industry to the field of economics This book bridges the gap between economics and data science to help data scientists understand the economics of big data, and enable economists to analyze the data industry. It begins by explaining data resources and introduces the data asset. This book defines a data industry chain, enumerates data enterprises’ business models versus operating models, and proposes a mode of industrial development for the data industry. The author describes five types of enterprise agglomerations, and multiple industrial cluster effects. A discussion on the establishment and development of data industry related laws and regulations is provided. In addition, this book discusses several scenarios on how to convert data driving forces into productivity that can then serve society. This book is designed to serve as a reference and training guide for ata scientists, data-oriented managers and executives, entrepreneurs, scholars, and government employees. Defines and develops the concept of a “Data Industry,” and explains the economics of data to data scientists and statisticians Includes numerous case studies and examples from a variety of industries and disciplines Serves as a useful guide for practitioners and entrepreneurs in the business of data technology The Data Industry: The Business and Economics of Information and Big Data is a resource for practitioners in the data science industry, government, and students in economics, business, and statistics. CHUNLEI TANG, Ph.D., is a research fellow at Harvard University. She is the co-founder of Fudan’s Institute for Data Industry and proposed the concept of the “data industry”. She received a Ph.D. in Computer and Software Theory in 2012 and a Master of Software Engineering in 2006 from Fudan University, Shanghai, China.

Python: Real-World Data Science

Unleash the power of Python and its robust data science capabilities About This Book Unleash the power of Python 3 objects Learn to use powerful Python libraries for effective data processing and analysis Harness the power of Python to analyze data and create insightful predictive models Unlock deeper insights into machine learning with this vital guide to cutting-edge predictive analytics Who This Book Is For Entry-level analysts who want to enter in the data science world will find this course very useful to get themselves acquainted with Python's data science capabilities for doing real-world data analysis. What You Will Learn Install and setup Python Implement objects in Python by creating classes and defining methods Get acquainted with NumPy to use it with arrays and array-oriented computing in data analysis Create effective visualizations for presenting your data using Matplotlib Process and analyze data using the time series capabilities of pandas Interact with different kind of database systems, such as file, disk format, Mongo, and Redis Apply data mining concepts to real-world problems Compute on big data, including real-time data from the Internet Explore how to use different machine learning models to ask different questions of your data In Detail The Python: Real-World Data Science course will take you on a journey to become an efficient data science practitioner by thoroughly understanding the key concepts of Python. This learning path is divided into four modules and each module are a mini course in their own right, and as you complete each one, you'll have gained key skills and be ready for the material in the next module. The course begins with getting your Python fundamentals nailed down. After getting familiar with Python core concepts, it's time that you dive into the field of data science. In the second module, you'll learn how to perform data analysis using Python in a practical and example-driven way. The third module will teach you how to design and develop data mining applications using a variety of datasets, starting with basic classification and affinity analysis to more complex data types including text, images, and graphs. Machine learning and predictive analytics have become the most important approaches to uncover data gold mines. In the final module, we'll discuss the necessary details regarding machine learning concepts, offering intuitive yet informative explanations on how machine learning algorithms work, how to use them, and most importantly, how to avoid the common pitfalls. Style and approach This course includes all the resources that will help you jump into the data science field with Python and learn how to make sense of data. The aim is to create a smooth learning path that will teach you how to get started with powerful Python libraries and perform various data science techniques in depth.

Introducing Data Science

Introducing Data Science teaches you how to accomplish the fundamental tasks that occupy data scientists. Using the Python language and common Python libraries, you'll experience firsthand the challenges of dealing with data at scale and gain a solid foundation in data science. About the Technology Many companies need developers with data science skills to work on projects ranging from social media marketing to machine learning. Discovering what you need to learn to begin a career as a data scientist can seem bewildering. This book is designed to help you get started. About the Book Introducing Data Science explains vital data science concepts and teaches you how to accomplish the fundamental tasks that occupy data scientists. You'll explore data visualization, graph databases, the use of NoSQL, and the data science process. You'll use the Python language and common Python libraries as you experience firsthand the challenges of dealing with data at scale. Discover how Python allows you to gain insights from data sets so big that they need to be stored on multiple machines, or from data moving so quickly that no single machine can handle it. This book gives you hands-on experience with the most popular Python data science libraries, Scikit-learn and StatsModels. After reading this book, you'll have the solid foundation you need to start a career in data science. What's Inside Handling large data Introduction to machine learning Using Python to work with data Writing data science algorithms About the Reader This book assumes you're comfortable reading code in Python or a similar language, such as C, Ruby, or JavaScript. No prior experience with data science is required. About the Authors Davy Cielen, Arno D. B. Meysman, and Mohamed Ali are the founders and managing partners of Optimately and Maiton, where they focus on developing data science projects and solutions in various sectors. Quotes Read this book if you want to get a quick overview of data science, with lots of examples to get you started! - Alvin Raj, Oracle The map that will help you navigate the data science oceans. - Marius Butuc, Shopify Covers the processes involved in data science from end to end… A complete overview. - Heather Campbell, Kainos A must-read for anyone who wants to get into the data science world. - Hector Cuesta, Big Data Bootcamp

Practical Data Analysis Cookbook

Practical Data Analysis Cookbook takes you on a comprehensive journey to mastering data exploration and analysis using Python. From data cleaning and transformation to building predictive and classification models, this book provides practical recipes for tackling real-world data challenges and extracting valuable insights. What this Book will help me do Efficiently clean, transform, and explore datasets using tools like pandas and OpenRefine. Develop predictive models for time series and other datasets using Python libraries such as scikit-learn and Statsmodels. Apply clustering and classification techniques to real-world data problems to gain actionable insights. Explore advanced topics like natural language processing and graph theory concepts using specialized tools. Build the skills to solve practical data modeling problems encountered in a data science role. Author(s) None Drabas is an experienced data scientist and author who specializes in Python-based data analysis. With a background in tackling intricate data-driven problems, None brings real-world experience to the readers. In creating this Cookbook, None adopts a step-by-step approach, making complex techniques accessible to learners of all backgrounds. Who is it for? If you are a data analyst, data scientist, or someone interested in exploring Python for practical data problems, this book is for you. It suits beginners starting their data journey and intermediate professionals looking to enhance their toolset. With clear instructions, it's ideal for anyone willing to build practical skills and tackle real-world challenges in data analysis.

R Machine Learning By Example

This book, 'R Machine Learning by Example,' offers a hands-on approach to learning about machine learning using R. You will not only understand the theoretical aspects but also learn to apply machine learning algorithms to solve real-world problems. Through guided examples, you'll explore predictive modeling, data analysis, and other machine learning techniques implemented in R. What this Book will help me do Master the use of R for advanced data handling and exploration. Visualize multidimensional data effectively to derive insights. Understand and implement key machine learning algorithms in R. Solve practical, industry-relevant problems across multiple domains using R. Learn to optimize and fine-tune machine learning models for better results. Author(s) Raghav Bali, the author, is a seasoned data scientist with expertise in machine learning. With years of experience using R in data science, he has taught both professionals and enthusiasts how to use machine learning effectively. His approachable and clear writing style ensures that learners of various skill levels can benefit from his insights and guidance. Who is it for? This book is perfect for analysts, data scientists, or enthusiasts who want to leverage R for machine learning. It is suitable for beginners familiar with basic R concepts and intermediate learners looking to deepen their understanding of machine learning applications. If you are aiming to solve practical problems using data, this book will serve as a comprehensive guide.

Data and Electric Power

Traditional engineering is built upon a world of knowledge and scientific laws, with components and systems that operate predictably. But what happens when a large number of these devices are interconnected? You get a complex system that’s no longer deterministic, but probabilistic. That’s happening today in many industries, including manufacturing, petroleum, transportation, and energy. In this O’Reilly report, Sean Patrick Murphy, Chief Data Scientist at PingThings, describes how data science is helping electric utilities make sense of a stochastic world filled with increasing uncertainty—including fundamental changes to the energy market and random phenomena such as weather and solar activity. Murphy also reviews several cutting-edge tools for storing and processing big data that he’s used in his work with electric utilities—tools that can help traditional engineers pursue a data-driven approach in many industries. Topics in this report include: Key drivers that have changed the electric grid from a deterministic machine into probabilistic system Fundamental differences that put traditional engineering and data science at odds with one another Why the time is right for engineering organizations to adopt a complete data-driven approach Contemporary tools that traditional engineers can use to store and process big data A PingThings case study for dealing with random geomagnetic disturbances to the energy grid

Going Pro in Data Science

Digging for answers to your pressing business questions probably won’t resemble those tidy case studies that lead you step-by-step from data collection to cool insights. Data science is not so clear-cut in the real world. Instead of high-quality data with the right velocity, variety, and volume, many data scientists have to work with missing or sketchy information extracted from people in the organization. In this O’Reilly report, Jerry Overton—Distinguished Engineer at global IT leader DXC—introduces practices for making good decisions in a messy and complicated world. What he simply calls “data science that works” is a trial-and-error process of creating and testing hypotheses, gathering evidence, and drawing conclusions. These skills are far more useful for practicing data scientists than, say, mastering the details of a machine-learning algorithm. Adapted and expanded from a series of articles Overton published on O’Reilly Radar and on the CSC Blog, each chapter is ideal for current and aspiring data scientists who want to go pro, as well as IT execs and managers looking to hire in this field. The report covers: Using the scientific method to gain a competitive advantage The skill set you need to look for when choosing a data scientist Why practical induction is a key part of thinking like a data scientist Best practices for writing solid code in your data science gig How agile experimentation lets you find answers (or dead ends) much faster Advice for surviving (and even thriving) as a data scientist in your organization

Ten Signs of Data Science Maturity

How well prepared is your organization to innovate, using data science? In this report, two leading data scientists at the consulting firm Booz Allen Hamilton describe ten characteristics of a mature data science capability. After spending years helping clients such as the US government and commercial organizations worldwide build innovative data science capabilities, Peter Guerra and Dr. Kirk Borne identified these characteristics to help you measure your company’s competence in this area. This report provides a detailed discussion of each of the 10 signs of data science maturity, which—among many other things—encourage you to: Give members of your organization access to all your available data Use Agile and leverage "DataOps"—DevOps for data product development Help your data science team sharpen its skills through open or internal competitions Personify data science as a way of doing things, and not a thing to do