talk-data.com talk-data.com

Topic

Pandas

data_manipulation data_analysis python

187

tagged

Activity Trend

17 peak/qtr
2020-Q1 2026-Q2

Activities

187 activities · Newest first

Elegant SciPy

Welcome to Scientific Python and its community. If you’re a scientist who programs with Python, this practical guide not only teaches you the fundamental parts of SciPy and libraries related to it, but also gives you a taste for beautiful, easy-to-read code that you can use in practice. You’ll learn how to write elegant code that’s clear, concise, and efficient at executing the task at hand. Throughout the book, you’ll work with examples from the wider scientific Python ecosystem, using code that illustrates principles outlined in the book. Using actual scientific data, you’ll work on real-world problems with SciPy, NumPy, Pandas, scikit-image, and other Python libraries. Explore the NumPy array, the data structure that underlies numerical scientific computation Use quantile normalization to ensure that measurements fit a specific distribution Represent separate regions in an image with a Region Adjacency Graph Convert temporal or spatial data into frequency domain data with the Fast Fourier Transform Solve sparse matrix problems, including image segmentations, with SciPy’s sparse module Perform linear algebra by using SciPy packages Explore image alignment (registration) with SciPy’s optimize module Process large datasets with Python data streaming primitives and the Toolz library

Learning pandas - Second Edition

Take your Python skills to the next level with 'Learning pandas,' your go-to guide for mastering data manipulation and analysis. This book walks you through the powerful tools offered by the pandas library, helping you unlock key insights from data efficiently. Whether you're handling time-series data or visualizing patterns, you'll gain the proficiency needed to make sense of complex datasets. What this Book will help me do Understand and effectively use pandas Series and DataFrame objects for data representation and manipulation. Master indexing, slicing, and combining data to perform detailed exploration and analysis. Learn to access and work with external data sources, including APIs, databases, and files, using pandas. Develop the skills to handle and analyze time-series data, managing its unique challenges. Create informative and professional data visualizations directly using pandas capabilities. Author(s) Michael Heydt is a respected author and educator in the field of Python and data analysis. With years of experience utilizing pandas in practical and professional environments, Michael offers a unique perspective that combines deep technical insight with approachable examples. His teaching philosophy emphasizes clarity, applicability, and engaging instruction, ensuring learners easily acquire valuable skills. Who is it for? This book is ideal for Python programmers looking to enhance their data analysis capabilities, as well as data analysts and scientists wanting to leverage pandas to improve their workflows. Readers are recommended to have some familiarity with Python, though prior experience with pandas is not required. If you have a keen interest in data exploration and quantitative techniques, this book is for you.

Python: Data Analytics and Visualization

Understand, evaluate, and visualize data About This Book Learn basic steps of data analysis and how to use Python and its packages A step-by-step guide to predictive modeling including tips, tricks, and best practices Effectively visualize a broad set of analyzed data and generate effective results Who This Book Is For This book is for Python Developers who are keen to get into data analysis and wish to visualize their analyzed data in a more efficient and insightful manner. What You Will Learn Get acquainted with NumPy and use arrays and array-oriented computing in data analysis Process and analyze data using the time-series capabilities of Pandas Understand the statistical and mathematical concepts behind predictive analytics algorithms Data visualization with Matplotlib Interactive plotting with NumPy, Scipy, and MKL functions Build financial models using Monte-Carlo simulations Create directed graphs and multi-graphs Advanced visualization with D3 In Detail You will start the course with an introduction to the principles of data analysis and supported libraries, along with NumPy basics for statistics and data processing. Next, you will overview the Pandas package and use its powerful features to solve data-processing problems. Moving on, you will get a brief overview of the Matplotlib API .Next, you will learn to manipulate time and data structures, and load and store data in a file or database using Python packages. You will learn how to apply powerful packages in Python to process raw data into pure and helpful data using examples. You will also get a brief overview of machine learning algorithms, that is, applying data analysis results to make decisions or building helpful products such as recommendations and predictions using Scikit-learn. After this, you will move on to a data analytics specialization - predictive analytics. Social media and IOT have resulted in an avalanche of data. You will get started with predictive analytics using Python. You will see how to create predictive models from data. You will get balanced information on statistical and mathematical concepts, and implement them in Python using libraries such as Pandas, scikit-learn, and NumPy. You'll learn more about the best predictive modeling algorithms such as Linear Regression, Decision Tree, and Logistic Regression. Finally, you will master best practices in predictive modeling. After this, you will get all the practical guidance you need to help you on the journey to effective data visualization. Starting with a chapter on data frameworks, which explains the transformation of data into information and eventually knowledge, this path subsequently cover the complete visualization process using the most popular Python libraries with working examples This Learning Path combines some of the best that Packt has to offer in one complete, curated package. It includes content from the following Packt products: Getting Started with Python Data Analysis, Phuong Vo.T.H &Martin Czygan Learning Predictive Analytics with Python, Ashish Kumar Mastering Python Data Visualization, Kirthi Raman Style and approach The course acts as a step-by-step guide to get you familiar with data analysis and the libraries supported by Python with the help of real-world examples and datasets. It also helps you gain practical insights into predictive modeling by implementing predictive-analytics algorithms on public datasets with Python. The course offers a wealth of practical guidance to help you on this journey to data visualization

Summary

There is a vast constellation of tools and platforms for processing and analyzing your data. In this episode Matthew Rocklin talks about how Dask fills the gap between a task oriented workflow tool and an in memory processing framework, and how it brings the power of Python to bear on the problem of big data.

Preamble

Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch. You can help support the show by checking out the Patreon page which is linked from the site. To help other people find the show you can leave a review on iTunes, or Google Play Music, and tell your friends and co-workers Your host is Tobias Macey and today I’m interviewing Matthew Rocklin about Dask and the Blaze ecosystem.

Interview with Matthew Rocklin

Introduction How did you get involved in the area of data engineering? Dask began its life as part of the Blaze project. Can you start by describing what Dask is and how it originated? There are a vast number of tools in the field of data analytics. What are some of the specific use cases that Dask was built for that weren’t able to be solved by the existing options? One of the compelling features of Dask is the fact that it is a Python library that allows for distributed computation at a scale that has largely been the exclusive domain of tools in the Hadoop ecosystem. Why do you think that the JVM has been the reigning platform in the data analytics space for so long? Do you consider Dask, along with the larger Blaze ecosystem, to be a competitor to the Hadoop ecosystem, either now or in the future? Are you seeing many Hadoop or Spark solutions being migrated to Dask? If so, what are the common reasons? There is a strong focus for using Dask as a tool for interactive exploration of data. How does it compare to something like Apache Drill? For anyone looking to integrate Dask into an existing code base that is already using NumPy or Pandas, what does that process look like? How do the task graph capabilities compare to something like Airflow or Luigi? Looking through the documentation for the graph specification in Dask, it appears that there is the potential to introduce cycles or other bugs into a large or complex task chain. Is there any built-in tooling to check for that before submitting the graph for execution? What are some of the most interesting or unexpected projects that you have seen Dask used for? What do you perceive as being the most relevant aspects of Dask for data engineering/data infrastructure practitioners, as compared to the end users of the systems that they support? What are some of the most significant problems that you have been faced with, and which still need to be overcome in the Dask project? I know that the work on Dask is largely performed under the umbrella of PyData and sponsored by Continuum Analytics. What are your thoughts on the financial landscape for open source data analytics and distributed computation frameworks as compared to the broader world of open source projects?

Keep in touch

@mrocklin on Twitter mrocklin on GitHub

Links

http://matthewrocklin.com/blog/work/2016/09/22/cluster-deployments?utm_source=rss&utm_medium=rss https://opendatascience.com/blog/dask-for-institutions/?utm_source=rss&utm_medium=rss Continuum Analytics 2sigma X-Array Tornado

Website Podcast Interview

Airflow Luigi Mesos Kubernetes Spark Dryad Yarn Read The Docs XData

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast

Apache Spark for Data Science Cookbook

In "Apache Spark for Data Science Cookbook," you'll delve into solving real-world analytical challenges using the robust Apache Spark framework. This book features hands-on recipes that cover data analysis, distributed machine learning, and real-time data processing. You'll gain practical skills to process, visualize, and extract insights from large datasets efficiently. What this Book will help me do Master using Apache Spark for processing and analyzing large-scale datasets effectively. Harness Spark's MLLib for implementing machine learning algorithms like classification and clustering. Utilize libraries such as NumPy, SciPy, and Pandas in conjunction with Spark for numerical computations. Apply techniques like Natural Language Processing and text mining using Spark-integrated tools. Perform end-to-end data science workflows, including data exploration, modeling, and visualization. Author(s) Nagamallikarjuna Inelu and None Chitturi bring their extensive experience working with data science and distributed computing frameworks like Apache Spark. Nagamallikarjuna specializes in applying machine learning algorithms to big data problems, while None has contributed to various big data system implementations. Together, they focus on providing practitioners with practical and efficient solutions. Who is it for? This book is primarily intended for novice and intermediate data scientists and analysts who are curious about using Apache Spark to tackle data science problems. Readers are expected to have some familiarity with basic data science tasks. If you want to learn practical applications of Spark in data analysis and enhance your big data analytics skills, this resource is for you.

Python Data Science Handbook

For many researchers, Python is a first-class tool mainly because of its libraries for storing, manipulating, and gaining insight from data. Several resources exist for individual pieces of this data science stack, but only with the Python Data Science Handbook do you get them all—IPython, NumPy, Pandas, Matplotlib, Scikit-Learn, and other related tools. Working scientists and data crunchers familiar with reading and writing Python code will find this comprehensive desk reference ideal for tackling day-to-day issues: manipulating, transforming, and cleaning data; visualizing different types of data; and using data to build statistical or machine learning models. Quite simply, this is the must-have reference for scientific computing in Python. With this handbook, you’ll learn how to use: IPython and Jupyter: provide computational environments for data scientists using Python NumPy: includes the ndarray for efficient storage and manipulation of dense data arrays in Python Pandas: features the DataFrame for efficient storage and manipulation of labeled/columnar data in Python Matplotlib: includes capabilities for a flexible range of data visualizations in Python Scikit-Learn: for efficient and clean Python implementations of the most important and established machine learning algorithms

Practical Data Analysis - Second Edition

Practical Data Analysis provides a hands-on guide to mastering essential data analysis techniques using tools like Pandas, MongoDB, and Apache Spark. With step-by-step instructions, you'll explore how to process diverse data types, apply machine learning methods, and uncover actionable insights that can drive innovative projects and business solutions. What this Book will help me do Master data acquisition, formatting, and visualization techniques to prepare your data for analysis. Understand and apply machine learning algorithms for tasks like classification and forecasting. Learn to analyze textual data, such as performing sentiment analysis and text classification. Effectively work with databases using tools like MongoDB and handle big data with Apache Spark. Develop data-driven applications using real-world examples like image similarity searches and social network graph analysis. Author(s) None Cuesta and Dr. Sampath Kumar are experienced data scientists and educators. They have considerable experience applying data analysis techniques in various domains and a passion for teaching these skills. Their practical approach to data analysis ensures an engaging learning experience for readers. Who is it for? This book is ideal for developers and data enthusiasts aiming to incorporate practical data analysis into their projects. It is perfectly suited for readers with basic programming, statistics, and linear algebra knowledge. Even if you're new to professional data analysis, you'll find the step-by-step examples approachable. This book guides you in transforming raw data into valuable insights.

Data Visualization with Python and JavaScript

Learn how to turn raw data into rich, interactive web visualizations with the powerful combination of Python and JavaScript. With this hands-on guide, author Kyran Dale teaches you how build a basic dataviz toolchain with best-of-breed Python and JavaScript libraries—including Scrapy, Matplotlib, Pandas, Flask, and D3—for crafting engaging, browser-based visualizations. As a working example, throughout the book Dale walks you through transforming Wikipedia’s table-based list of Nobel Prize winners into an interactive visualization. You’ll examine steps along the entire toolchain, from scraping, cleaning, exploring, and delivering data to building the visualization with JavaScript’s D3 library. If you’re ready to create your own web-based data visualizations—and know either Python or JavaScript— this is the book for you. Learn how to manipulate data with Python Understand the commonalities between Python and JavaScript Extract information from websites by using Python’s web-scraping tools, BeautifulSoup and Scrapy Clean and explore data with Python’s Pandas, Matplotlib, and Numpy libraries Serve data and create RESTful web APIs with Python’s Flask framework Create engaging, interactive web visualizations with JavaScript’s D3 library

Mastering Python Data Analysis

Mastering Python Data Analysis provides a comprehensive roadmap for Python developers to enhance their data analysis skills to tackle real-world problems. This book delves into advanced statistical analysis, covering tools, models, and methods to transform raw data into valuable insights. What this Book will help me do Effectively handle and preprocess data using Python and Pandas. Explore statistical models to identify patterns and gain insights from data. Learn clustering approaches to detect data groupings and predict outcomes. Utilize Bayesian methods for quantifying causal relationships. Generate professional reports and visualizations with Python tools like Jupyter Notebook. Author(s) None Vilhelm Persson is a seasoned software developer and data analyst with expertise in leveraging Python for sophisticated data analysis and machine learning tasks. Drawing from years of experience in the tech industry, None provides practical, real-world insights throughout the book. His approachable writing style ensures technical concepts are conveyed with clarity, making data analysis accessible to developers at varying skill levels. Who is it for? This book is ideal for intermediate Python developers seeking to elevate their data analysis skills. If you are familiar with Python libraries and have an interest in solving complex data problems, this guide will serve as a stepping stone to mastery. Advanced beginners with a curiosity for statistical methods and a desire to learn through practical examples will find this book invaluable. It is also perfect for professionals aiming to integrate Python-based statistical techniques into their workflow.

Python: Real-World Data Science

Unleash the power of Python and its robust data science capabilities About This Book Unleash the power of Python 3 objects Learn to use powerful Python libraries for effective data processing and analysis Harness the power of Python to analyze data and create insightful predictive models Unlock deeper insights into machine learning with this vital guide to cutting-edge predictive analytics Who This Book Is For Entry-level analysts who want to enter in the data science world will find this course very useful to get themselves acquainted with Python's data science capabilities for doing real-world data analysis. What You Will Learn Install and setup Python Implement objects in Python by creating classes and defining methods Get acquainted with NumPy to use it with arrays and array-oriented computing in data analysis Create effective visualizations for presenting your data using Matplotlib Process and analyze data using the time series capabilities of pandas Interact with different kind of database systems, such as file, disk format, Mongo, and Redis Apply data mining concepts to real-world problems Compute on big data, including real-time data from the Internet Explore how to use different machine learning models to ask different questions of your data In Detail The Python: Real-World Data Science course will take you on a journey to become an efficient data science practitioner by thoroughly understanding the key concepts of Python. This learning path is divided into four modules and each module are a mini course in their own right, and as you complete each one, you'll have gained key skills and be ready for the material in the next module. The course begins with getting your Python fundamentals nailed down. After getting familiar with Python core concepts, it's time that you dive into the field of data science. In the second module, you'll learn how to perform data analysis using Python in a practical and example-driven way. The third module will teach you how to design and develop data mining applications using a variety of datasets, starting with basic classification and affinity analysis to more complex data types including text, images, and graphs. Machine learning and predictive analytics have become the most important approaches to uncover data gold mines. In the final module, we'll discuss the necessary details regarding machine learning concepts, offering intuitive yet informative explanations on how machine learning algorithms work, how to use them, and most importantly, how to avoid the common pitfalls. Style and approach This course includes all the resources that will help you jump into the data science field with Python and learn how to make sense of data. The aim is to create a smooth learning path that will teach you how to get started with powerful Python libraries and perform various data science techniques in depth.

Practical Data Analysis Cookbook

Practical Data Analysis Cookbook takes you on a comprehensive journey to mastering data exploration and analysis using Python. From data cleaning and transformation to building predictive and classification models, this book provides practical recipes for tackling real-world data challenges and extracting valuable insights. What this Book will help me do Efficiently clean, transform, and explore datasets using tools like pandas and OpenRefine. Develop predictive models for time series and other datasets using Python libraries such as scikit-learn and Statsmodels. Apply clustering and classification techniques to real-world data problems to gain actionable insights. Explore advanced topics like natural language processing and graph theory concepts using specialized tools. Build the skills to solve practical data modeling problems encountered in a data science role. Author(s) None Drabas is an experienced data scientist and author who specializes in Python-based data analysis. With a background in tackling intricate data-driven problems, None brings real-world experience to the readers. In creating this Cookbook, None adopts a step-by-step approach, making complex techniques accessible to learners of all backgrounds. Who is it for? If you are a data analyst, data scientist, or someone interested in exploring Python for practical data problems, this book is for you. It suits beginners starting their data journey and intermediate professionals looking to enhance their toolset. With clear instructions, it's ideal for anyone willing to build practical skills and tackle real-world challenges in data analysis.

Python Business Intelligence Cookbook

Learn how to harness Python for business intelligence tasks with the 'Python Business Intelligence Cookbook.' This guide provides practical recipes that help transform raw data into actionable insights for better decision-making. From preparing and analyzing to visualizing data, you will acquire useful skills for implementing efficient BI systems within your organization. What this Book will help me do Master installing and setting up tools like Anaconda and MongoDB for BI work. Prepare datasets by cleaning, standardizing, and extracting essential data. Use Pandas and NoSQL databases to analyze data and extract insights. Build business dashboards utilizing visualization tools like Matplotlib. Gain the ability to create complete BI systems for various business needs. Author(s) None Dempsey has extensive experience in Python programming and data analysis. With a passion for teaching and applied business intelligence, Dempsey writes in a straightforward and approachable style, making complex topics accessible to readers. The recipes compiled in this book are built to be both practical and intuitive. Who is it for? This book is ideal for data analysts, managers, and professionals who have a basic understanding of Python and want to apply it to business intelligence tasks. It's also helpful for those familiar with BI concepts looking to enhance or modernize their workflows with Python-based tools. If you're seeking to gain actionable insights from data in your business, this book is for you.

Mastering Python Data Visualization

Mastering Python Data Visualization provides thorough, hands-on guidance for creating impactful visual representations of data by leveraging Python's powerful libraries such as Matplotlib, Pandas, and Scikit-Learn. By following this book, you will gain proficiency in understanding data, performing analyses, and ultimately presenting your findings in a clear and engaging way. What this Book will help me do Effectively transform raw data into insightful visualizations using Python's rich ecosystem of libraries. Understand and apply best practices for selecting the most appropriate visualization techniques for different datasets and objectives. Master the use of Python for interactive plotting, regression analysis, clustering, and classification tasks. Develop a solid foundation in data visualization aesthetics and how to convey information clearly through visuals. Utilize Python for specialized fields such as finance, bioinformatics, and social network analysis, incorporating advanced computation techniques. Author(s) Kirthi Raman is an experienced data scientist and Python advocate with a strong background in technical computing and data visualization. He has hands-on experience in using Python's ecosystem to solve real-world data problems and a passion for sharing knowledge. Raman's writing focuses on blending practical insights with comprehensive explanations, ensuring readers not only learn the tools but also apply them effectively. Who is it for? This book is ideal for data analysts, data scientists, and researchers who want to deepen their knowledge of Python-based data visualization techniques. It requires readers to have a basic understanding of Python and data manipulation. If your goal is to create professional and informative visual narratives that are both visually appealing and data-driven, this book is for you.

Learning IPython for Interactive Computing and Data Visualization, Second Edition

Dive into the powerful world of interactive computing and data visualization with Python in the Jupyter Notebook. In this book, you will gain foundational skills in Python and learn how to analyze and visualize data using popular libraries like pandas, NumPy, matplotlib, and more. By the end, you will be creating efficient computations and meaningful visualizations effortlessly. What this Book will help me do Understand the installation and usage of Anaconda and coding in Python through the Jupyter Notebook Gain practical experience in manipulating and exploring datasets with pandas Design advanced visualizations for data representation using matplotlib and seaborn Learn numerical computation and simulation techniques with NumPy and other tools Accelerate performance-sensitive tasks using tools like Numba and Cython Author(s) Cyrille Rossant, the author of this book, is a software developer and data scientist with extensive experience in Python, numerical computing, and data visualization. With a passion for making technical concepts approachable, his writing style blends clarity with practicality, ensuring readers from diverse backgrounds can successfully enhance their skills. Who is it for? This book is ideal for students, professionals, and hobbyists interested in data analysis and visualization. Beginners to Python programming will find it highly approachable. Those with some programming background but new to Python will also benefit greatly. Advanced readers will enjoy the in-depth discussions of performance optimizations and visualization customizations.

Python Data Analytics: Data Analysis and Science Using Pandas, matplotlib, and the Python Programming Language

Python Data Analytics will help you tackle the world of data acquisition and analysis using the power of the Python language. At the heart of this book lies the coverage of pandas, an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. Author Fabio Nelli expertly shows the strength of the Python programming language when applied to processing, managing and retrieving information. Inside, you will see how intuitive and flexible it is to discover and communicate meaningful patterns of data using Python scripts, reporting systems, and data export. This book examines how to go about obtaining, processing, storing, managing and analyzing data using the Python programming language. You will use Python and other open source tools to wrangle data and tease out interesting and important trends in that data that will allow you to predict future patterns. Whether you are dealing with sales data, investment data (stocks, bonds, etc.), medical data, web page usage, or any other type of data set, Python can be used to interpret, analyze, and glean information from a pile of numbers and statistics. This book is an invaluable reference with its examples of storing and accessing data in a database; it walks you through the process of report generation; it provides three real world case studies or examples that you can take with you for your everyday analysis needs.

Mastering Pandas for Finance

"Mastering Pandas for Finance" takes a deep dive into applying Python and the pandas library to solve real-world financial data analysis problems. With a focus on financial modeling, backtesting trading strategies, and analyzing large datasets, this book equips you with the skills to leverage pandas effectively. What this Book will help me do Utilize pandas DataFrame for efficient financial data handling and manipulation. Develop robust time-series models and perform statistical analysis on financial data. Backtest algorithmic trading strategies including momentum and mean reversion. Price complex financial options and calculate Value at Risk for portfolio management. Optimize portfolio allocation and model financial performance using industry techniques. Author(s) Michael Heydt is an experienced software engineer and data scientist with a strong background in quantitative finance. He specializes in using Python for data analysis and has spent years teaching and writing about technical subjects. His detailed yet approachable writing style makes complex topics accessible to all. Who is it for? "Mastering Pandas for Finance" is perfect for finance professionals seeking to integrate Python into their workflows, data analysts exploring quantitative finance applications, and programmers aiming to specialize in financial analytics. Some baseline Python and pandas knowledge is recommended, but the book is structured to guide you effectively through advanced concepts too.

Learning Pandas

"Learning Pandas" is your comprehensive guide to mastering pandas, the powerful Python library for data manipulation and analysis. In this book, you'll explore pandas' capabilities and learn to apply them to real-world data challenges. With clear explanations and hands-on examples, you'll enhance your ability to analyze, clean, and visualize data effectively. What this Book will help me do Understand the core concepts of pandas and how it integrates with Python. Learn to efficiently manipulate and transform datasets using pandas. Gain skills in analyzing and cleaning data to prepare for insights. Explore techniques for working with time-series data and financial datasets. Discover how to create compelling visualizations with pandas to communicate findings. Author(s) Michael Heydt is an experienced Python developer and data scientist with expertise in teaching technical concepts to others. With a deep understanding of the pandas library, Michael has authored several guides on data analysis and is passionate about making complex information accessible. His practical approach ensures readers can directly apply lessons to their own projects. Who is it for? This book is ideal for Python programmers who want to harness the power of pandas for data analysis. Whether you're a beginner in data science or looking to refine your skills, you'll find clear, actionable guidance here. Basic programming knowledge is assumed, but no prior pandas experience is necessary. If you're eager to turn data into impactful insights, this book is for you.

Data Just Right: Introduction to Large-Scale Data & Analytics

Making Big Data Work: Real-World Use Cases and Examples, Practical Code, Detailed Solutions Large-scale data analysis is now vitally important to virtually every business. Mobile and social technologies are generating massive datasets; distributed cloud computing offers the resources to store and analyze them; and professionals have radically new technologies at their command, including NoSQL databases. Until now, however, most books on “Big Data” have been little more than business polemics or product catalogs. is different: It’s a completely practical and indispensable guide for every Big Data decision-maker, implementer, and strategist. Data Just Right Michael Manoochehri, a former Google engineer and data hacker, writes for professionals who need practical solutions that can be implemented with limited resources and time. Drawing on his extensive experience, he helps you focus on building applications, rather than infrastructure, because that’s where you can derive the most value. Manoochehri shows how to address each of today’s key Big Data use cases in a cost-effective way by combining technologies in hybrid solutions. You’ll find expert approaches to managing massive datasets, visualizing data, building data pipelines and dashboards, choosing tools for statistical analysis, and more. Throughout, the author demonstrates techniques using many of today’s leading data analysis tools, including Hadoop, Hive, Shark, R, Apache Pig, Mahout, and Google BigQuery. Coverage includes Mastering the four guiding principles of Big Data success—and avoiding common pitfalls Emphasizing collaboration and avoiding problems with siloed data Hosting and sharing multi-terabyte datasets efficiently and economically “Building for infinity” to support rapid growth Developing a NoSQL Web app with Redis to collect crowd-sourced data Running distributed queries over massive datasets with Hadoop, Hive, and Shark Building a data dashboard with Google BigQuery Exploring large datasets with advanced visualization Implementing efficient pipelines for transforming immense amounts of data Automating complex processing with Apache Pig and the Cascading Java library Applying machine learning to classify, recommend, and predict incoming information Using R to perform statistical analysis on massive datasets Building highly efficient analytics workflows with Python and Pandas Establishing sensible purchasing strategies: when to build, buy, or outsource Previewing emerging trends and convergences in scalable data technologies and the evolving role of the Data Scientist

Python for Data Analysis

Python for Data Analysis is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python. It is also a practical, modern introduction to scientific computing in Python, tailored for data-intensive applications. This is a book about the parts of the Python language and libraries you’ll need to effectively solve a broad set of data analysis problems. This book is not an exposition on analytical methods using Python as the implementation language. Written by Wes McKinney, the main author of the pandas library, this hands-on book is packed with practical cases studies. It’s ideal for analysts new to Python and for Python programmers new to scientific computing. Use the IPython interactive shell as your primary development environment Learn basic and advanced NumPy (Numerical Python) features Get started with data analysis tools in the pandas library Use high-performance tools to load, clean, transform, merge, and reshape data Create scatter plots and static or interactive visualizations with matplotlib Apply the pandas groupby facility to slice, dice, and summarize datasets Measure data by points in time, whether it’s specific instances, fixed periods, or intervals Learn how to solve problems in web analytics, social sciences, finance, and economics, through detailed examples