talk-data.com talk-data.com

Topic

Python

programming_language data_science web_development

1446

tagged

Activity Trend

185 peak/qtr
2020-Q1 2026-Q1

Activities

1446 activities · Newest first

Hadoop with Python

Hadoop is mostly written in Java, but that doesn't exclude the use of other programming languages with this distributed storage and processing framework, particularly Python. With this concise book, you’ll learn how to use Python with the Hadoop Distributed File System (HDFS), MapReduce, the Apache Pig platform and Pig Latin script, and the Apache Spark cluster-computing framework. Authors Zachary Radtka and Donald Miner from the data science firm Miner & Kasch take you through the basic concepts behind Hadoop, MapReduce, Pig, and Spark. Then, through multiple examples and use cases, you'll learn how to work with these technologies by applying various Python tools. Use the Python library Snakebite to access HDFS programmatically from within Python applications Write MapReduce jobs in Python with mrjob, the Python MapReduce library Extend Pig Latin with user-defined functions (UDFs) in Python Use the Spark Python API (PySpark) to write Spark programs with Python Learn how to use the Luigi Python workflow scheduler to manage MapReduce jobs and Pig scripts Zachary Radtka, a platform engineer at Miner & Kasch, has extensive experience creating custom analytics that run on petabyte-scale data sets.

Big Data for Chimps

Finding patterns in massive event streams can be difficult, but learning how to find them doesn’t have to be. This unique hands-on guide shows you how to solve this and many other problems in large-scale data processing with simple, fun, and elegant tools that leverage Apache Hadoop. You’ll gain a practical, actionable view of big data by working with real data and real problems. Perfect for beginners, this book’s approach will also appeal to experienced practitioners who want to brush up on their skills. Part I explains how Hadoop and MapReduce work, while Part II covers many analytic patterns you can use to process any data. As you work through several exercises, you’ll also learn how to use Apache Pig to process data. Learn the necessary mechanics of working with Hadoop, including how data and computation move around the cluster Dive into map/reduce mechanics and build your first map/reduce job in Python Understand how to run chains of map/reduce jobs in the form of Pig scripts Use a real-world dataset—baseball performance statistics—throughout the book Work with examples of several analytic patterns, and learn when and where you might use them

There's an old adage which says you cannot fit a model which has more parameters than you have data. While this is often the case, it's not a universal truth. Today's guest Jake VanderPlas explains this topic in detail and provides some excellent examples of when it holds and doesn't. Some excellent visuals articulating the points can be found on Jake's blog Pythonic Perambulations, specifically on his post The Model Complexity Myth. We also touch on Jake's work as an astronomer, his noteworthy open source contributions, and forthcoming book (currently available in an Early Edition) Python Data Science Handbook.

Redis Essentials

Redis Essentials is your go-to guide for understanding and mastering Redis, the leading in-memory data structure store. In this book, you will explore the powerful features offered by Redis, such as real-time data processing, highly scalable architectures, and practical implementations for web applications. You'll complete the journey equipped to handle and optimize Redis for your development projects. What this Book will help me do Design analytics applications with advanced data structures like Bitmaps and HyperLogLogs. Scale your application infrastructure using Redis Sentinel, Twemproxy, and Redis Cluster. Develop custom Redis commands and extend its functionality with the Lua scripting language. Implement robust security measures for Redis, including SSL encryption and firewall rules. Master the usage of Redis client libraries in PHP, Python, Node.js, and Ruby for seamless development. Author(s) Maxwell Dayvson da Silva is an experienced software engineer and author with expertise in designing high-performance systems. With a strong focus on practical knowledge and hands-on solutions, Maxwell brings over a decade of experience using Redis to this book. His approachable teaching style ensures learners grasp complex topics easily while emphasizing their practical application to real-world challenges. Who is it for? Redis Essentials is aimed at developers looking to enhance their system's performance and scalability using Redis. Whether you're moderately familiar with key-value stores or new to Redis, this book will provide the explanations and hands-on examples you need. Recommended for developers with experience in data architectures, the book bridges the gap between understanding Redis features and their real-world application. Start here to bring high-performance in-memory data solutions to your projects.

Python Data Analytics: Data Analysis and Science Using Pandas, matplotlib, and the Python Programming Language

Python Data Analytics will help you tackle the world of data acquisition and analysis using the power of the Python language. At the heart of this book lies the coverage of pandas, an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. Author Fabio Nelli expertly shows the strength of the Python programming language when applied to processing, managing and retrieving information. Inside, you will see how intuitive and flexible it is to discover and communicate meaningful patterns of data using Python scripts, reporting systems, and data export. This book examines how to go about obtaining, processing, storing, managing and analyzing data using the Python programming language. You will use Python and other open source tools to wrangle data and tease out interesting and important trends in that data that will allow you to predict future patterns. Whether you are dealing with sales data, investment data (stocks, bonds, etc.), medical data, web page usage, or any other type of data set, Python can be used to interpret, analyze, and glean information from a pile of numbers and statistics. This book is an invaluable reference with its examples of storing and accessing data in a database; it walks you through the process of report generation; it provides three real world case studies or examples that you can take with you for your everyday analysis needs.

Programming ArcGIS with Python Cookbook, Second Edition

Dive into 'Programming ArcGIS with Python Cookbook, Second Edition,' an essential guide for automating your ArcGIS for Desktop tasks with hands-on Python recipes. Through this book, you will understand how to effectively handle GIS data, automate geoprocessing tasks, and extend ArcGIS functionalities to streamline your workflows and boost your productivity. What this Book will help me do Master the management of map documents, layer files, feature classes, and tables using Python. Automate common ArcGIS tasks such as map production, printing, and creating PDF map books programmatically. Learn to find and correct broken data links and make your datasets reliable. Develop custom geoprocessing tools and share them efficiently among your team or projects. Expand your knowledge by leveraging advanced practices such as Python scripting for ArcGIS Pro and REST API integration. Author(s) Eric Pimpler is an accomplished GIS professional and Python programmer with years of practical experience in geospatial science and technology. He specializes in teaching GIS automation using Python and aims to simplify complex concepts into approachable recipes for learners. Eric's writing is marked by clarity and a methodical approach, ensuring that readers can apply their new knowledge effectively. Who is it for? This book is aimed at GIS professionals, cartographers, or analysts who routinely work with ArcGIS and want to streamline their workflow. If you have foundational experience with ArcGIS and basic Python programming skills, this book will build upon them, offering practical recipes to extend your capabilities. It's perfect for those looking to enhance their efficiency and automate their GIS tasks. By the end of this book, readers will have skills valuable to GIS experts and data analysts alike.

Spark Cookbook

Spark Cookbook is your practical guide to mastering Apache Spark, encompassing a comprehensive set of patterns and examples. Through its over 60 recipes, you will gain actionable insights into using Spark Core, Spark SQL, Spark Streaming, MLlib, and GraphX effectively for your big data needs. What this Book will help me do Understand how to install and configure Apache Spark in various environments. Build data pipelines and perform real-time analytics with Spark Streaming. Utilize Spark SQL for interactive data querying and reporting. Apply machine learning workflows using MLlib, including supervised and unsupervised models. Develop optimized big data solutions and integrate them into enterprise platforms. Author(s) None Yadav, the author of Spark Cookbook, is an experienced data engineer and technical expert with deep insights into big data processing frameworks. Yadav has spent years working with Spark and its ecosystem, providing practical guidance to developers and data scientists alike. This book reflects their commitment to sharing actionable knowledge. Who is it for? This book is designed for data engineers, developers, and data scientists who work with big data systems and wish to utilize Apache Spark effectively. Whether you're looking to optimize existing Spark applications or explore its libraries for new use cases, this book will provide the guidance you need. A basic familiarity with big data concepts and programming in languages like Java or Python is recommended to make the most out of this book.

Building web applications with Python and Neo4j

Expand your Python web development expertise by integrating Neo4j into your applications. Through this book, you'll journey from understanding Neo4j's fundamentals to building powerful Python-based applications using tools like Flask, Py2neo, and Django. Learn how to model, query, and update graph data effectively. What this Book will help me do Gain an in-depth understanding of Neo4j installation, licensing, and tools. Master using Cypher for querying and modifying graph data models. Learn how to integrate Python with Neo4j effectively using Py2neo. Build RESTful services with Flask leveraging Neo4j for structured data. Create robust Django applications using graph-based data models with Neomodel. Author(s) Sumit Gupta is a seasoned Python developer with a strong background in graph database design and integration. He has extensive experience using Neo4j to create efficient, scalable applications for real-world problems. His hands-on approach combines practical examples with the depth of knowledge required to develop expertise. Who is it for? This book is ideal for Python developers with an interest in enhancing their applications through graph database technology. If you possess a moderate understanding of Python and wish to explore Neo4j for creating smarter, more interconnected data-driven solutions, this book is for you. You should be comfortable with basic programming concepts to fully benefit from this book.

Web Scraping with Python

Learn web scraping and crawling techniques to access unlimited data from any web source in any format. With this practical guide, you’ll learn how to use Python scripts and web APIs to gather and process data from thousands—or even millions—of web pages at once. Ideal for programmers, security professionals, and web administrators familiar with Python, this book not only teaches basic web scraping mechanics, but also delves into more advanced topics, such as analyzing raw data or using scrapers for frontend website testing. Code samples are available to help you understand the concepts in practice.

Bioinformatics Data Skills

Learn the data skills necessary for turning large sequencing datasets into reproducible and robust biological findings. With this practical guide, you’ll learn how to use freely available open source tools to extract meaning from large complex biological data sets. At no other point in human history has our ability to understand life’s complexities been so dependent on our skills to work with and analyze data. This intermediate-level book teaches the general computational and data skills you need to analyze biological data. If you have experience with a scripting language like Python, you’re ready to get started. Go from handling small problems with messy scripts to tackling large problems with clever methods and tools Process bioinformatics data with powerful Unix pipelines and data tools Learn how to use exploratory data analysis techniques in the R language Use efficient methods to work with genomic range data and range operations Work with common genomics data file formats like FASTA, FASTQ, SAM, and BAM Manage your bioinformatics project with the Git version control system Tackle tedious data processing tasks with with Bash scripts and Makefiles

Bioinformatics with Python Cookbook

Dive into the intersection of biology and data science with 'Bioinformatics with Python Cookbook.' This book equips you to leverage Python and its ecosystem of libraries to tackle complex challenges in computational biology, covering topics like genomics, phylogenetics, and big data bioinformatics. What this Book will help me do Understand the Python ecosystem specifically tailored for computational biology applications. Analyze and visualize next-generation sequencing data effectively. Explore and simulate population genetics for robust biological research. Utilize the Protein Data Bank to extract critical insights about proteins. Handle big genomics datasets with Python tools for large-scale bioinformatics studies. Author(s) Tiago Antao is an established bioinformatician with expertise in Python programming. With years of practical experience in computational biology, he has tailored this cookbook with detailed and actionable examples. Tiago's mission is to make bioinformatic techniques using Python accessible to researchers of varying skill levels. Who is it for? This book is ideal for researchers, biologists, and data scientists with intermediate Python skills looking to expand their expertise in bioinformatics. It caters to professionals wanting to utilize computational tools for solving biological problems. If you're involved in work or study related to genomics, phylogenetics, or large-scale biology datasets, this guide offers practical solutions. Make the most out of Python in your research journey.

Numpy Beginner's Guide (Update)

Delve into the capabilities of NumPy, the cornerstone of mathematical computations in Python. In this guide, you will learn how to utilize NumPy to its fullest by exploring its powerful array and matrix operations, and also integrate it with other libraries like SciPy and matplotlib for advanced analysis and visualization. What this Book will help me do Master the installation and configuration of the NumPy library on different systems. Perform advanced array and matrix operations efficiently using NumPy. Understand and utilize commonly used NumPy modules for computational tasks. Design and generate complex plots using the matplotlib library. Learn best practices for testing and validating numerical computations with NumPy. Author(s) Ivan Idris is an experienced data analyst and Python enthusiast, proficient in utilizing numerical and scientific libraries to address complex problems. With a strong background in mathematics and computer science, Ivan brings a practical approach to his teachings. He emphasizes clarity and hands-on practice, making expert-level concepts accessible and engaging for learners. Who is it for? This book is perfect for scientists, engineers, and data professionals with a solid foundation in Python. It's meant for those seeking to deepen their understanding of numerical methods and scientific computing. If you want to harness the power of NumPy to streamline your computations and develop high-performance solutions, this guide is for you.

Neo4j Cookbook

Dive into Neo4j and uncover how to harness its powerful capabilities in graph data analysis with the Neo4j Cookbook. Across 75 well-structured recipes, you'll learn to apply practical techniques in modeling, querying, and visualizing graph databases, enabling you to address real-world challenges efficiently. What this Book will help me do Access Neo4j from popular programming languages such as Java, Python, and Scala, enabling easier integration into your projects. Migrate data seamlessly from various data stores, including SQL and NoSQL, into Neo4j, maintaining data consistency. Use best practices for data modeling with Neo4j to optimize performance and scalability for your applications. Analyze social data from sources like Facebook and Twitter, revealing valuable insights from connections and relationships. Integrate geospatial data to enable location-based queries and nearest-point searches, opening up advanced application features. Author(s) Ankur Goel, the author of Neo4j Cookbook, is an experienced technologist with an extensive background in handling database solutions and applications. Passionate about simplifying complex systems, Ankur excels in teaching essential database concepts through clear and actionable recipes. His writing is rooted in practical insights, reflecting his hands-on experience in the industry. Who is it for? This book is ideal for developers and data engineers who currently use or plan to integrate Neo4j into their workflows. If you are migrating from a traditional database system or delving into graph databases for the first time, this book offers structured guidance. Readers should have a fundamental understanding of programming and familiarity with database concepts for the best experience. It caters to individuals aiming to build or enhance data-driven applications using Neo4j's robust graph modeling.

Mastering Pandas for Finance

"Mastering Pandas for Finance" takes a deep dive into applying Python and the pandas library to solve real-world financial data analysis problems. With a focus on financial modeling, backtesting trading strategies, and analyzing large datasets, this book equips you with the skills to leverage pandas effectively. What this Book will help me do Utilize pandas DataFrame for efficient financial data handling and manipulation. Develop robust time-series models and perform statistical analysis on financial data. Backtest algorithmic trading strategies including momentum and mean reversion. Price complex financial options and calculate Value at Risk for portfolio management. Optimize portfolio allocation and model financial performance using industry techniques. Author(s) Michael Heydt is an experienced software engineer and data scientist with a strong background in quantitative finance. He specializes in using Python for data analysis and has spent years teaching and writing about technical subjects. His detailed yet approachable writing style makes complex topics accessible to all. Who is it for? "Mastering Pandas for Finance" is perfect for finance professionals seeking to integrate Python into their workflows, data analysts exploring quantitative finance applications, and programmers aiming to specialize in financial analytics. Some baseline Python and pandas knowledge is recommended, but the book is structured to guide you effectively through advanced concepts too.

Marketing Data Science: Modeling Techniques in Predictive Analytics with R and Python

Now a leader of Northwestern University's prestigious analytics program presents a fully-integrated treatment of both the business and academic elements of marketing applications in predictive analytics. Writing for both managers and students, Thomas W. Miller explains essential concepts, principles, and theory in the context of real-world applications. , Building on Miller's pioneering program, thoroughly addresses segmentation, target marketing, brand and product positioning, new product development, choice modeling, recommender systems, pricing research, retail site selection, demand estimation, sales forecasting, customer retention, and lifetime value analysis. Marketing Data Science Starting where Miller's widely-praised Modeling Techniques in Predictive Analytics left off, he integrates crucial information and insights that were previously segregated in texts on web analytics, network science, information technology, and programming. Coverage includes: The role of analytics in delivering effective messages on the web Understanding the web by understanding its hidden structures Being recognized on the web – and watching your own competitors Visualizing networks and understanding communities within them Measuring sentiment and making recommendations Leveraging key data science methods: databases/data preparation, classical/Bayesian statistics, regression/classification, machine learning, and text analytics Six complete case studies address exceptionally relevant issues such as: separating legitimate email from spam; identifying legally-relevant information for lawsuit discovery; gleaning insights from anonymous web surfing data, and more. This text's extensive set of web and network problems draw on rich public-domain data sources; many are accompanied by solutions in Python and/or R. will be an invaluable resource for all students, faculty, and professional marketers who want to use business analytics to improve marketing performance. Marketing Data Science

Have you ever wondered what is lost when you compress a song into an MP3? This week's guest Ryan Maguire did more than that. He worked on software to issolate the sounds that are lost when you convert a lossless digital audio recording into a compressed MP3 file. To complete his project, Ryan worked primarily in python using the pyo library as well as the Bregman Toolkit Ryan mentioned humans having a dynamic range of hearing from 20 hz to 20,000 hz, if you'd like to hear those tones, check the previous link. If you'd like to know more about our guest Ryan Maguire you can find his website at the previous link. To follow The Ghost in the MP3 project, please checkout their Facebook page, or on the sitetheghostinthemp3.com. A PDF of Ryan's publication quality write up can be found at this link: The Ghost in the MP3 and it is definitely worth the read if you'd like to know more of the technical details.

Data Science from Scratch

Data science libraries, frameworks, modules, and toolkits are great for doing data science, but they’re also a good way to dive into the discipline without actually understanding data science. In this book, you’ll learn how many of the most fundamental data science tools and algorithms work by implementing them from scratch. If you have an aptitude for mathematics and some programming skills, author Joel Grus will help you get comfortable with the math and statistics at the core of data science, and with hacking skills you need to get started as a data scientist. Today’s messy glut of data holds answers to questions no one’s even thought to ask. This book provides you with the know-how to dig those answers out. Get a crash course in Python Learn the basics of linear algebra, statistics, and probability—and understand how and when they're used in data science Collect, explore, clean, munge, and manipulate data Dive into the fundamentals of machine learning Implement models such as k-nearest Neighbors, Naive Bayes, linear and logistic regression, decision trees, neural networks, and clustering Explore recommender systems, natural language processing, network analysis, MapReduce, and databases

Learning Pandas

"Learning Pandas" is your comprehensive guide to mastering pandas, the powerful Python library for data manipulation and analysis. In this book, you'll explore pandas' capabilities and learn to apply them to real-world data challenges. With clear explanations and hands-on examples, you'll enhance your ability to analyze, clean, and visualize data effectively. What this Book will help me do Understand the core concepts of pandas and how it integrates with Python. Learn to efficiently manipulate and transform datasets using pandas. Gain skills in analyzing and cleaning data to prepare for insights. Explore techniques for working with time-series data and financial datasets. Discover how to create compelling visualizations with pandas to communicate findings. Author(s) Michael Heydt is an experienced Python developer and data scientist with expertise in teaching technical concepts to others. With a deep understanding of the pandas library, Michael has authored several guides on data analysis and is passionate about making complex information accessible. His practical approach ensures readers can directly apply lessons to their own projects. Who is it for? This book is ideal for Python programmers who want to harness the power of pandas for data analysis. Whether you're a beginner in data science or looking to refine your skills, you'll find clear, actionable guidance here. Basic programming knowledge is assumed, but no prior pandas experience is necessary. If you're eager to turn data into impactful insights, this book is for you.

Advanced Analytics with Spark

In this practical book, four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark. The authors bring Spark, statistical methods, and real-world data sets together to teach you how to approach analytics problems by example. You’ll start with an introduction to Spark and its ecosystem, and then dive into patterns that apply common techniques—classification, collaborative filtering, and anomaly detection among others—to fields such as genomics, security, and finance. If you have an entry-level understanding of machine learning and statistics, and you program in Java, Python, or Scala, you’ll find these patterns useful for working on your own data applications.

ArcPy and ArcGIS: Geospatial Analysis with Python

"ArcPy and ArcGIS: Geospatial Analysis with Python" introduces you to streamlining geospatial analysis using the ArcPy library in Python. You'll learn to automate repetitive GIS tasks, enhance your workflow in ArcGIS, and handle geospatial data programmatically to achieve efficient and accurate results in your projects. What this Book will help me do Master the use of the ArcPy library to automate and optimize GIS workflows. Learn techniques to efficiently handle geospatial data updates and analysis in Python. Understand how to use Python scripting to dynamically create and manage maps and analyses. Gain the skills to enhance repetitive GIS tasks into custom Python tools to increase productivity. Explore advanced geospatial analysis topics using Python's ArcPy module for complex problem-solving. Author(s) Silas Toms is a seasoned GIS professional with extensive experience in Python programming for geospatial applications. With years of hands-on work in automating GIS processes and teaching others, Silas excels at making technical concepts relatable and useful for real-world applications. His practical writing style ensures readers can effectively apply what they learn. Who is it for? This book is ideal for GIS students and professionals who wish to enhance their efficiency by automating tasks in ArcGIS using Python. It also suits Python developers keen on exploring geospatial data analysis and management workflows. Suitable for those with basic GIS knowledge, the book bridges the gap to advanced GIS automation techniques. It's perfect if you aim to streamline repetitive tasks and integrate programming into your geospatial projects.