talk-data.com talk-data.com

Topic

Python

programming_language data_science web_development

1446

tagged

Activity Trend

185 peak/qtr
2020-Q1 2026-Q1

Activities

1446 activities · Newest first

Getting Started with Beautiful Soup

"Getting Started with Beautiful Soup" is your practical guide to website scraping using Python. It teaches you how to use Beautiful Soup and the urllib2 module to extract data from websites efficiently and effectively. Through hands-on examples and clear explanations, you'll gain the skills to navigate, search, and modify HTML content. What this Book will help me do Navigate and scrape web pages using the Beautiful Soup Python library. Understand and implement the urllib2 module to access web content programmatically. Search and analyze HTML structures efficiently to extract the needed data. Modify and format extracted HTML and XML content effectively. Handle encoding and manage output formats for diverse scraping requirements. Author(s) Vineeth G. Nair is an experienced Python developer with a strong focus on web technologies, data extraction, and automation. His expertise in Python's Beautiful Soup library has helped countless learners and professionals tackle the challenges of web scraping. Vineeth combines a methodical approach to teaching with practical examples, making complex concepts accessible and actionable. Who is it for? This book is ideal for Python enthusiasts, data analysts, and budding developers looking to explore web scraping. Whether you're a beginner or have some programming experience, this book will guide you through the fundamental concepts of extracting web data. If you're aiming to delve into practical, real-world implementations of web scraping, this is the book for you.

Data Just Right: Introduction to Large-Scale Data & Analytics

Making Big Data Work: Real-World Use Cases and Examples, Practical Code, Detailed Solutions Large-scale data analysis is now vitally important to virtually every business. Mobile and social technologies are generating massive datasets; distributed cloud computing offers the resources to store and analyze them; and professionals have radically new technologies at their command, including NoSQL databases. Until now, however, most books on “Big Data” have been little more than business polemics or product catalogs. is different: It’s a completely practical and indispensable guide for every Big Data decision-maker, implementer, and strategist. Data Just Right Michael Manoochehri, a former Google engineer and data hacker, writes for professionals who need practical solutions that can be implemented with limited resources and time. Drawing on his extensive experience, he helps you focus on building applications, rather than infrastructure, because that’s where you can derive the most value. Manoochehri shows how to address each of today’s key Big Data use cases in a cost-effective way by combining technologies in hybrid solutions. You’ll find expert approaches to managing massive datasets, visualizing data, building data pipelines and dashboards, choosing tools for statistical analysis, and more. Throughout, the author demonstrates techniques using many of today’s leading data analysis tools, including Hadoop, Hive, Shark, R, Apache Pig, Mahout, and Google BigQuery. Coverage includes Mastering the four guiding principles of Big Data success—and avoiding common pitfalls Emphasizing collaboration and avoiding problems with siloed data Hosting and sharing multi-terabyte datasets efficiently and economically “Building for infinity” to support rapid growth Developing a NoSQL Web app with Redis to collect crowd-sourced data Running distributed queries over massive datasets with Hadoop, Hive, and Shark Building a data dashboard with Google BigQuery Exploring large datasets with advanced visualization Implementing efficient pipelines for transforming immense amounts of data Automating complex processing with Apache Pig and the Cascading Java library Applying machine learning to classify, recommend, and predict incoming information Using R to perform statistical analysis on massive datasets Building highly efficient analytics workflows with Python and Pandas Establishing sensible purchasing strategies: when to build, buy, or outsource Previewing emerging trends and convergences in scalable data technologies and the evolving role of the Data Scientist

The Definitive Guide to MongoDB: A complete guide to dealing with Big Data using MongoDB, Second Edition

The Definitive Guide to MongoDB, Second Edition, is updated for the latest version and includes all of the latest MongoDB features, including the aggregation framework introduced in version 2.2 and hashed indexes in version 2.4. MongoDB is the most popular of the "Big Data" NoSQL database technologies, and it's still growing. David Hows from 10gen, along with experienced MongoDB authors Peter Membrey and Eelco Plugge, provide their expertise and experience in teaching you everything you need to know to become a MongoDB pro. The Definitive Guide to MongoDB, Second Edition, starts with the basics, including how to install on Windows, Linux, and OS X, and how MongoDB handles your data. Then you'll learn how to develop with MongoDB with both PHP and Python, including an example application using a PHP driver to create a blog application. Finally, you'll dig into more advanced but extremely important MongoDB features, including optimization, replication, and sharding -- load-balancing that makes MongoDB ideal for dealing with Big Data. If you're dealing with data, MongoDB should be on your must-learn list. The Definitive Guide to MongoDB, Second Edition, is just the book you need. What you'll learn Set up MongoDB on all major server platforms, including Windows, Linux, OS X, and cloud platforms like Rackspace, Azure, and Amazon EC2 Work with GridFS and the new aggregation framework Work with your data using non-SQL commands Write applications using either PHP or Python Optimize MongoDB Master MongoDB administration, including replication, replication tagging, and tag-aware sharding Who this book is for Database admins and developers who need to get up to speed on MongoDB and its Big Data, NoSQL approach to dealing with data management.

JavaScript and JSON Essentials

"JavaScript and JSON Essentials" is a focused tutorial that introduces you to the lightweight JSON data format, essential for effective data storage and transfer with JavaScript. By following this book, you'll gain the expertise to work with JSON in web applications, including tasks such as serialization, asynchronous calls, and debugging. What this Book will help me do Fully understand the structure and use of JSON and how it integrates with JavaScript. Learn to implement synchronous and asynchronous data transfers using JSON. Develop skills in creating, updating, and manipulating JSON objects effectively. Master the design of web functionalities like the Carousel application using JSON. Gain knowledge about best practices in debugging and optimizing JSON for web applications. Author(s) Sai S. Sriparasa is a seasoned developer and educator with extensive experience in JavaScript and related technologies. Having worked on numerous data-driven projects, Sai integrates real-world scenarios into his writing. His tutorials are crafted to be approachable and practical, aimed at demystifying complex concepts for a diverse audience. Who is it for? This book is ideal for web developers who are familiar with JavaScript and seek to expand their understanding of JSON. Suitable for programmers who have a basic knowledge of HTML and some exposure to server-side languages like PHP or Python. Those aiming to integrate efficient data exchange formats into their web applications will find it highly beneficial. It's also a good resource for JavaScript developers wanting to delve deeper into the synchronous and asynchronous handling of data.

Agile Data Science

Mining big data requires a deep investment in people and time. How can you be sure you’re building the right models? With this hands-on book, you’ll learn a flexible toolset and methodology for building effective analytics applications with Hadoop. Using lightweight tools such as Python, Apache Pig, and the D3.js library, your team will create an agile environment for exploring data, starting with an example application to mine your own email inboxes. You’ll learn an iterative approach that enables you to quickly change the kind of analysis you’re doing, depending on what the data is telling you. All example code in this book is available as working Heroku apps. Create analytics applications by using the agile big data development methodology Build value from your data in a series of agile sprints, using the data-value stack Gain insight by using several data structures to extract multiple features from a single dataset Visualize data with charts, and expose different aspects through interactive reports Use historical data to predict the future, and translate predictions into action Get feedback from users after each sprint to keep your project on track

Introducing Geographic Information Systems with ArcGIS: A Workbook Approach to Learning GIS, 3rd Edition

An integrated approach that combines essential GIS background with a practical workbook on applying the principles in ArcGIS® 10.0 and 10.1 Introducing Geographic Information Systems with ArcGIS® integrates a broad introduction to GIS with a software-specific workbook for Esri's ArcGIS®. Where most courses make do using two separate texts, one covering GIS and another the software, this book enables students and instructors to use a single text with an integrated approach covering both in one volume with a common vocabulary and instructional style. This revised edition focuses on the latest software updates—ArcGIS® 10.0 and 10.1. In addition to its already successful coverage, the book allows students to experience publishing maps on the Internet through new exercises, and introduces the idea of programming in the language Esri has chosen for applications (i.e., Python). A DVD is packaged with the book, as in prior editions, containing data for working out all of the exercises. This complete, user-friendly coursebook: Is updated for the latest ArcGIS® releases—ArcGIS® 10.0 and 10.1 Introduces the central concepts of GIS and topics needed to understand spatial information analysis Provides a considerable ability to operate important tools in ArcGIS® Demonstrates new capabilities of ArcGIS® 10.0 and 10.1 Provides a basis for the advanced study of GIS and the study of the newly emerging field of GIScience Introducing Geographic Information Systems with ArcGIS®, Third Edition is the ideal guide for undergraduate students taking courses such as Introduction to GIS, Fundamentals of GIS, and Introduction to ArcGIS® Desktop. It is also an important guide for professionals looking to update their skills for ArcGIS® 10.0 and 10.1.

MongoDB Applied Design Patterns

Whether you’re building a social media site or an internal-use enterprise application, this hands-on guide shows you the connection between MongoDB and the business problems it’s designed to solve. You’ll learn how to apply MongoDB design patterns to several challenging domains, such as ecommerce, content management, and online gaming. Using Python and JavaScript code examples, you’ll discover how MongoDB lets you scale your data model while simplifying the development process. Many businesses launch NoSQL databases without understanding the techniques for using their features most effectively. This book demonstrates the benefits of document embedding, polymorphic schemas, and other MongoDB patterns for tackling specific big data use cases, including: Operational intelligence: Perform real-time analytics of business data Ecommerce: Use MongoDB as a product catalog master or inventory management system Content management: Learn methods for storing content nodes, binary assets, and discussions Online advertising networks: Apply techniques for frequency capping ad impressions, and keyword targeting and bidding Social networking: Learn how to store a complex social graph, modeled after Google+ Online gaming: Provide concurrent access to character and world data for a multiplayer role-playing game

Programming ArcGIS 10.1 with Python Cookbook

Programming ArcGIS 10.1 with Python Cookbook offers a comprehensive guide for GIS professionals aiming to streamline their workflows using Python scripting within ArcGIS Desktop. This book provides hands-on recipes for automating geoprocessing tasks, managing map data, and creating custom tools, making it an essential resource for mastering efficient GIS operations. What this Book will help me do Understand the fundamentals of Python programming as it applies to GIS. Learn to automate tasks such as map production and geoprocessing. Develop customized tools and add-ons to extend ArcGIS capabilities. Improve efficiencies by fixing data errors and working with feature datasets. Gain the ability to schedule and manage complex GIS workflows using Python scripts. Author(s) Donald Eric Pimpler and Eric Pimpler are seasoned professionals in geospatial analysis, with years of experience incorporating Python programming into GIS workflows. Their approach combines practical insights with easy-to-follow methods, resulting in a clear and impactful guide for advancing your GIS skills. Who is it for? The ideal readers are GIS professionals or students in geographical sciences aiming to enhance their technical skills. Prior basic programming knowledge is helpful but not mandatory. The content is tailored for those looking to automate repetitive geospatial tasks and manage complex spatial datasets efficiently in ArcGIS. This book serves as a practical guide for gaining expertise in combining Python programming with GIS.

Developing with Couchbase Server

Today’s highly interactive websites pose a challenge for traditional SQL databases—the ability to scale rapidly and serve loads of concurrent users. With this concise guide, you’ll learn how to build web applications on top of Couchbase Server 2.0, a NoSQL database that can handle websites and social media where hundreds of thousands of users read and write large volumes of information. Using food recipe information as examples, this book demonstrates how to take advantage of Couchbase’s document-oriented database design, and how to store and query data with various CRUD operations. Discover why Couchbase is better than SQL databases with memcached tiers for managing data from the most interactive portions of your application. Learn about Couchbase Server’s cluster-based architecture and how it differs from SQL databases Choose a client library for Java, .NET, Ruby, Python, PHP, or C, and connect to a cluster Structure data in a variety of formats, from serialized objects, a stream of raw bytes, or as JSON documents Learn core storage and retrieval methods, including document IDs, expiry times, and concurrent updates Create views with map/reduce and learn Couchbase mechanisms for querying and selection

SciPy and NumPy

Are you new to SciPy and NumPy? Do you want to learn it quickly and easily through examples and a concise introduction? Then this is the book for you. You’ll cut through the complexity of online documentation and discover how easily you can get up to speed with these Python libraries. Ideal for data analysts and scientists in any field, this overview shows you how to use NumPy for numerical processing, including array indexing, math operations, and loading and saving data. You’ll learn how SciPy helps you work with advanced mathematical functions such as optimization, interpolation, integration, clustering, statistics, and other tools that take scientific programming to a whole new level. The new edition is now available, fully revised and updated in June 2013. Learn the capabilities of NumPy arrays, element-by-element operations, and core mathematical operations Solve minimization problems quickly with SciPy’s optimization package Use SciPy functions for interpolation, from simple univariate to complex multivariate cases Apply a variety of SciPy statistical tools such as distributions and functions Learn SciPy’s spatial and cluster analysis classes Save operation time and memory usage with sparse matrices

Python for Data Analysis

Python for Data Analysis is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python. It is also a practical, modern introduction to scientific computing in Python, tailored for data-intensive applications. This is a book about the parts of the Python language and libraries you’ll need to effectively solve a broad set of data analysis problems. This book is not an exposition on analytical methods using Python as the implementation language. Written by Wes McKinney, the main author of the pandas library, this hands-on book is packed with practical cases studies. It’s ideal for analysts new to Python and for Python programmers new to scientific computing. Use the IPython interactive shell as your primary development environment Learn basic and advanced NumPy (Numerical Python) features Get started with data analysis tools in the pandas library Use high-performance tools to load, clean, transform, merge, and reshape data Create scatter plots and static or interactive visualizations with matplotlib Apply the pandas groupby facility to slice, dice, and summarize datasets Measure data by points in time, whether it’s specific instances, fixed periods, or intervals Learn how to solve problems in web analytics, social sciences, finance, and economics, through detailed examples

Getting Started with Storm

Even as big data is turning the world upside down, the next phase of the revolution is already taking shape: real-time data analysis. This hands-on guide introduces you to Storm, a distributed, JVM-based system for processing streaming data. Through simple tutorials, sample Java code, and a complete real-world scenario, you’ll learn how to build fast, fault-tolerant solutions that process results as soon as the data arrives. Discover how easy it is to set up Storm clusters for solving various problems, including continuous data computation, distributed remote procedure calls, and data stream processing. Learn how to program Storm components: spouts for data input and bolts for data transformation Discover how data is exchanged between spouts and bolts in a Storm topology Make spouts fault-tolerant with several commonly used design strategies Explore bolts—their life cycle, strategies for design, and ways to implement them Scale your solution by defining each component’s level of parallelism Study a real-time web analytics system built with Node.js, a Redis server, and a Storm topology Write spouts and bolts with non-JVM languages such as Python, Ruby, and Javascript

RabbitMQ in Action

RabbitMQ in Action is a fast-paced run through building and managing scalable applications using the RabbitMQ messaging server. It starts by explaining how message queuing works, its history, and how RabbitMQ fits in. Then it shows you real-world examples you can apply to your own scalability and interoperability challenges. About the Technology There's a virtual switchboard at the core of most large applications where messages race between servers, programs, and services. RabbitMQ is an efficient and easy-to-deploy queue that handles this message traffic effortlessly in all situations, from web startups to massive enterprise systems. About the Book RabbitMQ in Action teaches you to build and manage scalable applications in multiple languages using the RabbitMQ messaging server. It's a snap to get started. You'll learn how message queuing works and how RabbitMQ fits in. Then, you'll explore practical scalability and interoperability issues through many examples. By the end, you'll know how to make Rabbit run like a well-oiled machine in a 24 x 7 x 365 environment. What's Inside Learn fundamental messaging design patterns Use patterns for on-demand scalability Glue a PHP frontend to a backend written in anything Implement a PubSub-alerting service in 30 minutes flat Configure RabbitMQ's built-in clustering Monitor, manage, extend, and tune RabbitMQ About the Reader Written for developers familiar with Python, PHP, Java, .NET, or any other modern programming language. No RabbitMQ experience required. About the Authors Alvaro Videla is a developer and architect specializing in MQ-based applications. Jason J. W. Williams is CTO of DigiTar, a messaging service provider, where he directs design and development. Quotes In this outstanding work, two experts share their years of experience running large-scale RabbitMQ systems. - Alexis Richardson, VMware Well-written, thoughtful, and easy to follow. - Karsten Strøbæk, Microsoft Soup to nuts on RabbitMQ; a wide variety of in-depth examples. - Patrick Lemiuex, Voxel Internap This book will take you to a messaging wonderland. - David Dossot, Coauthor of Mule in Action

Getting Started with Fluidinfo

Imagine a public storage system that has a place online for structured data about everything that exists—or that could exist. This book introduces Fluidinfo, a system that enables you to store information about anything, real or imaginary, in any digital form. You’ll learn how to organize and search for data, and decide who can use, modify, and extend what you’ve contributed. This guide demonstrates Fluidinfo’s potential to create social data, with facilities that encourage users and applications to share, remix, and reuse data in ways they may not have anticipated. You’ll learn how to use tools for reading and writing data, and how to use Fluidinfo in your own applications by working with its writable API and simple query language. Read and write Fluidinfo data from web applications—and reuse and build upon each other’s data Discover Fluidinfo’s permissions system for tags and namespaces Learn how to use Fish, the command-line tool for interacting with Fluidinfo data Delve into Fluidinfo’s RESTful API, and learn how to make HTTP requests Use Fluidinfo client libraries to build a simple Python utility or a JavaScript web application

Programming Pig

This guide is an ideal learning tool and reference for Apache Pig, the open source engine for executing parallel data flows on Hadoop. With Pig, you can batch-process data without having to create a full-fledged application—making it easy for you to experiment with new datasets. Programming Pig introduces new users to Pig, and provides experienced users with comprehensive coverage on key features such as the Pig Latin scripting language, the Grunt shell, and User Defined Functions (UDFs) for extending Pig. If you need to analyze terabytes of data, this book shows you how to do it efficiently with Pig. Delve into Pig’s data model, including scalar and complex data types Write Pig Latin scripts to sort, group, join, project, and filter your data Use Grunt to work with the Hadoop Distributed File System (HDFS) Build complex data processing pipelines with Pig’s macros and modularity features Embed Pig Latin in Python for iterative processing and other advanced tasks Create your own load and store functions to handle data formats and storage mechanisms Get performance tips for running scripts on Hadoop clusters in less time

The Art of R Programming

R is the world's most popular language for developing statistical software: Archaeologists use it to track the spread of ancient civilizations, drug companies use it to discover which medications are safe and effective, and actuaries use it to assess financial risks and keep economies running smoothly. The Art of R Programming takes you on a guided tour of software development with R, from basic types and data structures to advanced topics like closures, recursion, and anonymous functions. No statistical knowledge is required, and your programming skills can range from hobbyist to pro. Along the way, you'll learn about functional and object-oriented programming, running mathematical simulations, and rearranging complex data into simpler, more useful formats. You'll also learn to: •Create artful graphs to visualize complex data sets and functions •Write more efficient code using parallel R and vectorization •Interface R with C/C++ and Python for increased speed or functionality •Find new R packages for text analysis, image manipulation, and more •Squash annoying bugs with advanced debugging techniques Whether you're designing aircraft, forecasting the weather, or you just need to tame your data, The Art of R Programming is your guide to harnessing the power of statistical computing.

MongoDB and Python

Learn how to leverage MongoDB with your Python applications, using the hands-on recipes in this book. You get complete code samples for tasks such as making fast geo queries for location-based apps, efficiently indexing your user documents for social-graph lookups, and many other scenarios. This guide explains the basics of the document-oriented database and shows you how to set up a Python environment with it. Learn how to read and write to MongoDB, apply idiomatic MongoDB and Python patterns, and use the database with several popular Python web frameworks. You’ll discover how to model your data, write effective queries, and avoid concurrency problems such as race conditions and deadlocks. The recipes will help you: Read, write, count, and sort documents in a MongoDB collection Learn how to use the rich MongoDB query language Maintain data integrity in replicated/distributed MongoDB environments Use embedding to efficiently model your data without joins Code defensively to avoid keyerrors and other bugs Apply atomic operations to update game scores, billing systems, and more with the fast accounting pattern Use MongoDB with the Pylons 1.x, Django, and Pyramid web frameworks

Mining the Social Web

Popular social networks such as Facebook and Twitter generate a tremendous amount of valuable data on topics and use patterns. Who's talking to whom? What are they talking about? How often are they talking? This concise and practical book shows you how to answer these questions and more by harvesting and analyzing data using social web APIs, Python, and pragmatic storage technologies such as Redis, CouchDB, and NetworkX. With Mining the Social Web, intermediate to advanced programmers will learn how to harvest and analyze social data in way that lends itself to hacking as well as more industrial-strength analysis. Algorithms are designed with robustness and efficiency in mind so that the approaches scale well on an ordinary piece of commodity hardware. The book is highly readable from cover to cover as content progressively grows in complexity, but also lends itself to being read in an ad-hoc fashion. Use easily adaptable scripts to access popular social network APIs including Twitter, OpenSocial, and Facebook Learn approaches for slicing and dicing social data that's been harvested from social web APIs as well as other common formats such as email and markup formats Harvest data from other sources such as Freebase and other sites to enrich your analytic capabilities with additional context Visualize and analyze data in interactive ways with tools built upon rich UI JavaScript toolkits Get a concise and straightforward synopsis of some practical technologies from the semantic web landscape that you can incorporate into your analysis This book is still in progress, but you can get going on this technology through our Rough Cuts edition, which lets you read the manuscript as it's being written, either online or via PDF.