talk-data.com talk-data.com

Topic

web-scraping

24

tagged

Activity Trend

1 peak/qtr
2020-Q1 2026-Q1

Activities

24 activities · Newest first

Greasemonkey Hacks

Greasemonkey Hacks is an invaluable compendium 100 ingenious hacks for power users who want to master Greasemonkey, the hot new Firefox extension that allows you to write scripts that alter the web pages you visit. With Greasemonkey, you can create scripts that make a web site more usable, fix rendering bugs that site owners can't be bothered to fix themselves, or add items to a web site's menu bar. You can alter pages so they work better with technologies that speak a web page out loud or convert it to Braille. Greasemonkey gurus can even import, combine, and alter data from different web sites to meet their own specific needs. Greasemonkey has achieved a cult-like following in its short lifespan, but its uses are just beginning to be explored. Let's say you're shopping on an e-commerce site. You can create a script that will automatically display competitive prices for that particular product from other web sites. The possibilities are limited only by your imagination and your Greasemonkey expertise. Greasemonkey Hacks can't help you with the imagination part, but it can provide the expert hacks-complete with the sample code-you need to turn your brainstorms into reality. More than just an essential collection of made-to-order Greasemonkey solutions, Greasemonkey Hacks is crammed with sample code, a Greasemonkey API reference, and a comprehensive list of resources, to ensure that every resource you need is available between its covers. Some people are content to receive information from websites passively; some people want to control it. If you are one of the latter, Greasemonkey Hacks provides all the clever customizations and cutting-edge tips and tools you need to take command of any web page you view.

Fuzzy Modeling and Genetic Algorithms for Data Mining and Exploration

Fuzzy Modeling and Genetic Algorithms for Data Mining and Exploration is a handbook for analysts, engineers, and managers involved in developing data mining models in business and government. As you’ll discover, fuzzy systems are extraordinarily valuable tools for representing and manipulating all kinds of data, and genetic algorithms and evolutionary programming techniques drawn from biology provide the most effective means for designing and tuning these systems. You don’t need a background in fuzzy modeling or genetic algorithms to benefit, for this book provides it, along with detailed instruction in methods that you can immediately put to work in your own projects. The author provides many diverse examples and also an extended example in which evolutionary strategies are used to create a complex scheduling system. Written to provide analysts, engineers, and managers with the background and specific instruction needed to develop and implement more effective data mining systems Helps you to understand the trade-offs implicit in various models and model architectures Provides extensive coverage of fuzzy SQL querying, fuzzy clustering, and fuzzy rule induction Lays out a roadmap for exploring data, selecting model system measures, organizing adaptive feedback loops, selecting a model configuration, implementing a working model, and validating the final model In an extended example, applies evolutionary programming techniques to solve a complicated scheduling problem Presents examples in C, C++, Java, and easy-to-understand pseudo-code Extensive online component, including sample code and a complete data mining workbench

Spidering Hacks

The Internet, with its profusion of information, has made us hungry for ever more, ever better data. Out of necessity, many of us have become pretty adept with search engine queries, but there are times when even the most powerful search engines aren't enough. If you've ever wanted your data in a different form than it's presented, or wanted to collect data from several sites and see it side-by-side without the constraints of a browser, then Spidering Hacks is for you. Spidering Hacks takes you to the next level in Internet data retrieval--beyond search engines--by showing you how to create spiders and bots to retrieve information from your favorite sites and data sources. You'll no longer feel constrained by the way host sites think you want to see their data presented--you'll learn how to scrape and repurpose raw data so you can view in a way that's meaningful to you.Written for developers, researchers, technical assistants, librarians, and power users, Spidering Hacks provides expert tips on spidering and scraping methodologies. You'll begin with a crash course in spidering concepts, tools (Perl, LWP, out-of-the-box utilities), and ethics (how to know when you've gone too far: what's acceptable and unacceptable). Next, you'll collect media files and data from databases. Then you'll learn how to interpret and understand the data, repurpose it for use in other applications, and even build authorized interfaces to integrate the data into your own content. By the time you finish Spidering Hacks, you'll be able to: Aggregate and associate data from disparate locations, then store and manipulate the data as you like Gain a competitive edge in business by knowing when competitors' products are on sale, and comparing sales ranks and product placement on e-commerce sites Integrate third-party data into your own applications or web sites Make your own site easier to scrape and more usable to others Keep up-to-date with your favorite comics strips, news stories, stock tips, and more without visiting the site every dayLike the other books in O'Reilly's popular Hacks series, Spidering Hacks brings you 100 industrial-strength tips and tools from the experts to help you master this technology. If you're interested in data retrieval of any type, this book provides a wealth of data for finding a wealth of data.

Mining the Web

Mining the Web: Discovering Knowledge from Hypertext Data is the first book devoted entirely to techniques for producing knowledge from the vast body of unstructured Web data. Building on an initial survey of infrastructural issues—including Web crawling and indexing—Chakrabarti examines low-level machine learning techniques as they relate specifically to the challenges of Web mining. He then devotes the final part of the book to applications that unite infrastructure and analysis to bring machine learning to bear on systematically acquired and stored data. Here the focus is on results: the strengths and weaknesses of these applications, along with their potential as foundations for further progress. From Chakrabarti's work—painstaking, critical, and forward-looking—readers will gain the theoretical and practical understanding they need to contribute to the Web mining effort. * A comprehensive, critical exploration of statistics-based attempts to make sense of Web Mining. * Details the special challenges associated with analyzing unstructured and semi-structured data. * Looks at how classical Information Retrieval techniques have been modified for use with Web data. * Focuses on today's dominant learning methods: clustering and classification, hyperlink analysis, and supervised and semi-supervised learning. * Analyzes current applications for resource discovery and social network analysis. * An excellent way to introduce students to especially vital applications of data mining and machine learning technology.