Home arrow Search Engine News arrow Google`s Latest Moves in Information I...
SEARCH DEVARTICLES

TOOLS YOU CAN USE

advertisement

Google`s Latest Moves in Information Indexing


(Page 1 of 4 )

Sometimes Google does something with very little fanfare that stirs considerable interest. In this article, I’m going to discuss several of their recent moves. If you’re curious about their attempts to index more of the web or make their indexing more useful for searchers, keep reading; you’ve come to the right place.

SEOs have known for the longest time that HTML forms are potentially problematic. Any content that requires a user to fill out a form to peruse will trip up search engine spiders and remain unindexed. That's perfectly fine if that's what you want to have happen. Not all online content is for sharing, and if your content is valuable enough to encourage subscribers to pay good money for it, as happens with certain medical and legal indexes, you may not want general search engines to root around in your index and turn it up free for the asking.

Google wants to change that. In a recent post to the Google Webmaster Central Blog, the search engine revealed that "we have been exploring some HTML forms to try to discover new web pages and URLs that we otherwise couldn't find and index for users who search on Google." They make certain automated entries into the form based in part on content from the site, and "If we ascertain that the web page resulting from our query is valid, interesting, and includes content not in our index, we may include it in our index much as we would include any other web page."

The googlebot's new abilities stem from Google's purchase of Transformic in 2005. Transformic was working on exactly this problem. Anand Rajaraman, writing for Datawocky, mentioned working with one of Transformic's major researchers (Alon Halevy, who also made the recent Google blog post) back in 1995. He noted that Transformic was attempting to solve two problems with their technology. First, they needed to be able to determine which web forms were worth penetrating. Then, "If we decide to crawl behind a form, how do we fill in values in the form to get at the data behind it?" Rajaraman asked. Check boxes and radio buttons were no big deal, but with "free-text inputs, the problem is quite challenging - we need to understand the semantics of the input box to guess possible valid inputs."

This latest move is Google's way of crawling what has often been referred to as the Hidden, Deep, or Invisible Web. Google insists that it will continue to respect robots.txt files. But the move is not without its problems, and a number of observers have expressed concerns. I'll be covering those issues in the next section.

More Search Engine News Articles
More By Terri Wells

blog comments powered by Disqus

SEARCH ENGINE NEWS ARTICLES

- Zurker: Social Network for the 99 Percent?
- SOPA and PIPA: Bad Ideas
- Siri`s Search Strangeness Not Apple`s Fault
- Google Plus One Rivals Facebook Like Button
- Google Launches Media Ads for AdWords
- Targeting Keyword Domains Next on Google Age...
- Google Cracking Down on Fake Goods
- Google Panda Update Slams Content Farms
- What the JC Penney Link Buying Scandal Mean...
- New Panguso Search Engine Launches in China
- Google Changes Algorithm for Low Quality Sit...
- Google`s New Chrome Extension
- Update Your SEO Vocabulary
- Bing Searches Increase Strongly in January
- Facebook Unveils New Sponsored Stories Featu...
 
SEO Chat Forums  
 RSS  Articles
 RSS  Forums
 RSS  All Feeds
Contact Us 
Site Map 
Request Media Kit
Write For Us Get Paid 
SEO Weekly Newsletter
 
SEO Tools
Adsense Calculator
AdSense Preview
Advanced Meta-Tags
Alexa Rank Tool
Check Server Headers
Class C Checker
Code to Text Ratio
CPM Calculator
Domain Age Check
Domain Typos
Future PageRank
Google Dance
Google Keywords
Google Search
Google Suggest
Google vs Yahoo
Indexed Pages
Keyword Cloud
Keyword Density
Keyword Difficulty
Keyword Optimizer
Keyword Position
Keyword Typos
Link Popularity
Link Price Calculator
Meta Analyzer
Meta Tag Generator
Multiple Link Popularity
Page Comparison
Page Size
PageRank Lookup
PageRank Search
Robots.txt Generator
ROI Calculator 
S.E. Comparison 
S.E. Keyword Position 
Site Link Analyzer 
Spider Simulator 
URL Redirect Check 
URL Rewriting 
Privacy Policy 
Support 


© 2003-2012 by Developer Shed. All rights reserved. DS Cluster 5 - Follow our Sitemap
Popular SEO Chat Topics
All Tutorials & Tools
 
SEO Chat is sponsored by:
Close this Sponsor Message