Superior Searching

Want to get the most of out of your searches on Google? See these advanced tips from the book Google: The Missing Manual (O’Reilly Media, 2004, ISBN: 0-596-00613-6). Authors Sarah Milstein and Rael Dornfest talk about using Froogle, getting local results and more. (See link for Chapter 1.)

GoogleMM

Searching the Web is like panning for gold. There’s a lot of dirt out there, and you need the right tools to get at the shiny nuggets. The previous chapter provided the sieve. But to become a real search jockey, you need tweezers. And forceps. And maybe a staple gun.

It might help to think of every search as a problem, and to bear in mind that different problems require different solutions. “How do I find out which Web sites link to mine?” needs a different approach than “I want to find sites about Miss Piggy—but only those in Urdu.”

This chapter sets you up with an array of techniques that you can use to run different kinds of searches or get more specific results from any search. Because Google’s preference settings can affect all of your results big time, this chapter starts with them.

Have it Your Way: Setting Preferences

Software programs almost always let you change some settings, like the way Microsoft Word, for example, lets you choose the standard font or turn spell checking on and off. Google lets you set some preferences, too. But unlike Word and other programs that hang out on your hard drive, Google remembers your settings with a cookie, a tiny program that a Web site can place on your computer and communicate with.

You can reach Google’s settings page by clicking Preferences on the home page or at the top of any results page. Figure 2-1 shows the Preferences page, from which Google lets you control five settings: interface language, search language, filtering, number of results, and which window the results appear in. You have to click Save Preferences to activate the new settings.

Tip: If you change your settings and return to Google only to find they didn’t take, your browser could be set to reject cookies. Check your browser’s security or privacy settings. In Internet Explorer, for example, choose Tools → Options and then click the Privacy tab. You can move the slider to change the intensity with which the program blocks cookies (anything below the highest setting works for Google). Or you can click Edit to specify a Web site from which you want to allow cookies.

milstein

 

Buy the book! If you’ve enjoyed what you’ve seen here, or to get more information, click on the “Buy the book!” graphic. Pick up a copy today!

Visit the O’Reilly Network http://www.oreillynet.com for more online content.

Interface Language

The interface language controls the language Google uses to display tips and messages. Google sets itself to use English, but you can change it to anything from Afrikaans to Zulu. If you want to add some zing to your Google experience, try Bork, bork, bork! (the Swedish Chef), Elmer Fudd, or Hacker. For the less adventurous, Google provides Tamil and Scots Gaelic.

Search Language

The search language is different from the interface language. Instead of affecting the way Google talks to you, the Search Language limits your results to pages that are written in the language you specify. Google assumes you want sites in any language, but if you’d prefer sites only in Finnish and Catalan, this is the place to say so. Just click the blank box next to a language.

Note: You can choose as many languages at once as you like.

The tricky part is that this setting really, really limits your results. Unless you know for certain that you always want to search in one language, you’re probably better off using Google’s Language Tools (page 57), for individual searches.

SafeSearch Filtering

It’s no secret that the Web is home to a lot of sexually explicit text and pictures. If you want your search results to avoid some or possibly all of that material, you can filter it out with SafeSearch, which has three modes:

  • Moderate filtering removes results with explicit images , but not explicit language. Google comes set to Moderate.

  • Strict filtering nixes both explicit images and language.

  • No filtering blocks nothing.

Moderate filtering usually works fine for most casual Web surfers. Because Google tries to give you the most relevant pages first, a search for breast cancer or impotence is unlikely to yield inappropriately salacious results in the first few thousand listings when you’ve got moderate filtering on. But occasionally, moderate filtering can cause you to miss something important. If you don’t mind surprises every now and then, turn the filtering off. Those of you who want some spicy content—or none at all—know who you are.

Number of Results

Google is set to display ten results per page. Studies have shown that most people never click past the first page of results on any search site. You can increase your changes of finding your Holy Grail if Google shows you more results at once. Just change the setting to 20 results per page.

Note: You can change this setting for up to 100 results per page, but it can get hard to read really long results pages. And if your computer is very slow, it may take it an annoyingly long time to display a big list. Twenty to 30 results per page is probably a good range .

Results Window

Unless you tell it otherwise, Google displays your results pages in the same browser window where you ran your search. So when you click a link on the results page, Google replaces your results list with the page you’ve decided to explore. But when you turn on “Open search results in a new browser window,” Google leaves your results page intact and starts up a new copy of your browser when you click a result. (After it’s opened the second window, Google doesn’t keep opening new windows when you click results links; it just switches what you see in the second one.)

Note: If you’re using a browser with tabs–a feature that lets you open multiple Web pages in one browser window, this setting can make your search results open in a new tab, rather than a new window.

Google comes with this setting turned off, but it’s a great one to turn on, as it lets you use one window to view the Web pages you’ve found and the other to keep track of your results list. If you often explore more than one result, or if you find yourself clicking deep into sites or following links to new pages, this setting can save you a lot of hassle getting back to your results. (If your computer is getting geriatric and struggles to keep open a lot of windows, then leave this setting off.)

Tip: In most browsers, you can open additional windows on a case-by-case basis by right-clicking any link and choosing something like “Open in new window” or “Open link in new window.”

Buy the book! If you’ve enjoyed what you’ve seen here, or to get more information, click on the “Buy the book!” graphic. Pick up a copy today!

Visit the O’Reilly Network http://www.oreillynet.com for more online content.

On every Web page it tracks, Google records a handful of things, including the URL, body text, links to other pages, files in particular formats, and other goodies (see Chapter 1 for details). A simple Google search tells you when those things exist on the pages in your results. But if you want to search by file type, or look for text in URLs only, you need the help of a beefed-up search feature. And if you don’t remember how to tell Google you want to search for an exact phrase or that you want to exclude a word from your search, you may also feel stuck.

Google’s Advanced Search page can help you sort things out. You can get to it by clicking the Advanced Search link on Google’s home page or at the top of any results page. Figure 2-2 shows that it’s a pretty straightforward form.

milstein

Refining Your Search

Advanced Search can be a quick way to build a complex query. It’s particularly handy when you want to run a multipart search, such as one that looks for either of two phrases but searches for them only in URLs, not in the pages’ body text, for instance. Or one that looks for PDFs in Icelandic.

Tip: Occasionally, multipart queries from Google’s Advanced Search page simply don’t work. If you run one that doesn’t, consider trying syntax (describedin detailon page 60), whichis sometimes more flexible. Syntax refers to words that Google understands to mean, “Run a special kind of search.” For example, the syntax define, followed by a colon and another word (as in, define:vintner), tells Google to search for a definition of the second word (in this case, vintner).

Query words

At the top of the page, in the shaded gray section labeled Find Results, Google gives you four choices for how you would like it to treat your search terms. In order from top to bottom, these mimic the results you can get by using the operators AND, quote marks, OR, and the minus sign. The helpful thing here is that you can use these puppies in combination—which is nice when you want to do something like search for two phrases simultaneously, or search for one and exclude the other. Figure 2-3 shows you an example.

milstein

Language

The language menu lets you specify whether you want your results to include pages written in any language or just one particular language.

Tip: The language tools, explained on page 57, actually give you a lot more control over this factor.

File format

The cool thing about the file format option is that not only can you search for specific file types, like Word or PowerPoint documents, but you can exclude them, too. If you’re looking for an example of a table of contents in any format other than PDF (page 31), this is the place to let Google know.

On the other hand, this Advanced Search feature only lets you specify PDF, Postscript, Word, Excel, PowerPoint, and Rich Text Format. It doesn’t let you choose from the many other file types that Google indexes. For those, you have to use the filetype syntax, described on page 57.

Date

The Date option allows you to limit your search results to pages that Google has recorded in the last three months, six months, or year. This search has nothing to do with the date a page was created, but rather when Google indexed it. (For an explanation of Google’s indexing process, see page 2.) If you created a page on March 15 but Google didn’t record it until August 29, it shows up in a date search for August 29. (See page 67 for some suggestions on searching content creation dates.)

Note: Google rerecords pages regularly—usually every few weeks. But if a page’s content doesn’t change from one recording to the next, Google doesn’t update the index date.

So what’s the use of this feature? Google actually indexes the Web regularly enough that specifying a range can help filter out irrelevant results. If you’re wondering what Lance Armstrong has been up to since the Tour de France, try searching for pages in the last month (or however long it’s been since the Tour).

To zoom in on any date range beyond the last three months, six months, or year, see the discussion of the daterange operator on page 66.

Buy the book! If you’ve enjoyed what you’ve seen here, or to get more information, click on the “Buy the book!” graphic. Pick up a copy today!

Visit the O’Reilly Network http://www.oreillynet.com for more online content.

Google keeps track of text in the body of a page, in the URL, in the links to other pages, and in the title (which is different from the URL). The Occurrences pop-up menu lets you tell Google when you’re looking for results from only one of those places. Here’s when you might want to use them:

In the title. A Web page’s URL is not the same thing as its title. A URL is an address that your computer can read, and sometimes you can read it, too (for example, www.npr.org). But often, URLs are super-long and contain a slew of characters and symbols that make no sense unless you’re a droid. In those cases, it’s useful when a page has a separate, readable title that a Webmaster has written to help you understand what’s on that page. The first line of a Google result is usually a page’s title, not its URL, as Figure 2-4 illustrates.

milstein

A word that’s mentioned in the title of a page is more likely to indicate what’s on that page than a word that shows up randomly in the text. For example, a page called “File-sharing for fun and profit” is more likely to explain how to go about file-sharing than a page that simply mentions it as part of another discussion. Use this feature to get a smaller, more focused list of results.

In the text. Asking Google to ignore titles, URLs, and links is useful when you want to search for keywords or phrases that are likely to show up all over the place. For example, if you want only sites that discuss those bumpkins known as yahoos, and you don’t want pages from Yahoo.com or links to that site, use this feature to filter out references to the Web site.

In the URL. Want to find out how many sites have already used the word “sneaker” in their URLs? Here’s the place to check. Happily, this feature does not limit you to simple Web addresses, like www.sneaker-nation.com; it also produces more complex results, like www.cynosure.com.au/isp/sneaker.

Note: Searching for a term within a URL only yields results with whole words. In the example above, Google would give you back www.sneaker-fetish.com or www.sneaker.fetish.com, but not www.sneakerfetish.com.

In the links. This feature simply searches for the text in hyperlinks that connect pages. It’s useful in two situations. First, if you want to find out what pages have links to a certain person, phrase, or site, the “in links to the page” option can give you a rough idea.

Note: The text of a link may have nothing to do with the page it links to. Most commonly, you see sentences like, “To read about Barry Bonds, click here.” If “here” is the text for the link, your search for “Barry Bonds” isn’t going to bring up this page.

Second, the links search can help you find a person’s email address, because on most Web pages, an email address is a link. If the person’s name is part of the email address, or if the page says something like, “For more information, email Brad Pitt,” you’re in business.

Buy the book! If you’ve enjoyed what you’ve seen here, or to get more information, click on the “Buy the book!” graphic. Pick up a copy today!

Visit the O’Reilly Network http://www.oreillynet.com for more online content.

 
Domain 

The domain feature lets you restrict your search to a single site or to a domain (like .edu or .com). The site restriction is useful when you want to look up specific keywords on a site that has no search function (or that has a lousy one). It’s also good for a site whose search function is seriously annoying—maybe it displays results confusingly, or it doesn’t let you use the OR operator. And sometimes a site search turns up goodies you simply can’t seem to reach through a regular onsite search. To see the difference, try running a query with NYTimes.com as the site, and then try the same search terms on the New York Times site.

Note: The site search doesn’t search related Web sites. For example, if you want to search all of Google’s sites, restricting your query to Google.com means you’ll miss anything in www.answers.google.com, http:// labs.google.com, and so on. To make sure you hit those sites, too, try searching for Google in the URL, described above.

Limiting your search results to a particular domain can help, for example, sift out sites that want to sell you things. Figure 2-5 shows you what a difference it can make to limit your results to the .org domain—thus filtering out the common .coms and other flotsam.

The domain option also lets you rule out a particular site or domain—handy if your results are peppered with one site or domain that you know doesn’t contain the info you want.

SafeSearch

As described in the above discussion of preferences, SafeSearch lets you filter out explicit sexual content. If you normally keep it turned on, but you want it off for a single search that might suffer from filtering, like bra sizes, you can make the change here.

milstein

Froogle Product Search

Froogle is a Google test project that lets you compare prices for products. This feature is explained in detail in Chapter 5.

Page-Specific Search

Google lets you run two special searches for any particular page.

Similar

When you type a URL in the Similar box, Google searches for pages in that general category. For example, the pages related to www.nascar.com are things like NFL.com, MLB.com, NBA.com, ESPN.com, PGA.com, and so on.

Note: The Similar feature runs the same search as the “Similar pages” link that shows up in a Google result (see page 29).

Links

If you have a Web site, you might spend fully half your waking hours wondering who has linked to your pages. Just type in a URL here, and Google spits out a list of pages linked to it. (Obviously, this works even if the URL is not for your own site.)

Topic-Specific Searches

In a few broad categories, Google has already done a little filtering for you. With the exception of the Catalog Search (see below), the topic-specific searches are all just common subsets of the Google database and help keep your searches focused. From Apple Macintosh to the U.S. Government, the topics are self-explanatory. Note that the link for Universities takes you to a page with a handy alphabetical listing of school sites Google has recorded.

Catalogs

Google Catalogs is a collection of scanned mail-order catalogs whose pages you can search. The box on page 136 tells all about this feature.

Buy the book! If you’ve enjoyed what you’ve seen here, or to get more information, click on the “Buy the book!” graphic. Pick up a copy today!

Visit the O’Reilly Network http://www.oreillynet.com for more online content.

Google provides a useful advanced search form, but you can also run more specific searches from Fagan Finder, a site that has no official relationship with Google (Figure 2-6). It works best from Internet Explorer (www.faganfinder.com/google.html), but an alternate version that works with other browsers is well worth a try, too (www. faganfinder.com/google2.html).

Fagan Finder has all the variables you can specify on Google’s advanced search page—things like the type of file you want to find, and the domain you’d like to restrict your search to. But unlike Google, Fagan Finder has put several other variables on the same page, letting you run a highly specialized search without using syntax or multiple searches. Those variables include:

A menu that lets you specify the type of Google search you want to run (it’s above the Search button). For example, you can search Google’s Directory, Groups, Images, News, Catalogs, Froogle, or a few other oddball Google searches (the Alternate option lets you run your query on other search sites like Yahoo).

What makes Fagan Finder’s system notable is that one page lets you run an advanced search in any of these special collections; on Google’s own site, each special collection has its own advanced search page, forcing you to recreate your search if you want to check in on several areas. And Google’s advanced search pages don’t include as many detailed variables as Fagan Finder’s.

milstein

A menu that lets you specify subsets of the Google search you want to run (it’s above the Feeling Lucky button).

For example, if you’re searching the Web, you can choose to run your query through Google’s keyboard shortcut page (choose “all-shortcuts”) so you can navigate your results without a mouse. Or you can choose to search just Google’s own site (handy if you’re looking for help on a particular feature), or a bunch of other narrow slices of the Web. For each type of Google search, the subsets change (the Directory search, for instance, lists the categories you’d find on Google’s main directory page), but not all searches have subsets (News, for instance, has none).

Tip: Choose your type of search and subset before setting up the rest of your search. Some of the search types and subsets don’t have the choices the regular search does, so you could waste time setting options that disappear when you pick a search type.

  • The country from which you’d like your results to hail.

  • A specific date or date range for your results.

    (Google lets you give a general range, like “past three months.”Like Google’s date feature (on the Advanced Search form), this option searches for pages that Google indexed or re-indexed within the period you specify.
  • Special characters, handy for quests in languages other than English.
When you click a letter on the Fagan Finder page, it appears in the general search box.

Tip: If you’re trying to type a long query into the Exact Phrase box or any of the other search boxes, formulate your query in the regular box—which gives you a lot more breathing room and lets you make changes easily—and then cut and paste into whichever search box you’d like to use.

The ability to turn off the duplicates filter. You may have noticed that from time to time when you run a search in Google, your list of results is rather small, but Google inserts a message at the bottom that says something like, “In order to show you the most relevant results, we have omitted some entries very similar to the 5 already displayed. If you like, you can repeat the search with the omitted results included.” Google gives you this option because it automatically represses sites it thinks may be duplicates. Fagan Finder lets you run a search with this filter turned off from the start, which can be useful if you frequently hit the omitted results message in Google.

A choice to open results in the same window you’re already in or in a new window.

Google lets you set this option on its preferences page.

Note: The Fagan Finder Google page has an I’m Feeling Lucky button. Usually, you don’t need this for advanced searches, but if you make Fagan Finder your home page, it’s nice to have the Lucky choice available.

Fagan Finder has a mess of non-Google features worth exploring, too. Head over to the home page, www.faganfinder.com, for a full menu of search tools.

milstein

Buy the book! If you’ve enjoyed what you’ve seen here, or to get more information, click on the “Buy the book!” graphic. Pick up a copy today!

Visit the O’Reilly Network http://www.oreillynet.com for more online content.
 

Google’s Language Tools are a collection of features that let you fiddle with the language or location of your search. For example, you can limit an individual search to pages written in a particular language or pages from a country you specify. You can also translate text you type in or entire Web pages. And you can try out a new interface language or run a search from a Google site in another country.

To find these features, click Language Tools on Google’s homepage or any results page, or point your browser to www.google.com/language_tools. Here are the tools and what you can do with them:

Search specific languages or countries. This feature, shown in Figure 2-7, lets you narrow a search to pages written only in a certain language, with or without limiting it to pages from one country. If you’re looking to learn what Korean students at Bulgarian universities have to say about American pop stars, this is the tool to employ. Simply use the menus to change either or both choices, type in your search terms, and press Enter.

milstein

Note: This feature is like any advanced search on Google, in that it just works for one search. If you want to always search for pages written in a language other than English, change your Google preference settings, described on page 165.

Translate. You can type in a single word or lots of text into the “Translate text” box. Then use the menu to choose the languages you’re translating from and to, and then press Translate to have Google display your results in a box like the one in Figure 2-8. Alternatively, you can type a URL into the blank box below “Translate a web page,” choose the languages you’re translating from and to, and then press Translate to have Google show the original Web page—with text in the new language.

Tip: If you run a regular Google search and your results include pages in languages Google can translate, it provides a “Translate this page” link next to the title of any page it can convert for you.

milstein

Google’s translation feature, like nearly all computerized translation tools, is pretty crude. After all, a machine doesn’t know that when you want to translate an Italian newspaper page about New York Mets catcher Mike Piazza, you’re not actually looking for a story about Mike Public Square. And grammar can take a hit, too. A recent page from a German site on Michael Jackson comes out like this: “In order to become fair the music-historical value of the sieved child of the family Joseph Jackson, one can either good chronologically precious metal honors list or the absolute high point of its work with a superlativ on the point bring: Thriller.” ABC may not be easy as 1, 2, 3.

Still, the translation tool can be helpful. It can give you the gist of a page in another language, and it can usually provide reasonable interpretations of single words or phrases.

Tip: Google’s translations capabilities are limited to exchanging English with German, Spanish, French, Italian, and Portuguese, and vice versa. For a listing of more thorough translation services, click over to

http://babblefish.com/babblefish/language.htm

Use the Google interface in your language. Like Google’s preferences page, this tool lets you pick the language in which Google displays buttons, messages, and links (it has nothing to do with the language of pages on the Web). If you pick an interface language here, however, it lasts only for the current browser session—a good way to check out the different options. For a permanent change, select your interface language from your preferences page.

Note: If you don’t see your favorite lingo on the list of interface languages, you can volunteer to translate for Google. Check out https://services.google.com/tc/Welcome.html for details.

Visit Google’s site in your local domain.  Google runs sites that are primarily for searching pages whose URLs specify a country, like www.yahoo.co.jp, Yahoo’s site for Japan, and for searching sites in a country’s local language. This feature is helpful if you are going to, say, Bilbao, and you want to find only Spanish pages about the Guggenheim Museum there.

Like the interface tool, the domain choice lasts only as long as the current browser session. If you close the browser and then reopen it, you’re back to your earlier settings. To make a permanent domain change, you must install the Google toolbar, if you haven’t already done so (page 148). Then click Options to open the Toolbar Options dialog box, and at the top of the Options tab, choose your domain from the menu labeled “Use Google site.”

Buy the book! If you’ve enjoyed what you’ve seen here, or to get more information, click on the “Buy the book!” graphic. Pick up a copy today!

Visit the O’Reilly Network http://www.oreillynet.com for more online content.

Being able to limit a search to sites from a particular country can help you filter out a lot of noise. But Google actually lets you limit searches to pages about a particular U.S. town—which is a total godsend. Whether you’re looking for a neighborhood magician to perform at your cat’s birthday party or for a bed and breakfast that takes pets in a town six states away, Google Local can be your online Yellow Pages.

In fact, Google Local is a hybrid of the Google index and standard phone book data. When you include in your search an address with city and state or zip code, Google crosschecks its index against various online Yellow Pages, generating a batch of results from your specified area only. Because it incorporates its own relevance rankings, too, Google sometimes lists a place ten miles away from you before a place only two miles away. Still, it’s a super-handy search tool.

milstein

You can run a Google Local search two ways:

  • From the regular search box.

    Just type in your search terms and an address with city and state or zip code, and Google includes a few local links, signaled by a compass icon, at the top of your regular results. Figure 2-9 shows you an example.

  • From the Google Local page at http://local.google.com/lochp.

    Despite its weird URL, this page (Figure 2-10) is a handy way to use Google Local. When you run a search here, you get a full page of results listings, as shown in Figure 2-9.

Tip: If Google can’t find you a business within 15 miles, it shows you a page with the choice to expand your search out to 30 or 45 miles.

If your business is listed incorrectly or is missing altogether, shoot off a note to local-listings@google.com and give them the proper info.

milstein

Getting Fancy with Syntax

When you type in a query, you can add words known by the geek term syntax, also called operators, that tell Google something specific about the search you want to conduct. For example, the operator inurl tells it to look in URLs only for your search terms. Syntax operators make honing results as easy as pie, but they’re primary uncharted territory for the Google Underusers Club. In some cases, syntax replicates the results you can get via Google’s Advanced Search page, but it is often more specific, and it almost always saves you some clicking.

To use any syntax, simply type the operator and a colon before each of your terms, and don’t put spaces before or after the colon. For example, a search using the operator inurl (described on page 64) should look like this:

inurl:whammy

or

inurl:”double whammy”

or

inurl:double inurl:whammy

Note: A URL can’t contain any spaces, like the one between “double” and “whammy,” so if your query is inurl:”double whammy”, Google automatically searchesfor variationslike double-whammy; double.whammy; and double,whammy—all of which are perfectly kosher in URLese.

If you type in a space before or after the colon, Google can’t read your query.
Buy the book! If you’ve enjoyed what you’ve seen here, or to get more information, click on the “Buy the book!” graphic. Pick up a copy today!

Visit the O’Reilly Network http://www.oreillynet.com for more online content.

As explained in the Advanced Search section above, titles are different from URLs. They’re also handy to search when you want pages that really focus on your topic. Use the operator intitle, like this:

intitle:file intitle:sharing

or

intitle:”file sharing”

The first example finds titles that contain both or your words. The second example finds titles that contain the exact phrase file sharing.

A variation of this syntax, allintitle, finds pages that have all your keywords or phrases in the title, in any order. For example—

allintitle:file sharing

—finds titles that contain both “file” and “sharing,” without requring an operator before each word (as in, intitle:file intitle:sharing).

Tip: If you try to mix allintitle with some other syntax and the search conks out, you can put intitle before each keyword or phrase (intitle plays better with syntax other than allintitle).

Searching Text

The intext operator searches only the body text of Web pages, ignoring links, URLs, and titles. Use this syntax when you want to find a word that might crop up in zillions of URLs or links, like this:

intext:amazon

or

intext:amazon.com

Its cousin, allintext, works similarly to allintitle, but it has unpleasant issues when you try to mix it with other syntax.

Searching Anchors

“Link anchor” is HTMLese for the words and pictures on a Web page that serve as links to another page. Mostly, a link anchor is just what you think of as a link (usually a blue, underlined word or phrase that describes a related, linked page), but a lot of times they turn up as buttons or icons or images, too.

The inanchor operator searches for text in link anchors. It’s a nifty way to get an idea of which or how many pages link to a person, place, or thing. And sometimes, it can help you find a person’s email address, since most Web pages consider email addresses to be links. Use it like this:

inanchor:”Linkin Park fans” inanchor:”Richard Stallman”

Not surprisingly, Google has an allinachor option. Bear in mind that the keywords you specify for it must all appear in a link anchor in order to show up in your results.

Searching Within Sites and Domains

Like the Domain feature on the Advanced Search page (described on page 52), the site operator lets you specify a site or domain you want to search. It makes a quick and handy search function for sites that don’t have a search feature.

Unlike the previous operators, the site syntax has two parts. One, you have to attach a site name or domain name to site: And two, you have to include the keywords or phrases you want to search for. Here are a couple of examples:

site:nba.com “larry bird” magic site:gov “agricultural subsidies”

Tip: You don’t have to include http:// or www. And you don’t have to put the site name in quotes.

You can also use site to exclude a particular Web site from your search. For example, if you want to look for sites about books, but you don’t want to wade through zillions of results from Amazon, this query—

books -site:amazon.com

—does the trick. Mostly. It doesn’t block Amazon’s international partners, like Ama-zon.co.uk, because that’s not the site you specified. To nix all instances of Amazon in a URL, use the inurl operator, as described on page 64.

You can’t use the site operator to search within sites’ subdomains (also called subdirectories), which is anything at the end of a site name after a slash (/). For example, the query feedback site:ebay.com/help gets you nada. For these situations, use the inurl syntax.

Buy the book! If you’ve enjoyed what you’ve seen here, or to get more information, click on the “Buy the book!” graphic. Pick up a copy today!

Visit the O’Reilly Network http://www.oreillynet.com for more online content.

The inurl operator searches solely URLs for your query words. No body text. No titles. Just URLs. Unlike the site operator, inurl doesn’t require additional query words: inurl:”great pumpkinis perfectly acceptable. (Remember, a URL can’t contain any spaces, like the one between “great” and “pumpkin,” but if your query includes an exact phrase that has spaces, Google automatically searches for variations that work in URLs, like great.pumpkin and great-pumpkin.)

Another cool thing about inurl versus the site syntax is that inurl lets you search subdirectories. Thus, if you run the search inurl:ebay.com/help, you get a gaggle of links to eBay help pages. (You could also use “Search within results” to find, say, instances of “feedback” in the results’ body text, or you could add an intext element, like this: intext:feedback inurl:ebay.com/help).

Note: When using inurl, you can’t include http://, or Google will come up with zero results.

Inurl is also handy when you want to exclude a site from your search. For example, this search—

books -inurl:amazon

—lets you find pages that sell or discuss books, but it blocks any site with Amazon in its URL, which includes the giant retailer and its international partners.

Allinurl is a variation that finds all your keywords, but it doesn’t mix well with some other special syntax.

Who Links to Whom?

Want to find out which sites link to your Web site

, or to Friendster.com, or to a particular page on Friendster? The link operator is for you (it does the same thing as the Links feature on the Advanced Search page). Just type in link:friendster.com, and Google spits out a list of pages linked to Friendster.com. One nice thing about link is that it works with subdirectories, too. Thus you can go nuts seeing who’s linked to poundy.com/journal/04-01/4-11.html.

Caching Up

Google keeps a copy of each page as it records it, called a cache, discussed in detail on page 28. The cache operator lets you view Google’s last cached copy of a page, even if the page has moved from its original URL or changed radically. Thus, this query—

cache:espn.com

—gets you the ESPN home page on the last day Google checked it. The page changes often, but the cache syntax lets you jump back in time. In fact, cache is a great way to find an earlier version of a site that changes frequently. Figure 2-11 compares a Google cache with a current site.

milstein

Tip: If your own site changes regularly, you can use the cache operator to find out the last time Google recorded it.

The cache operator does pretty much the same thing as clicking a cache link on a Google results page, but it doesn’t highlight your search terms, and it can save you some clicking around.

Buy the book! If you’ve enjoyed what you’ve seen here, or to get more information, click on the “Buy the book!” graphic. Pick up a copy today!

Visit the O’Reilly Network http://www.oreillynet.com for more online content.

Daterange

The daterange operator lets you search for pages that Google indexed during a specific time frame. Like the date option on the Advanced Search page (page 48), this operator has nothing to do with the date a page was created, but rather when Google recorded it.

So seeing as Google’s Advanced Search page lets you narrow down the dates for your results, why even bother with daterange? Because the Advanced Search lets you limit your results only to the last three months, six months, or year. The daterange operator is a lot more powerful, letting you specify a single day for which you’d like results, or a date before or after which you’d like results. (For example, you can search for “terrorism” before and after September 11, 2001).

In fact, daterange can be useful in a few situations:

  • Stale results. If you tend to run searches that pull up a lot of old, useless pages, you can use daterange to ensure freshness.

  • Too much current news. For your doctoral dissertation on weightlifting and the performance of masculinity, you want to find some older writings on Arnold Schwarzenegger, but your results are all gummed up with news about his becoming California governor. Use daterange to filter out the breaking news.

  • Finding trends. You can use daterange to see how results for a particular query have changed over time. While many a pundit has noted that there’s no accuracy or true popularity ranking in the number of results for any Google query, you can still use that data to gauge trends. Plus, the content of certain queries might change over time. What did Google give you if you searched for “Gary Coleman” in 1999 versus today?

In theory, daterange is easy to use. Youjust type daterange:startdate-enddate keywords. You have to specify a start date and end date; if you want only one day, use the same date for start and end. In practice, daterange is a power user’s operator because the dates must be in the Julian date form, a continuous count of days since noon on January 1, 4713 BC. In the Julian scheme, July 8, 2002 is 2452463.5.

Though it may seem ridiculous to base an electronic search on a dating system that started thousands of years ago, computers like Julian dates because each is just one number, regardless of leap years, days in a month, and other things that confuse machines. People, on the other hand, tend to take to the Julian calendar like ducks to molasses. (By the way, there’s another Julian date format that Google doesn’t recognize. It’s a five-digit string, yyddd, with two numbers for the year and three for the day, up to 365.)

Fortunately, you can easily convert dates from Gregorian (the calendar you’re familiar with) to Julian, and vice versa, at several Web sites. Just run a Google search for “julian dateto get a current list of converters. Even better, you can use the Fagan Finder Google interface at www.faganfinder.com/google.html to specify dates using pop-up menus with Gregorian dates. (For more info about Fagan Finder, see page 54.)

Beyond the Julian hurdle, daterange has a couple of catches. First, Google doesn’t like decimals, so for the date above, you’d have to round up or down to 2452463 or 2452464. Second, Google doesn’t officially sanction daterange searches. So if you get funky results, you can’t complain.

You can use daterange in combo with most of Google’s special operators except the link syntax. Also, the stock quote and phonebook operators, described in Chapter 1, don’t fly with daterange.

Searching by File Type

The filetype operator searches for file name extensions, like .doc or .pdf. Unlike the similar feature on Google’s Advanced Search page, filetype lets you specify HTML pages (or those encoded as htm, which is the same thing technically, but might give you different results) and Macromedia Flash files (.swf), which are usually animated sequences like those intro pages you sometimes have to sit through before getting to real Web page. Geeks can also use filetype to search for page generators—like asp, php, and cgi—little programs that render Web pages suitable for your browser.

You use filetype with keywords or phrases, like this—

“chocolate soymilk ingredients” filetype:ppt

or

tofu filetype:xls

—to get PowerPoint slides on chocolate soymilk ingredients or Excel spreadsheets on tofu, respectively.

Buy the book! If you’ve enjoyed what you’ve seen here, or to get more information, click on the “Buy the book!” graphic. Pick up a copy today!

Visit the O’Reilly Network http://www.oreillynet.com for more online content.

The related operator performs the same search as the“Similar pages”link that appears in a Google result (page 29). It’s a good way to find pages in a category, rather than with your particular keywords. For example, related:”sesame streetbrings up a list of pages on children’s TV shows, while a straight search for “sesame streetyields a lot of sites selling Cookie Monster merchandise.

milstein

Synonyms

If your keywords describe a concept, you might want your results to include synonyms for your query. For example, if you’re looking for technical help, it’s useful to automatically include synonyms like “support” and “customer service” without having to type them all in.

The ~ symbol tells Google to look for synonyms. You can find this squiggle near the top of your keyboard, on the key to the left of the number 1 (type it by pressing Shift+`). Use it like this—

~help Microsoft Word

—to get a list of pages with tips on using the popular word processing program.

Most of the Kit and Caboodle

The info operator provides you with a tidy summary of the details Google can give you about a URL—including links to that page’s cache, similar pages, linked pages, and pages containing the words in your search. Figure 2-12 shows you what to expect.

Mixing Syntax

As noted above, some syntax operators work only on their own, while some play well with others. For the many you can combine, cleverly splicing them together can narrow your search results in the most satisfying manner. The trick is to experiment a lot and find out what works for your searches. The examples below can help you get started.

Tip: If you find the syntax confusing to combine at first, stick with the Advanced Search page. But take a look at the search box on your results pages, since Google converts your query into syntax (which appears in the search box), helping you learn which operators do what.

How Not to Mix Syntax

The most important rules to keep in mind are those dictating which syntax not to mix. For starters, a few operators don’t get along with any of the others: allinurl, allintitle, allintext, allinanchor and link. In addition, here are a few general principles of mixology.

Canceling yourself out

Basic safety tip: don’t mix operators that will cancel each other out. For example —

site:bluefly.com –inurl:bluefly

—tells Google that you want all your results to come from bluefly.com, but that the results should not include the word “bluefly” in the URLs. This search yields exactly zero results.

Doubling up

You can also run into trouble by doubling up a single operator. This query —

“trading spaces” site:com site:edu

—might look like you’re asking Google to give you results from either .com or .edu sites, but in fact, you’re telling it that your results should come from sites that are simultaneously part of both domains. Unfortunately, there is no such animal as a URL like www.google.com.edu.

If you want results from .edu and .com domains only, try something like this:

“trading spaces” (site:edu OR site:com)

milstein

Getting carried away

If you want to run a very narrow search, something like —

intitle:curtains site:ebay.com inurl:funkyfresh

—you’re likely to get nothing in return. Instead, start with a broader search, like this—

intitle:curtains site:ebay.com

—and then streamline it by searching within your results.

Buy the book! If you’ve enjoyed what you’ve seen here, or to get more information, click on the “Buy the book!” graphic. Pick up a copy today!

Visit the O’Reilly Network http://www.oreillynet.com for more online content.

How to Mix Syntax

Some syntax combinations work very well. Here’s an example to get you started: intitle and site. Say you’d like to get a sense of the forms available from the United States Department of Agriculture; you could run this search:

intitle:form site:usda.gov

And if you want to narrow that down, you could add a keyword, like this:

cattle intitle:form site:usda.gov

Tip: You can put keywords at the beginning or the end of your query, but it’s easier to keep track of them if you put them first.

Remember that the site operator lets you specify subdomains, so if you want to know what kind of forms the National Agricultural Library has on tap, you could run a search like this:

intitle:form site:nal.usda.gov

Another classic combo is intext along with inurl, described on page 64.

Anatomy of a Google URL

As you’ve probably noticed by now, URLs are often long, complicated, and weird. You certainly don’t have to become an expert on what goes into those addresses. Indeed, millions of people live happy lives never wondering why some URLs are eighteen characters and some are longer than Beowulf.

But after you’ve run a Google search, the URL in your browser bar contains some characters that you can change on the fly to refine a quest without taking a long trip to the Advanced Search page. Plus, once you can read URLese, you can fiddle with a string to produce results you can’t get any other way.

milstein

The URL for a Google results page can vary depending on the preferences you’ve set, but mostly, they look similar (see page 45 for more on preferences). Say you run a search for the phrase “over the river”; your results URL should look something like Figure 2-13. In addition to your query, the URL contains codes for the language you’re surfing and for the number of results per page, both explained below. The rest of the stuff varies wildly and can include information about the browser you’re using, the page where you initiated your search (perhaps you ran a Google search from Amazon. com), or other factors Google doesn’t reveal.

milstein

Buy the book! If you’ve enjoyed what you’ve seen here, or to get more information, click on the “Buy the book!” graphic. Pick up a copy today!

Visit the O’Reilly Network http://www.oreillynet.com for more online content.

In the middle of a Google results URL, you can usually find num=, which tells you the number of search results Google gives you per results page. You can temporarily change the number of results to anything from 1 to 100 simply by altering the number in the URL, and then pressing Enter. Most of the time, search results are easiest to read when you’ve got 10, 20, or 30 per page (page 71). But this trick is a quick way to amp up the number of results on a page for the rare times when you want to review a lot of them at once or compare, say, results 1 and 100 on one page.

Tip: If you don’t see num= in your URL, you can click the end of the URL in your browser’s address bar, add an ampersand (&) and then type in num=x. For example, to set the number of results to 50 per page, add &num=50. Actually, this works even if you can already see the num= setting in your URL, because Google reads URLs from right to left.

As long as you have the same browser window open, Google keeps your number of results at the new setting. But it doesn’t remember the change after you’ve closed the browser. To make the change permanent, see page 45 on setting preferences.

Changing the Interface Language

In addition to a setting for the number of results, the URL contains a setting for the interfac

e language—that is, the language Google uses on things like buttons and instructions. The setting’s prefix is hl=, followed by a language code. For example, English is denoted like this:

hl=en

Like the number of results, you can alter the URL for a temporary change of interface language. This is most useful when you find yourself in front of a computer that’s set to a language you don’t know—a common problem for exchange students and diplomats. If the language is something other than English but you want English, just change the code to en.

Tip: You can also append the language code at the end of the URL, after adding an ampersand, like this: &hl=bn. No need to delete hl=en, because Google looks for the rightmost hl reference in a URL.

Here’s a list of the codes for the other languages Google knows:

Afrikaans. af Greek. el Pig Latin. xx-piglatin
Albanian. sq Gujarati. gu Polish. pl
Amharic. am Hacker. xx-hacker Portuguese (Brazil). pt-BR
Arabic. ar Hebrew. iw Portuguese (Portugal). pt-PT
Azerbaijani. az Hindi. hi Punjabi. pa
Basque. eu Hungarian. hu Romanian. ro
Belarusian. be Icelandic. is Russian. ru
Bengali. bn Indonesian. id Scots Gaelic. gd
Bihari. bh Interlingua. ia Serbian. sr
Bork, bork, bork! (Swedish Chef). xx-bork Irish. ga Sinhalese. si
Bosnian. bs Italian. it Slovak. sk
Bulgarian. bg Japanese. ja Slovenain. sl
Catalan. ca Javanese. jw Spanish.es
Chinese (Simplified). zh-CN Kannada. kn Sundanese.su
Chinese (Traditional). zh-TW Klingon. xx-klingon Swahili.sw
Croatian. hr Korean. ko Swedish.sv
Czech. cs Latin. la Tagalog.tl
Danish. de Latvian. lv Tamil.ta
Dutch. nl Lithuanian. lt Teluga.te
Elmer Fudd. xx-elmer Macedonian. mk Thai.th
Esperanto. eo Malay. ms Tigrinya.ti
Estonian. et Malayalam. ml Turkish.tr
Faroese. fo Maltese. mt Ukrainian.uk
Finnish. fi Marathi. mr Urdu.ur
French. fr Nepali. ne Uzbek.uz
Frisian. fy Norwegian. no Vietnamese. vi
Galician. gl Norwegian (Nynorsk). nn Welsh.cy
Georgian. ka Occitan. oc Xhosa.xh
German. de Persian. fa Zulu.zu

Two More URL Tricks

By now you’ve probably noticed that you can temporarily change Google’s behavior by placing an ampersand at the end of the results URL and then adding a modifier, which is simply a term or code that alters your results slighly. Here are two more handy modifiers.

  • Fresh results. By adding &as_qdr=m#, you can alter the maximum age of the results, in months. Just change the # symbol to anything between 1 and 12.
    Google’s Advanced Search page lets you say you’d like results that have been updated anytime, within the past three or six months, or year. But Google doesn’t give you any other way to specify results from within the past one, two, four, five, seven, eight, nine, ten, or eleven months. Yet, this trick can be an excellent way of narrowing results to only the freshest pages, and it’s very handy when you’re looking for a page that you’re sure has been changed recently.
  • Unsafe searching. The SafeSearch filter tells Google to remove sexually explicit links from your results (for more on the filter, see page 47). The problem is, sometimes the filter gets carried away and removes things you need. (For example, you’re searching for information on sex education, and Google hides some important sites.) Or maybe what you want is XXX sites. To make sure the filter is off, add this to the end of your URL: &safe=off. To make sure it’s on, add: &safe=on.

Buy the book! If you’ve enjoyed what you’ve seen here, or to get more information, click on the “Buy the book!” graphic. Pick up a copy today!

Visit the O’Reilly Network http://www.oreillynet.com for more online content.

Google+ Comments

Google+ Comments