Using the Google SOAP Search API

Would you like to make your computer applications able to submit queries to Google and extract the data? For example, wouldn’t it be nice to create an application that automatically queries Google with your keywords and returns the top 10 results? Then you’ll want to take advantage of the Google SOAP Search API service. This article explains how.

Introduction

The Internet provides an unparalleled amount of information. However, without the aid of search engines such as Google, navigating through this information would be quite difficult. Search engines allow people to sift through billions of pages and find the information they need.

However, the use of search engines is not limited to people. Computer applications can easily query search engines and extract data. Google has made this process particularly easy with the Google SOAP Search API service. Though currently both in beta and under a restrictive license, Google’s service is still worth taking a look at. Using the service, one can query Google as usual, but access to cached pages is also provided, as well as Google’s spelling suggestion service.

Getting Started

To use Google’s service, you first have to register and obtain a license key, which will be sent with each query:

https://www.google.com/accounts/NewAccount?
continue=http://api.google.com/createkey&followup=http://api.google.com/
createkey

If you have Visual Studio, create a project and then add a web reference to the SOAP Search API. This can be done by right clicking in the Solution Explorer and selecting “Add Web Reference.” A dialog will appear, where you can enter the WSDL file’s URL:

http://api.google.com/GoogleSearch.wsdl

Name the reference “GoogleSearchService.” You’ll need to point to this reference with a “using” statement in your code.

If you don’t have Visual Studio, run the .NET SDK’s wsdl utility:

wsdl http://api.google.com/GoogleSearch.wsdl

This will create a file called GoogleSearchService.cs. Use this file to access the SOAP Search API service.

To begin using the service, create a GoogleSearchService object. This will be used to interact with the service:

usingSystem;
usingSystem.Text;
usingSystem.Text.RegularExpressions;
classGoogleSearchTest
{
    static void Main()
    {
        GoogleSearchService google = new GoogleSearchService();
    }
}

We’ll be making use of this shell (and the namespaces it uses) throughout the article.

Google searches can be conducted through the doGoogleSearch method, which returns a GoogleSearchResult object. The method, however, takes a number of arguments: a string containing your license key, a string containing the query, an integer representing the first result you wish to retrieve (starting at zero), an integer representing the maximum number of results you wish to retrieve (the maximum is ten), a boolean turning search filtering on or off (this eliminates similar results), a string restricting results to a certain country or topic, a boolean turning SafeSearch on or off, a string restricting results to a certain language, and two more strings to specify encoding—arguments which are now disregarded.

Here, we query Google with “Developer Shed” and store the results in search:

        GoogleSearchResultsearch = google.doGoogleSearch(“x0x0″,
“Developer Shed”, 0, 10, true, “”, true, “”, “”, “”);

You will, of course, have to substitute the first argument with your own license key.

The above line of code tells the SOAP Search API to return ten results starting from the very first result obtained. Search filtering is turned on, which, as stated earlier, trims down the results by eliminating near-duplicate results. SafeSearch is turned on, and the results are not restricted to any language, topic or country.

We can obtain the results of the query through search, starting with some general information:

        Console.WriteLine(“Query “{0}” completed in {1}
seconds.”
, search.searchQuery,
            search.searchTime);
        Console.WriteLine(“Estimated number of results: “ +
search.estimatedTotalResultsCount);

Above, we display the query text, the amount of time the query took and the estimated number of results.

We can also access the results themselves (well, ten of them). Here, we iterate through the results and print a summary:

        foreach (ResultElement result in search.resultElements)
        {
            Console.WriteLine(“n” + result.title);
            Console.WriteLine(result.URL);
            Console.WriteLine(result.snippet);
        }

As you can see, resultElements contains an array of ResultElement objects. A foreach loop is used to iterate through this array. Unfortunately, a bit of HTML is also included, and we may not always want that. However, regular expressions can fix this problem easily enough:

        Regex stripHtml = new Regex(“<(.+?)>”);
        foreach (ResultElement result in search.resultElements)
        {
            Console.WriteLine(“n” + stripHtml.Replace
(result.title, “”));
            Console.WriteLine(result.URL);
            Console.WriteLine(stripHtml.Replace(result.snippet,
“”));
        }

Now let’s say we need the next five results. To retrieve them, simply increase the starting index by ten and set the maximum number of results to five:

        search = google.doGoogleSearch(“x0x0″, “Developer Shed”,
10, 5, true, “”, true, “”, “”, “”);

The results can be modified significantly by turning filtering off:

        GoogleSearchResultnoFilter = google.doGoogleSearch
(“x0x0″, “Developer Shed”, 0, 10, false, “”, true, “”, “”, “”);

This produces a number of similar results. In most cases, this is undesired behavior, since a smaller variety of information is returned in a single query. However, the feature can be turned off, as we did above, in case you find some reason to do so.

Making use of country, topic and language restrictions is easy. For example, the following query returns results restricted to the German language and the German country. The country of a result is based on its top level domain and IP address.

        GoogleSearchResult germanSearch = google.doGoogleSearch
(“x0x0″, “Geschichte”, 0, 10, true, “countryDE”, true, “lang_de”,
“”, “”);

Google also provides a few topics that searches can be restricted to: American government (“unclesam”), Linux (“linux”), Macintosh (“mac”) and FreeBSD (“bsd”). Here, we restrict results to American government:

        GoogleSearchResult govSearch = google.doGoogleSearch
(“x0x0″, “John Paul Jones”, 0, 10, true, “unclesam”, true, “”,
“”, “”);

Google provides a full list of country, topic and language restrictions on the SOAP Search API website:

http://www.google.com/apis/reference.html#2_4

The SOAP Search API service also gives developers access to Google’s cache of pages. The process is simple enough; it involves a call to the doGetCachedPage method of the GoogleSearchService class. The method takes two arguments, your license key and the page to retrieve. An array of bytes is returned:

        byte[] page = google.doGetCachedPage(“x0x0″,
“http://aspfree.com”);

We can easily get the size of the cached page by checking the size of the array, and converting the array into a string isn’t very complicated either. We simply have to call a static method of the System.Text.UTF8Encoding class:

        Console.WriteLine(“Page is {0} bytes.”, page.Length);
        Console.WriteLine(“n” + UTF8Encoding.UTF8.GetString
(page));

Spelling Suggestions

Google’s spelling suggestion service, like Google’s cache of pages, is easy to make use of. The doSpellingSuggestion method takes only two arguments. The first is, of course, your license key, and the second is the word you wish to run through Google. If Google has a spelling suggestion, that suggestion is returned as a string, as in this example:

        string spelling = google.doSpellingSuggestion(“x0x0″,
“choclate”);
        Console.WriteLine(“”choclate” -> “{0}”", spelling);

However, if Google does not have a suggestion, then an empty string is returned, as in this next example:

        string spelling2 = google.doSpellingSuggestion(“x0x0″,
“computer”);
        Console.WriteLine(“”computer” -> “{0}”", spelling2);

So far, all of our calls to the SOAP Search API service have been synchronous. However, the API also provides asynchronous functionality. Instead of waiting for a result to be returned before performing any more work, you can send a message to the service and do something else while you wait for a reply.

The API actually provides two ways to do this. The first method involves subscribing to the appropriate event and then calling a special version of whichever “do-” method you wish to use. Here, we perform an asynchronous Google search using this approach:

usingSystem;
usingSystem.Text.RegularExpressions;
classGoogleAsyncTest
{
    private static Regex stripHtml = new Regex(“<(.+?)>”);
    static void Main()
    {
        GoogleSearchService google = new GoogleSearchService();
        google.doGoogleSearchCompleted += new
doGoogleSearchCompletedEventHandler(OnSearchCompleted);
        google.doGoogleSearchAsync(“x0x0″, “Cookie Recipe”, 0,
10, true, “”, true, “”, “”, “”);
        Console.ReadKey();
    }
    static void OnSearchCompleted(object sender,
doGoogleSearchCompletedEventArgs
e)
    {
        Console.WriteLine(“Total results: “ +
e.Result.estimatedTotalResultsCount);
        foreach (ResultElement result in e.Result.resultElements)
        {
            Console.WriteLine(“n” + stripHtml.Replace
(result.title, “”));
            Console.WriteLine(stripHtml.Replace(result.snippet,
“”
));
        }
    }
}

Above, we subscribe the OnSearchCompleted static method to the onGoogleSearchCompleted event. We then call doGoogleSearchAsync and pass all the usual arguments. When the SOAP Search API service completes the search, the OnSearchCompleted method is called. Here, we display the estimated total number of results and then iterate through the ten results we receive. This information is contained within a doGoogleSearchCompletedEventArgs object.

There is also an alternate version of the doGoogleSearchAsync method that accepts an additional argument, an object. For example, here we pass a string:

    static void Main()
    {
       …
        string state = “This represents the current state.”;
        google.doGoogleSearchAsync(“x0x0″, “Cookie Recipe”, 0, 10, true, “”, true, “”, “”, “”,
            state);
        …
    }

The object is intended to represent the current state—anything we want to make use of later on in the OnSearchCompleted method. There, we can cast it to whatever type (here, a string):

    static void OnSearchCompleted(object sender,
doGoogleSearchCompletedEventArgs e)
    {
        …
        string state = (string)e.UserState;
        …
    }

In fact, the state object argument is mandatory when more than one asynchronous operation is in progress.

Cached pages and spelling suggestions are handled in much the same way:

    static void Main()
    {
        google.doGetCachedPageCompleted +=
            new doGetCachedPageCompletedEventHandler
(OnGetCachedPageCompleted);
        google.doSpellingSuggestionCompleted +=
            new doSpellingSuggestionCompletedEventHandler
(OnSpellingSuggestionCompleted);
        …
        google.doGetCachedPageAsync(“x0x0″, http://google.com,
“State 1″
);
        google.doSpellingSuggestionAsync(“x0x0″, “gogles”, “State
2″
);
        …
    }
    static void OnGetCachedPageCompleted(object sender,
doGetCachedPageCompletedEventArgs
e)
    {
        byte[] page = e.Result;
        // Do whatever here
    }
    static void OnSpellingSuggestionCompleted(object sender,
        doSpellingSuggestionCompletedEventArgs e)
    {
        string spelling = e.Result;
        // Do whatever here
    }

The second way to use the API asynchronously utilizes callbacks rather than events. For example, to conduct a search, the BegindoGoogleSearch method is called. It takes the usual arguments, in addition to an AsyncCallback object (which points to the method that will be called when the search is complete) and a state object. It returns an object in the form of theIAsyncResult interface:

usingSystem;
usingSystem.Text.RegularExpressions;
classGoogleAsyncTest2
{
    private static Regex stripHtml = new Regex(“<(.+?)>”);
    private static GoogleSearchService google = new
GoogleSearchService();
    static void Main()
    {
        IAsyncResult ar = google.BegindoGoogleSearch(“x0x0″,
“Cookie Recipe”
, 0, 10, true, “”,
            true, “”, “”, “”, new AsyncCallback
(OnSearchCompleted), “State”);
            “State”);
        Console.ReadKey();
    }
(continued)

In the OnSearchCompleted method, we can retrieve the state object, but before we can see the results of our search, we must call the EnddoGoogleSearch method, which returns a GoogleSearchResult object:

 …
   static void OnSearchCompleted(IAsyncResult ar)
    {
        string state = (string) ar.AsyncState;
        GoogleSearchResult search = google.EnddoGoogleSearch(ar);
        Console.WriteLine(“Total results: “ +
search.estimatedTotalResultsCount);
        foreach (ResultElement result in search.resultElements)
        {
            Console.WriteLine(“n” + stripHtml.Replace
(result.title, “”));
            Console.WriteLine(stripHtml.Replace(result.snippet,
“”
));
        }
    }

Again, the process for getting cached pages or spelling suggestions is nearly identical:

    static void Main()
    {
        …
        IAsyncResult ar2 = google.BegindoGetCachedPage
(“hW0StcxQFHJl8dGmsUxL/J3zS+fGcZs6″,
            “http://msn.com”, new AsyncCallback
(OnGetCachedPageCompleted), “State 2″);
        IAsyncResult ar3 = google.BegindoSpellingSuggestion
(“hW0StcxQFHJl8dGmsUxL/J3zS+fGcZs6″,
            “telephoen”, new AsyncCallback
(OnSpellingSuggestionCompleted), “State 3″);
        …
    }
    static void OnGetCachedPageCompleted(IAsyncResult ar)
    {
        byte[] page = google.EnddoGetCachedPage(ar);
        Console.WriteLine(page);
    }
    static void OnSpellingSuggestionCompleted(IAsyncResult ar)
    {
        string spelling = google.EnddoSpellingSuggestion(ar);
        Console.WriteLine(spelling);
    }

Conclusion

Though still in beta, the Google SOAP Search API appears to be very promising for developers who wish to implement a few of Google’s features into their own applications. Currently, it offers support for Web searches, retrieval of cached pages and spelling suggestions. These three popular services can be easily accessed through the API, either synchronously or asynchronously, to create more complex and informed applications with .NET.

Google+ Comments

Google+ Comments