Multilingual Sites and Search Engines: Part II - Non-Character Languages and Cross-Language Searching
(Page 2 of 3 )
What About Right-to-left and Non-character Languages?
I believe that right-to-left and non-character based languages (like Chinese) are treated in a similar way as far as text processing and NPL are concerned. Probably the stage of text processing differs slightly for non-character based languages, but since they also have repeating patterns that carry the meaning, the principle remains. In practice, when a search is performed, search engines first match the string to find suitable search results and are not interested in language details, unless a search language is specified.
On other hand, for those languages where vowels are generally omitted in writing and where several words are written in the same way (like Hebrew and Arabic), simple matching of found symbols tends to generate less relevant results. This in turn requires NPL in order to get the words in the right context. In addition, the morphological and grammatical structure of these languages tends to be more complicated than the morphological and grammatical structure of English, which also contributes to the way search engines handle these languages.
Mixed-Language Search Query and Cross-language Searching
It is interesting to examine what happens when the search query contains words and expressions in several languages. Since simple text processing is the first stage in retrieving the results, search engines will find where these strings occur regardless of language. Even if you specify to retrieve results in a particular language only, this will not skip sites that have both the selected language and other in it. It seems that the most important in this case is the fact that the sentence of the paragraph where the search string is found is in the specified language rather than it is not the only language on the page. What seems to matter (and it is not surprising) is the order in which search terms are arranged.
Mixed-language search queries are an interesting issue but they are different from cross-language search. Cross-language search is a technology that allows to have a search query in one language and to get results from pages where this query occurs in other languages - i.e. the search query gets translated, which allows to retrieve significantly more results. Currently, to the best of my knowledge, cross-language searches are used in libraries and similar places where there is enough information and they are not supported by the search engines like Google and MSN. There are toolbars and browser add-ons that allow to use cross-language search engines on the Web or at least in a particular site but still they are not the standard.
Next: NPL Applications and Practices in SEO >>
More Search Optimization Articles
More By Tsvetanka Stoyanova