Wednesday, November 5, 2008

Reading Notes- week 10

-Search Engines, Part 1, David Hawking: This site was easy to understand and it gave a basic overview of what web-searching is, and how it is done. It starts by explaining how data centers are clusters of computers; the computers have to be clustered together because there is too much information to search through for only one computer alone. These computers are called crawlers; they crawl to gather the information. They can check for blocked pages, duplicated pages, or spam pages (pages that use one or more false keywords to gain more popularity).
-Part 2: Part two goes on where part one leaves off. It explains that there is such a large scale of documents and words to search; they scale these down to search more than one at once. There are more searchable terms than in existence than words in the English language, because web-searchers search for words in many languages, both real and made up words or acronyms. Phrases can be searched for, but they are often subdivided to allow for the results to come up faster. Web searching tools can rate pages by the number of links that lead to a certain page. Pages with a lot of links going to them are more popular.
-Shreeves, S. L., Habing, T. O., Hagedorn, K., & Young, J. A., Current Developments:
This article is about a protocol by the OAI, open archives initiative, to harvest metadata. The initiate was started two years before this article was written, and this article is a response to how the metadata collection has improved and progressed. It also comments on some future work the initiative would like to complete to advance the collecting they are now doing
-Bergman: This page by Bergman was about the deep web. I found this interesting to read about, because it explained that web searching tools are often only searching the surface of the web. The deep web consists of about 500 times the amount of information that is usually brought up through normal web searching. The deep web has information that might be usable and important, but it is not often seen. This article also made an effort to break down what types of information (such as news) is being lost in the deep-web. This article was interesting, because it introduced me to some information I did not know about before, I did not know that most search engines do not search the deep web.

No comments: