Webcrawler

views updated May 14 2018

WEBCRAWLER

Webcrawler was the Internet's first search engine that performed keyword searches in both the names and texts of pages on the World Wide Web. It won quick popularity and loyalty among surfers looking for information. Despite the fact that competitors like Yahoo!, AltaVista, Lycos, HotBot, Northern Light, and Infoseek have long overtaken Webcrawler in popularity and power, its name remains synonymous with search engines, and some, like Metacrawler, still pay homage to this pioneer.

During the Web's infancy, Webcrawler was born in January 1994. It was developed by Brian Pinkerton, a computer student at the University of Washington, to cope with the complexity of the Web. Finding one's way among first tens of thousands—soon to be millions—of individual Web pages was comparable to trying to find a book in a major library that had no indexing system or card catalog. Pinkerton's application, Webcrawler, could automatically scan individual sites on the Web, register their content, and create an index that surfers could query with keywords to find Web sites relevant to their interests.

The basic function of the program was something Pinkerton called its search engine. The search engine looked at a particular document somewhere on the Web, and used the various hyperlinks it found to lead it to other pages with similar content. It followed some of the links to new documents and repeated the process over and over. The search engine determined what general type of document it would visit as well as which individual documents it would visit. Pinkerton described Webcrawler as a "Web robot" that used the structure of the Internet itself to find documents on the Internet. In other words, it required no special, additional software; it searched for Web documents the same way a human user would—by following hyperlinks from one document to another until it found what it was looking for. Webcrawler just did it much faster. Pinkerton estimated, in the mid-1990s, that Webcrawler required about an eighth of a second to parse the query, contact the database server, perform the query, get the answer, format the results and return them to the query-maker.

Webcrawler's search engine performed two basic functions. First, it compiled an ongoing index of web addresses (URLs). Webcrawler retrieved and marked a document, analyzed the content of both its title and its full text, registered the relevant links it contained, and then stored the information in its database. When a user submitted a query in the form of one or more keywords, Webcrawler compared it with the information in its index and reported back any matches.

Faced with memory that could only store a fraction of the Web's total content, Pinkerton had to devise a criterion for which documents should be included in the index. The one he settled on was that the index should include documents from as many servers as possible. Webcrawler was written to ensure that new servers were visited before older ones were revisited. The protocol also guaranteed that the index would include at least one document from every server visited. That particular strategy gained Webcrawler an early reputation for broad coverage of the Web. During its first year, Webcrawler's index included about 50,000 documents—a number that would grow into the millions over the next few years—from some 9,000 servers. The index was updated every week.

Webcrawler's second function was searching the Internet in real time for sites that matched a given query. It was carried out using exactly the same process, following links from one page to another. However, it first searched its index for the criteria in the query and from there looked for new pages. Surfers themselves were not able to use the real-time function directly, but Webcrawler performed real-time searches, added the results to the index, and returned to the individual making the query.

For its first three months, Webcrawler was a desktop application rather than a Web-based service. It went live on the Internet on April 20, 1994 with pages indexed from about 6,000 servers. Its popularity grew rapidly. By October 1994, it was receiving 15,000 queries a day and had answered almost a quarter of a million in all. Only a month later, around the time Yahoo! was launching its search site, Webcrawler handled its millionth query, a search for "nuclear weapons design and research."

By the end of 1994, the service found two corporate sponsors and was starting to lure advertisers. In June 1995, it was acquired for $1 million by America Online, then just an upstart itself with less than one million subscribers. In April 1996, when Webcrawler incorporated the human-edited GNN Select guide, its index was contained 500,000 entries. The service was processing about 3 million queries daily.

In November 1996 AOL sold Webcrawler to Excite, another online search engine for stock valued at about $19.8 million. The large purchase price was seen as evidence that search engines had been accepted as successful platforms for Web advertising. Why Excite, a search engine in its own (now defunct), would purchase Webcrawler, however, was not immediately apparent, and was even more puzzling in light of its purchase of the then-popular Magellan search engine only four months earlier. Webcrawler maintained its own staff within Excite until mid-1997; eventually the separate Webcrawler index was done away with as well. After that, Webcrawler continued to exist essentially as a separate brand name Web site, although its results were exactly the same as Excite's. Over the years, the Webcrawler Web site underwent various changes. In mid-2001, however, it eliminated all extraneous features—family pages, telephone directories, maps, etc.—in favor of a spare, clean, "pure" search site, in keeping with Webcrawler's slogan "It's that simple."