The History of Search – Part 1
This is the first part of a three part series looking back at the History of Search, how it’s changed and the differences between the main search engines today. Whether you’re a student, working in the Search industry or just want to learn how search engines have evolved over the years you should be able to take something from this.
History of Search Engines – The early days
The World Wide Web
A search engine wouldn’t exist if it wasn’t for the worldwide web and its websites. The first website was designed was by Tim Berners-Lee, a physicist, who wrote a proposal for information management and how its relevant information could be transferred. His idea was to transfer information over the internet by using hypertext (text on a computer that would lead the user to other, related information on demand). A year later resulted in Robert Cailliau, a systems engineer, joining forces with Berners-Lee to help achieve the goal of connecting personal computers within a network to share information. This was originally designed to aid physicists answer tough questions about the Universe, however today it is widely used by the global community for both business and pleasure.
To ensure that this could be successful a universal piece of software needed to be created as the computers at Berners-Lee’s disposal were more advanced than the everyday computer. 1991 saw the testing of a “universal line mode browser” that would be able to run on any computer or terminal. This required no mouse, no graphics but plain text that allowed anyone with an internet connection access to the information on the web.
These days internet search engines refer to the World Wide Web however, before the web became the most visible part of the Internet, there were already search engines in place to help people find information on the net. These would be typically used to locate information. Today there are upwards of 80 million websites, with many more computers connected to the Internet, and hundreds of millions of users. If households nowadays want a computer, it is not to compute, but to go on the Web.

The historic NeXT computer used by Tim Berners-Lee in 1990, on display in the Microcosm exhibition at CERN. It was the first web server, hypermedia browser and web editor
Early Search Engines
Early search engines held an index of a few hundred thousand pages and documents, and received maybe 1000-2000 enquiries per day. Today, a top search engine will index hundreds of millions of pages, and respond to tens of millions of queries per day.
There are three basic types of search engines: crawler-based, human-powered, and a combination of both. Search Engines act as a way of connecting their users with relevant results when typing in a search query. This is where the web’s first robot (also called spiders or Web crawlers because they ‘crawl’ over the Web) based engine came into play. In June 1993 Matthew Gray introduced the World Wide Web Wanderer. The spider was created initially to measure the growth of the web and to count active web servers. He soon upgraded the spider to capture actual URL’s. This database became knows as the Wandex. The Wanderer was as much of a problem as it was a solution because it caused system lag by accessing the same page hundreds of times a day. It did not take long for him to fix this software, but people started to question the value of it.
The first META tag crawler was called ALIWEB and allowed users to submit their pages they wanted indexed with their own page description. This meant it needed no spider to collect data and was not using excessive bandwidth. The downside of ALIWEB is that many people did not know how to submit their site. One problem was that the engines would only update themselves every 6 or so months. Before PPC was introduced the only way of getting your site noticed was to ensure that it was correctly designed and would be picked up by a spider.
Search engine optimisation first came along in the mid 1990s when the first search engines began cataloguing the contents of the Internet. Initially the entire procedure was fairly honest and a fair reflection of what content there was on the web. Sites were submitted to the search engines where a spider crawled the content and then stored the collected data in a database that could be accessed by individuals performing a search (please see diagram below).

When a search engine spider detects new content on the Internet it downloads a page where it is stored on the engine’s own server. Once on the server a second program (known as an indexer) extracts information about the page as well as all of the links it contains. This page is then placed into a depository of pages to be crawled at a later date. When Google is running at peak performance using four spiders, their system can crawl over 100 pages per second, generating around 600 kilobytes of data each second. When the Google spider looked at an HTML page, it took note of two things: The words within the page and where the words were found.
Words occurring in the title, subtitles, META tags and other positions of relative importance were noted for special consideration during a subsequent user search. The Google spider was built to index every significant word on a page, leaving out the articles “a,” “an” and “the.” Other spiders take different approaches. Some spiders will keep track of the words in the title, sub-headings and links, along with the 100 most frequently used words on the page and each word in the first 20 lines of text. Lycos is said to have used this approach to “spidering” the Web. Other systems, such as AltaVista, go in the other direction, indexing every single word on a page, including “a,” “an,” “the” and other “insignificant” words. The push to completeness in this approach is matched by other systems in the attention given to the unseen portion of the Web page, the META tags.
META tags allow the owner of a page to specify key words and concepts under which the page will be indexed. This can be helpful, especially in cases in which the words on the page might have double or triple meanings — the META tags can guide the search engine in choosing which of the several possible meanings for these words is correct. There is, however, a danger in over-reliance on them, because a careless or unscrupulous page owner might add META tags that fit very popular topics but have nothing to do with the actual contents of the page – an example of this is included within the marketing section below. To protect against this, spiders will correlate META tags with page content, rejecting the META tags that don’t match the words on the page.

An example of how META tags get picked up naturally by Search Engines
That’s it for part one, I hope it’s been of interest to you. In part two we’ll look at how companies went about marketing online in the early days and how the landscape has changed through the years.
0
Comments
Leave a Reply