A History of Search Engines

HistoryThe mystery of search engine algorithms begins to unravel when one studies their history. From 1994 through 2003, brilliant software engineers combined Mathematics, English, and organization skills in an effort to help users find what they needed on the Internet.

We’ve Come a Loooong Way

Do you remember your first home computer? In 1981, the TI-99 and Commodore 64 personal computers were all the rage. You had to type in your own programs using the Basic language, and save the programs and data on a cassette using a cable connected to a cassette recorder. If the power happened to go out 5 hours into the creation of your program (or if your young daughter decided she wanted to play on your computer), you would lose it all.

Then, as a personal home computer user, you may have upgraded to an HP PC. By 1994, an HP 486 could save programs and data right on the computer and print out copy on its own black and white printer—amazing! And, to make it even more exciting, this computer could actually go on the Internet! The Internet–it was the new frontier, where no man had gone before!

But once online, it was like being in virtual outer space! New users had to find out how to navigate this new and mysterious world wide web. Quite often, they were limited to sites they already knew, and would get lost in the virtual universe. Thankfully, Search Engines, invented by the best and the brightest computer programming engineers, were developed to help us find what we were looking for. Search engines gave us the freedom to search the Net!

» Early Search Engines were Maintained Manually

Some early search engines were created and maintained manually, by humans. This made them very expensive, slow to change and somewhat subjective. The number of sites they could index was limited, so they had to pick and choose the sites they thought would be important and/or popular. And, with the phenomenal rate of growth experienced by the world wide web, they couldn’t possibly keep up.

The Archie Legacy

» The First Search Engine was Developed for FTP Sites

The first search engine invented was “Archie”, created in 1990 by Alan Emtage, a brilliant student at McGill University in Montreal. The original intent of the name was “archives,” but it was shortened to Archie. This was before the creation of the world wide web, and was a search engine for FTP sites.

» Mathematics and English Unite! Statistical Analysis of Word Relationships

In February, 1993, 6 Stanford students had the idea of using statistical analysis of word relationships to make searching more efficient. They soon received funding for their project, Architext, and in mid 1993 they released copies of their search software. This was gobbled up by Excite™.

» Counting Web Server Idea spawned Capturing URLs and Indexing

In June 1993 Matthew Gray introduced the World Wide Web Wanderer. At first, he wanted to measure the growth of the web, and created the WWWW bot to count active web servers. It was just like a robot sent out to collect data on the moon! Later, he upgraded the bot to capture actual URL’s. His database became known as the Wandex—a welcome new inhabitant of planet Earth, online.

In October of 1993 Martin Koster created an Archie-Like Indexing of the Web, or ALIWEB, in response to the Wanderer. ALIWEB crawled meta information and allowed users to submit pages they wanted indexed with their own page description. ALIWEB didn’t need a bot to collect data and was not using a lot of bandwidth. The downside of ALIWEB was that many internet marketing promotion and advertising webmasters had no idea how to submit their sites.

Creating historyCreating History with the Bot Boom

By December of 1993, three bot-fed search engines had arrived on the web: JumpStation, the World Wide Web Worm, and the Repository-Based Software Engineering (RBSE) Spider. JumpStation gathered info about the title and header from web pages and retrieved these by using a simple linear search. As the web grew, JumpStation could not keep up, and eventually slowed to a stop. The WWW Worm was great because it indexed both titles and URL’s. Unfortunately, JumpStation and the World Wide Web Worm listed results in the order they found them. Since early search engines did not do link analysis or cache full pages, if you did not know the exact name of the site you were looking for, you might not ever find it.

» Yahoo!™ Celebrated its 10-year Anniversary in March, 2005

In 1994, Yahoo! was created by Jerry Yang and David Filo. It started as a listing of their favorite websites but each entry, besides the URL, also had a page description. With companies eager to invest in Internet opportunities, Yang and Filo received funding within a year, and Yahoo!, the corporation, was created. Yahoo! celebrated their 10-year anniversary in March, 2005. Congratulations! Yahoo!

» WebCrawler—First to Index Entire Pages

On April 20, 1994, Brian Pinkerton of the University of Washington released the first crawler to index entire pages, called WebCrawler. Soon it became so popular that it could only be used at night, when less people were trying to use it. An appropriate nick-name would have been NIGHT CRAWLER. WebCrawler was eventually bought by AOL™.

» Monstrous Lycos™ added Prefix Matching and Word Proximity to Search Results

On July 20, 1994, a monstrous new search engine named Lycos went public with a catalog of 54,000 documents. Besides providing results ranked by relevance, Lycos’ algorithm also used prefix matching and word proximity (very important in organic search engine optimization even now) in their search engine placement. The main advantage that Lycos had was its huge size. By August of 1994, Lycos had indexed 394,000 documents, by January 1995, it had 1.5 million documents, and by November 1996, Lycos over 60 million documents. If these documents were printed out and stacked on top of one another, they would reach… well, you get the idea. This was more pages indexed than any other Web search engine in history.

» Infoseek™ Offered Web Site Search Engine Submission

Infoseek, also started in 1994, offered a few extra features, such as allowing internet marketing promotion and advertising specialists  and other webmasters to submit their own pages to the search index. In December, 1995, Netscape agreed to use them as their default search, which instantly gave them increased popularity. Unfortunately, spammers soon took advantage of Infoseek’s generosity, and made it virtually impossible for searchers to get useful results.

» AltaVista™ Claimed Unlimited Bandwidth and Taught Search Tips

Next came AltaVista, with nearly unlimited bandwidth at the time. They were the first to have advanced searching techniques, even offering useful search tips to teach new search engine techniques and help users get better results. By November, 1997, Altavista, claimed that it handled roughly 20 million queries per day.

On May 20, 1996, the Inktomi Corporation launched its Hotbot search engine. Hotwire listed this search engine, and it too became very popular. In 2003, Inktomi was purchased by Yahoo!

Yes, the Bot Boom resulted in many new automated search engines with the capability of handling the large number of sites and searches that were being done. Although, they were an improvement, many search engine queries still returned lists of non-related information, or “junk” results. A search for Mars could have returned sites on Venus!

Back to Natural

» Ask Jeeves™ Used Human Editors

In April of 1997, Ask Jeeves was launched as a natural language search engine. Users were invited to use a complete question to find more relevant answers. Ask Jeeves, using DirectHit technology, used human editors to try to match search queries. You could Ask Jeeves, “Does Mars Have Any Water?” if you wanted to know.

» Teoma™  Developed Clustering Techniques to Determine Site Popularity

In 2000, the Teoma search engine was released. Teoma used “clustering” to organize sites by Subject Specific Popularity—they tried to find local web communities in order to find relevant sites. In 2001, Ask Jeeves swallowed up Teoma to replace their DirectHit search technology. In 2006, Ask Jeeves was renamed to Ask and Teoma was digested.

The History of Google™

» BackRub Analyzed Links Pointing to a Site

By January of 1996, Larry Page and Sergey Brin, both brilliant graduate students at Stanford University, had begun to work together on a search engine called BackRub. Not referring to what these explorers needed and deserved after the many hours they spent hunched over their keyboards, BackRub was actually named for its ability to analyze the “back links” pointing to a given website. BackRub ranked pages using “citation notation”.

In academic works, if someone cites a source, they usually think it’s important. On the web, citations are often links to a resource page. In BackRub’s PageRank algorithm, all links counted as votes, but some votes counted more than others. The ability of a given web page to rank, and the strength of its ability to vote for others, depended on how many people linked to it and how highly rated those links were—(the ‘new’ term for this is Social Search).

» Google = Googol = 10100

In 1998, Google was launched. Google’s lofty and altruistic mission is to organize the world’s information and make it universally accessible and useful. “We chose our system name, Google, because it is a common spelling of googol, or 10100 and fits well with our goal of building very large-scale search engines.” (Brin and Page)

» Continual Search Algorithm Enhancements: Advanced Filters, Crawling Patterns, Link Analyzation

Like other search engines, Google continues to improve. On November 15, 2003 they introduced hoards of new elements into their searches. Since then, they have come out with more advanced filters and crawling patterns to help make quality links count more. SEO consultants who offer search engine submission and web site optimization services, boost site visibility by Google’s search algorithm. In doing so, they include a myriad of factors such as keyword usage, domain strength, inbound links, user data and—of major importance to Google—quality content. The actual Google algorithm is a closely guarded secret and remains a mystery. But we do have several hints.

» The Quest: Include Only the Best Document Links in Search Reults

“People are still only willing to look at the first few tens of results. Because of this, as the collection size grows, we need tools that have very high precision (number of relevant documents returned, say in the top tens of results). Indeed, we want our notion of “relevant” to only include the very best documents since there may be tens of thousands of slightly relevant documents.” (Brin and Page)

As we human Internet users continue to develop and explore uncharted territories in this new frontier, search engines are indispensable for our navigation. They have already come a very long way in a very short amount of time, and only time will tell how they, and those of us who offer SEO services, will improve to make the most relevant sites show up at the Top Of The List!

