Search Engine Fundamentals
By Nancy Anne
The term Search Engine has become a catch-all
phrase for all kinds of search services. Without these free resources it would be much
more difficult to find anything anywhere on the 'Net. Searching is fundamental to
gathering information on the Internet. You can search different areas of the 'Net such as
Usenet (commonly called newsgroups), or the World Wide Web by using different search
services. Each service has their own way of compiling and collecting information.
There are two main kinds of search services commonly used on the Web: the
index, and the directory or subject guide. One way to think of the differences between
these two kinds of engines is to think of web sites as books. Indexes will catalog every
word in every book it looks at, and will list for you each page that contains word(s)
you're looking for. Directories and Subject Guides take the overall subject matter of the
books it looks at and lists the front covers of the books that match your word(s).
Indexes
You've probably heard of Alta
Vista and HotBot, -- both popular search indexes.
Indexes regularly scan the Internet for Web pages and record the HTML content and key
words. They also have the ability to follow any links associated with scanned pages and
get even more information.
The job of compiling data for indexes is done by spiders (also
called robots, bots, or crawlers ergo the names HotBot and WebCrawler),
software programmed by a human to automatically gather information from all over the 'Net
based on specific or broad search criteria. Most of the time spiders scan pages on the
fly, without the owner's knowledge or consent (if you don't want some or all of your web
pages scanned by spiders, you can write some HTML into your page to keep them out).
The advantages of this kind of service is their data bases are very large
and updated often by spiders working around the clock. They catalog Web pages in a
computational manner without human intervention. A search engine's spider catalogs all the
pages of a given web site, listing for you only the pages that match the words or phrases
you're searching for.
For instance, if you're looking for information about spiders,
you'll get over thirty-nine thousand hits (links to a Web page) from Alta Vista with the
word spiders in them. This means not only will you get pages referencing Internet robots,
you'll mostly get the eight-legged, living-in-your-shoe-and-going-to-bite-you kind of
spider.
A drawback to using services of this type is that sifting through so many
hits to find what you're looking for is sometimes a daunting task. Some indexes include a
number of options you can utilize to help narrow down your search criteria, such as search
for this exact phrase or search for any of the words on HotBot.
Directories/Subject Guides
On the other hand, Yahoo! and Magellan are hierarchical directories of web page
subjects. Each reference is entered and updated by a person manually, placing each web
address in a certain context much like your telephone company's Yellow Page directory.
People catalog the sites in a directory, so the hits often include reviews
and/or recommendations, which can guide you through the content of the pages quicker and
more easily.
To have a Web site listed in a directory you must submit it yourself, or
you can hire a company to do it for you. The directory has the last word on where they
catalog your site. This means directories contain far fewer sites than indexes do, but
they are better targeted to what word(s) you use to search.
For example, you enter the same key word spiders
in Yahoo!, and this time you'll get a list of categories like Science: Zoology:
Animals, Insects, and Pets: Arachnids or Computers and Internet: Internet: World
Wide Web: Searching the Web: Robots, Spiders, etc. Documentation which can narrow and
shorten your search significantly. You'll get fewer hits overall, and hits on pages with
headings and content within the context of the keywords you enter.
One drawback is that Yahoo's hits are usually to home pages (the first
page of a site) only, for instance it would hit a home page called Nancy's
Page-O-Spiders but not Nancy's Home which contains a page exclusively on
spiders. Another drawback to directories is that manually updating directories is tedious
and time consuming, and that means old sites that are no longer valid (dead links) are
often listed long after their demise.
Hybrids
Some search services use both schemes -- they are both an index and a
directory, like Infoseek and Excite. These services occasionally send out a spider to
collect and cull Web sites, alongside people cataloging sites that are submitted by Web
developers.
Yahoo's directory is one of the the best on the Web, but their service is
limited. To fill the gaps in their service, Yahoo! teamed up with Alta Vista to
automatically send your query there if your Yahoo! search found no matching hits.
Use Them...All
As a rule of thumb, if I'm not exactly sure what I'm looking for, like modems,
I'll start with a directory, which will show me lots of modem brand hits and companies
that sell modems. But, if I know I'm looking for information about a specific brand of
modem, I'll use an index, which will show me many sites with that particular brand name
listed somewhere on the page.
No one service catalogs the whole web. Each service logs parts of it and
there is overlap. Services also put their own spin on how they rank hits. For instance,
some advertisers pay for their sites to be listed on some services, so their sites get
priority listing, being listed in your search even if their site has nothing to do with
what you're looking for. Knowing this, it's a good idea to use more than one search
service when you're looking for something.
There are hundreds of search tools out there, so don't only use the big,
popular ones. There are even specific search services for special interests, such as art
or science. Some search email, home addresses or phone numbers, some usenet only, some
search both and more. Look for all-in-one search engines like Dogpile or Metafind
that enter your key words into many engines at once, which result in the first set of ten
or so hits of several services listed on one seemingly unending page.
Try using a variety of search services using your favorite hobby as the
key word(s) and you'll see the radically different hits you'll get with each directory and
index. No one service is perfect, so use as many as you have time for. Using many search
engines will also help you get a feel for how the different kinds of services work. You'll
soon find yourself using a favorite engine to find all the information you need quickly
and painlessly.
Here Today, Gone Tomorrow
The Internet is in a state of constant change. Internet addresses
disappear as fast or faster than new ones are created. Many sites relocate without telling
anyone, and dead listings are everywhere, so finding a page that's moved requires that you
utilize more than one engine.
Happy searching!
Copyright © 1997 Nancy Anne all rights reserved.
|