(@) Main | About The World | Web Hosting | Help | Memo | Login | WebMail login


Making your Web pages findable in search engines

(From "Today on The World")


1.) Getting listed

One of the first questions people ask when they're putting up a Web page is "How do I get it listed in all the popular search engines?"

You don't have to fool around with <META> tags or use a special program or pay anyone money. Really. Search engines are designed to list everyone who wants to be listed -- just go to the search engine's front page and look for a link named "Add a URL", "Index My Site", "Please List Me", etc.

For instance, let's go to AltaVista:

http://www.altavista.com

At the bottom of the main page is a set of links:

> About AltaVista | Set your Preferences | Add a Page | Text-Only Version

Following that link will show you a page of policies ("Please don't spam us") and give you a little form where you can type in your URL.

That's all you have to do. Give it the URL of your front page, and the "crawler" from the search engine will go there the next time it runs (usually in a day or two) and follow all your links to the other pages on your site. (Most search engines work this way.) Let's try it with HotBot:

http://www.hotbot.com

Again, the link we want is way down in the bottom right corner in the tiniest text imaginable:

> About Wired Digital | Our privacy policy | Text-only version | Add URL

This also presents a simple form where they ask for the URL of your site (as well as the common "Check this box if you want us to spam you" item that shows up on everyone's feedback for these days.)

So just go to each search engine you know and look for something you can click on to add your site. (Sometimes it's pretty well hidden.)

I love Google (www.Google.com) but Google's site does not make it easy to find their "Add A URL" link (I had to search Google's site to find it): Add A URL to Google ...note that Google will likely find your site by "crawling" links from other people linking to it, even if you don't manually add your site to Google.

There are also other services you can use, and programs you can buy, that go to a dozen search engines and send in that URL for you, but why bother with that when it's so trivial to do it yourself?

Different search engines have different amounts of crawling on their to-do lists, and they crawl at different levels of aggressiveness, so depending on the search engine it may be a matter of hours or days or weeks before your listing is added. Also, some update their database more frequently than others, so when one of your pages changes, they may still list the old content for a while (of course, you can always re-submit your URL when you change your pages...)

And, because the search engines work by following links from page to page, they may well have found your site (or at least part of it) already if you have friends who have linked to your site from theirs. The more links there are TO your site, obviously, the more likely it is that both human beings and robots will visit you.

You can stop here if you're happy just being listed and don't feel the urge to tweak your site in minor ways to be more competitive with other sites.




2.) Moving upwards

So what IS the deal with all those pages that have fifty zillion copies of their keywords at the bottom of the page in semi-visible black-on-black lettering? What IS the deal with the mysterious <META> tag?

Some people want to make it more likely that their site is listed near the top of the search engine's results (usually because they're selling something or because they're offering something that is also available from several better/more popular sites) so they take advantage of the search engines' page-ranking technique to improve their site's score. Know how some search engines give you a little "confidence rating" (100%, 90%, 20%...) for each site? If you understand the algorithm the search engine is using for these calculations, you can improve your score.

Each search engine uses a different technique (with some overall similarities) and in many cases we can only speculate on how they work (the search engine makers don't want to tell the people advertising porn how to get their site listed on everyone's search results!) So please take this advice with a grain of salt of unknown size and shape.

Let's assume that your page is "Fred's library of Bee-Keeping tips". In general, search engines may be scoring your site based on these things:

<META> is the all-purpose "do with it what thou wilt" tag left purposefully undefined in the HTML specification. In the words of the W3C HTML 4.0 specification:

> The META element is a generic mechanism for specifying meta data.

Its reason for existence is that it gives you a place to put special-purpose items that people aren't meant to see, in other words, the people who write a program for creating or editing HTML can store their own special stuff there without affecting anything else. Some search engines have decided to encourage the use of <META> for lists of keywords, so that your page can be counted as containing keywords without having to have them all in the "clear text" of your page. While it is not clear how many search engines can see <META>, and there is differing advice on how to format the keyword list (commas, spaces, or commas followed by spaces?) the form used most often is:

<META NAME="keywords" CONTENT="word,word,word,word">

...with commas between the words and no spaces after the commas. An example for the front page of the bee site could be:

<META NAME="keywords" CONTENT="bees,bee,beekeeping,bee-keeping,bee keeping,
hive,hives,apiary,apiaries">

Because some search engines, when you search for "bee", will find "bees" -- but some won't -- it makes sense to list both the plural and singular versions of nouns. (Especially in the case of "apiary"/"apiaries"!) Putting "apiary" here is a good idea because some people might refer to your hives with that fifty-cent Latin-derived word instead of the "hives" you've mentioned all over your page. The <META> keywords give you a chance to include the words you didn't say in the body of the page.

This doesn't mean you should pack every word in the dictionary onto your "Keywords" list. Most search engines weight things so that if your keyword list is "bees,spatulas,pork rinds,Pez,nougat,elastic, pants,bubble gum,socks" the keywords will be considered less important than if the list was just "bees,spatulas". Also, the keywords at the start of the list may be considered of greater importance than the ones at the end of the list. I recommend listing first the keywords that differentiate this page from the other related pages on your site, followed by the keywords which are the same on all your pages, with the really general stuff last. For instance, if you have a page about honey and a page about queen bees, you might use:

<META NAME="keywords" CONTENT="honey,clover,flowers,bees,bee,
beekeeping,apiaries">

<META NAME="keywords" CONTENT="queen,queens,queen bee,queen bees,
bees,bee,beekeeping,apiaries">

Why do we have "queen", "bees", AND "queen bees"? Because the person looking for bee information might be searching for "The phrase 'queen bees'" or "'queen' and 'bees'". Compound words (including hyphenated ones) and phrases can be included both as individual words and as things with spaces (or hyphens) in them to ensure that you cover all the variants. (Don't forget variant spellings! A page about colors might want to list "colours" for the British people.)

Note that if the list included "bees,bees,bees,bees" that would probably not help you much if someone were looking for "bees". Anyone who really wants to find pages about bees should be able to find your page by default even if you don't use every trick in the book to lure them in -- that's what the search engines are for. Using repeated words or enormous lists of words unrelated to the topic at hand will just set off the search engines' bozo detector. As HotBot says:

> If HotBot recognizes any spoofing technique, it will severely penalize a
> page's ranking.

There are a zillion other things you can do with <META>, but the only other kind of <META> relevant to the search engines (or so I'm told) is "description":

<META NAME="description" CONTENT="Beekeeping tips and tricks from
a bee person who has been stung over 1,000,000 times.">

...most search engines don't pull out that when generating a summary of all the sites they found, but a few will show that as the description of your site (whereas most will show either the <TITLE> or the first couple of lines of the body text.)

Note that there are a lot of free or shareware or overpriced commercial programs that claim to generate <META> tags to improve your site's ranking. They basically work like this:

PLEASE TYPE IN YOUR KEYWORDS HERE:

> bee,bees,beekeeping

GREAT! NOW ADD THIS TO YOUR PAGE AND GIVE US $50:

<META NAME="keywords" CONTENT="bee,bees,beekeeping">

Big deal!




3.) Getting unlisted

Note that to REMOVE your listing from search engines, in general, there are ways of doing this (but they vary, and it's not always possible.) Most search engines let you follow the "robots.txt" standard for keeping robots off your site; however, you have to have your own domain name (www.yourname.com) to be able to do this (Home Page Alone customers on The World can't make a robots.txt file for world.std.com!)

Some search engines (not all) can be told to NOT index your site with a <META> tag:

<META NAME="robots" CONTENT="noindex">

If you don't want your page to ever be listed in the search engines, don't submit your page to them, and don't tell any of your friends to make links to your page (remember, the search engine crawlers follow all the links they see) and that will greatly decrease the chances that anything (or anyone) will blunder across your pages without knowing the exact URL.




4.) References

Search Engine Watch's "How To Use <META> Tags":

http://searchenginewatch.internet.com/webmasters/meta.html

...has a brief tutorial on using <META> for keywords and description, as well as a very good set of links to related tutorials. (Also poke around the rest of the Search Engine Watch site for other useful tips!)


General information on <META> tags of all kinds:

http://vancouver-webpages.com/META/

http://www.webdeveloper.com/categories/html/html_metatags.html

http://www.yahoo.com/Computers_and_Internet/Information_and_Documentation/Data_Formats/HTML/META_Tag/


Official W3C HTML specification for <META>:

http://www.w3.org/TR/REC-html40/struct/global.html#h-7.4.4

And the W3C has notes on helping search engines index your site:

http://www.w3.org/TR/REC-html40/appendix/notes.html#recs

(that last one also covers robots.txt.)


Robot Exclusion Standard (for robots.txt):

http://info.webcrawler.com/mak/projects/robots/norobots.html



(@) THE WORLD Comments? Questions? Problems? Contact us.
Page last modified September 5, 2003.
Web site contents & design Copyright © 2009 Software Tool & Die.

Legal information. | Privacy policy.
Spammers are crooks! Don't do business with crooks!