=========================================================================== Today on The World Vol. 4 #156 Monday, June 15, 1998 =========================================================================== Here's a simple (yet cryptic) addition to your HTML toolkit: entities. You can skip this section if you don't feel like tackling them -- it's filled with warnings along the lines of "these don't always work right" and they don't do anything spectacular anyway. But if you read this section you'll know how to put that umlaut on "fahrvergnugen". Also, there's one timely announcement. (kibo) --------------------------------------------------------------------------- URL: Big Ride Across America http://www.gtebigride.com/ This site is about an American Lung Association fundraising event - Big Ride Across America - a bike ride from Seattle, WA to Washington, DC which will begin tomorrow, June 15th. The ride will end August 1st and the web site will be updated throughout that period so that progress of the participants can be tracked hourly. The "What Is It?" page says: "In June 1998, 1,000 bike riders from all walks of life--most of whom have never done anything like this before--will come together to push their limits, inspire the nation with their dedication and drive, and help raise over $8 million for a worthy cause. The race will last for 6 1/2 weeks but the friendships, memories and self-knowledge gained will last a lifetime. This is where you can learn everything you need to know about the GTE Big Ride--from the basic facts of the race to sponsorship information and highlights of the trip." Many of the participants will be people who suffer from some form of lung disease. One of the riders, Mary Pierce, had a double lung transplant a few years ago and has ridden as a competitor in bicycle events worldwide since then. Another rider, Shirley St. Cyr, who is originally from New Hampshire, suffers from genetic emphysema. Shirley will be riding with her oxygen tank strapped to her bike. These two women as members of TeamAlpha1 participate in ALA sponsored cycling events across the country, to raise awareness of lung disease and the importance of organ donation as well as to raise funds. There is more information about TeamAlpha1 and about genetic emphysema at http://www.alphaone.org/teamalpha1/BigRide.htm and at http://www.alphaone.org/ (contributed by Mary E. Sayre) --------------------------------------------------------------------------- HTML TUTORIAL -- CHAPTER 8 -- ENTITIES 8.1 Entities Entities are used for inserting special characters (such as accented vowels, non-breaking spaces, or copyright symbols) into HTML files. They begin with "&" and end with ";", as in "&entityname;" and are one of the few things in HTML which is case-sensitive; in other words, "ö" and "Ö" are different (lowercase and capital O with an umlaut.) Try adding this HTML to your practice page:

This is ä test. © 1998. 2+2<5

If everything works you should see an "a" with an umlaut (two dots), a circled "C" copyright symbol, and a "less than" symbol (you wouldn't normally be able to put a "<" on your page because it's a "meaningful character" in HTML.) Entities give you ways of putting in exotic characters, and referring to things like "<" and ">" that you couldn't otherwise use:

If you want your text to look like this, type <B>this<B>.

Some browsers support wide ranges of named entities (like "©" for copyright), some support hundreds of numbered entities (like "©" for copyright), some support both. If you use exotic entities, it is best to test your HTML file with every version of every browser you can find for Mac OS, Windows, and UNIX, as the support for entities varies widely. Nonetheless, some entities are fairly standard:   non-breaking space © copyright (circled C) ® registered trademark (circled R) (some browsers also do ™ for a "TM" symbol -- note: it's not "&tm;") " double quotes (") & ampersand (&) < less than (<) > greater than (>)   is used when you want two words to stick together and not split up at the end of a line when the browser wraps it to fit in the window. For instance,

Hello, my name is J. P. Morgan, and my phone number is (555) 123-4567."

Try that and make your browser window several different sizes (narrower and narrower) to see where the line break occurs. (In addition to making names and phone numbers stick together, I often put   between the last two words of each paragraph so that I won't see the last word on a line by itself. You can also use it to make big spaces, by stringing  's together.)

Space:     The Final Frontier

© and ® make symbols. They were specified in an early version of the HTML standard, and are supported by the major browsers. There are hundreds of other symbols (all of which have named and numbered entities) but these do not always display correctly currently. Some of this is due to differences in the browsers, and some of it is due to different kinds of computers having different character sets to start with. More on the problems of entities later. The last four listed above, ", &, <, and > are "escapes" that let you put special characters in your HTML without having them actually do something. For instance, if you want to say "I hate ." without actually using the tag, you would write "I hate <BLINK>." In general, you don't need to use " unless you want to put quotes inside quotes (ALT="This picture says "Hi!"") You do need to use & if you want an ampersand with text immediately following it ("I have a stereo tuner& for sale" would be misinterpreted unless you said "I have a stereo tuner&amp for sale" -- do you see why?) These entities do not necessarily work correctly in special places, such as inside , although they're supposed to. I recommend only using them in the text of <P> and its relatives (<H1>, etc.) because not everything processes them when you tuck them in places like <TITLE>. Most annoyingly, my beloved   doesn't work right inside <TABLE>! One browser I tried made the table too wide, sticking off the right edge of the screen; another made the cells too wide, overlapping them; a third browser simply showed "& n b s p ;". So I don't recommend putting entities inside tables without extensive testing. (You haven't learned tables yet. Next chapter.) (If browsers actually followed the standard -- which says that entities are supposed to be translated to the single characters they represent before the page is processed and displayed -- this wouldn't be a problem and entities would work inside <TITLE> or <TABLE>. But the truth is, most Web browsers have incomplete support for entities.) By the way, I don't recommend using ™ to make a "TM" symbol, as it's not recognized by any of the versions of Netscape Navigator I have -- instead you can just use <SUP>TM</SUP> to superscript the letters "TM" the old-fashioned way (<SUP> is supported in all browsers I know of.) Tip: Be very sure to get in the habit of typing that semicolon at the end of each entity. THE SEMICOLON IS REQUIRED! REPEAT, THE SEMICOLON IS REQUIRED! Although most browsers will show " " (without the semicolon) as a non-breaking space, some will show the letters "& n b s p" under some circumstances unless you write it correctly, " ". 8.2 More Entities Regarding those hundreds of other entities I told you about, here are some of the most useful. Again, they don't necessarily show correctly in all browsers, so test thoroughly. ¢ American cent £ British pound sterling ¥ Japanese yen ¶ Paragraph mark (pilcrow) § Section mark · Vertically-centered dot × Multiplication sign (x) ÷ Division sign ¬ Not equal to ± Plus or minus µ mu ("micro-") ° degrees ¼ 1/4 ½ 1/2 ¾ 3/4 ² superscript 2 (squared) ³ superscript 3 (cubed) ¿ Spanish inverted "?" ¡ Spanish inverted "!" º Masculine ordinal (superscript "o") ª Feminine ordinal (superscript "a") « Left guillemot ("<<") » Right guillemot (">>") ç c with cedilla Ç C with cedilla ß German ss ligature (or sz) æ ae ligature Æ AE ligature ø slashed o Ø slashed O ð lowercase eth Đ capital eth þ lowercase thorn Þ capital thorn ä ë ï ö ü ÿ Lowercase with umlaut (dieresis) Ä Ë Ï Ö Ü Capital with umlaut (dieresis) á é í ó ú ý Lowercase, acute accent Á É Í Ó Ú Ý Capital, acute accent à è ì ò ù Lowercase, grave accent À È Ì Ò Ù Capital, grave accent ã ñ õ Lowercase with tilde (~) Ã Ñ Õ Capital with tilde (~) â ê î ô û Lowercase with circumflex (^) Â Ê Î Ô Û Capital with circumflex (^) To practice, add the German word "fahrvergnugen" to your Web page with an umlaut over the "u". Then make sure it's marked as a registered trademark of Volkswagen. Different computer operating systems or programs have different character sets. Windows uses a character set named "ISO Latin-1", aka "ISO 8859-1". However, Mac OS uses a different character set, which has more math symbols and fewer accented characters. For instance, Macs have µ, while Windows has ý -- if you try to display these characters on the other kind of computer, you'll likely get a different character instead. (And DOS programs and lynx are going to have different sets of characters available, too. Sheesh.) Anyhow, pretty much everyone can see the basic "foreign" characters like ä or é, but some of the other symbols (even ½) are only available on some computers. (Furthermore, older browsers may not recognize all these names.) The numbered entities are simply the ASCII number from the ISO Latin-1 character set (as in the numbers you can find in the Windows Character Map desk accessory.) In other words, you can pick any Windows character by number. (See the URL below for a list.) But because Windows uses ISO Latin-1 and other things use other character sets, you're still going to run into the problem of several of the characters showing up incorrectly on other computers. Description of the numbered entities (  an so on) can be found at http://www.w3.org/MarkUp/html-spec/html-spec_13.html The complete list of entities in the proposed HTML 4.0 standard is at: http://www.w3.org/TR/REC-html40/sgml/entities.html Detailed list of references on character sets, plus charts: http://www.bbsinc.com/iso8859.html My favorite reference on entities: http://www.utoronto.ca/webdocs/HTMLdocs/NewHTML/entities.html Another site which shows examples of named and numbered entities: http://www.uni-passau.de/~ramsch/iso8859-1.html I suggest not just typing the accented characters the normal way (with the alt or option key on your keyboard) because you never know how those will come out on random computers (they can't be translated automatically by the receiving browser because usually there is no information about whether they came from a Windows or Mac OS computer) but the entities like © or   are more likely to work (because both Windows and Mac OS have copyright symbols, even if they're in different parts of the character set.) In other words, if you've typed "curly quotes" or other characters into your HTML file the hard way, replace them with entities, because the computer at the other end of the Web will not know which character set your computer uses. (We won't go into "<META HTTP-EQUIV>" character set declarations here. That's not something you should monkey around with for quite a while.) 8.3 URL Encoding Just in case HTML doesn't seem half-baked yet, there's a different way of representing exotic characters which is used in some cases: Ever see a URL with "%7E" or "%20" in it? http://world.std.com/%7Ebzs You will probably never need to use those when you're creating pages, but I'm going to tell you about them so you'll know why they show up in your browser. These percent sign guys only show up in URLs (not in normal HTML, just in URLs.) The percent sign is followed by a two-digit hexadecimal value, in other words: %20 is a space %7E is a tilde (~) Spaces are not good things to have in your filenames, and a few computers in Antarctica or somewhere may not have tilde keys, so Web browsers tend to translate spaces and tildes into %20 and %7E just to make sure that the URL works. Don't worry about this. This is not to be confused with "Quoted-Printable" MIME encoding, which shows up in some E-mail (not the Web), and uses an equal sign instead of a percent sign. I won't even mention the other popular methods of encoding funny characters on the Internet, because they're not relevant to making Web pages. Well, you might use KOI-8 or JIS if you write in Russian or Japanese on your Web page, but that's outside the scope of the present tutorial. (kibo) ========================================================================== [] Send suggestions for tips & URLs to today@world.std.com. We're also collecting links for our Web pages at eyeguy@world.std.com. [] To contact CUSTOMER SUPPORT, send mail to support@world.std.com or call 617-739-0202. [] To subscribe to the "Today" mailing list, send a note saying 'subscribe announcements' to majordomo@world.std.com. Subscriptions to this mailing list are open to World customers only. [] The answer is "fahrvergnügen®". [] Today on The World is (C) Copyright 1998 by Software Tool & Die. Its contents may freely be redistributed as long as credit is given.