HTML Entity Reference for Common Characters

HTML Entity Reference for Common StuffThere are lots of references online where you can quickly search and find the necessary HTML code for embedding all sorts of symbols and characters into your web pages.

I find that most of the references I’ve seen are far too exhaustive. So for my own personal use, I put together a chart of the character entity references that I’ve needed the most.

Obviously, what consitutes “common” would vary from developer to developer, but I hope this list covers most of the most commonly used symbols and characters.

Name Character Entity
Copyright © ©
Registered ® ®
Trademark ™
Curly Open Double Quote “
Curly Closed Double Quote ”
Curly Open Single Quote ‘
Curly Closed Single Quote ’
Big Bullet/Dot •
Small Bullet/Dot · ·
Square Dot ⋅
En Dash –
Em Dash —
Cents ¢ ¢
Pound (Currency) £ £
Euro (Currency) €
Not Equal To ≠
Half (Fraction) ½ ½
Quarter (Fraction) ¼ ¼
Three-Quarters (Fraction) ¾ ¾
Degrees ° °
Left Arrow ←
Right Arrow →
Up Arrow ↑
Down Arrow ↓
Lowercase “e”, Grave Accent è è
Lowercase “e”, Acute Accent é é
Lowercase “c”, Cedilla ç ç
Ellipse …
Triangle Down (none)
Triangle Up (none)
Triangle Left (none)
Triangle Right (none)
Name Character Entity

Final Notes

In HTML5, as far as I understand, you could technically just copy and paste the character right into your document and it will validate just fine (and as pointed out in the comments, this is the strongly preferred method). If you’re concerned about how these characters are handled when entered into a database or form, then you might want to check out this article on Smashing Magazine, along with the comments.

Also, for the most part, pretty much any symbol can be found with a quick Google search. But like I said, many of the references are often too comprehensive and contain a lot of extra stuff that I don’t need.

Finally, when deciding what is “common”, keep in mind that I didn’t include any symbol that has its own key on most keyboards (e.g. “%” or “@”). Comment if you feel I’ve left out anything that qualifies as “common”. I know there are some accented characters that I’ve left out, so I’m considering doing a separate post that will have a chart of common foreign language characters. Maybe I’ll do that post mañana.

Update (June 24/2012): Some really interesting and enlightening comments have been posted regarding using special characters like these. In short, if you use the appropriate character encoding, you should be able to just embed the character as-is (without the entity or other syntax). There are cases where you need to use an entity reference (like when you want you want to actually display the entity to the user, like I’m doing in the table), but other than that, you should be fine with just copying the actual character right into your document’s source. Also, I originally had a column for the decimal notation, but a few people have pointed out that it’s not necessary, so I’ll trust their opinions for now and so I’ve removed that column.

Advertise Here

34 Responses

  1. You should add these too:
    » Right Arrow Quote
    « Left Arrow Quote
    Right Arrow Single Quote
    Left Arrow Single Quote

    • Yeah, I was on the fence about those ones, because technically I believe they are french quotation marks, no? I guess they do come in handy as pointers, but I thought I remember someone saying that it’s not good to use those characters as arrows, since they will be parsed as quotations marks. Although I’m not sure how big of a deal that really is.

      I’ll consider adding them, thanks.

      • Then you might also want to consider including „ and ‚ (not comma) which are used in many laguages as opening quotation marks. Their entity references are „ and ‚ [sic!] (That’s how it made it into the spec; should have been &bsquo; though.)

        There are also characters for eighth fractions… Are they “common”?

        What is this table for, anyway? OS (Windows, MacOS, …) provide character tables where the user can pick up from and include the real characters into the source code. No need for escapes.

        • Well, two reasons I put this together:

          1) Most references (including OS character tables) are too comprehensive, sometimes requiring you to fish for common stuff like a simple arrow. Maybe I haven’t found a quicker way to do it yet..? But it seemed more convenient to have the most common ones on a single page.

          2) I assumed that many characters still needed to be entered as entities, depending on how you were using them. But as this discussion in the comments has shown, it’s clear that entering the characters directly is the best method. I guess I’m still somewhat accustomed to escaping everything, for whatever reasons. This has been a good reminder, thanks.

  2. Jan!:

    “In HTML5, as far as I understand, you could technically just copy and paste the character right into your document and it will validate just fine. This is different from XHTML validation, which requires that you use the named or numeric entity reference to validate.”

    This seems incorrect to me. If you use (and specify) UTF-8 (or another Unicode encoding), you can use any character as-is. Entities are too much trouble, in general.

    PS: Could you fix the tabindex for this comment form? When I am in this textarea, pressing Tab does not put me in the name input, even though it is directly below the comment box.

  3. Norell Winburn:

    Since your last word was “mañana”, it might be worth including ñ as well!

  4. Daniel Yearwood:

    One of the biggest characters that should have been on this list that I was surprised not to was the ampersand &. It is one of the biggest problem children and should probably be one of the most used on the list. Otherwise, great article!

    &

    • Daniel Yearwood:

      & is what I meant to say, but I can see that it was noted in other comments above.

  5. Micah:

    In case you hadn’t read this, and I found this interesting myself, Google recommends to not use entity reference code anymore if you already declare UTF-8. It’d be interesting to know a little more background into who made this decision and why, but thinking about it made me kinda agree with Google. Encoding entities has become rather pointless for most current desktop/mobile browsers.

    https://google-styleguide.googlecode.com/svn/trunk/htmlcssguide.xml#Entity_references

  6. The trademark symbol does not exist in the ISO 8859-1 (Latin 1) character set. The decimal 153 code point mentioned above is from the Windows 1252 character set. To present the trademark symbol correctly for all your users, you can either always use the entity ™ or ™ — but really what you should do is use UTF8 and just enter the character natively: ™.

    • That should read “…use the entity ™ or ™”

    • Well spotted. There’s a paragraph about that in the already mentioned article Using character escapes in markup and CSS: “One point worth special note is that values of numeric character references […] are interpreted as Unicode characters – no matter what encoding you use for your document. It is a common error for people working on content encoded in Windows code page 1252, for example, to try to represent the euro sign using &⁠#x80;.”

      ™ has the code point U+2122, hence its character reference is &⁠#x2122;.

      Not only the numeric character reference for ™ is wrong, the same applies to the quotation marks, the bullet sign, and the dashes – all wrong.

      Haven’t I said that it’s better to use hexadecimal numeric references? It’s easier to remember that the ranges x0 to x1F and x80 to x9F contain control characters than in decimals 0 to 31 and 128 to 159.

      Correct the table, please. And replace decimal with hexadecimal references, or add a column and show both.

      • Sounds good to me. I’ve removed the decimal notation column completely. Thanks for all the info guys, I’m sure the discussion will be useful to many readers.

  7. fjpoblam:

    I may be a special case, but I use &iquest and &iexcl often. I live in the great desert southwest (Santa Fe, NM), so I have much use for Hispanic diacriticals in some of my local clients’ websites.

  8. Nice article, thanks for this!

  9. Mike:

    Triangle right and Triangle left are different sizes… Inconsistencies are not cool!

    [It’s the symbol that’s wrong, not the post styling]

    • You’re right, I’ve corrected it.

      It seems as though there are unicode characters for slightly smaller left/right triangles that don’t match any up/down arrows in size, but I’m not sure.

  10. Jan:

    Usefull article!

    For a quick conversion of HTML entities I use http://www.html-entities.org

  11. Check out this awesome typography tool for characters. :)

    http://copypastecharacter.com/

  12. This is great. I too had the same problem, so I put together a “searchable cheetsheet” at http://amp-what.com It fits/works nicely on a phone so it can always be handy.

Leave a Reply

Comment Rules: Please use a real name or alias. Keywords are not allowed in the "name" field. If you use keywords, your comment will be deleted, or your name will be replaced with the alias from your email address. No foul language, please. Thank you for cooperating.

Instructions for code snippets: Wrap inline code in <code> tags; wrap blocks of code in <pre> and <code> tags. When you want your HTML to display on the page in a code snippet inside of <code> tags, make sure you use &lt; and &gt; instead of < and >, otherwise your code will be eaten by pink unicorns.