CodeinWP CodeinWP

HTML Entity Reference for Common Characters

HTML Entity Reference for Common StuffThere are lots of references online where you can quickly search and find the necessary HTML code for embedding all sorts of symbols and characters into your web pages.

I find that most of the references I’ve seen are far too exhaustive. So for my own personal use, I put together a chart of the character entity references that I’ve needed the most.

Obviously, what constitutes “common” would vary from developer to developer, but I hope this list covers most of the most commonly used symbols and characters.

Name Character Entity
Copyright © ©
Registered ® ®
Trademark ™
Curly Open Double Quote “
Curly Closed Double Quote ”
Curly Open Single Quote ‘
Curly Closed Single Quote ’
Big Bullet/Dot •
Small Bullet/Dot · ·
Square Dot ⋅
En Dash –
Em Dash —
Cents ¢ ¢
Pound (Currency) £ £
Euro (Currency) €
Not Equal To ≠
Half (Fraction) ½ ½
Quarter (Fraction) ¼ ¼
Three-Quarters (Fraction) ¾ ¾
Degrees ° °
Left Arrow ←
Right Arrow →
Up Arrow ↑
Down Arrow ↓
Lowercase “e”, Grave Accent è è
Lowercase “e”, Acute Accent é é
Lowercase “c”, Cedilla ç ç
Ellipse …
Triangle Down (none)
Triangle Up (none)
Triangle Left (none)
Triangle Right (none)
Name Character Entity

Final Notes

In HTML5, as far as I understand, you could technically just copy and paste the character right into your document and it will validate just fine (and as pointed out in the comments, this is the strongly preferred method). If you’re concerned about how these characters are handled when entered into a database or form, then you might want to check out this article on Smashing Magazine, along with the comments.

Also, for the most part, pretty much any symbol can be found with a quick Google search. But like I said, many of the references are often too comprehensive and contain a lot of extra stuff that I don’t need.

Finally, when deciding what is “common”, keep in mind that I didn’t include any symbol that has its own key on most keyboards (e.g. “%” or “@”). Comment if you feel I’ve left out anything that qualifies as “common”.

34 Responses

  1. You should add these too:
    » Right Arrow Quote
    « Left Arrow Quote
    Right Arrow Single Quote
    Left Arrow Single Quote

    • Yeah, I was on the fence about those ones, because technically I believe they are french quotation marks, no? I guess they do come in handy as pointers, but I thought I remember someone saying that it’s not good to use those characters as arrows, since they will be parsed as quotations marks. Although I’m not sure how big of a deal that really is.

      I’ll consider adding them, thanks.

      • Then you might also want to consider including „ and ‚ (not comma) which are used in many laguages as opening quotation marks. Their entity references are „ and ‚ [sic!] (That’s how it made it into the spec; should have been &bsquo; though.)

        There are also characters for eighth fractions… Are they “common”?

        What is this table for, anyway? OS (Windows, MacOS, …) provide character tables where the user can pick up from and include the real characters into the source code. No need for escapes.

        • Well, two reasons I put this together:

          1) Most references (including OS character tables) are too comprehensive, sometimes requiring you to fish for common stuff like a simple arrow. Maybe I haven’t found a quicker way to do it yet..? But it seemed more convenient to have the most common ones on a single page.

          2) I assumed that many characters still needed to be entered as entities, depending on how you were using them. But as this discussion in the comments has shown, it’s clear that entering the characters directly is the best method. I guess I’m still somewhat accustomed to escaping everything, for whatever reasons. This has been a good reminder, thanks.

  2. Jan! says:

    “In HTML5, as far as I understand, you could technically just copy and paste the character right into your document and it will validate just fine. This is different from XHTML validation, which requires that you use the named or numeric entity reference to validate.”

    This seems incorrect to me. If you use (and specify) UTF-8 (or another Unicode encoding), you can use any character as-is. Entities are too much trouble, in general.

    PS: Could you fix the tabindex for this comment form? When I am in this textarea, pressing Tab does not put me in the name input, even though it is directly below the comment box.

  3. This is different from XHTML validation, which requires that you use the named or numeric entity reference to validate.

    No. Not. At. All.

    Of course you can use any Unicode character (sans a few control characters) in XHTML (or XML in general) – as long as you use an appropriate character encoding (UTF-8 being best practice).

    You even should do so: “It is almost always preferable to use an encoding that allows you to represent characters in their normal form, rather than using character entity references or NCRs [numeric character references]. Using escapes can make it difficult to read and maintain source code, and can also significantly increase file size.” (Using character escapes in markup and CSS)

    The article lists exceptions: syntax characters, not-easily entered characters, invisible or ambiguous characters.

    when deciding what is “common”, keep in mind that I didn’t include any symbol that has its own key on most keyboards

    There are characters that do have a key on most keyboard, but must be escaped in HTML code: < and &. Also " in double-quoted attribute values and ' in single-quoted attribute values.

    You might want to include those characters to your table. Keep in mind that apos was not defined in HTML 4, so &apos; might not work in older browsers.

    BTW, there’s no such thing as “numeric entity reference”. There’s character entity references (references to character entities, eg. &amp; refers to the entity amp) and numeric character references (no “entity” there; eg. hexadecimal & or decimal & refer to the code point U+0026).

    But don’t use decimal numeric character references! Unicode code points are always given in hexadecimal nuimbers, eg. U+0026. There’s really no reason for the decimal notation. It only causes trouble and effort to convert decimal to hexadecimal when looking up a character in a chart. (Browsers that don’t support hexadecimal numeric character references are looooong gone.)

    • You’re right, Gunnar. Thank you. For XHTML, I was thinking of the fact that certain characters (like “&”) had to be entered with their entity value in order to validate. I was used to always entering symbols with their entity or numeric values. Been too long since I worked on an XHTML document.

      I didn’t want to get into too many details about character encoding, because it’s really over my head, which is why I referenced that SM article that has some good info and comments.

      • Certain characters (as mentioned < and &, also ' and " in attribute values), yes. But that’s not XHTML-specific, it applies to HTML as well, cf. chapter 5.3.2 and appendix B.2.2 in the HTML 4.01 spec.

        (The HTML5 spec is such a mess that I couldn’t easily find it there.)

        It might be possible not to escape those characters in HTML when they are followed by a (white)space, though.

        HTML: “you should, but not need to under certain circumstances”
        XHTML: “you must”

        That’s what makes me prefer XHTML (or polyglott markup in HTML5-speech) any time.

        As for articles on character encodings, the W3C Internationalization site is a good source.

    • Geez, it should have read: hexadecimal & or decimal & refer to the code point U+0026

      Have I failed despite of the nice preview?

      Wow, a preview for comments! And a hint above the input field. And a warning to check your comment if there’s HTML in it. Nice! I really miss that in other blogs.

      • Haha… thanks. It does cause a lot of delays in commenting though. But thanks for approving of it, as I have considered removing those things. :)

      • Ha! I’ve typed & a m p ; # x 2 6 ; and & a m p ; # 3 8 ; this time (like the time before, I guess), so it’s not my fault that it hasn’t appeared as it should have.

        Why did it work on & a m p ; a p o s ; but not on numerical references? Because of the <code> tags? Check: & & &

        Let’s try to trick the system: It should have read: hexadecimal &⁠#x26; or decimal &⁠#38; refer to the code point U+0026

        • Hmm… I’m not sure. I’m assuming WordPress is doing something to eat up those characters and convert them to “as-is” characters. I’ll have to do some testing to see what’s going on.

          • The funny (read: bad) thing is that the preview differs from the appearance once the comment gets submitted. I’m pretty sure I had checked that my comments looked OK in the preview. However, it turned out they did not in the real thing.

          • I know. It has flaws, but it’s better than nothing I guess.

            Usually if people post two comments in a row correcting something in the first comment, I’ll just edit the comment to fix it and delete the second comment.

  4. Norell Winburn says:

    Since your last word was “mañana”, it might be worth including ñ as well!

  5. Daniel Yearwood says:

    One of the biggest characters that should have been on this list that I was surprised not to was the ampersand &. It is one of the biggest problem children and should probably be one of the most used on the list. Otherwise, great article!

    &

  6. Micah says:

    In case you hadn’t read this, and I found this interesting myself, Google recommends to not use entity reference code anymore if you already declare UTF-8. It’d be interesting to know a little more background into who made this decision and why, but thinking about it made me kinda agree with Google. Encoding entities has become rather pointless for most current desktop/mobile browsers.

    https://google-styleguide.googlecode.com/svn/trunk/htmlcssguide.xml#Entity_references

  7. Fred Condo says:

    The trademark symbol does not exist in the ISO 8859-1 (Latin 1) character set. The decimal 153 code point mentioned above is from the Windows 1252 character set. To present the trademark symbol correctly for all your users, you can either always use the entity &trade; or ™ — but really what you should do is use UTF8 and just enter the character natively: ™.

    • Fred Condo says:

      That should read “…use the entity &trade; or ™”

    • Well spotted. There’s a paragraph about that in the already mentioned article Using character escapes in markup and CSS: “One point worth special note is that values of numeric character references […] are interpreted as Unicode characters – no matter what encoding you use for your document. It is a common error for people working on content encoded in Windows code page 1252, for example, to try to represent the euro sign using &⁠#x80;.”

      ™ has the code point U+2122, hence its character reference is &⁠#x2122;.

      Not only the numeric character reference for ™ is wrong, the same applies to the quotation marks, the bullet sign, and the dashes – all wrong.

      Haven’t I said that it’s better to use hexadecimal numeric references? It’s easier to remember that the ranges x0 to x1F and x80 to x9F contain control characters than in decimals 0 to 31 and 128 to 159.

      Correct the table, please. And replace decimal with hexadecimal references, or add a column and show both.

      • Sounds good to me. I’ve removed the decimal notation column completely. Thanks for all the info guys, I’m sure the discussion will be useful to many readers.

  8. fjpoblam says:

    I may be a special case, but I use &iquest and &iexcl often. I live in the great desert southwest (Santa Fe, NM), so I have much use for Hispanic diacriticals in some of my local clients’ websites.

  9. synlag says:

    Nice article, thanks for this!

  10. Mike says:

    Triangle right and Triangle left are different sizes… Inconsistencies are not cool!

    [It’s the symbol that’s wrong, not the post styling]

    • You’re right, I’ve corrected it.

      It seems as though there are unicode characters for slightly smaller left/right triangles that don’t match any up/down arrows in size, but I’m not sure.

  11. Jan says:

    Usefull article!

    For a quick conversion of HTML entities I use http://www.html-entities.org

  12. Bryan Hyshka says:

    Check out this awesome typography tool for characters. :)

    http://copypastecharacter.com/

  13. Andy P says:

    This is great. I too had the same problem, so I put together a “searchable cheetsheet” at http://amp-what.com It fits/works nicely on a phone so it can always be handy.

Leave a Reply

Comment Rules: Please use a real name or alias. Keywords are not allowed in the "name" field and deep URLs are not allowed in the "Website" field. If you use keywords or deep URLs, your comment or URL will be removed. No foul language, please. Thank you for cooperating.

Markdown in use! Use `backticks` for inline code snippets and triple backticks at start and end for code blocks. You can also indent a code block four spaces. And no need to escape HTML, just type it correctly but make sure it's inside code delimeters (backticks or triple backticks).