A Call For Better Fragment Identifiers

Where would the web be without links? Links are what hold together what we know as the World Wide Web. Without links, the World Wide Web would be more appropriately called the World Wide Set Of Unrelated Pages, or, incidentally, WWSOUP.

While it’s great how simple and effective the process is of “linking” pages together, I think there’s room for improvement.

If you’ve never heard of the term fragment identifier, well, that’s just the official name for the part of a URL that follows the hash symbol (“#”). Some people refer to links with fragment identifiers as “in page links”. So for example, in the following URL, the fragment identifier would be the string “scroll-to-fragid”:

http://www.whatwg.org/specs/web-apps/current-work/multipage/history.html#scroll-to-fragid

If you visit the above URL (which is the WHATWG HTML5 spec that discusses fragment identifiers), the page will automatically jump to the section that’s been identified by the browser as the “scroll-to-fragid” section.

Fragment identifiers also come in handy for deep linking and preserving state in Ajax-based applications. So these certainly have an important role on the web.

How Are Fragments Identified?

In order for the browser to correctly identify which section of the page the window should scroll to, the fragment needs to be identified within the HTML by means of the id attribute. This means that if a web developer hasn’t done his job, then there will be no way to link to a specific fragment of any particular document.

So, if you were linking to a document that was five screens long that didn’t have any id attributes in the source, and you wanted to link to a specific section three screens down, you would have no way to do this. You’d have to link to it, then place the words “Scroll down to section B” or something ridiculous like that.

The Problem With Fragment Identifiers

The simple problem that I see with fragment identifiers is that their existence and functionality relies completely on the developer rather than the browser. Yes, the browser needs to read and interpret the identifier and identify the matching fragment. But if the developer doesn’t include any id attributes in the HTML of the page, then there will be no identifiable fragments.

Do you see why this is a problem? Whether the developer has coded identifiers into the HTML has nothing to do with whether or not the page actually has fragments. Virtually every web page has fragments. In fact, sectioning content as defined in the HTML5 spec implies as much. Every element on the page that can contain content can theoretically be categorized as a “fragment”.

So why is it up to the developer (or content creator) to define whether or not a specific portion of the content can be linked to? When any page of content is created, there is no way of knowing which sections of the page are worthy of being identified. The developer or content creator may have a general idea of how a page’s content might be divided up, but ultimately it will be the linking resource that should have full control over what portion of the page they want to highlight.

That, after all, is how linking works. A page that’s displayed as a result of a web-based hyperlink is displayed to the end user only because the referrer (i.e. the page linking to it) defined the link that way. This means that, regardless of what the developer has done behind the scenes in the HTML, all HTML fragments on that page should be identifiable by external referrers.

The Solution: Power to the Browser and User

The solution, as I see it, is for the HTML spec to require that browsers have an internal mechanism for identifying fragments that can optionally be overridden by the developer. Just as the browser, by default, makes all links blue and underlines them, and allows these styles to be changed via CSS, likewise the ability to link to specific sections of a page should be built into the browser, and then the developer should have the option to change this.

Here’s a simple example of how this might be implemented. Suppose you have the following HTML page:

<h1>Page Title</h1>
<p>Some introductory text.</p>

<h2>Page Subhead 1</h2>
<p>Some text for subhead 1.</p>

<h2>Page Subhead 2</h2>
<p>Some text for subhead 2.</p>

<h2>Page Subhead 3</h2>
<p>Some text for subhead 3.</p>

<h2>Page Subhead 4</h2>
<p>Some text for subhead 4.</p>

This type of structure is common on almost all blog posts. The post is divided into sections by means of headings, but unless the developer actually hard-codes id attributes onto each heading tag, there is no way to link to any of those unique sections of the page.

To solve this problem, the browser should allow native fragment identifiers that use the HTML elements themselves in a CSS selector-like fashion. So if you wanted to link to “Page Subhead 3” in that HTML page, you could do something like this:

<a href="http://www.example.com/example.html#h2:3">Check this out!</a>

Notice the string h2:3 that appears after the hash symbol. This tells the browser to link to the third <h2> element on the page. This example, of course, is just theoretical, and not meant to imply that this is the way it will be implemented. This is just to illustrate how it could be done without being dependent on developer-added attributes.

Why Should Fragments Be Identified By Users?

The reason fragments should be identifiable by users is because a user, not the content creator or the developer, will ultimately decide whether or not a portion of content is valuable or notable in some way.

Yes, the content creator should have the ability to decide how a page is generally divided, if they choose to do so. But the end user should not be restricted from linking to content fragments just because a developer couldn’t be bothered to add id attributes to every element on the page. And that’s besides the fact that it would be a waste of time for a developer to do that or to have to build a CMS that does it automatically.

Blog Comments Get It Right

Linking directly to someone’s blog comment is very useful. Even if a blog doesn’t have an active link for each comment, it’s pretty easy to use developer tools to find the comment’s id and link to it. I’ve done this many times on Smashing Magazine (they don’t have live links on each comment).

If there was no way to link to an individual blog comment, this would be a great hindrance to linking on the web. It would not be enough to link to the “#comments” section and then hope for the best. So CMSs like WordPress do the right thing by dynamically adding a unique identifier to each comment.

As mentioned, this saves the content creator from having to do it themselves, and puts the identifiability (or, the decision on what’s valuable) in the hands of the user or the referring website.

It’s Already in the Works

Being fearful of writing an article like this and having someone smarter poke holes in my proposal, I ran a draft of this piece by Paul Irish and he pointed out that an improvement to fragment identifiers is already in the works, but in very early stages.

A developer named Simon St. Laurent is hosting an “unofficial draft” of a specification called Using CSS Selectors as Fragment Identifiers. The draft is authored by St. Laurent and Eric Meyer and seems to be in the works for about a year (based on the date on that page). There’s even a jQuery script with a GitHub repo that attempts to implement this new type of fragment identifier. (Thanks to Ahmad for the GitHub link.)

And on a related note, media fragments (i.e. deep linking in audio and video, similar to what you can do on YouTube) have now been introduced and have some browser support (evidently WebKit and Firefox). Check out this part of the spec for the syntax.

All credit to Paul Irish for filling me in on these details.

Conclusion

Although implementing better fragment identifiers could be a challenge to support and publicize, for the reasons I’ve explained here, I think it’s a worthwhile addition to the HTML/CSS spec. I’m glad someone is already working on a proposal for this, and I hope this article serves to help make this known so that control of linking to content fragments ends up where it’s supposed to be: in the hands of users.

Scott says:

February 2, 2012 at 7:53 am

I like the idea in theory, however it doesn’t feel very robust – you are relying on the layout of the page in question. Many authors regularly revisit articles and add new sections and paragraphs. Now your #h2:3 link points to a previous section. (This is far less likely to happen with IDs.)

Using full CSS selectors feels even worse as sites can redesign and your link to .main-content > h2:nth-child(2) (which itself is horribly verbose) is now broken.

wheresrhys says:

February 2, 2012 at 10:20 am

Although I do like the idea in principle it’s difficult to see how it could be implemented. Scott above makes a valid point for a start. Also, how would an average user (by which I mean not somebody who knows css) create a link? Browsers would have to come up with an easy to use interface for doing so and also, I would guess, some pretty sophisticated heuristics to choose the appropriate selector (e.g. let’s say a user right clicks a point in the document and chooses “link to here”, how does the browser know whether to use e.g p:3, or p.bodyText:1, and how can it cope with users clicking slightly inaccurately – how can it guess that a user meant to click on a h3 tag if they click 1px outside and hit the page’s wrapper).

Perhaps a subset of html could be linkable via the tagname and title attribute, which also has the advantage of keeping the fragment readable to the lay user e.g

http://mysite.com/myarticle#section:It's fine in practice, but does it work in principle?

Which isn’t much better than the existing id only linking, and also depends on developers doing things right.

Patrick Samphire says:

February 2, 2012 at 1:46 pm

I’m inclined, reluctantly, to agree with the first two comments. The content of a page is far too dynamic and subject to change for this to be at all reliable, and I think it would lead to far too many incorrect links as content evolved to be worth implementing.

That’s not to say that the current system is great, but at the very least browsers could offer the option to link to any ids detected in the page (or at least those appearing on certain elements; not sure there would be much call for linking to individual nav items, for example).

Ahmad Alfy says:

February 2, 2012 at 8:49 pm

Just if you wanna watch for updates for the script mentioned above, here is a link to the script on GitHub

https://github.com/vsa-partners/jQuery-Fragment-ID/tree/master/js

Thanks to Jeremy Kahn

Louis Lazaris says:

February 3, 2012 at 4:01 pm

Nice, thanks. I’ll link to that as well, as that will be more useful I think.

Larry Botha says:

February 6, 2012 at 3:32 am

Good article, Louis.

For this very reason, we have the hx’s on our site dynamically create id’s. I love being able to direct people directly to a particular part of a page when I find something interesting, but the problem is that only we, as developers, know how to do this.

I’m glad to hear that user friendly frag id’s are in the works.

itmitică says:

February 6, 2012 at 8:02 am

The only way I can think of this being possible, without imposing any further restrictions on the future development of the document, is by allowing users/browsers to author other’s documents, i.e. adding their own “id”s for the fragments of their choosing.

In documents that don’t belong to them. That doesn’t sound good, does it?

Or, a central warehouse will have to record every bit of fragment pointer ever constructed by browsers/users and re-link when needed/asked. Who would host and manage that, since the task will become comparable in challenge with the task a search engine has, growing bigger and bigger each year.

This also doesn’t sound practical: a search engine for fragment pointing?

Otherwise, impossing restrictions will never ever work as grounds for an obscure goal. They didn’t work well for the assistive part, did they? And you could see the value of it, whereas with the fragment stuff, the benefits are void for many.

Nick Dunn says:

February 14, 2012 at 8:10 am

Shaun Inman had a similar idea last year that led to the creation of CSSFrag:

http://shauninman.com/archive/2011/07/25/cssfrag

Rhys Burnie says:

February 14, 2012 at 8:28 pm

Problem with the #h2:3 idea is in future page edits more content may be added above the h2 in question which may also include a h2. The result would be any link with #h2:3 poblished or bookmarked elsewhere would no longer go to the correct target. This is the whole point of using the id attribute as an id attribute is supposed to be unique to the page, and can convey more meaning than something like #h2:3

Robert (Jamie) Munro says:

September 12, 2012 at 10:55 am

A long time ago, Gervase Markham proposed using a plain text search of the document content for this purpose, and released a Greasemonkey script as an example implimentation:
http://www.gerv.net/software/fragment-search/

I think these identifiers are easy to understand and can be less fragile than CSS, and the are also easy to generate with a simple UI. The user just select a few words of text, and they become the URL, with a count added if the same text occurs multiple times on the page.

dret says:

September 25, 2013 at 1:55 pm

when HTML5 started, the feedback from the HTML5 guys was pretty clear: HTML5 is there to improve web apps (standards-based flash! yay!), and not to improve HTML as a hypermedia format. http://dret.typepad.com/dretblog/2008/05/xhtml-fragment.html was a very early attempt to raise the issue and was shot down promptly. with HTML5 now branching into so many micro-specs (https://github.com/dret/HTML5-overview), maybe there’s a good chance to simply create a “FragIDs in HTML5” spec and see if there’s any community uptake. it would be great to see this getting started, and maybe IETF with its more open process would be a better place than W3C.