Block reference mechanisms

Slouching toward Xanadu

Feb 22, 2022

Writing is linear, serial, and one side-effect of communicating serially is that you have to force ideas into some sort of order.

And so: stories. Authors flatten clouds of connected ideas into linear stories (lossy). Readers unbundle these linear stories, and stitch them into their own nonlinear cloud of connected ideas (lossy).

But each story is just one possible path through the hyperdimensional landscape of reality. What if I want to take a detour?

In an important sense there are no "subjects" at all; there is only all knowledge, since the cross-connections among the myriad topics of this world simply cannot be divided up neatly.
Ted Nelson, 1974, “Computer Lib/Dream Machines”

Ted Nelson calls this “The Framing problem”. In what ways might we deconstruct stories into smaller thought legos that we can piece together into new ideas?

*Ted Nelson,* 1974, *“Computer Lib/Dream Machines”*

Actually, we do this all the time. In science and other academic research it’s called “citation”. It’s just that most of our tools don’t seem to be designed with this in mind.

How to construct, deconstruct, and reconstruct thoughts into new thoughts? This was the original motivation behind hypertext, and there are a handful of software mechanisms that can help us do better than blue links. Here’s a roundup of approaches I’ve seen in the wild.

Break docs down into smaller thought legos

Perhaps the most direct way to make thought legos is to break big documents down into smaller, reusable blocks.

Most written documents are made up of something like a series of smaller “idea chunks”, usually paragraphs. What if we blew these documents apart, so that each “idea chunk” became a thought lego, a block we could address?

What might this look like? Well, if you’re saving data to files, instead of saving a document to a file, you might save every block (paragraph) into it’s own separate file. If you’re using a database, you might save one database entry per block. Each of those blocks is now a discrete unit, with an address you can reference.

What is a document in this world? A list of block addresses.

If I read it right, this is more or less how Roam’s block-based hypertext works. This approach also has a passing resemblance to the the “hypertext trails” that Vannevar Bush envisioned in As We May Think.

A tradeoff: while individual blocks are useful ingredients, they are not always useful as standalone works. Blocks lack context, and may need to be stitched together into larger narratives before they make sense. A folder full of ID’d files that contain paragraphs is likely to be overwhelming, as compared to a single document. Then again, many applications save to custom file formats or opaque databases anyway, so this may not make a difference. But, in general, as a reader, I want documents, as a writer, I want blocks.

Purple numbers place IDs in prose

Purple numbers are unique IDs that are automatically generated and placed within paragraphs, making them uniquely addressable.

Why Purple Numbers? The concept was popularized by a piece of wiki software back in 2001, called “Purple”.

Purple numbers are stable. That is, they stay the same once generated, even if the text around them changes. This gives you a “hook” to link to that won’t disappear.

Obsidian uses Purple Numbers to implement block-based addressing.

Once you hit enter, a link to that block will be generated for you, in the format similar to [[filename#^dcf64c]], where dcf64c is the block ID that was just generated for you.
Obsidian documentation

When you create a link to a paragraph in Obsidian, it generates a Purple Number, and literally sticks it at the end of the paragraph in question.

In HTML, you might implement Purple Numbers through writing IDs into elements.

Purple Numbers are a clever hack because you can work them into many existing kinds of systems. You don’t have to reinvent the document format, or cut it up into many pieces. You just stick a few ID tags in useful places. It’s like dog-earing the page of a book to find your way back.

More: eekim.com/software/purple/purple.html, burningchrome.com/~cdent/mt/archives/000388.html, burningchrome.com/~cdent/mt/archives/000387.html, tbray.org/ongoing/When/200x/2004/05/29/PurpleNumbers

Offset links reference ranges in documents

So far, we’ve looked at methods that break documents down into smaller pieces, but when you link to a block, you’re still fundamentally working within the boundary lines that were drawn by the original author. What if you want to cross the boundaries? What if the boundaries that made sense for their work make no sense for yours? What if you want to reference part of a work that spans less than a block, or spans just parts of two or more blocks?

Ted Nelson’s Project Xanadu uses a clever technique I’ll call offset linking to carve out exactly the pieces of a document that you want to reference.

At the heart of Xanadu is a file called an EDL, or Edit Decision List. An EDL is a list of links, together with a range. The range describes a start position, offset from the beginning of the document, plus a length.

span: http://hyperland.com/xuCambDemo/WelcXu-D1y,start=25,length=567
span: http://xanadu.com/xanadox/MoeJuste/sources/0-Moe.pscr.txt,start=7995,length=274
span: http://hyperland.com/xuCambDemo/WelcXu-D1y,start=592,length=37

Conceptually, offset ranges could be based on text offsets (counting characters), or byte offsets (counting bytes), or something else. In Xanadu, they’re text offsets, IIRC.

EDLs are a kind of standoff markup. Unlike HTML, the markup isn’t in the text, it’s in the EDL, which points to the text documents. The idea is that the client application follows these links, looks up the text in the offset range, carves out that bit of text, and transludes it when it renders the EDL. Neat!

An under-appreciated feature of standoff markup is that it allows you to mark up the same source content content in many different ways. You could have different standoff markup for different devices, or different remixes of the same source documents. Standoff markup also doesn’t need to be a strict tree, it can overlap in interesting ways.

One challenge with offset links is that they are very brittle. If the document you are linking to changes, even a little, the link breaks. You can fall back to linking the whole document, but it’s not ideal.

Here’s a loose thread I’d like to pursue at some point: the brittleness of offset links is due to mutability. The thing I’m linking to might mutate, change out from under me, and I would never know it. But what if we paired offset linking with an immutable hypermedia protocol?

IPFS addresses files by hashing their contents. Hashing is a deterministic way of converting data into an ID. Same data, same ID. To look up a file on IPFS, you might use an ID like this:

Qme7ss3ARVgxv6rXqVPiikMJ8u2NLgmgszg13pYrDKEoiu

Yeah, it’s ugly, but the beautiful thing about hashes is that they never change. The same hash always pulls up exactly the same content on IPFS. If the content changes, the address changes too. This makes content on IPFS immutable. A link made up of a hash and an offset would continue to point to exactly the same thing forever (or as long as that file is alive on the network).

Text fragments select excerpts by search

But there’s another way to mitigate the brittleness of offsets. The Web Platform Incubator Community Group recently introduced Text Fragment links. This is hands down my favorite web feature since the <a> tag.

The basic notion is that you can reference parts of a document by including a snippet of the text you want to reference in the URL. Here’s a text fragment link:

https://en.wikipedia.org/wiki/Vannevar_Bush#:~:text=wholly%20new%20forms%20of%20encyclopedias%20will%20appear%2C%20ready%20made%20with%20a%20mesh%20of%20associative%20trails%20running%20through%20them%2C%20ready%20to%20be%20dropped%20into%20the%20memex%20and%20there%20amplified

Click it, and it will link you to exactly the passage I wanted to share with you.

Like the Offset Links in EDLs, Text Fragment links are a kind of standoff markup. You and I can reference the same document, but point to different parts. Unlike Purple Numbers, text fragment links don’t require the cooperation of the document being pointed to.

But the real brilliance of text fragments is that they’re resilient to change. Offset links are sensitive to global change. With an offset link, a change anywhere in the document above the offset will break the link. By contrast, text fragment links are only sensitive to local change. A text fragment link will only break if the specific text fragment being referenced disappears from the document.

Even better, text fragments work with the web we already have. Here’s me creating a text fragment link in Chrome:

I think text fragments might be the most beautiful hack of all. They transform the web into a slapdash Xanadu by exapting existing infrastructure.

More: web.dev/text-fragments, wicg.github.io/scroll-to-text-fragment

Any others?

These are all the approaches I’ve seen in the wild. Have I missed any? Let me know.