LLMs and information post-scarcity

A handful of guesses

Jan 09, 2023

This podcast was generated end-to-end with AI. Give it a listen. It’s shockingly good. The script to generate it is less than 200 lines of glue code.

kache (yacine) @yacineMTB

👀 I wrote a script that - pulled @_akhaliq's last 7 days of tweets - fished out the arxiv links - downloaded raw paper .tex - parsed out intros & conclusions - automated a podcast dialogue about the papers w/ web automation & GPT - generated a podcast scribepod.substack.com/p/scribepod-1

scribepod.substack.comScribepod 1Listen now (100 min) | 1.5 hours of dialogue about ML papers

First the internet reduced the marginal cost of distributing content to zero. Now AI is reducing the marginal cost of producing content to zero.

Content has become like clay. LLMs can remix it, summarize it, elaborate on it, hallucinate it, combine it with other content, freely transform it between text, audio, image, and back again. It seems we have achieved a kind of information post-scarcity. A regime of radical overproduction. A content singularity. How will this change things?

I’m still attempting to orient to this new condition, to get a sense of the new landscape. Let’s explore it together and make a few guesses…

New abundance creates new scarcities

Scarcity is not an absolute, but a relative bottleneck generated by a difference in rates. The rates of soil nitrogen to grass, grass to rabbits, rabbits to foxes, for example. So, what becomes scarce?

Attention becomes scarce

…but this is not a new thing. It’s the basic condition of the internet. When the marginal cost to distribute information is zero, you get a lot of information.

What information consumes is rather obvious: it consumes the attention of its recipients. Hence a wealth of information creates a poverty of attention, and a need to allocate that attention efficiently among the overabundance of information sources that might consume it. (Herbert A. Simon)

Way back in 1995, Hal Varian was quoting Herbert Simon, too, and making the same points:

Technology for producing and distributing information is useless without some way to locate, filter, organize and summarize it. A new profession of “information managers” will have to combine the skills of computer scientists, librarians, publishers and database experts to help us discover and manage information. These human agents will work with software agents that specialize in manipulating information-offspring of indexing programs such as Archie, Veronica and various World Wide Web crawlers that aid Internet navigators today.
(Hal Varian 1995, “How much will two bits be worth in the digital marketplace”)

Hal Varian, by the way, is the person who designed the Google ad auction. He was only partially right about the importance of human “information managers” (we call them influencers). It turns out that aggregators like Google do the bulk of the heavy lifting. Computers are able to scale in ways that people can’t.

So, the condition of superabundance creates a need for aggregation. LLMs will amplify that abundance. It’s a good bet that this strengthens the strategic importance of aggregation.

Trust becomes scarce

Bots can now beat the Turing test. They can trivially plagiarize your style, your tone, soon your voice. Spam, misinformation, identity theft, spearfishing are getting a massive upgrade.

So what do we do about this world we are living in where content can be created by machines and ascribed to us?
I think we will need to sign everything to signify its validity. When I say sign, I am thinking cryptographically signed, like you sign a transaction in your web3 wallet.
(Fred Wilson, AVC, 2022, “Sign Everything”)

I think this may break the web’s security model. The web has a fundamentally feudal structure. It conceptualizes security as a castle wall around the server. The server controls the keys.

This isn’t going to cut it. We’re going to have to reimagine security at an individual level, around user-owned keys. Fred’s right. We have to sign everything. We need to start thinking about security the way crypto thinks about security. Not your keys, not your data.

This is a net good, in my opinion. The web’s feudal security model is a huge barrier to user-ownership. On the web, you’re a serf. The castle can pull up the drawbridge any time. By contrast, user-owned keys enable user-owned data. Self-sovereign keys give apps one less chokepoint by which to lock you in.

The wallet paradigm seems like the way forward, and Passkeys are probably how this gets implemented.

Data lock-in is not a moat

Building a business model on data lock-in seems not only bad, but pointless now? Content is superabundant, so what are you locking in, exactly? There’s always more where that came from.

Are social graphs even a moat? Perhaps more than data, but TikTok proves you don’t need a dense social graph to serve up interesting content. What if you can bootstrap a content ecosystem with AI NPCs?

So again, back to aggregators, but aggregators without lock-in? Perhaps this might result in faster emergence of aggregators and also faster collapse. A hotter innovation loop. New aggregators could rapidly emerge and compete, without having to contend with incumbent network effect.

LLMs are a moat, but for how long?

An LLM vendor like OpenAI isn’t an aggregator, as far as I can tell.

An aggregator leverages a monopoly on demand to commodify supply.
Whereas a traditional industrial monopoly leverages a monopoly on supply to extract $ from demand.

LLM vendors seem more like a traditional industrial monopoly. Like an industrial monopoly, the moat here is capital cost—the cost of gathering the data, and the cost of training the model.

But industrial monopolies rely on the means of production remaining scarce. Is this a safe assumption in software? Is it a durable moat?

François Chollet @fchollet

Crucially, any sufficiently successful scenario has its own returns-defeating mechanism built-in: commoditization. *If* LLMs are capable of generating outsized economic returns, the tech will get commoditized. It will become a feature in a bunch of products, built with OSS.

I suspect there will be a lot of pressure to commoditize LLMs. Why? LLMs are the complement to many consumer products, and smart companies try to commoditize their compliments:

Something is still going on which very few people in the open source world really understand: a lot of very large public companies, with responsibilities to maximize shareholder value, are investing a lot of money in supporting open source software, usually by paying large teams of programmers to work on it. And that’s what the principle of complements explains.
Once again: demand for a product increases when the price of its complements decreases. In general, a company’s strategic interest is going to be to get the price of their complements as low as possible. The lowest theoretically sustainable price would be the “commodity price” — the price that arises when you have a bunch of competitors offering indistinguishable goods. So:
Smart companies try to commoditize their products’ complements.
(Joel Spolsky, 2002. Strategy Letter V)

Open source models may also be at a competitive advantage because they enable permissionless innovation. The state of the art in AI was all teddy bears and cowboy astronauts—inoffensive images generated by product managers—until Stable Diffusion was released. One good-enough open source model, and we saw a proliferation of product concepts and demos.

Having to ask permission traps ecosystems in local maxima. You can only do what the gatekeeper is able to imagine or value. Not a great way to find product-market fit. Permissionless models have more variety, in the cybernetic sense.

Noosphere fixes this?

It feels like Subconscious and Noosphere are pretty well aligned with this future. I mean, we’re building a tool for thought around AI agents (“Geists”) who participate in a shared knowledge graph with users who sign everything and are verified by their public key. Feels like we’re slouching toward the general direction this tech is headed.

Next week I want to dig into this a bit. What do tools for thought look like in the age of LLMs?