June 7 2024
How does AI shape the future of the browser?
One view is that we’ll have a browser “copilot” that fills out forms, summarizes page content, and remembers stuff we’ve seen before. Presumably, it would also be general purpose like ChatGPT or Claude. In all likelihood, the assistant would just be ChatGPT or Claude.
The browser is probably going to be a battleground for assistant products. There’s no reason to go to chat.openai.com if there’s an equivalently useful assistant immediately present in your browser window. For this reason, I suspect OpenAI/Anthropic ends up building or buying up their own browser product (perhaps the one from NY). As I've written about before, this is Google's battle to lose.
In any case, I find the vision of a built-in browser assistant pretty underwhelming. It’s fine and good, and I’ll use the heck out of it. But I can’t shake the feeling that we have the potential to build something significantly better than this. Something that goes beyond chatbots, semantic search, and browser RPA.
I think we should reimagine the web browser as a tool for generating and transforming web content.
In the last few months, I've seen a few projects that hint at this future.
introducing delve: a ChatGPT interface for going down rabbit holes
— Max Krieger (@maxkriegers) May 26, 2024
👉 delve . a9 . io pic.twitter.com/yVC5T1cyCH
Each of these shows the promise of generated web content. In the last paradigm of the web, the fundamental challenge was finding content that users wanted. In the new paradigm, users don’t need to search for content, they manifest it. Sometimes that generated content is extracted purely from the model; other times, the model remixes and transforms what already exists into superior form.
What I find most fascinating is the use of generative web content to satisfy complex reasoning tasks. These workflows follow a consistent grammar. Drawing on a particular body of content (a set of documents, search results, or web history) and a task description, an assistant generates rich web content.
Some examples of the sorts of interactions I have in mind:
A key attribute of these workflows is that they’re iterative. The user can perform a single operation (“what are all the backpacks I’ve looked at”) and then iterate to what they want (“remove all backpacks over $200” and “generate a catalog”). This could happen all in a single request or in sequence.
The more we bake AI-superpowers into the browser, the more likely it becomes that the browser eats up workflows that have nothing to do with the web. A browser that is good at viewing, transforming, and generating various forms of content (hypertext among others) would be a general-purpose application. We’d use it for traditional web content, but we’d also use it for reading ebooks and PDFs or consuming various forms of media living on our local filesystem, etc. The generative browser is really just an AI workspace.
I wonder if incumbent browsers will be able to pursue this vision. One argument against incumbents is their attachment to existing UX patterns. It’s conceivable that there’s a sort of UX counterpositioning; the incumbents can’t properly embrace these new capabilities without making radical UX changes that would alienate their existing audience. The radical transformation required to turn the browser into an AI workspace may be the kind of bold product bet that incumbents can’t afford to make.