10 Insights into the Web's Structure Problem and How the Block Protocol Offers a Solution

Since the 1990s, the web has been a vast repository of human-readable documents. But beneath the surface lies a fundamental limitation: most content lacks the structure needed for machines to truly understand it. This list explores the decades-old quest for semantic markup, from HTML's humble beginnings to the ambitious vision of the Semantic Web, and introduces the Block Protocol as a promising way forward. Each point sheds light on the challenges and breakthroughs in making the web both human-friendly and machine-readable.

1. The Web Was Designed for Human Eyes, Not Machine Brains

When the web first emerged, its primary goal was to share documents between people. HTML, the language of the web, provides basic structural hints like <h1> for headings or <em> for emphasis. However, these tags describe how content looks, not what it means. A book title in bold is still just bold text to a computer. This human-centric design means that while we can read and enjoy content, automated agents struggle to extract meaning. The result is a web rich in presentation but poor in semantic depth.

10 Insights into the Web's Structure Problem and How the Block Protocol Offers a Solution — Source: www.joelonsoftware.com

2. HTML's Limited Structure Leaves Machines in the Dark

Consider a typical mention of a book on a web page: bold text for the title, a few lines for author and publisher. To a human, it's clear. To a naive program, it's just a jumble of words. HTML alone cannot tell a computer that “Goodnight Moon” is a book, that Margaret Wise Brown is the author, or that Harper & Brothers is the publisher. This lack of explicit meaning is the core problem. Without additional context, machines cannot aggregate, compare, or reason about the data – they can only display it.

3. The Semantic Web Dream: Machines That Understand

As early as 1999, Tim Berners-Lee envisioned a “Semantic Web” where computers could analyze content, links, and transactions. In his book Weaving the Web, he described a future where intelligent agents handle trade, bureaucracy, and daily life by talking to each other. The idea was to add machine-readable metadata to web pages, turning them into a global database. But this dream required authors to embed structured data alongside their human-readable text – a task that proved harder than expected.

4. Schema.org Provides a Vocabulary, but Implementation Is Homework

To make the Semantic Web practical, schema.org emerged as a collaborative effort to define a shared vocabulary. For a book, schema.org offers properties like name, author, and isbn. Authors can then use formats such as RDF or JSON-LD to annotate their HTML. In theory, this is straightforward. In practice, it adds extra steps to publishing. After writing a post, few want to research schemas and embed complex JSON-LD blocks. It feels like homework – and many give up before they start.

5. The Homework Problem: Why Semantic Markup Stalls

The barrier isn't technical skill; it's mental energy. Once your beautiful blog post is live and human-readable, the motivation to add semantic markup evaporates. You might think, “Is any computer actually going to read this?” And without immediate reward, the extra effort seems pointless. This psychological hurdle has kept structured data adoption low, even two decades after the Semantic Web was proposed. The lack of feedback loops – no visible benefit for the author – further discourages participation.

6. The Result: Very Little Structured Data in the Wild

Despite schema.org's availability and support from search engines, the percentage of web pages with proper structured data remained tiny for years. The dream of a universally machine-readable web has not materialized. Most content on the web is still opaque to automated reasoning. This gap limits what AI assistants, voice search, and other intelligent systems can do. They can crawl and index text, but they cannot confidently derive facts like “this page is about a specific book with a known author and publisher.”

7. Why Structured Data Matters for Human Progress

Human progress increasingly depends on information being accessible to both people and machines. Structured data powers everything from personalized recommendations to medical research. When scientific papers, library catalogs, and e-commerce product listings are semantically marked up, algorithms can find patterns, make connections, and synthesize knowledge. Without this foundation, AI remains limited to surface-level pattern matching. A web of structured data would unlock deeper insights and more efficient automation.

8. A New Approach: Making Semantic Markup Effortless

What if adding structured data were as easy as writing a blog post? That's the premise behind the Block Protocol. Instead of retrofitting existing content, the protocol integrates semantic structure into the content creation process itself. Authors build pages from “blocks” – reusable components that inherently know what they represent. A book block, for instance, automatically includes fields for title, author, and ISBN. The semantic markup is baked in, requiring no extra steps or mental overhead.

9. The Block Protocol: A Promising Solution

The Block Protocol is an open standard that defines how blocks work and how they communicate. It aims to make the web programmable and interoperable. Developers can create, use, and share blocks that carry their own metadata. For users, this means they can embed rich, meaningful content without ever touching JSON-LD. For machines, it means context is embedded from the ground up. Early experiments show that this approach dramatically lowers the barrier to creating structured data, making the Semantic Web dream more attainable.

10. What's Next: Overcoming the Remaining Barriers

While the Block Protocol is promising, adoption requires ecosystem support – from content management systems, hosting platforms, and developers. The protocol must become as familiar as HTML itself. But the direction is clear: we can no longer afford a web that is only half-readable. By embedding structure at the block level, we move from a web of documents to a web of data. The next few years will determine whether this approach finally brings Tim Berners-Lee's 1999 vision into reality.

From HTML's lean beginnings to the ambitious Semantic Web and now the Block Protocol, the journey to a machine-readable web has been long. But the pieces are coming together. With tools that make structured data automatic and seamless, we may finally bridge the gap between human expression and machine comprehension. The progress on the Block Protocol is not just technical – it's a step toward a web that truly serves everyone, from curious readers to intelligent agents.

Tags: