The Web was designed for documents, it was not designed for data

This post was featured as a guest blog on the Open Data Institute’s blog.


HTML and other related Web technologies were initially developed to allow for the publication of natural language documents on the Internet, for reading by humans.

The key phrase in the previous sentence is “for reading by humans”.

Humans are very good at reading natural language documents and interpreting meaning.  Computers on the other hand, while they are good at serving these documents, are very bad at reading them and interpreting them.  Computers require precise instructions: if you want a computer to do something for you, it is better that you have data rather than documents.

The standards for serving Web documents have been widely adopted and they power the Web that we know and love today.  By contrast, while there are proposed standards for serving Web data, these standards have not been widely adopted: less than 1% of websites use RDF.

This is a problem if you want to make computers do things with the content that is available on the Web.  You can use an API if one is available; although compared to the number of websites that there are, there are not that many APIs – Programmable Web lists only 8,500 in its API directory. Where APIs are not available, you have to resort to web-scraping.  Using either APIs or web-scraping in order to get data both require that the data-user be able to write computer code.

This matters for those of us interested in open data because data (including open data) is still too difficult to get at.  It should be possible to access Web data without having to write any computer code.  The promised value of open data will only be realized if data becomes easier to access on the Web.

In the future, I hope that Web data standards such as RDF become widely adopted.  I wish that I could wave a magic wand and make all websites adopt Web data standards.  This would mean that the content on every website would be represented as data as well as documents and we would be able to make our computers automatically operate upon that content.  But it has been twelve years since Tim Berners-Lee first articulated the vision for a Web of data and as of today the future that he described has still not arrived.

At import.io we are trying to help create the Web of data by making it possible to access Web data without having to write any computer code.  Come join us for a hackathon that we are running the weekend of March 8th with the ODI, in order to learn about open data and accessing Web data with no computer code.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s