How Open is Open Data?
Even as I was reflecting upon the question as to whether data, when put into the hands of people, would actually lead to greater accountability, I got the opportunity to participate in a workshop on open data. One of my professional assignments is to advise a Swiss governmental agency that works on supporting decentralised local governance in the Central, South and South East Asian region.
The LOGIN network comprises of institutions that work on decentralisation, from twelve countries. The workshop on open data, organised by LOGIN in the Philippines brought together several organisations, both from the government and the non-government sectors that worked towards, or extensively used data on government services and activities.
One of the most useful sessions in the workshop was a session conducted by Michael Canares. Miko, as he prefers being called, works with the Open Data Labs, which is a part of the World Wide Web Foundation started by Tim Berners-Lee, the inventor of the World Wide Web. Miko works in Indonesia and has extensive experience in the manners in which governments make data available to people. In his session, titled ‘Making Sense of Open Data’, Miko explained the classification adopted by Tim Berners-Lee to determine whether any data put out in the public domain was really open (Tim has also explained this crisply on the net at the site – here).
Tim describes five stages of openness of government data, as follows:
Stage 1 is where data is put out in any format, but under an open licence that enables it to be copied and reproduced. For example a PDF document fits the bill quite nicely. However, a PDF document does not allow sufficient leeway for readers to extract data, compare with other documents and undertake numerical analysis.
Stage 2 is where data is made available in a structured form that can be manipulated, sliced and diced. For example a table in Excel format would meet the requirements of Stage 2, whereas a jpeg image of the table converted into a PDF file would not. Being a battle scarred veteran of exhaustive searches on the net for government data, I would be very satisfied with that. But is that open enough?
Tim describes Stage 3, where the data can not only be manipulated, but is available on a non-proprietary software. Thus, while a licenced Windows Office suite is necessary to read an uploaded excel file, any open source spreadsheet programme could read a table that is qualified to have reached Stage 3.
Stage 4 is when the data is linkable through URIs, which is expanded as a ‘Uniform Resource Identifier’. While I do not know anything about the technical details, in IT science, URIs enable a greater degree of extraction and analysis than a document containing data that has reached Stage 3.
Stage 5 is the final stage, where documents with URIs are capable of being linked so that different datasets can be used together. Again, I do not know enough of the technical details of how that’s done, but if the government is able to reach that stage, then it can be considered to have achieved the pinnacle of openness in its open data.
How do we fare in India? For that, I checked out the website of my favourite dart board, the website of the Bruhat Bangalore Mahanagara Palika (BBMP), the city government of Bangalore. A quick check reveals that at best, we can conclude that Stage 1 has been reached. Much of the data is in the form of PDF documents, and worse still, PDFs of scanned pictures. Thus the Mayor’s annual budget speech which has reams of material on the projects sanctioned, are in the form of jpegs; data cannot be extracted except by that time worn method, namely, to print out the whole thing and retyping it. The same goes for most circulars and notifications. But why single out the BBMP? The website of the Department of Panchayati Raj is similar. One finds loads of circulars and instructions (the department is fond of directing a lot of things from above - a strange affliction for a department that is supposed to promote democratic decentralisation) but one cannot search through using key words, for instance, because PDFs of images don’t lend themselves to that.
We have a long way to go. What will be required is a whole bunch of nitpickers, who go through government websites systematically, reviewing data put out in accordance with the following parameters, to determine whether it is really open. These parameters are:
- Data must be freely accessible; meaning it should not be locked or password protected. It must be reachable easily. Complex or disorganised website architecture detracts a great deal from the intentions of governments to make data open. Ideally, three clicks should get you to where you want to be.
- Data should be understandable. Faded documents that require experts in hieroglyphics to make sense of them, even after conversion to PDF, do not constitute open data.
- Data must be downloadable and should be retrievable without filing a request. It must be extractable and machine readable. It must be shareable, in the sense that there must not be copyright restrictions on sharing.
- Finally, data should be put out by the government proactively, not at the whims of the government, when it deems that it is necessary that data should be shared.
What we don’t really want is for the government to analyse the data. That’s optional, on their part. We, on the outside, will do that.
I hope my old friends in the government read my blogs.