Methodology

This page explains the methodology behind the 2016/2017 Global Open Data Index. If you have any further questions or comments about our methodology, please reach to us through the Open Data Index forum.

The Global Open Data Index (GODI) is an independent assessment of open government data publication from a civic perspective. GODI enables different open data stakeholders to track government’s progress on open data release. GODI also allows governments to get direct feedback from data users. The Index gives both parties a baseline for discussion and analysis of the open data ecosystem in their country and internationally. We encourage all interested parties to participate in an open dialogue to allow for ownership of the results and to make the Index as relevant as possible.

Research scope

Like any other benchmarking tool, GODI tries to answer a question. In our case, the question is as follows: How do governments around the world publish open data?

From this question, other important questions emerge, such as:

Which governments readily publish open data? Which governments still need to improve open data publication?
What is the most open dataset? What is the least open dataset?
Which aspects of open data are easiest or hardest to implement?

In this year’s edition, we also experimented and measured aspects of “practical openness” like data findability. These are also acknowledged by the International Open Data Charter Principles. The information we gained from this assessment is displayed in the results and is available to download. It will also inform internal research which can be tracked on GitHub

What GODI does NOT cover?

GODI intentionally limits its inquiry to the publication of national government data. It does not look at other aspects of the common open data assessment framework such as context, use or impact. This narrow focus enables it to provide a standardised, robust, comparable assessment of open data around the world. While we are only looking at publication, we are yet to cover data quality which is a significant barrier to reuse. We hope we will be able to do this in the future.

Research assumptions

This section presents the key assumptions that were taken into consideration while collecting and assessing the data.

Assumption 1: Open data is defined by the Open Definition.

We define open data according to the ‘Open Definition’. The Open Definition is a set of principles that define openness of data and content. It is also simple and easy to operationalise. We note one small deviation from the current v2.1 of the Open Definition. The only part of our methodology that is not aligned with the Open Definition is our assessment of ‘open, machine-readable’ formats. We give a full score to machine-readable formats even if their source code is not open. Instead, formats must be usable with at least one free and open source software. Thereby the Index gives preference to practical openness over the actual openness of a format.

Assumption 2: The role of government in publishing data.

In the past, there have been questions about the role government should play to ensure the publication of open data. Government services may be privatised, which means the data can be owned and produced by a company and not the state. We assume that for the key data categories we survey, the government has a responsibility to ensure their publication, even if it is held and managed by a third-party.

Assumption 3: The Global Open Data Index is a ‘national’ indicator.

We acknowledge that not all countries have the same political structure. It is possible that not all of the sub-national governments produce the same data as they are potentially subject to different laws and procedures. GODI, therefore, does not only assess data publication of national government but data publication at the national level. “National” publication of open data can take three forms:

The data describes national government processes or procedures (government entities operating on the highest administrative level).
The data is collected or produced by national government or a national government agency (on highest administrative level).
The data describes national parameters and public services for the entire national territory but is collected by sub-national actors. For example, we check if budgets are available for the national government of a federal state, or if air quality data exists for all country regions. Only in cases where we see legal and administrative autonomy from a higher government, GODI will look into sub-national territories individually (see assumption 4).

Assumption 4: GODI assesses ‘places’ instead of ‘countries’.

GODI seeks to be a meaningful and actionable indicator for government. Therefore, GODI 2016 ranks ‘Places’ and not ‘Countries’. For years GODI struggled to assess countries with devolved power. In some cases, such as Northern Ireland, sub-national governments mainly operate autonomously from the higher national government and are granted administrative and legislative autonomy. To be a relevant indicator, we experimented how to better assess data on a sub-national level in a comparable way. As a test case, the Index assesses Northern Ireland separately from Great Britain this year. By separating Northern Ireland, we seek to address those government bodies that are actually responsible for publishing open data, and open up the debate how to understand open data on a subnational level. A short explanation why we regard Northern Ireland separately can be found here. We would love to hear your feedback in the forum. Furthermore, the British Crown Dependencies (eg. Isle of Man, Jersey, Guernsey) are regarded individually because they are not part of the UK government and operate largely autonomously. In other cases, we receive submissions for places that are not officially recognised as independent countries (such as Kosovo).

What data does the Index look at?

GODI measures the openness of clearly defined data categories. Any open data that does not fall within these categories is not regarded for our assessment. All Index scores exclusively refer to our data categories and should be understood as a proxy for the availability of open government data at large. This has three reasons. Firstly, GODI assesses open government data that has proven to be useful for the public. User stories helped us to define categories that are most useful for the public. Secondly, GODI is a comparative indicator. In the past, we have used broader categories and compared very different datasets, at the expense of comparability. Thirdly, a standardised procedure supports our researchers to reduce bias and personal judgement.

Each data category contains the following information:

A minimum of 3 characteristics: The data characteristics describe the mandatory content of a dataset. Usually, all data characteristics are required to qualify for assessment. Usually if a dataset is missing one of the characteristics, it will be considered that the dataset is not published. For two categories - water quality and draft legislation we have lowered the bar by making some characteristics optional. This is because we are trying to understand better what data is out there and to improve definitions for these datasets in the future.
Aggregation level: Some data is available in different levels of aggregation. For example, water quality data can exist for each individual water source, or it can be presented as total annual pollution for regions or the country. In most cases GODI assesses detailed, disaggregated data. Comprehensive data increases the use cases and broadens the insights people can draw from it. The International Open Data Charter also emphasises that the data should be published in its raw, original format as disaggregated data. Being clear about the aggregation level helps to guide our researchers looking for the correct dataset.
Time intervals: Different datasets are updated in different time intervals. Our survey includes the question “This data should be updated every [TIME INTERVAL]. Is it up-to-date?” to assess whether data is up-to-date. Data that is not up-to-date often is less useful.

Governments often publish data on multiple websites, and in many files and formats. To make an informed and consistent decision about which data to pick, reviewers followed two approaches:

Choosing one reference dataset: Reviewers find one reference dataset or file that contains all relevant characteristics. They answer the survey using this dataset. This can be a CSV file, a shapefile, or data presented on a website. If reviewers have to choose between two or more similar datasets, they should choose the one that scores highest and document their choice in a comment.
Referencing multiple datasets (if one reference file is not available): Reviewers could not find a reference dataset because the data is split across many files, formats and places. In this case, they refer the survey to different files. It is important that the sum of these files contains all required data characteristics. Example: if one dataset displays vote on bills and are in a machine-readable format, but another one contains bill texts and is not machine-readable, then the data is not considered to be machine-readable.

The list of data categories

Our data categories reflect key data that is relevant for civil society at large. The categories have been developed in partnership with domain experts, including organisations championing open data in their respective fields. In some cases, we base our definition on international data production and reporting standards used by governments around the world. Each year we refine our definitions to reflect learnings from these experts. Table in CSV form here:

Category	What we look at?	Why we look at it?	Characteristics
Budget	National government budget at a high level. This is planned government expenditure for the upcoming year, and not the actual expenditure.To develop this category the Index drew on work from Open Spending.	Open budget data allows for well-informed publics. It showing what money is spent on, how public funds develop over time, and why certain activities are funded. See here a list of cases how budget data has been used in the past.	Following data must be online to qualify for assessment: Budget for each national government department, ministry, or agency Descriptions for budget sections Level of granularityBudget separated into sub-department, political program, or expenditure type
Spending	Records of actual (past) national government spending at a detailed transactional level. Data must display ongoing expenditure, including transactions. A database of contracts awarded or similar will not be considered sufficient. Also, a database only showing subsidies will not be sufficient. To develop this category the Index drew on work from Open Spending.	Open spending data shows whether public money is efficiently and effectively used. It helps to understand spending patterns and to display corruption, misuse, and waste.	Following data must be online to qualify for assessment: Government office which had the transaction Date of transaction Name of vendor Nominal amount of individual transactionLevel of granularity Individual record of each transaction
Procurement	All tenders and awards of the national/federal government aggregated by an office. It does not look into procurement planning or other procurement phases such as implementation (i.e. actual money transfers, which are part of our spending category). To develop this category the Index drew on work from the Open Contracting Partnership.	Open procurement data may enable fairer competition among companies, allow to detect fraud, as well as deliver better services for governments and citizens. Monitoring tenders helps new groups to participate in tenders and to increase government compliance.	Following data must be online to qualify for assessment: Tender phase Tenders per government office Tender name Tender description Tender status Award phase Awards per government office Award title Award description Value of the award Supplier's name
Election results	This data category looks at results for the latest national electoral contest. Election data informs about voting outcomes and voting process. What are electoral majorities and minorities? How many votes are registered, invalid, or spoilt? The Index consulted the National Democratic Institute (NDI) to develop this data category, but did not take their latest recommendation which will be considered for the next edition. For more information, see the NDI’s Open Elections Data Initiative.	To enable the highest level of transparency, the Index assesses polling station-level data. Polling stations are the locations at which voters cast their vote. Having this data allows for independent scrutiny of each stage of the voting and counting process. It also helps electoral stakeholders better target their voter education and mobilization efforts for the next elections.	Following data must be online to qualify for assessment: Results for major national electoral contests (such as general elections) Number of registered votes Number of invalid votes Number of spoiled votes (not required, if a digital voting system is assessed, that does not recognize spoiled votes) Level of granularity Data available at polling station level
Company register	Lists of registered (limited liability) companies. The submissions in this data category do not need to include detailed financial data such as balance sheets.This category draws on the work of OpenCorporates.	Open data from company registers may be used to many ends: enabling customers and businesses to see with whom they deal, or to see where a company has registered offices.	Following data must be online to qualify for assessment: Name of company Company address Unique identifier of the company Register available for entire country (usually assessed through sample: it is answered with „Yes“ if a register indicates companies in different regions)
Land ownership	Maps of lands with parcel layer that displays boundaries. Also a land registry with information on registered parcels of land.The assessment criteria were developed in collaboration with Cadasta Foundation. For more information on land ownership datasets, see Cadasta Foundation's Data Overview.	The Index focuses on assessing open land tenure data (describing the rules and processes of land property). Responsible use may enable tenure security and increase the transparency of land transactions.	The following characteristics must be included in cadastral and registry information submitted. Parcel boundaries Parcel ID Property Value (price paid for transaction or tax value) Tenure Type (public, private, customary, etc.)
National maps	A geographical map of the country including national traffic routes, stretches of water, and markings of heights. The map must at least be provided at a scale of 1:250,000 (1 cm = 2.5km), a scale feasible for most countries. The Index developed this category based on a landmark report of the United Nations Committee of Experts on Global Geospatial Information Management (UNGGIM).	Geographic information is instrumental for many use cases, including journey planning, the mapping of topography, as well as demographic indicators.	Following data must be online to qualify for assessment: Markings of national traffic routes Markings of relief/heights Markings of water stretches National borders Coordinates - Note: To qualify, data must contain geographic projections that enable to interpret coordinates
Administrative Boundaries	Data on administrative units or areas defined for the purpose of administration by a (local) government.The development of this category draws on work of FAO Global Administrative Unit Layers (GAUL)project, as well as the UNGIWG.	Open data about administrative zones has many use cases: Who are the candidates in my region? Which government bodies administer my region? How is wealth distributed across regions? The Index assesses two administrative boundary levels (e.g. federal states = level 1, and municipalities = level 2).	Following data must be online to qualify for assessment: Boundary level 1 Boundary level 2 (not required, if country has only one level) Coordinates of administrative zone (latitude, longitude) Name of polygon Borders of polygon - Note: To qualify, data must contain geographic projections that enable to interpret coordinates
Locations	A database of postcodes/zipcodes and the corresponding spatial locations regarding latitude and longitude (or similar coordinates in an openly published coordinate system). The data has to be available for the entire country. The Index drew on work of the Universal Postal Union to develop this category.	Open location data shows the addresses of public and private buildings. While mainly used to route postal services, this data has many use cases: to calculate the number of persons in a city district, to provide homes with services, or for direct mailing and marketing.	Following data must be online to qualify for assessment: Zipcodes Addresses (required, if zip code does not include the address) Coordinates (latitude, longitude) Data available for entire country - Note: To qualify, data must contain geographic projections that enable to interpret coordinates
National statistics	Key national statistics on demographic and economic indicators such as Gross Domestic Product (GDP), or unemployment and population statistics. These statistics can be published as aggregates for the entire country.	As Open Data Watch states "Official statistics provide an indispensable element in the information system of a democratic society, serving the Government, the economy and the public with data about the economic, demographic, social and environmental situation."	Following data must be online to qualify for assessment: Country Population (Required: census data, updated every year, Optional: vital statistics of birth and death) Gross Domestic Product (measured in current or constant prices, updated quarterly, last update must not be more than 3 months ago) National unemployment (absolute numbers, or expressed as percentage of entire population, updated quarterly, last update must not be more than 3 months ago)
Draft legislation	Data about the bills discussed within national parliament as well as votes on bills (not to be confused with passed national law). Data on bills must be available for the current legislation period. This data category draws on work by the National Democratic Institute (NDI) and the Declaration of Parliamentary Openness.	Open data on the law-making process is crucial for parliamentary transparency: What does a bill text say and how does it change over time? Who introduces a bill? Who votes for and against it? Where is a bill discussed next so that the public can participate in debates?	Following data is required. It must be online for the data to qualify for assessment: Content of bill Author of bill Status of bill Available for current legislation period Following data is assessed optionally (only if available): Votes on bill per member of parliament Transcripts of debates on bill Note on optional data: This category is newly added in 2016. Not all data needs to be available online to qualify. The Index team used minimum requirements to explores how much data is currently available online. In future editions, the category may require more data elements.
National law	This data category requires all national laws and statutes to be available online, although it is not a requirement that information on legislative behaviour e.g. voting records is available.This data category draws on work by the National Democratic Institute (NDI) and the Declaration of Parliamentary Openness.	Access to open data on a country's legal code (i.e. national law) supports compliance with law, enables to keep track of legal changes, and also enables public deliberation around a law.	Following data must be online to qualify for assessment: Content of the law / status Date of last amendment Amendments to the law (if applicable)
Air quality	Data about the daily mean concentration of air pollutants, especially those potentially harmful to human health. Data should be available for all air monitoring stations or zones in a country, including at least 3 major cities. The Index evaluates the openness of key pollutants as defined by the World Health Organisation (WHO).	Air quality is a key factor for human health and environment.	Following data must be online to qualify for assessment: Particulate matter (PM) Sulphur oxides (SOx) Nitrogen oxides (NOx) Carbon monoxide (CO) Ozone (O3) Available per air monitoring station (at least for 3 major cities Following data is assessed optionally (if available):Volatile organic compounds (VOCs)
Water quality	Water quality data by water source. The data category regards the quality of designated drinking water sources. If data on designated drinking water sources is not available, it refers to environmental water sources (lakes, rivers, groundwater). Data per each water source is desirable. But for this year the Index also accepted if a country only published country-wide aggregated reports. As the review shows, we either find local and granular data or aggregated national reports.	This information is essential for both the delivery of services and the prevention of diseases.	In order to satisfy the minimum requirements for this category, data should be available on level of the following chemicals: Fecal coliform Arsenic Fluoride levels Nitrates Total Dissolved Solids Data per water source Available for the entire country

Survey questions and scoring

Each dataset in each place is evaluated using a set of questions that examine the openness of the datasets based on the Open Definition and the Open Data Charter. In 2016, we introduced a new survey. The new scoring follows three ideas:

Each survey question measures a crucial aspect of either the legal, technical, or practical ‘openness’ of data. With this approach, we aim to reduce the potential bias towards single aspects of openness.
Our scoring follows a rationale in which we describe why a question is important for open data. We also explain cases why we should not score a question. Further explanations can be found in the table below or here.
The new scoring gives in total a maximum of 40 points to open licenses/public domain status and machine-readable and open file formats. These technical and legal aspects of openness are the core of the Open Definition version 2.1, and we seek to maintain a strong emphasis on them. However, aspects such timely publication, data availability and accessibility are equally important to access and use open data. Questions around data accessibility receive a score of in total a maximum of 60 points.

Question	Description	Score	Rationale
Is the data collected by government (or a third-party related or linked to government)?	Answer “Yes” if the chosen data is collected by the government, or a third party is officially representing the government. This is the case for state-owned-enterprises or contractors delivering public services for government. Answer “No” if one of the following cases apply: i) The data is collected by organisations that do not represent government; ii) The data is collected but not for the relevant government level; iii) The data is not collected at all	Not scored	Data collection by itself is not a characteristic of ‘open’ data. Our knowledge of edge cases or exceptions from the rule (such as legal arrangements of data publication in cases of public-private partnerships) is too limited to develop valid statements about a reasonable scoring.
Is the data available online without the need to register or request access to the data?	Answer “Yes”, if the data is made available by the government on a public website. Answer “No” if the data are NOT available online or are available online only after registering, requesting the data from a civil servant via email, completing a contact form or another similar administrative process.	15 points	Online availability is a requirement for openness: everyone has to have online access to specific data. Furthermore, it is a condition for all following questions and mandatory registration can deter people from using data (focus on user perspective). We put emphasis on the additional requirement that data must also be available without mandatory registration
Is the data available online at all?	Tell us if the data is available online at all (after registering, after getting authentication.	Not scored	We currently do not aim to reward mandatory registration. Administrative processes may entail terms of use that contradict open data: such as agreeing to terms of use. A zero score is a indicates to governments that their way of online publication is not ideal for all user groups.
Is the data available free of charge?	The data is free if you don’t have to pay for it.	15 Points	Data has to be for free in order to be accessible to everyone. We cannot expect users to pay for datasets in order to evaluate them for us. Some data (especially when provided in machine-readable file formats) have to be paid for.
Where did you find the data?	Indicate a URL and a description of the URL. Example: If you find data on a financial department website, please fill in: “Website of National Department of Finances”. Sometimes you can find data in a lot of places in the web. To limit your search, tell us the first 5 URLs you can easily find for each source type. Make sure the URLs are from an official government source.	Not scored	This is a subjective assessment. The results may be affected by a submitter's topical expertise or familiarity with government websites.
How much do you agree with the following statement: “It was easy for me to find the data.”	Submitters answer with a Likert scale.	Not scored	This is a subjective assessment. The results may be affected by a submitter's topical expertise or familiarity with government websites. We experiment with the results to develop a better findabillity assessment.
Is the data downloadable at once?	Answer “Yes”, if you can download all data at once from the URL at which you found them. In case that downloadable data files are very large, their downloads may also be organised by month or year or broken down into sub-files. Answer “No” if you have to do many manual steps to download the data, or if you can only retrieve very few parts of a large dataset at a time (for instance through a search interface).	15 Points	We score if a dataset can be downloaded at once. This question therefore rewards the technical possibility to retrieve all data from the internet without having to download dozens of small pieces of information, getting access to data through a search interface only, sending requests, having captchas or other limits to download. Important note: data may be split into smaller sub-sets. This applies for instance for long time series, or large geospatial data. It is important that these sub-sets are logically linked, and that it is possible to retrieve data automatically from one or several URLs.
Data should be updated every [Time Interval]: Is the data up-to-date?	Please base your answer on the date at which you answer this question. Answer “No” if you cannot determine a date, or if the data are outdated.	15 Points	Some of the data we assess are most valuable right after their releases such as short-term weather forecasts, election results or budget data. Timely provision of these data is crucial. - Some data is not as time sensitive as others. Our scoring wants to strike a balance between both cases and therefore amounts to 15 points, in order not to avoid an over-emphasis of this category.
Is the data openly licensed/in public domain?	This question measures if anyone is legally allowed to use, modify and redistribute data for any purpose. Only then data is considered truly "open" (see Open Definition). Answer ”Yes” if the data are openly licensed. The Open Definition provides a list of conformant licenses. Also, consult the terms of use which often indicate whether data can be freely reused Answer “Yes” if there is no open license, but a statement that the dataset is in “public domain”. To count as public domain, the dataset must not be protected by copyright, patents or similar restrictions. If you are not sure whether an open licence or public domain notice is compliant with the Open Definition 2.1, seek feedback on the Open Data Index discussion forum. Answer “No” whenever it is not fully evident that the license or terms of use are compliant with the Open Definition.	20 Points	Legal usability of data is a core requirement of the open definition. It is a prerequisite for unrestricted usability for everyone. Our old scoring was fairly high, emphasizing the legal usability of data. The current scoring is lowered to give us some space to stress other aspects of openness. This question will not lose its significance for openness (still scored higher than in the Open Data Barometer)
Is the data in open and machine-readable file formats?	We automatically compare them against a list of file formats that are considered machine-readable and open. A file format is called machine-readable if your computer can process, access, and modify single elements in a data file.The Index considers formats to be “open” if they can be fully processed with at least one free and open-source software tool. Potentially these formats allow more people to use the data because people do not need to buy specific software to open it. The source code of these format does not have to be open.	20 points	Both features (machine-readable and open format) are key aspects of the open definition. Machine-readability is a major enhancement of technical usability. However, if a file is only usable with proprietary software (such as ArcGIS) ‘normal’ users are exempt from using them. Open formats put no copyright, monetary restrictions or other restrictions on their use (important for people who cannot / do not want to afford proprietary software).
How much human effort is required to use the data. (1 = little to no effort is required, 3 = extensive effort is required)	The submitters tell us their use case and the steps they took to make the data usable (example: “I have to reformat the data”).	Not scored	The question is a subjective assessment. Furthermore, usability depends on context and the purposes for which a person wants to use the data.

How to read the final results

As explained in the sections above, the Index looks at specific data using specific survey questions. The result is a final score that has to be read carefully. Firstly, it exclusively refers to data with mandatory characteristics. If no dataset can be found online matching these characteristics, the data will not be considered to be available (equalling a score of 0%). More explanations to this approach can be found in the section “What data does the Index look at?”. Furthermore, the survey questions check different aspects of data access and usability (see table below). This means that behind fairly high scores we often do not find open data, but access-controlled data, or public data in poorly structured, or not machine-readable formats. The score, therefore, does not show a linear increase of openness. Instead, it highlights areas where the government may improve open data publication. An example: We may assess budget data in PDF form which may be in public domain, available online for free, but in a format making it practically unusable. This data is presented as 80% open. The score suggests a fairly high degree of openness, but in fact, the data is not open. Only 100% means that the data is open. The reason for this is that we do not add many filters, such as exclusively considering data that is machine-readable - even though it might give a more realistic image of open data. With this approach, the Index seeks to demonstrate which data is already available and how it can be further improved. It is, therefore, important to carefully read how the data is published.

Depending on what survey items are checked, we find:

Type	Description	Maximum Score
Open data	Open data can be freely used, modified, and shared by anyone for any purpose. Main criteria: An open license Machine-readability Open formats Access (data must be provided as a whole. It should be downloadable online without charge).	100%
Public data	Data is public if it can be seen by the public online without any restrictions (e.g. access controls). This data is not protected by any means of control (see below). Data must be readily available online. It does not matter whether data can be downloaded.Examples: Data can be openly licensed and downloadable as PDF, but not in a machine-readable format.Sometimes it is possible to download texts and other information in machine-readable formats (e.g. XML). While available as open access this information is not openly licensed and hence not 100 open data.	Up to 80%
Access-controlled data	Data is access-controlled if a provider regulates who can access data when and how. Access control includes:Registration/identification/authentication, An active request (often with a note what the data will be used for). A data sharing agreement (stipulating use cases) Ordering/purchasing dataThe reasons for controlled access are manifold, including traffic management, or to maintain control over how data is used. It is debated whether some registration/authentication reduce the openness of data (especially when registration is automated).	Up to 85% (data can be open, with the limitation that users have to register online for download)
Data gaps	A data gap means that governments do not produce any data on a phenomenon. If the Index states that no data is provided we often see data gaps. They show that some governments still have a long way to go before they become ready to produce data.	Maximum score: 0%

Our data collection and analysis

The data collection of the Index is done in three phases:

Submission phase using a snowball sampling approach. It includes an interim phase to filter out countries that do not have submissions for all data.
Review phase
Quality assurance of review results

Submission phase

The Index crowdsources its data. To do so, it uses a non-probability sampling technique — also known as a “snowball sample”. A snowball sample attempts to locate the subject of studies in areas that are hard to locate. In our case, we work with contributors who are interested in open government data and who can assess the availability and quality of open datasets in their respective locations. We do so not only by using referrals, but also by reaching out on social media, through regular communications our Open Government Data and Open Data Index forums, and by actively networking at conferences and events. This year, like in 2015, we also hired local coordinators, that outreached to their networks and assist in soliciting new submissions. This means that anyone from any place can participate and contribute to the Global Open Data Index as a contributor and make submissions, which are then reviewed. We do not have a quota on the number of places that can participate. Rather, we aim to sample as many places around the world as we can. This year, we considered only places that had submissions to all 15 categories. Places that had partial submissions were omitted. Data findability also has an impact on the quality of the data we collect. Contributors have diverse knowledge and backgrounds in open data and sometimes need help finding the data we are looking for. The following section explains how we tried to deal with this problem.

Review process

To provide reliable and valid results, each submission must be reviewed again. Our reviewers are domain experts. A list of all reviewers can be found on the About page. In the past, the review was country-based. We engaged local reviewers to verify all submissions for a country. It allowed us to overcome language problems and evaluate submissions in the context of a country. This approach, however, led to inconsistencies. Across countries, submitters evaluated datasets with sometimes very different content. This went so far that submitters evaluated the openness of data that was so highly aggregated that it was not usable. Since 2015 we, therefore, use a thematic review. Each reviewer gets assigned one data category and checks the submissions across all places. A thematic review has further advantages: (1) Reviewers develop a consistent approach how to assess data categories. (2) They develop a sense where a similar piece of data can usually be found. (3) They can collect information in which formats and quality specific data are provided. This information is used by us to refine our data categories and guidance for future editions. To do so, we document our findings in review diaries.

Review Diaries

The reviewers document in diaries all problems they encountered during the review, as well as proposals to improve the Index. What was hard to assess? How can data categories be improved? Review diaries are especially useful to understand how reviewers dealt with edge cases. In what cases did they have to use their personal judgement? Thereby we want to ensure the highest degree of transparency possible, so others understand the steps that were taken to verify a submission. Also as advocates for open science, we wish to enable others to learn from our efforts and to improve their own research. Further information is shared on our insights page. A list of review diaries can be found here.

Quality assurance of review

This year we did a quality assurance of the review results. Once the review results were gathered, Open Knowledge International staff members analysed all data sets scoring 100% to verify whether they were correctly assessed and to spot false negatives

We only focussed on the top scoring data for the following reasons: 100% scoring data has an important signalling function to the government suggesting that data is fully open.
Eliminating mistakes in 100% scoring data presents a realistic picture to the government. It is hard to justify why a dataset, once falsely deemed open, shall not be 100% open in following years.

The quality assurance was accomplished in the following steps:

Checking the forum for comments from our community.
Compare with results from last year. Are the same source URLs used? Is something different this year? If so, why?
Go to the source URL and double-check all survey questions.
Look at the reviewer comments: Do reviewers say that the assessed data does not meet all characteristics? Does the submission maybe even have to be rejected?
Check if an open license clearly refers to the reference dataset. Check it especially in cases where an open license was found on another website, than on the one were data is hosted. Also, check if the license terms comply with the Open Definition.
We looked into 2015 GODI results to see what changed. A frequent case: Do our reviewers refer to a different website or data portal?

The quality assurance phase showed us that some reviews contained some errors, which were discussed and corrected with the reviewers. We documented the findings of the quality assurance, as well as our learnings from it.

Public dialogue phase

Once our results are published, we invite civil society and government to provide us with feedback about what they find useful (or not) and to tell us how they think we could strengthen the assessment. This dialogue phase will be open for one month, after which we will publish the revised data by June. In the past, we got approached by governments and civil society alike to discuss our results. Reformers and open data decision-makers reference our data and publicly highlight their advancement in the Index - and civil society provides constructive feedback about their country context so we can improve the assessment. This feedback is very useful for our team. But for the Index to be most effective, these points should be discussed in an open dialogue, so civil society and government can talk to one another, learn from one another, and take ownership to improve open data publication. Convening data providers and users is a unique and important quality of the Index. Research by Open Knowledge International and others (here and here) suggests that indicators must be relevant for users, actionable, resonate with the users’ priorities, and credible and robust. We are aware that striking the right balance is challenging: we need to ground the Index in the realities and priorities of governments so they can improve their scores, while at the same time highlighting data demands of civil society. Through open dialogue, we want to know whether the Index is useful for both parties, want to see how open data supply and demand can match, and stimulate more constructive uptake to improve open data publication at country level.