What to collect ?
Chokepoint aims to collect information in four key areas:
1. Network metrics
Network metrics provide an indication of network performance and capability. Network metrics can range from basic quantitative measurements, to more advanced qualitative statistics as collected by organizations such as Measurement Lab or RIPE NCC .
2. Legal and jurisdictional information
These are sources that supply information about the legislative structures that apply to the network nationally, supra-nationally and globally. An understanding of the legal framework is crucial to understand the significance of network metrics as well as incident reports. Apart from information on legislation, Chokepoint will collect jurisdictional information to evaluate how the law is actually interpreted and applied.
3. Incident reports / Journalistic reports
This is the most diverse and generally least structured type of data. It may incorporate reports from NGO’s, individuals, companies, as well as governmental organizations. This kind of information cannot always be quantified in a meaningful way, but is crucial to illustrate the impact of policy and/or interventions on the people that depend on the network for economic, social and business, safety, and well-being.
4. Reference data
These are sources that are used to enrich source data and/or to normalize intermediate analytic results. Examples are geoip lookups or Worldbank data. Reference data helps to facilitate (historical) analyses and to structure the presentation of those analyses.
How to collect ?
Right now, Chokepoint is tailored to processing structured, regularly published data. This means that at this stage of development, Chokepoint only publishes information based on data sources that are published periodically, in a structured (i.e. machine-readable) format.
Incidental data that is collected as a one-time effort over a limited period in time, such as the “internet census”, can provide a wealth of information, but is intrinsically limited for the purpose of monitoring developments over time. In addition, such sources frequently cannot guarantee legality, safety and reliability of the information. For this reason, the integration of these types of data has a lower priority at this time.
Work is on-going to incorporate data that is published on a regular basis, but is not (yet) machine-readable. Chokepoint is working with several organizations to set up a workflow to facilitate the incorporation of these types of data. As with incidental data, the integration of these sources has a lower priority at this time.
Issues with collecting and publishing data and information
Not all data sources are created equal. Some sources publish sparingly, others in overwhelming bulk. Some are completely unstructured, others are highly structured but require extensive enrichment and analysis to become meaningful. Partly this is a technical issue, and in that respect there are a number of best practices that can be leveraged to address some of the issues.
The big issues, as we see them, are 1) understanding of the data and information and potential repercussions for individuals who might be exposed by publishing, or even republishing, this data. Merely stating that this is simply a question of “anonimyzing” the data would be both flippant and a gross simplification. This platform can not profess to have sufficiently expert knowledge of every source data set. It is for this reason that anonymization has to take place at the source data owner/publisher before it enters into the system.
Description continues here as “Analyze”
An overview :