Ed Parsons recaps the SOTM '09, citing Muki Haklay's updated comparison of OSM (OpenStreetMap) vs UK OS (Ordnance Survey) Meridian 2 data. The original article from August 2008 was widely cited as a proof of feasibility for OSM and misleadingly shortened as "the quality of the [OSM] data is comparable and can be fit for many applications".
In terms of spatial accuracy Muki Haklay has made a specialism of accessing OSM data quality and his latest results presented at SOTM suggest that using the UK as an example, OSM data is better than the equivalent business geographics product produced by the OS, and in some cases comparable to OS MasterMap ITN data, a product that costs over £100,000 per year to license.
To make a long story short: OSM is getting better compared to 2008 but still OSM is far from professionally reliable.
(Beware, this is a long read.)
'You shouldn't bash the good guys'
Less exalted comments to fancy Web 2.0 phenomenons like collaboration, grass-roots actions and creative commons carries the harsh risk of being labeled as conterrevolutionary egghead. Or worse. So let's state that once and clearly: the achievements and dynamics of the Open Street Map Project are breathtaking, insanely great and probably the best single example of the Internets' usefulness. Mapping the world is a vision worth spreading.
United Maps - though being a commercially oriented venture - supports OSM projects and more is to be announced soon.
Nevertheless, there's quite some attributions to the brave new world of mash-ups and webtwosomethings that should't be trusted blindly. And: OSM ≠ Cloudmade.
cities ≠ coverage
I re-read Haklays excellent article with open eyes. I did so several times.
The conclusions the original paper draws are stunningly different from what is touted in PR releases. End of April, I called up Dr. Haklay and cleared some (of my) misconceptions.
In a nutshell, the result of the analysis is that OSM data is suitable for cartographic products that display central areas of cities. Not less, not much more.
By coverage, the centers of major UK cities are well mapped and covered, though, as you move from the centres to the edges, the quality of coverage deteriorates rapidly. In areas where OSM information is complete and with fully attributed – an area of about 20% of England – it is estimated that OSM quality is such that it can be a replacement for "Meridian 2".
The subsequent comparison with OSM also helps to clarify why a price tag of about £1300 for a licence for the Ordnance Survey's "Meridian 2" is reasonable in terms of quality and in explains why professional grade data has a professional price tag.
OSM ≠ Wikipedia
Unlike Wikipedia, where the majority of content is created at disparate locations and by "desk research", the OSM community also organises a series of local workshops (called ‘mapping parties’), which aim to create and annotate content for localised geographical areas. Interestingly, even OSM's frequent mapping parties don't bring much new data on a larger scale.
Again unlike the Wikipedia, mapping out there in the wild is not funny if it's windy, cold and getting dark. As it is most of time in central Europe. Wikipedia definitely has an advantage here.
incompleteness ≠ feature
As an end user, you probably will accept "imperfected, unfished work in progress" but whitepace is nothing you'd tolerate on a professional map with a distinct commercial use case.
As Steve Coast is cited to state ‘it’s important to let go of the concept of completeness’ in an interview (GIS professional, Issue 18, October 2007, pp. 20-23). Put this into a nasty perspective: "sure ... if one sacrifices completeness, every bug turns into a feature to be improved later on ..."
Or as Haklay puts it "As OSM relies on the decisions of contributors about the areas that they would like to collect, it is interesting to evaluate the level in which deprivation influences data collection." - or in other, my words: you cannot tell volunteer OSM folks where to map which features. They either do it because they feel an intrinsic motivation or it isn't done, resulting in eternal "imperfected, unfished work in progress".
You cannot direct the crowd
If volunteering contributers decide to leave white spaces open, then OSM coverage likely never will be completed as there is a natural bias in data gathering for nicer, more mainstream places. To take it to the negative: Is OSM then "shunning socially marginal places" - read as: only "cool and sunny places are mapped"?
(...) most of the data capture (80%) was carried out by 90 participants and a very large group of users disengaged from the project after minimal contribution.
Deductively this means that a tiny fraction of the 135.208 registered contributing users (as of July 14th 09) are actively mapping and contribute useful data.
The point is that the crowd doesn't work like professionals would, see p5 at this slideshare presentation "Beyond good enough? Spatial Data Quality and OpenStreetMap data":
- "We know little about the people that collect (the data), their skills, knowledge or patterns of data collection.
- Loose coordination and no top-down quality assurance processes - can't produce good data
- It is not complete and comprehensive - there are white areas."
Nevertheless it's stunning that ...
"OSM is better than Meridian 2 in terms of positional accuracy and less accurate than MasterMap."
Haklay comes up with an intersting hypothesis that couldn't be better formulated:
"When people buy geodata, they pay for the errors or the notion that the errors are well known and quantified."
I state this because this exactly this the argument we at United Maps have with both first and second tier suppliers and customers: providing data as good as technically and commercially feasible provides unseen commercial value if the confidence level is clearly labeled. But if confidence or error level is unknown (as it is at OSM), you're busted.
Is the active crowd made up of a handful of (paid?) Cloudmade activists?
So are we effecively talking about 99% of users doing nothing and 0.5% users doing extremely much. The contribution schemes also show a spatial distribution and, as expected, central areas where mapping parties regulary are organized show a higher number of active users.
Quote Haklay:
"(...) the areas that were covered by very few users (up to 3) are 89.5% of the total area. (...) the fact that most of the area was covered by a single contributor means that very little quality assurance was carried out, if at all."
So the Pareto principle applies to ‘Volunteered Geographical Information’ (VGI) in a slightly modified flavor as it states "roughly 80% of the effects come from 20% of the causes".
For OSM and clearly dissected by Muki Haklay, the biggest share of contributions comes from a tiny fraction that obviously doesn't collaborate too much.
It will be most interesting to see what results a comparison of OSM data quality to Navteq or Tele Atlas datasets will reveal. Haklay's scrutiny is a welcome sober voice in a somehow hyperbolical discussion on user generated maps.