Ed Parsons recaps the SOTM '09, citing Muki Haklay's updated comparison of OSM (OpenStreetMap) vs UK OS (Ordnance Survey) Meridian 2 data. The original article from August 2008 was widely cited as a proof of feasibility for OSM and misleadingly shortened as "the quality of the [OSM] data is comparable and can be fit for many applications".
In terms of spatial accuracy Muki Haklay has made a specialism of accessing OSM data quality and his latest results presented at SOTM suggest that using the UK as an example, OSM data is better than the equivalent business geographics product produced by the OS, and in some cases comparable to OS MasterMap ITN data, a product that costs over £100,000 per year to license.
To make a long story short: OSM is getting better compared to 2008 but still OSM is far from professionally reliable.
(Beware, this is a long read.)
'You shouldn't bash the good guys'
Less exalted comments to fancy Web 2.0 phenomenons like collaboration, grass-roots actions and creative commons carries the harsh risk of being labeled as conterrevolutionary egghead. Or worse. So let's state that once and clearly: the achievements and dynamics of the Open Street Map Project are breathtaking, insanely great and probably the best single example of the Internets' usefulness. Mapping the world is a vision worth spreading.
United Maps - though being a commercially oriented venture - supports OSM projects and more is to be announced soon.
Nevertheless, there's quite some attributions to the brave new world of mash-ups and webtwosomethings that should't be trusted blindly. And: OSM ≠ Cloudmade.
cities ≠ coverage
I re-read Haklays excellent article with open eyes. I did so several times.
The conclusions the original paper draws are stunningly different from what is touted in PR releases. End of April, I called up Dr. Haklay and cleared some (of my) misconceptions.
In a nutshell, the result of the analysis is that OSM data is suitable for cartographic products that display central areas of cities. Not less, not much more.
By coverage, the centers of major UK cities are well mapped and covered, though, as you move from the centres to the edges, the quality of coverage deteriorates rapidly. In areas where OSM information is complete and with fully attributed – an area of about 20% of England – it is estimated that OSM quality is such that it can be a replacement for "Meridian 2".
The subsequent comparison with OSM also helps to clarify why a price tag of about £1300 for a licence for the Ordnance Survey's "Meridian 2" is reasonable in terms of quality and in explains why professional grade data has a professional price tag.
OSM ≠ Wikipedia
Unlike Wikipedia, where the majority of content is created at disparate locations and by "desk research", the OSM community also organises a series of local workshops (called ‘mapping parties’), which aim to create and annotate content for localised geographical areas. Interestingly, even OSM's frequent mapping parties don't bring much new data on a larger scale.
Again unlike the Wikipedia, mapping out there in the wild is not funny if it's windy, cold and getting dark. As it is most of time in central Europe. Wikipedia definitely has an advantage here.
incompleteness ≠ feature
As an end user, you probably will accept "imperfected, unfished work in progress" but whitepace is nothing you'd tolerate on a professional map with a distinct commercial use case.
As Steve Coast is cited to state ‘it’s important to let go of the concept of completeness’ in an interview (GIS professional, Issue 18, October 2007, pp. 20-23). Put this into a nasty perspective: "sure ... if one sacrifices completeness, every bug turns into a feature to be improved later on ..."
Or as Haklay puts it "As OSM relies on the decisions of contributors about the areas that they would like to collect, it is interesting to evaluate the level in which deprivation influences data collection." - or in other, my words: you cannot tell volunteer OSM folks where to map which features. They either do it because they feel an intrinsic motivation or it isn't done, resulting in eternal "imperfected, unfished work in progress".
You cannot direct the crowd
If volunteering contributers decide to leave white spaces open, then OSM coverage likely never will be completed as there is a natural bias in data gathering for nicer, more mainstream places. To take it to the negative: Is OSM then "shunning socially marginal places" - read as: only "cool and sunny places are mapped"?
(...) most of the data capture (80%) was carried out by 90 participants and a very large group of users disengaged from the project after minimal contribution.
Deductively this means that a tiny fraction of the 135.208 registered contributing users (as of July 14th 09) are actively mapping and contribute useful data.
The point is that the crowd doesn't work like professionals would, see p5 at this slideshare presentation "Beyond good enough? Spatial Data Quality and OpenStreetMap data":
- "We know little about the people that collect (the data), their skills, knowledge or patterns of data collection.
- Loose coordination and no top-down quality assurance processes - can't produce good data
- It is not complete and comprehensive - there are white areas."
Nevertheless it's stunning that ...
"OSM is better than Meridian 2 in terms of positional accuracy and less accurate than MasterMap."
Haklay comes up with an intersting hypothesis that couldn't be better formulated:
"When people buy geodata, they pay for the errors or the notion that the errors are well known and quantified."
I state this because this exactly this the argument we at United Maps have with both first and second tier suppliers and customers: providing data as good as technically and commercially feasible provides unseen commercial value if the confidence level is clearly labeled. But if confidence or error level is unknown (as it is at OSM), you're busted.
Is the active crowd made up of a handful of (paid?) Cloudmade activists?
So are we effecively talking about 99% of users doing nothing and 0.5% users doing extremely much. The contribution schemes also show a spatial distribution and, as expected, central areas where mapping parties regulary are organized show a higher number of active users.
Quote Haklay:
"(...) the areas that were covered by very few users (up to 3) are 89.5% of the total area. (...) the fact that most of the area was covered by a single contributor means that very little quality assurance was carried out, if at all."
So the Pareto principle applies to ‘Volunteered Geographical Information’ (VGI) in a slightly modified flavor as it states "roughly 80% of the effects come from 20% of the causes".
For OSM and clearly dissected by Muki Haklay, the biggest share of contributions comes from a tiny fraction that obviously doesn't collaborate too much.
It will be most interesting to see what results a comparison of OSM data quality to Navteq or Tele Atlas datasets will reveal. Haklay's scrutiny is a welcome sober voice in a somehow hyperbolical discussion on user generated maps.


You're only comparing on a quantitative level, and I guess you're talking about street networks. OSM is indeed a long way from complete on this score, and as you say, it's the nice middle-class places that get mapped first; in the UK, for example, the New Towns and the post-industrial Pennines are much more sparsely covered.
It's instructive to look at how a typical city is mapped. The mappers start with the glamorous bits: their own street, the city centre, the main roads. Their surveys then range a little wider: they might do their entire suburb. At some point, though, it reaches (to use the hackneyed phrase) the tipping point in the mapper's mind. The mapper thinks "I want to get this finished" - and they do.
The really interesting bit is that it scales. If you look at coverage in the West Midlands (England) last year, there is a definite tipping point at which a group of mappers realised they could get their chosen area finished - although that area (the motorway box of Britain's second city) was pretty colossal. Once that decision is made, progress becomes very rapid.
So the unglamorous places will, and do, get mapped. But it hasn't happened universally yet.
On a qualitative assessment, OSM can be better than any other mainstream data source - whether OS or TA/NT. The really obvious example is cycling data, as showcased at www.opencyclemap.org: you simply won't find another provider with that richness, even the Ordnance Survey. It's no coincidence that Google Maps is, despite great clamour, still unable to provide a "directions by bike" feature.
Our magazine doesn't yet use OSM data for waterway mapping: the current licence makes it impractical. (The proposed new one is better.) But, again, OSM's data is better than TA/NT in this area, and roughly on a par with OS.
The Wikipedia comparison is always enjoyable. Yes, OSM's weakness is that people have to go out and survey. It's also OSM's strength. You don't get half of the aggro (edit wars, user hierarchies, locked articles) that you get with Wikipedia, because OSM enforces contributions from a position of knowledge, and because "neutral point of view" goes with the territory - you map what's on the ground. Requiring user survey also keeps the copyright on the straight and narrow. Wikipedia is full of infringements: OSM remarkably little.
Finally: "Is the active crowd made up of a handful of (paid?) Cloudmade activists?" No. There are, as far as I'm aware, no CloudMade employees in Germany, yet look at the astonishing progress there. In Britain, the CM employees are mostly working on London, yet even there they're part of a much broader community.
CM's role in the States is interesting, I think; the existence of TIGER data makes it a very different proposition. But in Europe, CM's contribution to the data is not especially significant.
It's what happens when a company actively recruits within an existing community.
Yawn.
I am a bit surprised by your interpretation of my analysis. It's always interesting to see how other people read/understand what you say.
For me, all this analysis of OSM fundamentally changed my view of many of the 'common sense' views of geographical information (GI).
Basically, the message is that in many use cases there is no need for completeness, nor in reality there ever been such thing in any database (see Peter Batty talk at SOTM). What my research is starting to show is that In the crowdsourcing way you can have good quality, reasonable and useful coverage and effective basis for GIS analysis without paying a lot for the data and with a distributed and variable quality data collection practices. The point in my presentation (and in Aamer dissertation, which you can download from my blog) that the potential for accurate enough and fit for purpose datasets that are produced in a very different paradigm to what the GI sector is used to is now emerging. It will force everyone to rethink what they mean by spatial data quality, and is it all just about ISO standards?
A comparable example is that of geodemographics and their application in businesses. Their users know that they don't reflect their client base accurately, but if you get an increase in balk mail response from 1% to 5% it is still 5 times more responses and return on investment. This is how big chunks of the real world work - not trying to have perfect accuracy and quality unless you need it. The GI world is based on many over engineered specifications, and the crowdsourced data is starting to offer very different way of thinking about GI...
However, it is your right to interpret my results in different light. I find it very valuable to see that my results can be read in a different way. I also accept that m interpretation of the results might be wrong. Thanks for looking at the paper and commenting!
Cheers
Muki
"But if confidence or error level is unknown (as it is at OSM), you're busted."
The whole point of Muki's work was to quantify the levels of quality available from OSM data. Though you're talking about his report you appear to be completely disregarding it in that statement.
I did also like this title:
"Is the active crowd made up of a handful of (paid?) Cloudmade activists?"
That deliberately provocative title was followed by a paragraph with no mention of CloudMade. Considering 130,000+ members of the OSM project, it would require CloudMade to be paying over 650 people for your statement to have any relevance, I'm pretty sure they're not paying that many.
http://maker.geocommons.com/maps/1793
I think this illsutrates that "OSM then "shunning socially marginal places" - read as: only "cool and sunny places are mapped"?" is pretty far off target.
We find OSM is hugely popular with the humanitarian relief community because they have coverage in socially marginal places that commercial providers have no monetary incentive to map.
I think Mikel's Humanitarian Open Street Map Team (HOT) is a testament to this:
http://brainoff.com/weblog/2009/05/07/1399
Sure there are shortcoming to crowdsourcing data and folks are working hard to evolve those, but a lot of your conclusions seem either hasty or uninformed.
The definition of known error has to be completely rethought with crowdsourcing. The data is perpetually improving/changing. It is not a one and done process where you can say the entire data set has a margin of error + or - 5 meters. This may make it unsuitable for some purposes but it is an innovation in the industry where the metadata needs to catch up. Not the other way around. Tom Tom is doing the same as well as Google and that will feedback into TeleAtlas at least on the Tom Tom side. Everyone is going to have to rethink how we measure accuracy and document metadata.