Hey Andy
On Feb 9, 2015, at 4:24 PM, Andy Mabbett andy@pigsonthewing.org.uk wrote:
On 9 February 2015 at 22:59, Aaron Halfaker ahalfaker@wikimedia.org wrote:
Our spot checking suggests that 98% of these DOIs resolve. The remaining 2% were extracted correctly, but they appear to be typos.
All on en.Wikipedia?
correct, we haven’t looked at other projects for this release
Do DoIs not incude check digits?
they don’t, validation can be done via the CrossRef API or the DOI resolver. This method is not 100% reliable, especially when DOIs include special characters. CrossRef advised to use a 200 HTTP response code from the resolver with a noredirect flag (e.g. http://dx.doi.org/%7Bdoi%7D?noredirect=true) as an indication that the DOI is valid and resolves.
We should test for tehse in citation templates. Does your data show which templates (if any) the broken DoIs were in?
we haven’t checked if these errors occur systematically within specific templates, but we know that the code extracted them correctly with no parsing errors. We’ll share the list of broken DOIs so they can be reviewed and fixed.
Dario