[Wikipedia-l] New contribution - would this be useful?
Neil Harris
usenet at tonal.clara.co.uk
Thu Feb 14 01:10:09 UTC 2002
I have just uploaded a file, 1037c-output.txt
It contains the text from Federal Standard 1037C, after the following
operations have been performed:
* stripping of HTML markup (which has left some nasties like 1012 for
10<sup>12</sup> )
* tracing of sources based on tags in text
* stripping of items that contain stuff from non-Federal Govt. sources
like NATO or CCITT...
* stripping of bracketed abbreviations from titles
* generated an introductory phrase where possible
* added paragraph breaks in generally sensible places
* added a heading like this {{ title }}
* general wikification (term in bold, etc.)
* classified articles by adding tags like
# REDIRECT [[whatever]]
#ONELINER for one-line articles < 120 chars
#NONTRIVIAL for articles > 200 chars
Approx breakdown (this might be a few versions old)
1241 REDIRECTs
411 ONELINERs
799 mid-length items
2783 NONTRIVIALs
It is now in a form where not too much more programming is needed to
shove some or all of it into the Wikipedia.
Would this be useful?
Neil
More information about the Wikipedia-l
mailing list