I have just uploaded a file, 1037c-output.txt
It contains the text from Federal Standard 1037C, after the following operations have been performed:
* stripping of HTML markup (which has left some nasties like 1012 for 10<sup>12</sup> ) * tracing of sources based on tags in text * stripping of items that contain stuff from non-Federal Govt. sources like NATO or CCITT... * stripping of bracketed abbreviations from titles * generated an introductory phrase where possible * added paragraph breaks in generally sensible places * added a heading like this {{ title }} * general wikification (term in bold, etc.) * classified articles by adding tags like # REDIRECT [[whatever]] #ONELINER for one-line articles < 120 chars #NONTRIVIAL for articles > 200 chars
Approx breakdown (this might be a few versions old)
1241 REDIRECTs 411 ONELINERs 799 mid-length items 2783 NONTRIVIALs
It is now in a form where not too much more programming is needed to shove some or all of it into the Wikipedia.
Would this be useful?
Neil