[Wikipedia-l] New contribution - would this be useful?

Neil Harris usenet at tonal.clara.co.uk
Thu Feb 14 01:10:09 UTC 2002


I have just uploaded a file, 1037c-output.txt

It contains the text from Federal Standard 1037C, after the following
operations have been performed:

* stripping of HTML markup (which has left some nasties like 1012 for
10<sup>12</sup> )
* tracing of sources based on tags in text
* stripping of items that contain stuff from non-Federal Govt. sources
like NATO or CCITT...
* stripping of bracketed abbreviations from titles
* generated an introductory phrase where possible
* added paragraph breaks in generally sensible places
* added a heading like this {{ title }}
* general wikification (term in bold, etc.)
* classified articles by adding tags like
      # REDIRECT [[whatever]]
      #ONELINER for one-line articles < 120 chars
      #NONTRIVIAL for articles > 200 chars

Approx breakdown (this might be a few versions old)

1241 REDIRECTs
  411 ONELINERs
  799 mid-length items
2783 NONTRIVIALs

It is now in a form where not too much more programming is needed to
shove some or all of it into the Wikipedia.

Would this be useful?

Neil







More information about the Wikipedia-l mailing list