2010/7/26 John Vandenberg
<jayvdb@gmail.com>
I think Wiktionary does want to include examples of the words 'in
use', and Wikisource can provide this.
Linking to Wikisource is encouraged on English Wiktionary. e.g.
http://en.wiktionary.org/wiki/demirep
If you create a list of words used in a book, it would be beneficial
to also record how many times each word is used.
Thanks John, yes, it's pretty simple to do such type of statistics. The trick is really simple, and - in my opinion - anyone could implement it with a python script much better than my one. It consists simply of a routine that converts a string into a python list where "words characters " and "other text characters" are separated, giving simply the "word character" list as a parameter (or, what's the same, the list of "not word characters" . I.e,
"This could be a piece of raw wikitext splitted by [[python]] routine"
is converted into list
["This"," ","could be"," ","a;" ","piece"," ","of"," ","raw"," ","wikitext"," ","splitted"," ", "by"," [[","python","]] ", "routine"]
where "words and "not-words" regularly alternate and a simple "".join() method of the list gives back exactly the source string.
Simply selecting "words" from such a list, you can do anything you like with them.
Alex