<br><br>
<div class="gmail_quote">2010/7/26 John Vandenberg <span dir="ltr"><<a href="mailto:jayvdb@gmail.com">jayvdb@gmail.com</a>></span><br>
<blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">I think Wiktionary does want to include examples of the words 'in<br>use', and Wikisource can provide this.<br>
<br>Linking to Wikisource is encouraged on English Wiktionary. e.g.<br><br><a href="http://en.wiktionary.org/wiki/demirep" target="_blank">http://en.wiktionary.org/wiki/demirep</a><br><br>If you create a list of words used in a book, it would be beneficial<br>
to also record how many times each word is used.</blockquote>
<div> </div>
<div>Thanks John, yes, it's pretty simple to do such type of statistics. The trick is really simple, and - in my opinion - anyone could implement it with a python script much better than my one. It consists simply of a routine that converts a string into a python list where "words characters " and "other text characters" are separated, giving simply the "word character" list as a parameter (or, what's the same, the list of "not word characters" . I.e, </div>
<div>"This could be a piece of raw wikitext splitted by [[python]] routine" </div>
<div>is converted into list </div>
<div>["This"," ","could be"," ","a;" ","piece"," ","of"," ","raw"," ","wikitext"," ","splitted"," ", "by"," [[","python","]] ", "routine"] </div>
<div>where "words and "not-words" regularly alternate and a simple "".join() method of the list gives back <em>exactly</em> the source string. </div>
<div> </div>
<div>Simply selecting "words" from such a list, you can do anything you like with them.</div>
<div> </div>
<div>Alex</div></div>