[Wikisource-l] Goals for Wikisource

Alex Brollo alex.brollo at gmail.com
Mon Jul 26 10:15:29 UTC 2010


2010/7/26 John Vandenberg <jayvdb at gmail.com>

> I think Wiktionary does want to include examples of the words 'in
> use', and Wikisource can provide this.
>
> Linking to Wikisource is encouraged on English Wiktionary. e.g.
>
> http://en.wiktionary.org/wiki/demirep
>
> If you create a list of words used in a book, it would be beneficial
> to also record how many times each word is used.


Thanks John, yes, it's pretty simple to do such type of statistics. The
trick is really simple, and - in my opinion - anyone could implement it with
a python script much better than my one. It consists simply of a routine
that converts a string into a python list where "words characters " and
"other text characters" are separated, giving simply the "word character"
list as a parameter (or, what's the same, the list of "not word characters"
. I.e,
"This could be a piece of raw wikitext splitted by [[python]] routine"
is converted into list
["This"," ","could be"," ","a;" ","piece"," ","of"," ","raw","
","wikitext"," ","splitted"," ", "by"," [[","python","]] ", "routine"]
where "words and "not-words" regularly alternate and a simple "".join()
method of the list gives back *exactly* the source string.

Simply selecting "words" from such a list, you can do anything you like with
them.

Alex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.wikimedia.org/pipermail/wikisource-l/attachments/20100726/bd4c3ab2/attachment.htm 


More information about the Wikisource-l mailing list