you have several options
1-use regex e.g.:
import re, codecs
site=wikipedia.getSite()
f=codecs.open("file.txt","r","utf-8")
R=re.compile("\{\{(.+?)\}\}") #or other types of regex
for name in R.findall(f.read()):
page=wikipedia.Page(site,name)
#do whatever you like with the page
2- use readlines:
import codecs
site=wikipedia.getSite()
f=codecs.open("file.txt","r","utf-8")
for line in f.readlines():
line=line.replace("\n","").replace("\r","")
name=line.split(":")[0] #or any kind that you like to get the title
page=wikipedia.Page(site,name)
#do whatever you like with the page
for not loading the whole file, I don't think it's possible or simply you
can read it, save it to so other variables or files and close it (e.g.
f.close())
Best
On Sun, Dec 1, 2013 at 1:26 PM, Mathieu Stumpf <
psychoslave(a)culture-libre.org> wrote:
Hello,
I want to add esperanto words to fr.wiktionary using as input a file
where each line have the format "word:the fine definition". So I copied
the basic.py, and started hacking it to achieve my goal.
Now, it's seems like the -file argument expect a file where each line is
formated as "[[Article name]]". Of course I can just create a second
input file, and read both in parallel, so I feed the genFactory with the
further, and use the second to build the wiktionary entry. But maybe you
could give me a hint on how can I write a generator that can feed a
pagegenerators.GeneratorFactory() without creating a "miror file" and
without loading the whole file in the main memory.
Kind regards,
Mathieu
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
--
Amir