Re: [Wikitech-l] [Pywikipediabot] Using the content of a file as input for articles

1 Dec 2013

you have several options
1-use regex  e.g.:
import re, codecs
site=wikipedia.getSite()
f=codecs.open("file.txt","r","utf-8")
R=re.compile("\{\{(.+?)\}\}") #or other types of regex
for name in R.findall(f.read()):
    page=wikipedia.Page(site,name)
    #do whatever you like with the page

2- use readlines:
import codecs
site=wikipedia.getSite()
f=codecs.open("file.txt","r","utf-8")
for line in f.readlines():
    line=line.replace("\n","").replace("\r","")
    name=line.split(":")[0] #or any kind that you like to get the title
    page=wikipedia.Page(site,name)
    #do whatever you like with the page

for not loading the whole file, I don't think it's possible or simply you
can read it, save it to so other variables or files and close it (e.g.
f.close())

Best

On Sun, Dec 1, 2013 at 1:26 PM, Mathieu Stumpf <
psychoslave(a)culture-libre.org&gt; wrote:

...
  Hello,

 I want to add esperanto words to fr.wiktionary using as input a file
 where each line have the format "word:the fine definition". So I copied
 the basic.py, and started hacking it to achieve my goal.

 Now, it's seems like the -file argument expect a file where each line is
 formated as "[[Article name]]". Of course I can just create a second
 input file, and read both in parallel, so I feed the genFactory with the
 further, and use the second to build the wiktionary entry. But maybe you
 could give me a hint on how can I write a generator that can feed a
 pagegenerators.GeneratorFactory() without creating a "miror file" and
 without loading the whole file in the main memory.

 Kind regards,
 Mathieu

 _______________________________________________
 Wikitech-l mailing list
 Wikitech-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l 

-- 
Amir

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] [Pywikipediabot] Using the content of a file as input for articles