Re: [Wikitech-l] [Pywikipediabot] Using the content of a file as input for articles

1 Dec 2013


      you have several options
1-use regex  e.g.:
import re, codecs
site=wikipedia.getSite()
f=codecs.open("file.txt","r","utf-8")
R=re.compile("{{(.+?)}}") #or other types of regex
for name in R.findall(f.read()):
    page=wikipedia.Page(site,name)
    #do whatever you like with the page
2- use readlines:
import codecs
site=wikipedia.getSite()
f=codecs.open("file.txt","r","utf-8")
for line in f.readlines():
    line=line.replace("\n","").replace("\r","")
    name=line.split(":")[0] #or any kind that you like to get the title
    page=wikipedia.Page(site,name)
    #do whatever you like with the page
for not loading the whole file, I don't think it's possible or simply you
can read it, save it to so other variables or files and close it (e.g.
f.close())
Best
On Sun, Dec 1, 2013 at 1:26 PM, Mathieu Stumpf <
psychoslave@culture-libre.org> wrote:
...
Hello,
I want to add esperanto words to fr.wiktionary using as input a file
where each line have the format "word:the fine definition". So I copied
the basic.py, and started hacking it to achieve my goal.
Now, it's seems like the -file argument expect a file where each line is
formated as "[[Article name]]". Of course I can just create a second
input file, and read both in parallel, so I feed the genFactory with the
further, and use the second to build the wiktionary entry. But maybe you
could give me a hint on how can I write a generator that can feed a
pagegenerators.GeneratorFactory() without creating a "miror file" and
without loading the whole file in the main memory.
Kind regards,
Mathieu

Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- 
Amir

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] [Pywikipediabot] Using the content of a file as input for articles