) to parse the wikitext to
extract the links, the latter is already a part of pywikibot, though.
Cheers,
Morten
On 18 January 2016 at 10:45, Amir Ladsgroup <ladsgroup(a)gmail.com> wrote:
Hey,
There is a really good module implemented in pywikibot called
xmlreader.py
<https://github.com/wikimedia/pywikibot-core/blob/master/pywikibot/xmlreader.py>.
Also a help is built based on the source code
<https://doc.wikimedia.org/pywikibot/api_ref/pywikibot.html#module-pywikibot.xmlreader>
You can read the source code and write your own script. Some scripts also
support xmlreader, read the manual for them in
mediawiki.org
Best
On Mon, Jan 18, 2016 at 10:00 PM Luigi Assom <itsawesome.yes(a)gmail.com>
wrote:
hello hello!
about the use of pywikibot:
is it possible to use to parse the xml dump?
I am interested in extracting links from pages (internal, external,
with distinction from ones belonging to category).
I also would like to handle transitive redirect.
I would like to process the dump, without accessing wiki, either access
wiki with proper limits in butch.
Is there maybe something in the package already taking care of this ?
I 've seen in
https://www.mediawiki.org/wiki/Manual:Pywikibot/Scripts
there is a "ghost" extracting_links.py" script,
I wonted to ask before re-inventing the wheel, and if pywikibot is
suitable tool for the purpose.
Thank you,
L.
_______________________________________________
pywikibot mailing list
pywikibot(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/pywikibot
_______________________________________________
pywikibot mailing list
pywikibot(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/pywikibot
_______________________________________________
pywikibot mailing list
pywikibot(a)lists.wikimedia.org