Re: [pywikibot] about parsing the dump

19 Jan 2016

Here's an example using regular expressions and `mwxml` (a new offshoot of
mediawiki-utilities referenced above)
https://tools.wmflabs.org/paws/public/EpochFail/examples/mwxml.py.ipynb

The example extracts image links from English Wikipedia, but I imagine it
would work for you with little modification.

-Aaron

On Mon, Jan 18, 2016 at 6:23 PM, Luigi Assom &lt;itsawesome.yes(a)gmail.com&gt;
wrote:

...
  hi, thank you.

 Where can I find documentation for an example to extract links
 https://github.com/earwig/mwparserfromhell
 or

 https://github.com/wikimedia/pywikibot-core/blob/master/pywikibot/xmlreader…
 ?

 I'd be very grateful if you can point me to an example for links
 extraction and redirect.
 Shall I use them against the xml dump or as bot to api.wikimedia?
 I would like to use offline, but mwparserfromhell seems to use online
 against api.wikipedia..

 where are documentation of scripts in mediawiki.org?

https://www.mediawiki.org/w/index.php?search=xmlparser&title=Special%3A…

 thank you!

 On Mon, Jan 18, 2016 at 8:05 PM, Morten Wang &lt;nettrom(a)gmail.com&gt; wrote:

  An alternative is Aaron Halfaker's
mediawiki-utilities (
 https://pypi.python.org/pypi/mediawiki-utilities) and mwparserfromhell (
 https://github.com/earwig/mwparserfromhell) to parse the wikitext to
 extract the links, the latter is already a part of pywikibot, though.

 Cheers,
 Morten

 On 18 January 2016 at 10:45, Amir Ladsgroup &lt;ladsgroup(a)gmail.com&gt; wrote:

  Hey,
 There is a really good module implemented in pywikibot called
 xmlreader.py
 <https://github.com/wikimedia/pywikibot-core/blob/master/pywikibot/xmlreader.py>.
 Also a help is built based on the source code

<https://doc.wikimedia.org/pywikibot/api_ref/pywikibot.html#module-pywikibot.xmlreader>
 You can read the source code and write your own script. Some scripts also
 support xmlreader, read the manual for them in mediawiki.org

 Best

 On Mon, Jan 18, 2016 at 10:00 PM Luigi Assom &lt;itsawesome.yes(a)gmail.com&gt;
 wrote:

  hello hello!
 about the use of pywikibot:
 is it possible to use to parse the xml dump?

 I am interested in extracting links from pages (internal, external,
 with distinction from ones belonging to category).
 I also would like to handle transitive redirect.
 I would like to process the dump, without accessing wiki, either access
 wiki with proper limits in butch.

 Is there maybe something in the package already taking care of this ?
 I 've seen in https://www.mediawiki.org/wiki/Manual:Pywikibot/Scripts
 there is a "ghost" extracting_links.py" script,
 I wonted to ask before re-inventing the wheel, and if pywikibot is
 suitable tool for the purpose.

 Thank you,
 L.
 _______________________________________________
 pywikibot mailing list
 pywikibot(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/pywikibot

 _______________________________________________
 pywikibot mailing list
 pywikibot(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/pywikibot

 _______________________________________________
 pywikibot mailing list
 pywikibot(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/pywikibot

 _______________________________________________
 pywikibot mailing list
 pywikibot(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/pywikibot

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: [pywikibot] about parsing the dump