Hi Yoni,

On Mon, Sep 7, 2015 at 6:01 PM, Yoni Lamri <lamri.yoni@gmail.com> wrote:
Hello everybody,
First thread in the list, i'm relatively new to wikipedia/media usage (1 month).

Welcome!
 
I followed the doc (RTFM as usual) and every steps seems to finish in a wall.

My simple question, how to correctly install a wikipedia mirror from dumps in mediawiki ?

My goal:
Create an offline wikimedia server, from FR, EN or PT dumps, (1 language only).

I did some tests with the wowiki which is small enough for testing purpose but I encountered problems:
The tools mentioned in the doc are 404 for most of them or outdated. for example the xml2sql or mwdumper.
the original importer of wikimedia takes 20 minutes to import the wowiki 1.6Mo dump which means approximatly a decade for the FR one...

I don't think the time taken by the import script provided in MediaWiki is directly proportional to the size of the dump, as there are also other factors. You can perhaps try to use a larger dump (perhaps simplewiki or even metawiki) to see the difference.
 
I tried with the last MWDumper found on github, it can quickly generate a good sql file but it seems that special characters in page text are not "slashed" so \n return SQL error...

I am not sure if this is related to https://phabricator.wikimedia.org/T16379
 
Some columns are missing in original mediawiki installer, the "page_counter" in "page" table for example, is there some needed extensions to install dumps?

The "page_counter" column has already been removed in MediaWiki 1.25. You can obtain it as an extension here: https://www.mediawiki.org/wiki/Extension:HitCounters
 
How can i found/get medias (images low res are enough)

Media dumps are currently not available due to disk space issues. There is a possibility that these dumps will be produced in the future, and that will be announced on this mailing list.
 
How to do with interwiki medias/links/pages?

You will have to set up your interwiki table using this script: https://git.wikimedia.org/blob/operations%2Fdumps.git/adb99633787b188dc78ac775274081be4f4480a8/tools%2Finterwikis%2Ffixup-interwikis.py

Note that there is a small bug in that script that causes the interwiki file to not be downloaded correctly, you would have to apply this patch to get it working: https://gerrit.wikimedia.org/r/220075
 

To follow next step will be updates procedure, incremental or complete...

Seems french wikimedia team are stuck with my questions..

I'm not closed to development (python, PHP or JEE) but i need an entry point to begin if nobody work on it...

Thank you for your work, i know you are quite busy with the dump generation but our project is quite serious and lots of people want it and we need quick answers for the feasability.

Best regards,
Yoni

_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l


Hope this helps.

--
Best regards,
Hydriz Scholz