0) Context
I am currently developing new features for WP-MIRROR (see < https://www.mediawiki.org/wiki/Wp-mirror%3E).
1) Objective
I would like WP-MIRROR to generate all image thumbs during the mirror build process. This is so that mediawiki can render pages quickly using precomputed thumbs.
2) Dump importation
maintenance/importDump.php - this computes thumbs during importation, but is too slow. mwxml2sql - loads databases quickly, but does not compute thumbs.
3) Question
Is there a way to compute all the thumbs after loading databases quickly with mwxml2sql?
Sincerely Yours, Kent
On 9/12/15, wp mirror wpmirrordev@gmail.com wrote:
- Context
I am currently developing new features for WP-MIRROR (see < https://www.mediawiki.org/wiki/Wp-mirror%3E).
- Objective
I would like WP-MIRROR to generate all image thumbs during the mirror build process. This is so that mediawiki can render pages quickly using precomputed thumbs.
- Dump importation
maintenance/importDump.php - this computes thumbs during importation, but is too slow. mwxml2sql - loads databases quickly, but does not compute thumbs.
- Question
Is there a way to compute all the thumbs after loading databases quickly with mwxml2sql?
Sincerely Yours, Kent _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hi. My understanding is that wp-mirror sets up a MediaWiki instance for rendering the mirror. One solution would be to set up 404-thumb rendering. This makes it so that instead of pre-rendering the needed thumbs, MediaWiki will render the thumbs on-demand whenever the web browser requests a thumb. There's some instructions for how this works at https://www.mediawiki.org/wiki/Manual:Thumb.php This is probably the best solution to your problem.
Otherwise, MW needs to know what thumbs are needed for all pages, which involves parsing pages (e.g. via refreshLinks.php). This is a very slow process. If you already had all the thumbnail's generated, you could just copy over the thumb directory perhaps, but I'm not sure where you would get a pre-generated thumb directory.
-- -bawolff
Have you looked into what mwoffliner does? https://sourceforge.net/p/kiwix/other/ci/master/tree/mwoffliner/mwoffliner.j... Maybe you can even just extract the images from the ZIM files.
Nemo
As another suggestion, XOWA (http://gnosygnu.github.io/xowa/) can generate a list of thumbs. It takes about 60 hours to parse the English Wikipedia dump and generate a table of 4.78 million rows with the following columns:
* file name * file extension * repo (commons or local) * file width * file height * thumbtime (for video) * page (for djvu / pdf)
There's more information in XOWA at home/wiki/Help:Import/Command-line/Thumbs . I can provide more information online or offline if you're interested.
If you need the actual thumb files, you can download XOWA databases from https://archive.org/details/Xowa_enwiki_latest . They have about 5 million thumbs within SQLite tables. It should be straightforward to write code to pull the blob from the database and save them to disk.
Otherwise, as others have indicated, I know of no MediaWiki way to get this information (via .sql dump file or by api.php). Since XOWA parses wikitext, it can generate the information easily, though the solution is not officially a MediaWiki one.
Hope this helps.
On Sun, Sep 13, 2015 at 1:43 AM, wp mirror wpmirrordev@gmail.com wrote:
- Context
I am currently developing new features for WP-MIRROR (see < https://www.mediawiki.org/wiki/Wp-mirror%3E).
- Objective
I would like WP-MIRROR to generate all image thumbs during the mirror build process. This is so that mediawiki can render pages quickly using precomputed thumbs.
- Dump importation
maintenance/importDump.php - this computes thumbs during importation, but is too slow. mwxml2sql - loads databases quickly, but does not compute thumbs.
- Question
Is there a way to compute all the thumbs after loading databases quickly with mwxml2sql?
Sincerely Yours, Kent _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Wed, Sep 16, 2015 at 12:51 AM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Have you looked into what mwoffliner does? https://sourceforge.net/p/kiwix/other/ci/master/tree/mwoffliner/mwoffliner.j...
+1 for mwoffliner. It should be *very* close to what you are looking for, and avoids the need to parse wikitext itself by fetching the HTML from the Wikimedia REST API.
Maybe you can even just extract the images from the ZIM files.
Nemo
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org