Hello,
I'm Côme (Skylsmoi) and I am working with Kelson (from Kiwix) on the mwoffliner. I am trying to mirror locally every js and css resources from original articles for an offline use.
I have been looking for every resources I could find and currently I manage to get : - the ones from api.php?action=parse&prop=modules|jsconfigvars - the list in the <script> tag from article's <head> listed by mw.loader.load([...]) - I added startup.js, jquery.js and mediawiki.j
But I have realised that some resources are doing a mw.loader.using() which download more resources (that might also download even more resources) and even execute some js code in .using() callbacks.
I can't parse every resources looking for mw.loader.using to manually get the file because some are dynamic (mw.loader.using(module_list); // module_list is assigned before in the code) The callbacks also are a problem for parsing the files because in offline, the files would already be linked to the page, not downloaded so the callback would not trigger
I'm looking for any help. The dream solution would an api endpoint that, giving one wikipedia page, list every js and css modules with, for every js modules, it's dependencies and the js to execute once downloaded.
Thanks in advance
Le 05/04/2017 à 09:31, Côme HUGUIES a écrit :
Hello,
I'm Côme (Skylsmoi) and I am working with Kelson (from Kiwix) on the mwoffliner. I am trying to mirror locally every js and css resources from original articles for an offline use.
[...]
Hello all,
I'm working with Skylsmoi on the project. The global question we are trying to answer is : "how to get the exhaustive list of ressources required by a given wikimedia page". The example Skylsmoi worked on is the page of Lyon city - https://en.wikipedia.org/wiki/Lyon
The problem is that the page depends on some modules which depends on some other modules which... a recursive dependency workflow based on javascript dynamic code. So... it is pretty impossible to get the fully exhaustive dependency list without to execute the javascript code itself.
As you may (or may not) know, kiwix is the project that target to allow to mirror some wikimedia-based website in order to give access offline. mwoffliner (the module we are working on) is in charge of downloading all required stuff. That's why we try to get the exhaustive list.
/Kelson/ initially pointed us to the load.php endpoint which returns some dependancies... unfortunatelly, it does not return all dependencies (or we do not use it the right way).
Better than to reverse-engineer the entire wikimedia source code, asking for ideas / directions here, on the mailing list can help. (or even if the response is "it's impossible" will help)
Thanks in advance for any help you can give us.
Have a great day.
Damien
"all" is a lot, and not likely to be what you want. All includes skinning, libraries, content scripts, site scripts, user scripts etc. javascript, css, userinterface message etc... All have different elements to them that might not fit your use case, so identifying that is important.
Furthermore, JS and CSS are very dependent on the structure of the page, so in order to be able to run, you will have to coerce the page to be similar enough that it will work. They load stuff conditionally for a wide area of purposes etc...
You can retrieve 'some' of the module names that a specific page's content depends on. These are handy (and also what JS wikitext previews use to dynamically load modules into the existing context), but far from complete
For Lyon, there actually are none: https://en.wikipedia.org/wiki/Special:ApiSandbox#action=parse&format=jso...
The lyon page doesn't depend on JS or CSS modules really. Because collapsible content is a en.wp specific UI hack on top, delivered by the JS module 'site' and the CSS module 'site.styles' https://en.wikipedia.org/w/load.php?debug=false&lang=en&modules=site...
Which requires the jquery and mediwiki modules https://en.wikipedia.org/w/load.php?debug=false&lang=en&modules=jque...
which require the startup module (the startup module has the mw.loader, the dependency tree and the configuration): https://en.wikipedia.org/w/load.php?debug=false&lang=en&modules=star...
Remember that load order needs to be ensured, and that the above only details JS, and not CSS and doesn't help you much there are lazy loaded modules downloaded over HTTP and everything. Unless you download all modules and basically implement your own load.php delivery shim to deliver them...
It's not easy :)
DJ
On Wed, Apr 5, 2017 at 10:54 AM, Damien Accorsi damien.accorsi@free.fr wrote:
Le 05/04/2017 à 09:31, Côme HUGUIES a écrit :
Hello,
I'm Côme (Skylsmoi) and I am working with Kelson (from Kiwix) on the mwoffliner. I am trying to mirror locally every js and css resources from original articles for an offline use.
[...]
Hello all,
I'm working with Skylsmoi on the project. The global question we are trying to answer is : "how to get the exhaustive list of ressources required by a given wikimedia page". The example Skylsmoi worked on is the page of Lyon city - https://en.wikipedia.org/wiki/Lyon
The problem is that the page depends on some modules which depends on some other modules which... a recursive dependency workflow based on javascript dynamic code. So... it is pretty impossible to get the fully exhaustive dependency list without to execute the javascript code itself.
As you may (or may not) know, kiwix is the project that target to allow to mirror some wikimedia-based website in order to give access offline. mwoffliner (the module we are working on) is in charge of downloading all required stuff. That's why we try to get the exhaustive list.
/Kelson/ initially pointed us to the load.php endpoint which returns some dependancies... unfortunatelly, it does not return all dependencies (or we do not use it the right way).
Better than to reverse-engineer the entire wikimedia source code, asking for ideas / directions here, on the mailing list can help. (or even if the response is "it's impossible" will help)
Thanks in advance for any help you can give us.
Have a great day.
Damien
MediaWiki-l mailing list To unsubscribe, go to: https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Thank you DJ for your advices.
On 05.04.2017 17:32, Derk-Jan Hartman wrote:
"all" is a lot, and not likely to be what you want. All includes skinning, libraries, content scripts, site scripts, user scripts etc. javascript, css, userinterface message etc... All have different elements to them that might not fit your use case, so identifying that is important.
Furthermore, JS and CSS are very dependent on the structure of the page, so in order to be able to run, you will have to coerce the page to be similar enough that it will work. They load stuff conditionally for a wide area of purposes etc...
You can retrieve 'some' of the module names that a specific page's content depends on. These are handy (and also what JS wikitext previews use to dynamically load modules into the existing context), but far from complete
For Lyon, there actually are none: https://en.wikipedia.org/wiki/Special:ApiSandbox#action=parse&format=jso...
The lyon page doesn't depend on JS or CSS modules really. Because collapsible content is a en.wp specific UI hack on top, delivered by the JS module 'site' and the CSS module 'site.styles' https://en.wikipedia.org/w/load.php?debug=false&lang=en&modules=site...
Which requires the jquery and mediwiki modules https://en.wikipedia.org/w/load.php?debug=false&lang=en&modules=jque...
which require the startup module (the startup module has the mw.loader, the dependency tree and the configuration): https://en.wikipedia.org/w/load.php?debug=false&lang=en&modules=star...
Remember that load order needs to be ensured, and that the above only details JS, and not CSS and doesn't help you much there are lazy loaded modules downloaded over HTTP and everything. Unless you download all modules and basically implement your own load.php delivery shim to deliver them...
It's not easy :)
DJ
On Wed, Apr 5, 2017 at 10:54 AM, Damien Accorsi damien.accorsi@free.fr wrote:
Le 05/04/2017 à 09:31, Côme HUGUIES a écrit :
Hello,
I'm Côme (Skylsmoi) and I am working with Kelson (from Kiwix) on the mwoffliner. I am trying to mirror locally every js and css resources from original articles for an offline use.
[...]
Hello all,
I'm working with Skylsmoi on the project. The global question we are trying to answer is : "how to get the exhaustive list of ressources required by a given wikimedia page". The example Skylsmoi worked on is the page of Lyon city - https://en.wikipedia.org/wiki/Lyon
The problem is that the page depends on some modules which depends on some other modules which... a recursive dependency workflow based on javascript dynamic code. So... it is pretty impossible to get the fully exhaustive dependency list without to execute the javascript code itself.
As you may (or may not) know, kiwix is the project that target to allow to mirror some wikimedia-based website in order to give access offline. mwoffliner (the module we are working on) is in charge of downloading all required stuff. That's why we try to get the exhaustive list.
/Kelson/ initially pointed us to the load.php endpoint which returns some dependancies... unfortunatelly, it does not return all dependencies (or we do not use it the right way).
Better than to reverse-engineer the entire wikimedia source code, asking for ideas / directions here, on the mailing list can help. (or even if the response is "it's impossible" will help)
Thanks in advance for any help you can give us.
Have a great day.
Damien
MediaWiki-l mailing list To unsubscribe, go to: https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
MediaWiki-l mailing list To unsubscribe, go to: https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
mediawiki-l@lists.wikimedia.org