On 11.07.22 22:37, Asaf Bartov wrote:
Yes, it sounds like the missing link here is a tool
for creating the
list of resources to offline. Z made one particular specification, but
I suppose it could be made a little more general, and potentially even
leverage some existing general-purpose pageset curation tools, such as
PetScan. AFAIK, PetScan currently doesn't support the use-case of "get
me N levels of pages linked from this first page (or from these P first
pages)", but we can imagine (and advocate for) PetScan supporting it at
some point.
Then, a PetScan query ID (which is enough to generate the page-set) can
be an input to the Wikipedia-on-Demand tool, the problem is solved.
(Well, almost: we'd still need to specify the logic for collecting
related resources -- i.e. none/some/all images included in the pages,
Wikidata items, etc.)
At a high level, this is indeed the kind of challenge we face now. This
is a lack of tool which has been identified already a long time ago at
Kiwix. I believe we are now ready to move forward on this because the
underlying software pieces are ready.
The overall strategy is to extend
wp1.openzim.org (API) to allow to
implement sophisticated selection modules. So far, how these modules
will look like at the end is really open and all the ideas are welcome
(please open tickets at
https://github.com/openzim/wp1).
Collaborating/Relying with/on PetScan is an idea which should be assessed.
Once a selection done, our Zimfarm infrastructure is ready to build the
snapshots (ZIM files). We will probably have to build (a) dedicated
frontend(s) to bring these two tools together in a user friendly manner.
This is the goal of the Wikipeda-on-Demand project (WMCH granted)
https://meta.wikimedia.org/wiki/Kiwix/Wikipedia_on_demand we have
started to work on. We have an other project (related to the war in
Ukraine) in the pipe which should even extend this tool.
Kelson
--
Kiwix - Wikipedia Offline & more
* Web:
https://kiwix.org/
* Twitter:
https://twitter.com/KiwixOffline
* Wiki:
https://wiki.kiwix.org/