I am currently trying to install ws-export (
https://github.com/wikimedia/ws-export) and I’m having trouble with “compose”, would anyone know anything about this?

> composer install --no-dev

> Your lock file does not contain a compatible set of packages. Please run composer update.

> composer update

> Your requirements could not be resolved to an installable set of packages.

> Problem 1
    - Root composer.json requires PHP extension ext-dom * but it is missing from your system. Install or enable PHP's dom extension.
  Problem 2
    - Root composer.json requires PHP extension ext-intl * but it is missing from your system. Install or enable PHP's intl extension.
  Problem 3
    - Root composer.json requires PHP extension ext-sqlite3 * but it is missing from your system. Install or enable PHP's sqlite3 extension.
  Problem 4
    - Root composer.json requires PHP extension ext-zip * but it is missing from your system. Install or enable PHP's zip extension.
  Problem 5
    - symfony/framework-bundle[v5.4.0, ..., v5.4.12] require ext-xml * -> it is missing from your system. Install or enable PHP's xml extension.
    - Root composer.json requires symfony/framework-bundle 5.4.* -> satisfiable by symfony/framework-bundle[v5.4.0, ..., v5.4.12].

To enable extensions, verify that they are enabled in your .ini files:
    - /etc/php/7.4/cli/php.ini
    - /etc/php/7.4/cli/conf.d/10-opcache.ini
    - /etc/php/7.4/cli/conf.d/10-pdo.ini
    - /etc/php/7.4/cli/conf.d/20-calendar.ini
    - /etc/php/7.4/cli/conf.d/20-ctype.ini
    - /etc/php/7.4/cli/conf.d/20-exif.ini
    - /etc/php/7.4/cli/conf.d/20-ffi.ini
    - /etc/php/7.4/cli/conf.d/20-fileinfo.ini
    - /etc/php/7.4/cli/conf.d/20-ftp.ini
    - /etc/php/7.4/cli/conf.d/20-gettext.ini
    - /etc/php/7.4/cli/conf.d/20-iconv.ini
    - /etc/php/7.4/cli/conf.d/20-json.ini
    - /etc/php/7.4/cli/conf.d/20-phar.ini
    - /etc/php/7.4/cli/conf.d/20-posix.ini
    - /etc/php/7.4/cli/conf.d/20-readline.ini
    - /etc/php/7.4/cli/conf.d/20-shmop.ini
    - /etc/php/7.4/cli/conf.d/20-sockets.ini
    - /etc/php/7.4/cli/conf.d/20-sysvmsg.ini
    - /etc/php/7.4/cli/conf.d/20-sysvsem.ini
    - /etc/php/7.4/cli/conf.d/20-sysvshm.ini
    - /etc/php/7.4/cli/conf.d/20-tokenizer.ini
You can also run `php --ini` in a terminal to see which files are used by PHP in CLI mode.
Alternatively, you can run Composer with `--ignore-platform-req=ext-dom --ignore-platform-req=ext-intl --ignore-platform-req=ext-sqlite3 --ignore-platform-req=ext-zip --ignore-platform-req=ext-xml` to temporarily ignore these required extensions.



I need to install these 5 extensions? Is that really the solution? Shouldn’t they be automatically installed?

Thank you,
Julius

On Tue 20. Sep 2022 at 17:41, Julius Hamilton <juliushamilton100@gmail.com> wrote:
Thank you very much.

> Did you look at the wikitext of that page?

I did now, I see that the text displayed is not actually present in the wikitext / source text. I am seeing these ".djvu include" lines:

<pages index="A simplified grammar of the Swedish language.djvu" include=7 />

What is this? Is it a common format for a Wikisource book? 

> prop=extracts works, but I would say it's a poor fit for many (most?) wikisource pages.

Why? Because it just pulls out sentences from the wikitext? What is different about the functioning of prop=revisions, for example?

> Plaintext as in wikitext or in parsed html converted to plaintext?

Whatever you think is preferable, the point is to have some clean, readable text. If the parsed HTML has any awkward formatting issues, I might prefer the wikitext, or vice versa. Whichever is easier to work with. Technically since wikitext is a markup format it might be easier to pull out from specific fields you are seeking? I don't know.

> You could use something like this to fetch every page 

Thanks. I tried replacing the title with a different, more normal book and it didn't seem to work. 

https://en.wikisource.org/w/api.php?generator=allpages&action=query&prop=revisions&rvprop=content&rvslots=main&gapprefix=Moby-Dick_(1851)_US_edition


I guess it's the same problem, "revisions" also pulls out wikitext but Wikisource wikitext pulls in its text from separate files?


So would the "parse" action of the API be the tool of choice?


 the WS Export tool can do that


Thanks very much, will give that a shot next.


Thank you,

Julius






On Tue, Sep 20, 2022 at 2:14 AM Sam Wilson <sam@samwilson.id.au> wrote:
 

How can I get the full plaintext from an entire book on Wikisource with the API?
 
Plaintext as in wikitext or in parsed html converted to plaintext?



If it's the latter, the WS Export tool can do that: https://ws-export.wmcloud.org/?format=txt


_______________________________________________
Mediawiki-api mailing list -- mediawiki-api@lists.wikimedia.org
To unsubscribe send an email to mediawiki-api-leave@lists.wikimedia.org