I need to install these 5 extensions? Is that really
the solution? Shouldn’t they be automatically installed?
Yes. They are likely to be already provided alongside PHP or maybe not
activated. For example on Debian and its derivatives they are packages
as php-dom, php-intl...
More conveniently you might also just use the public Wsexport
instance:
https://ws-export.wmcloud.org/
It's not suitable if you want to export tens of thousands of pages but
for small workloads it should be fine.
Thomas
Le mar. 20 sept. 2022 à 18:43, Julius Hamilton
<juliushamilton100(a)gmail.com> a écrit :
>
> I am currently trying to install ws-export (
>
https://github.com/wikimedia/ws-export) and I’m having trouble with “compose”, would
anyone know anything about this?
>
> > composer install --no-dev
>
> > Your lock file does not contain a compatible set of packages. Please run
composer update.
>
> > composer update
>
> > Your requirements could not be resolved to an installable set of packages.
>
> > Problem 1
> - Root composer.json requires PHP extension ext-dom * but it is missing from your
system. Install or enable PHP's dom extension.
> Problem 2
> - Root composer.json requires PHP extension ext-intl * but it is missing from
your system. Install or enable PHP's intl extension.
> Problem 3
> - Root composer.json requires PHP extension ext-sqlite3 * but it is missing from
your system. Install or enable PHP's sqlite3 extension.
> Problem 4
> - Root composer.json requires PHP extension ext-zip * but it is missing from your
system. Install or enable PHP's zip extension.
> Problem 5
> - symfony/framework-bundle[v5.4.0, ..., v5.4.12] require ext-xml * -> it is
missing from your system. Install or enable PHP's xml extension.
> - Root composer.json requires symfony/framework-bundle 5.4.* -> satisfiable by
symfony/framework-bundle[v5.4.0, ..., v5.4.12].
>
> To enable extensions, verify that they are enabled in your .ini files:
> - /etc/php/7.4/cli/php.ini
> - /etc/php/7.4/cli/conf.d/10-opcache.ini
> - /etc/php/7.4/cli/conf.d/10-pdo.ini
> - /etc/php/7.4/cli/conf.d/20-calendar.ini
> - /etc/php/7.4/cli/conf.d/20-ctype.ini
> - /etc/php/7.4/cli/conf.d/20-exif.ini
> - /etc/php/7.4/cli/conf.d/20-ffi.ini
> - /etc/php/7.4/cli/conf.d/20-fileinfo.ini
> - /etc/php/7.4/cli/conf.d/20-ftp.ini
> - /etc/php/7.4/cli/conf.d/20-gettext.ini
> - /etc/php/7.4/cli/conf.d/20-iconv.ini
> - /etc/php/7.4/cli/conf.d/20-json.ini
> - /etc/php/7.4/cli/conf.d/20-phar.ini
> - /etc/php/7.4/cli/conf.d/20-posix.ini
> - /etc/php/7.4/cli/conf.d/20-readline.ini
> - /etc/php/7.4/cli/conf.d/20-shmop.ini
> - /etc/php/7.4/cli/conf.d/20-sockets.ini
> - /etc/php/7.4/cli/conf.d/20-sysvmsg.ini
> - /etc/php/7.4/cli/conf.d/20-sysvsem.ini
> - /etc/php/7.4/cli/conf.d/20-sysvshm.ini
> - /etc/php/7.4/cli/conf.d/20-tokenizer.ini
> You can also run `php --ini` in a terminal to see which files are used by PHP in CLI
mode.
> Alternatively, you can run Composer with `--ignore-platform-req=ext-dom
--ignore-platform-req=ext-intl --ignore-platform-req=ext-sqlite3
--ignore-platform-req=ext-zip --ignore-platform-req=ext-xml` to temporarily ignore these
required extensions.
>
>
>
I need to install these 5 extensions? Is that really
the solution? Shouldn’t they be automatically installed?
>
> Thank you,
> Julius
>
> On Tue 20. Sep 2022 at 17:41, Julius Hamilton <juliushamilton100(a)gmail.com>
wrote:
>>
>> Thank you very much.
>>
>> > Did you look at the wikitext of that page?
>>
>> I did now, I see that the text displayed is not actually present in the wikitext
/ source text. I am seeing these ".djvu include" lines:
>>
>> <pages index="A simplified grammar of the Swedish language.djvu"
include=7 />
>>
>> What is this? Is it a common format for a Wikisource book?
>>
>> > prop=extracts works, but I would say it's a poor fit for many (most?)
wikisource pages.
>>
>> Why? Because it just pulls out sentences from the wikitext? What is different
about the functioning of prop=revisions, for example?
>>
>> > Plaintext as in wikitext or in parsed html converted to plaintext?
>>
>> Whatever you think is preferable, the point is to have some clean, readable text.
If the parsed HTML has any awkward formatting issues, I might prefer the wikitext, or vice
versa. Whichever is easier to work with. Technically since wikitext is a markup format it
might be easier to pull out from specific fields you are seeking? I don't know.
>>
>> > You could use something like this to fetch every page
>>
>> Thanks. I tried replacing the title with a different, more normal book and it
didn't seem to work.
>>
>>
https://en.wikisource.org/w/api.php?generator=allpages&action=query&…
>>
>>
>> I guess it's the same problem, "revisions" also pulls out wikitext
but Wikisource wikitext pulls in its text from separate files?
>>
>>
>> So would the "parse" action of the API be the tool of choice?
>>
>>
>> > the WS Export tool can do that
>>
>>
>> Thanks very much, will give that a shot next.
>>
>>
>> Thank you,
>>
>> Julius
>>
>>
>>
>>
>>
>>
>> On Tue, Sep 20, 2022 at 2:14 AM Sam Wilson <sam(a)samwilson.id.au> wrote:
>>>
>>>
>>>
>>>
>>>> How can I get the full plaintext from an entire book on Wikisource with
the API?
>>>
>>>
>>> Plaintext as in wikitext or in parsed html converted to plaintext?
>>>
>>>
>>>
>>> If it's the latter, the WS Export tool can do that:
https://ws-export.wmcloud.org/?format=txt
>>>
>>>
>>> _______________________________________________
>>> Mediawiki-api mailing list -- mediawiki-api(a)lists.wikimedia.org
>>> To unsubscribe send an email to mediawiki-api-leave(a)lists.wikimedia.org
>
> _______________________________________________
> Mediawiki-api mailing list -- mediawiki-api(a)lists.wikimedia.org
> To unsubscribe send an email to mediawiki-api-leave(a)lists.wikimedia.org