Hey all!
While working on some maintenance scripts for TimedMediaHandler I've been trying to make it easier to do scripts that use multiple parallel processes to run through a large input set faster.
My proposal is a ForkableMaintenance class, with an underlying QueueingForkController which is a refactoring of the OrderedStreamingForkController used by (at least) some CirrusSearch maintenance scripts.
Patch in progress: https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/451099/
The expected use case is a relatively long loop of reading input data interleaved with running CPU-intensive or DB-intensive jobs, where the individual jobs are independent and order of input is not strongly coupled. (Ordering of _running_ of items on the child processes is not guaranteed, but the order of result processing is guaranteed to be in the same order as input, making output across runs predictable for idempotent processing.)
A simple ForkableMaintenance script might look like:
class Foo extends ForkableMaintenance { // Handle any input on the parent thread, and // pass any data as JSON-serializable form into // the queue() method, where it gets funneled into // a child process. public function loop() { for ( $i = 0; $i < 1000; $i++) { $this->queue( $i ); } }
// On the child process, receives the queued value // via JSON encode/decode. Here it's a number. public function work( $count ) { return str_repeat( '*', $count ); }
// On the parent thread, receives the work() return value // via JSON encode/decode. Here it's a string. public function result( $data ) { $this->output( $data . "\n" ); } }
Because data is serialized as JSON and sent over a pipe, you can't send live objects like Titles or Pages or Files, but you can send arrays or associative arrays with fairly complex data.
There is a little per-job overhead and multiple threads can cause more contention on the database server etc, but it scales well on the subtitle format conversion script I'm testing with, which is a little DB loading and some CPU work. On an 8-core/16-thread test server:
threads runtime (s) speedup 0 320 n/a 1 324 0.987654321 2 183 1.74863388 4 105 3.047619048 8 66 4.848484848 16 58 5.517241379
I've added a phpunit test case for OrderedStreamingForkController to make sure I don't regress something used by other teams, but noticed a couple problems with using this fork stuff in the test runners.
First, doing pcntl_fork() inside a phpunit test case has some potential side effects, since there's a lot of tear-down work done in destructors even if you call exit() after processing completes. As a workaround, when I'm having the child procs send a SIGKILL to themselves to terminate immediately without running test-case destructors.
Second, probably related to that I'm seeing a failure in the code coverage calculations -- it's seeing some increased coverage on the parent process at least but seems to think it's returning a non-zero exit code somewhere, which marks the whole operation as a failure:
https://integration.wikimedia.org/ci/job/mediawiki-phpunit-coverage-patch-do...
Worst case, can probably exclude these from some automated tests but that always seems like a bad idea. :D
If anybody else is using, or thinking about using, ForkController and its friends and wants to help polish this up, give a shout!
-- brion
On Sun, Aug 12, 2018 at 4:48 AM, Brion Vibber bvibber@wikimedia.org wrote:
While working on some maintenance scripts for TimedMediaHandler I've been trying to make it easier to do scripts that use multiple parallel processes to run through a large input set faster.
My proposal is a ForkableMaintenance class, with an underlying QueueingForkController which is a refactoring of the OrderedStreamingForkController used by (at least) some CirrusSearch maintenance scripts.
For what it's worth, when I saw ForkableMaintenance I thought of forking an open-source project, not Unix fork(). Something like ParallelMaintenance or ParallelizableMaintenance would better suggest the desired meaning for me.
On Sun, Aug 12, 2018, 2:49 AM Aryeh Gregor ayg@aryeh.name wrote:
For what it's worth, when I saw ForkableMaintenance I thought of forking an open-source project, not Unix fork(). Something like ParallelMaintenance or ParallelizableMaintenance would better suggest the desired meaning for me.
I like it... I'll reserve the Fork terminology for the low level controllers which deal with the processes and change the maintenance wrapper to ParallelMaintenance.
Any preferences on --fork=N vs --threads=N for the cli option? I'm leaning to changing it to --threads.
-- brion
(I've made both changes on the PR.)
-- brion
On Sun, Aug 12, 2018 at 7:54 AM Brion Vibber bvibber@wikimedia.org wrote:
On Sun, Aug 12, 2018, 2:49 AM Aryeh Gregor ayg@aryeh.name wrote:
For what it's worth, when I saw ForkableMaintenance I thought of forking an open-source project, not Unix fork(). Something like ParallelMaintenance or ParallelizableMaintenance would better suggest the desired meaning for me.
I like it... I'll reserve the Fork terminology for the low level controllers which deal with the processes and change the maintenance wrapper to ParallelMaintenance.
Any preferences on --fork=N vs --threads=N for the cli option? I'm leaning to changing it to --threads.
-- brion
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
Hi,
On 08/11/2018 06:48 PM, Brion Vibber wrote:
Second, probably related to that I'm seeing a failure in the code coverage calculations -- it's seeing some increased coverage on the parent process at least but seems to think it's returning a non-zero exit code somewhere, which marks the whole operation as a failure:
https://integration.wikimedia.org/ci/job/mediawiki-phpunit-coverage-pa
tch-docker/460/console
I
think your test and the tool are working properly, there just is an actual coverage drop:
+-------------------------------------+--------+--------+ | Filename | Old % | New % | +-------------------------------------+--------+--------+ | includes/QueueingForkController.php | 0 | 07.92 | | maintenance/Maintenance.php | 20.51 | 19.83 | +-------------------------------------+--------+--------+
If you look at the HTML report[1], all the new lines added to Maintenance.php are not covered by tests, which is decreasing coverage.
The tool currently reports a coverage drop if it drops in any file, which isn't necessary ideal, see [2].
[1] https://integration.wikimedia.org/ci/job/mediawiki-phpunit-coverage-patc h-docker/460/artifact/log/coverage.html [2] https://phabricator.wikimedia.org/T188687
- -- Legoktm
Thanks, looks like I misinterpreted the report output. :)
I think I can add a test case for ParallelMaintenance which should make the warning go away.
-- brion
On Sun, Aug 12, 2018, 1:51 PM Kunal Mehta legoktm@member.fsf.org wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
Hi,
On 08/11/2018 06:48 PM, Brion Vibber wrote:
Second, probably related to that I'm seeing a failure in the code coverage calculations -- it's seeing some increased coverage on the parent process at least but seems to think it's returning a non-zero exit code somewhere, which marks the whole operation as a failure:
https://integration.wikimedia.org/ci/job/mediawiki-phpunit-coverage-pa
tch-docker/460/console https://integration.wikimedia.org/ci/job/mediawiki-phpunit-coverage-patch-docker/460/console
I
think your test and the tool are working properly, there just is an actual coverage drop:
+-------------------------------------+--------+--------+ | Filename | Old % | New % | +-------------------------------------+--------+--------+ | includes/QueueingForkController.php | 0 | 07.92 | | maintenance/Maintenance.php | 20.51 | 19.83 | +-------------------------------------+--------+--------+
If you look at the HTML report[1], all the new lines added to Maintenance.php are not covered by tests, which is decreasing coverage.
The tool currently reports a coverage drop if it drops in any file, which isn't necessary ideal, see [2].
[1] https://integration.wikimedia.org/ci/job/mediawiki-phpunit-coverage-patc h-docker/460/artifact/log/coverage.html https://integration.wikimedia.org/ci/job/mediawiki-phpunit-coverage-patch-docker/460/artifact/log/coverage.html [2] https://phabricator.wikimedia.org/T188687
- -- Legoktm
-----BEGIN PGP SIGNATURE-----
iQIzBAEBCgAdFiEE+h6fmkHn9DUCyl1jUvyOe+23/KIFAltwnZ4ACgkQUvyOe+23 /KIm8w//WZVvuCCxHlW2CmbKtw/hjiBCxTUsmFAdb4QCco2nJe1qcSKtU9tBWq0n HrT32rEK06sYSPFrcHE6KYCHYtaLAGn8zcpXTCnB15mq1c/yrkwNucXwpBbbs28b 776EjNzTnU8UbTP0y9qt+Z3g1rRAjFjbXSqbh/3Vi9nQDlgS+cCgMwudZ+INzCeV L0O+JKZKmfAswcSZbSVkWFDBQMZulhlP4ztS0hTYyixnGTl5z2Cc3c7F2OvjlcIx lyGZkh1544X6hG7t9t5o35Tjbwt/Y5de617QiiN6dvFO6OrxD/mNs17kp302WJS7 FjcPjFXvSsOdobG0Ff/cg8/cy1m5Ek6fctw1cCMWnYruy7rcYvt9QYz8NRxaSIES PdYmL8AiuY0173AKTTMgmOxjg0phY6Mrf7d8eo81zRwkENBjzwut2gEF6s4xeR6I jnKdqpHta6HWs3wF7+dTNxH8v5f7TRGVkz1PwacdbHZBj8PEUAge7789f2qqzByb V2P5tr/nMCTrIoc+iPsjif1AbQsbLk+dKs1BDHxymChnZa0gTIbUGnTriqKn4xIo qkZyflm66OHa+R6C3hcs5+OTfVnx7Sqqcmk3b7vC3N5ydlEwUXJSI9PrOC6xr77s ltUPh/8hsA3TLJ6CHyCwgnZtyZOL6XczysOHiWpBeH1wWhl+gwQ= =DSVw -----END PGP SIGNATURE-----
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hi Brion.
I might be a bit late, but I just wonder why did you decide to go with forking instead of Pthreads PHP extension?
AFAIK, Mediawiki should support not only *nix platforms, but Windows as well, and `pcntl_fork()` does not work on Windows.
PS: Couldn't find related ticket. Can you post a link to it?
Links: https://secure.php.net/manual/en/book.pthreads.php https://github.com/krakjoe/pthreads
Aleksey, Wikimedia Deutschland
On 13 August 2018 at 02:51, Brion Vibber bvibber@wikimedia.org wrote:
Thanks, looks like I misinterpreted the report output. :)
I think I can add a test case for ParallelMaintenance which should make the warning go away.
-- brion
On Sun, Aug 12, 2018, 1:51 PM Kunal Mehta legoktm@member.fsf.org wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
Hi,
On 08/11/2018 06:48 PM, Brion Vibber wrote:
Second, probably related to that I'm seeing a failure in the code coverage calculations -- it's seeing some increased coverage on the parent process at least but seems to think it's returning a non-zero exit code somewhere, which marks the whole operation as a failure:
https://integration.wikimedia.org/ci/job/mediawiki-phpunit-coverage-pa
tch-docker/460/console <https://integration.wikimedia.org/ci/job/mediawiki-phpunit-coverage-
patch-docker/460/console>
I
think your test and the tool are working properly, there just is an actual coverage drop:
+-------------------------------------+--------+--------+ | Filename | Old % | New % | +-------------------------------------+--------+--------+ | includes/QueueingForkController.php | 0 | 07.92 | | maintenance/Maintenance.php | 20.51 | 19.83 | +-------------------------------------+--------+--------+
If you look at the HTML report[1], all the new lines added to Maintenance.php are not covered by tests, which is decreasing coverage.
The tool currently reports a coverage drop if it drops in any file, which isn't necessary ideal, see [2].
[1] https://integration.wikimedia.org/ci/job/mediawiki-phpunit-coverage-patc h-docker/460/artifact/log/coverage.html <https://integration.wikimedia.org/ci/job/mediawiki-phpunit-coverage-
patch-docker/460/artifact/log/coverage.html>
[2] https://phabricator.wikimedia.org/T188687
- -- Legoktm
-----BEGIN PGP SIGNATURE-----
iQIzBAEBCgAdFiEE+h6fmkHn9DUCyl1jUvyOe+23/KIFAltwnZ4ACgkQUvyOe+23 /KIm8w//WZVvuCCxHlW2CmbKtw/hjiBCxTUsmFAdb4QCco2nJe1qcSKtU9tBWq0n HrT32rEK06sYSPFrcHE6KYCHYtaLAGn8zcpXTCnB15mq1c/yrkwNucXwpBbbs28b 776EjNzTnU8UbTP0y9qt+Z3g1rRAjFjbXSqbh/3Vi9nQDlgS+cCgMwudZ+INzCeV L0O+JKZKmfAswcSZbSVkWFDBQMZulhlP4ztS0hTYyixnGTl5z2Cc3c7F2OvjlcIx lyGZkh1544X6hG7t9t5o35Tjbwt/Y5de617QiiN6dvFO6OrxD/mNs17kp302WJS7 FjcPjFXvSsOdobG0Ff/cg8/cy1m5Ek6fctw1cCMWnYruy7rcYvt9QYz8NRxaSIES PdYmL8AiuY0173AKTTMgmOxjg0phY6Mrf7d8eo81zRwkENBjzwut2gEF6s4xeR6I jnKdqpHta6HWs3wF7+dTNxH8v5f7TRGVkz1PwacdbHZBj8PEUAge7789f2qqzByb V2P5tr/nMCTrIoc+iPsjif1AbQsbLk+dKs1BDHxymChnZa0gTIbUGnTriqKn4xIo qkZyflm66OHa+R6C3hcs5+OTfVnx7Sqqcmk3b7vC3N5ydlEwUXJSI9PrOC6xr77s ltUPh/8hsA3TLJ6CHyCwgnZtyZOL6XczysOHiWpBeH1wWhl+gwQ= =DSVw -----END PGP SIGNATURE-----
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Tue, Aug 14, 2018, 3:39 AM Aleksey Bekh-Ivanov < aleksey.bekh-ivanov@wikimedia.de> wrote:
Hi Brion.
I might be a bit late, but I just wonder why did you decide to go with forking instead of Pthreads PHP extension?
ForkController and OrderedStreamingForkController were already present and using pcntl_fork; also I didn't know PHP has a pthreads extension now. :) Worth looking at...
AFAIK, Mediawiki should support not only *nix platforms, but Windows as well, and `pcntl_fork()` does not work on Windows.
If no threads are specified it'll run in-process which will still work, but of course you get no additional processes then.
PS: Couldn't find related ticket. Can you post a link to it?
No ticket currently, though I can open one. :)
-- brion
Links: https://secure.php.net/manual/en/book.pthreads.php https://github.com/krakjoe/pthreads
Aleksey, Wikimedia Deutschland
On 13 August 2018 at 02:51, Brion Vibber bvibber@wikimedia.org wrote:
Thanks, looks like I misinterpreted the report output. :)
I think I can add a test case for ParallelMaintenance which should make
the
warning go away.
-- brion
On Sun, Aug 12, 2018, 1:51 PM Kunal Mehta legoktm@member.fsf.org
wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
Hi,
On 08/11/2018 06:48 PM, Brion Vibber wrote:
Second, probably related to that I'm seeing a failure in the code coverage calculations -- it's seeing some increased coverage on the parent process at least but seems to think it's returning a non-zero exit code somewhere, which marks the whole operation as a failure:
https://integration.wikimedia.org/ci/job/mediawiki-phpunit-coverage-pa
tch-docker/460/console <https://integration.wikimedia.org/ci/job/mediawiki-phpunit-coverage-
patch-docker/460/console>
I
think your test and the tool are working properly, there just is an actual coverage drop:
+-------------------------------------+--------+--------+ | Filename | Old % | New % | +-------------------------------------+--------+--------+ | includes/QueueingForkController.php | 0 | 07.92 | | maintenance/Maintenance.php | 20.51 | 19.83 | +-------------------------------------+--------+--------+
If you look at the HTML report[1], all the new lines added to Maintenance.php are not covered by tests, which is decreasing coverage.
The tool currently reports a coverage drop if it drops in any file, which isn't necessary ideal, see [2].
[1]
https://integration.wikimedia.org/ci/job/mediawiki-phpunit-coverage-patc
h-docker/460/artifact/log/coverage.html <https://integration.wikimedia.org/ci/job/mediawiki-phpunit-coverage-
patch-docker/460/artifact/log/coverage.html>
[2] https://phabricator.wikimedia.org/T188687
- -- Legoktm
-----BEGIN PGP SIGNATURE-----
iQIzBAEBCgAdFiEE+h6fmkHn9DUCyl1jUvyOe+23/KIFAltwnZ4ACgkQUvyOe+23 /KIm8w//WZVvuCCxHlW2CmbKtw/hjiBCxTUsmFAdb4QCco2nJe1qcSKtU9tBWq0n HrT32rEK06sYSPFrcHE6KYCHYtaLAGn8zcpXTCnB15mq1c/yrkwNucXwpBbbs28b 776EjNzTnU8UbTP0y9qt+Z3g1rRAjFjbXSqbh/3Vi9nQDlgS+cCgMwudZ+INzCeV L0O+JKZKmfAswcSZbSVkWFDBQMZulhlP4ztS0hTYyixnGTl5z2Cc3c7F2OvjlcIx lyGZkh1544X6hG7t9t5o35Tjbwt/Y5de617QiiN6dvFO6OrxD/mNs17kp302WJS7 FjcPjFXvSsOdobG0Ff/cg8/cy1m5Ek6fctw1cCMWnYruy7rcYvt9QYz8NRxaSIES PdYmL8AiuY0173AKTTMgmOxjg0phY6Mrf7d8eo81zRwkENBjzwut2gEF6s4xeR6I jnKdqpHta6HWs3wF7+dTNxH8v5f7TRGVkz1PwacdbHZBj8PEUAge7789f2qqzByb V2P5tr/nMCTrIoc+iPsjif1AbQsbLk+dKs1BDHxymChnZa0gTIbUGnTriqKn4xIo qkZyflm66OHa+R6C3hcs5+OTfVnx7Sqqcmk3b7vC3N5ydlEwUXJSI9PrOC6xr77s ltUPh/8hsA3TLJ6CHyCwgnZtyZOL6XczysOHiWpBeH1wWhl+gwQ= =DSVw -----END PGP SIGNATURE-----
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Tue, Aug 14, 2018 at 8:34 AM Brion Vibber bvibber@wikimedia.org wrote:
On Tue, Aug 14, 2018, 3:39 AM Aleksey Bekh-Ivanov < aleksey.bekh-ivanov@wikimedia.de> wrote:
Hi Brion.
I might be a bit late, but I just wonder why did you decide to go with forking instead of Pthreads PHP extension?
ForkController and OrderedStreamingForkController were already present and using pcntl_fork; also I didn't know PHP has a pthreads extension now. :) Worth looking at...
AFAIK, Mediawiki should support not only *nix platforms, but Windows as well, and `pcntl_fork()` does not work on Windows.
If no threads are specified it'll run in-process which will still work, but of course you get no additional processes then.
I took a quick look at the PHP pthreads extension; it looks interesting, but looks like it might be trickier to integrate.
Not all of the global environment is available to threads -- only classes that extend a special Threaded type -- so I think threads would either end up without the MediaWiki environment, severely limiting what they can do, or they would have to reinitialize MediaWiki locally.
Also it doesn't seem to be packaged in Debian, so it would be extra work for us to adopt and package it for production use, and it's less likely to be present by default in other peoples' operating environments.
If re-initializing the application state is necessary in the threads anyway without retooling the entire application, it might be easier to use a separate-process model where the maintenance script is re-invoked with wfShellExec() with a special child-process mode, and data routed over stdin/stdout.
In which case it might be more consistent to do that than using pcntl_fork() to begin with... but I'm not sure how to integrate that into phpunit tests. :)
PS: Couldn't find related ticket. Can you post a link to it?
No ticket currently, though I can open one. :)
Opened as https://phabricator.wikimedia.org/T201970 -- if there's more interest in getting this right I can make this an RFC. (I had an inquiry about thinking about ways to do multiprocess work from web requests too, which might not work with pcntl_fork().)
Additional thoughts welcome on pcntl_fork() vs separate process launch vs something else. :D
-- brion
wikitech-l@lists.wikimedia.org