Hey,
For the new renderer backend for the Collections Extension we've come up with a tentative architecture that we would like operations buy in on. The living document is here [1]. It's worth saying explicitly that whatever setup we use must be able to handle the greater than 150k requests a day we serve using the old setup.
Basically we're looking at having * 'render servers' run node.js * doing job management in Redis * rendering content using PhantomJS and/or Latex * storing rendered files locally on the render servers (and streaming the rendered results through MediaWiki -- this is how it's done now as well). * having a garbage collector run routinely on the render servers to cleanup old stale content
Post comments to the talk page please :)
[1 ]https://www.mediawiki.org/wiki/PDF_rendering/Architecture
~Matt Walker Wikimedia Foundation Fundraising Technology Team
As a followup, it's worth talking about puppetization and how we're going to accomplish that.
* Node.JS itself should be installable via apt package (we'll have to do a custom package so that we get Node v10) * Node dependencies will be all 'npm install'ed into a node_modules submodule of the main repo for the application which we can deploy with the rest of the application code. ** It's worth noting that although this means that we'll still be pulling our dependencies from a separate source initially; what is currently in production will be in our git repos. We can also version lock inside our configuration.
~Matt Walker Wikimedia Foundation Fundraising Technology Team
On Wed, Nov 13, 2013 at 3:02 PM, Matthew Walker mwalker@wikimedia.orgwrote:
Hey,
For the new renderer backend for the Collections Extension we've come up with a tentative architecture that we would like operations buy in on. The living document is here [1]. It's worth saying explicitly that whatever setup we use must be able to handle the greater than 150k requests a day we serve using the old setup.
Basically we're looking at having
- 'render servers' run node.js
- doing job management in Redis
- rendering content using PhantomJS and/or Latex
- storing rendered files locally on the render servers (and streaming the
rendered results through MediaWiki -- this is how it's done now as well).
- having a garbage collector run routinely on the render servers to
cleanup old stale content
Post comments to the talk page please :)
[1 ]https://www.mediawiki.org/wiki/PDF_rendering/Architecture
~Matt Walker Wikimedia Foundation Fundraising Technology Team
On Wed, Nov 13, 2013 at 03:41:33PM -0800, Matthew Walker wrote:
- Node.JS itself should be installable via apt package (we'll have to
do a custom package so that we get Node v10)
I haven't looked at your document yet, but a quick note on that: I have nodejs 0.10 backported packages ready for about 10 days now.
We typically avoid running multiple versions of the same package across the infrastructure (and our apt repo isn't split like that, thankfully), so I'd like to upgrade the existing users to 0.10. These are parsoid, statsd, perhaps the not-production-yet limn, etherpad-lite and of these, parsoid is the one with the most impact.
As such, we've agreed with Gabriel -which needed node 0.10 for rashomon anyway- to test the new version under the parsoid RTT suite & subsequently in the Parsoid Labs instance, before we go and upgrade production. (The packages have been in parsoid.wmflabs.org's /root/ since). I haven't heard since but as they don't /need/ a new Node version right now, I guess this is low priority for them (and we, ops, don't care much either).
I think this would happen anyway before the PDF service would ever reach production, but I think we can prioritize it a bit more and make sure it will. Gabriel, what do you think?
Regards, Faidon
Faidon,
Fantastic! I didn't know we had an internal backport already. :)
Gabriel basically just told me to use v0.10 because that's what he was moving to for parsoid. So... v0.10!
~Matt Walker Wikimedia Foundation Fundraising Technology Team
On Wed, Nov 13, 2013 at 4:04 PM, Faidon Liambotis faidon@wikimedia.orgwrote:
On Wed, Nov 13, 2013 at 03:41:33PM -0800, Matthew Walker wrote:
- Node.JS itself should be installable via apt package (we'll have to do
a custom package so that we get Node v10)
I haven't looked at your document yet, but a quick note on that: I have nodejs 0.10 backported packages ready for about 10 days now.
We typically avoid running multiple versions of the same package across the infrastructure (and our apt repo isn't split like that, thankfully), so I'd like to upgrade the existing users to 0.10. These are parsoid, statsd, perhaps the not-production-yet limn, etherpad-lite and of these, parsoid is the one with the most impact.
As such, we've agreed with Gabriel -which needed node 0.10 for rashomon anyway- to test the new version under the parsoid RTT suite & subsequently in the Parsoid Labs instance, before we go and upgrade production. (The packages have been in parsoid.wmflabs.org's /root/ since). I haven't heard since but as they don't /need/ a new Node version right now, I guess this is low priority for them (and we, ops, don't care much either).
I think this would happen anyway before the PDF service would ever reach production, but I think we can prioritize it a bit more and make sure it will. Gabriel, what do you think?
Regards, Faidon
Yeah we've been running 0.10 in development for Parsoid for a while. So no problems expected... other than unpredictable load gremlins or some such. It sounds like gwicke's plan is to ramp up the load gradually to try to head that off. --scott On Nov 13, 2013 6:02 PM, "Matthew Walker" mwalker@wikimedia.org wrote:
Hey,
For the new renderer backend for the Collections Extension we've come up with a tentative architecture that we would like operations buy in on. The living document is here [1]. It's worth saying explicitly that whatever setup we use must be able to handle the greater than 150k requests a day we serve using the old setup.
Basically we're looking at having
- 'render servers' run node.js
- doing job management in Redis
- rendering content using PhantomJS and/or Latex
- storing rendered files locally on the render servers (and streaming the
rendered results through MediaWiki -- this is how it's done now as well).
- having a garbage collector run routinely on the render servers to
cleanup old stale content
Post comments to the talk page please :)
[1 ]https://www.mediawiki.org/wiki/PDF_rendering/Architecture
~Matt Walker Wikimedia Foundation Fundraising Technology Team
Matthew Walker mwalker@wikimedia.org wrote:
[1 ]https://www.mediawiki.org/wiki/PDF_rendering/Architecture
I think requirement number one is that Jimmy the casual MediaWiki user would be able to install his own renederer without replicating WMF infrastructure:
https://www.mediawiki.org/wiki/Talk:PDF_rendering/Architecture#Simple_set_up...
//Saper
Marcin Cieslak wrote:
Matthew Walker mwalker@wikimedia.org wrote:
[1] https://www.mediawiki.org/wiki/PDF_rendering/Architecture
I think requirement number one is that Jimmy the casual MediaWiki user would be able to install his own renederer without replicating WMF infrastructure:
Matthew replied on-wiki, but I'll add that there's a dream within the MediaWiki tech community to be able to simply do "apt-get mediawiki" or similar on a spun-up virtual machine and everything will quickly and easily be set up for you.
There's a contrasting view that MediaWiki should only serve as the platform for Wikimedia wikis (large, high-volume sites) and that it's overkill for any small wiki setup. This view also usually advocates not focusing on third-party support, naturally, which removes Jimmy the casual MediaWiki user from the equation.
Whether either of these ideas (and many more) should guide architectural design decisions is, of course, a matter of debate. :-)
MZMcBride
On 11/13/2013 08:18 PM, MZMcBride wrote:
Matthew replied on-wiki, but I'll add that there's a dream within the MediaWiki tech community to be able to simply do "apt-get mediawiki" or similar on a spun-up virtual machine and everything will quickly and easily be set up for you.
There's a contrasting view that MediaWiki should only serve as the platform for Wikimedia wikis (large, high-volume sites) and that it's overkill for any small wiki setup. This view also usually advocates not focusing on third-party support, naturally, which removes Jimmy the casual MediaWiki user from the equation.
Ha! Having good packaging so that you can just do "apt-get mediawiki" would actually eliminate some of this dichotomy.
Gabriel
And I'll add that there's another axis: gwicke (and others?) have been arguing for a broader "collection of services" architecture for mw. This would decouple some of the installability issues. Even if PDF rendering (say) was a huge monster, Jimmy MediaWiki might still be able to simply install the core of the system. Slow progress making PDF rendering "more friendly" wouldn't need to hamper all the Jane MediaWikis who don't need that feature.
These issues cross-couple. Making a really super-easy "giant VM blob" that contained an entire complicated MediaWiki setup with all bells and whistles might as a side-effect make it less pressing to decouple the services and simplify the installation -- so long as the giant blob worked, no one needs to know what darkness lay beneath the hood. (Is that a good or a bad thing?) Conversely, making 'apt-get install mediawiki mediawiki-pdf' Just Work would make it less relevant whether 'mediawiki-pdf' was a separate service or a tightly-coupled mediawiki extension.
In practice, what is needed most are people to actually work on making the process friendly, one way or another. (I've done my part by aggressively patching extension READMEs as I come across them to keep them up to date and accurate.) --scott
On Thu, Nov 14, 2013 at 8:13 AM, C. Scott Ananian cananian@wikimedia.orgwrote:
And I'll add that there's another axis: gwicke (and others?) have been arguing for a broader "collection of services" architecture for mw. This would decouple some of the installability issues. Even if PDF rendering (say) was a huge monster, Jimmy MediaWiki might still be able to simply install the core of the system. Slow progress making PDF rendering "more friendly" wouldn't need to hamper all the Jane MediaWikis who don't need that feature.
Definitely "and others". Apart from decoupling instability issues it also breaks the application into separately maintainable applications that can have teams of people working on them separately. The only thing needed to ensure compatibility with other teams is a stable API, and that's what API versioning is for. Having multiple services doesn't complicate things much, unless you're running on a shared host.
- Ryan
Hoi, The current PDF support is broken. It does not support all the languages we support. I do not see that this an explicit requirement. Thanks, GerardM
On 14 November 2013 00:02, Matthew Walker mwalker@wikimedia.org wrote:
Hey,
For the new renderer backend for the Collections Extension we've come up with a tentative architecture that we would like operations buy in on. The living document is here [1]. It's worth saying explicitly that whatever setup we use must be able to handle the greater than 150k requests a day we serve using the old setup.
Basically we're looking at having
- 'render servers' run node.js
- doing job management in Redis
- rendering content using PhantomJS and/or Latex
- storing rendered files locally on the render servers (and streaming the
rendered results through MediaWiki -- this is how it's done now as well).
- having a garbage collector run routinely on the render servers to cleanup
old stale content
Post comments to the talk page please :)
[1 ]https://www.mediawiki.org/wiki/PDF_rendering/Architecture
~Matt Walker Wikimedia Foundation Fundraising Technology Team _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Nov 17, 2013 9:17 AM, "Gerard Meijssen" gerard.meijssen@gmail.com wrote:
The current PDF support is broken. It does not support all the languages
we
support. I do not see that this an explicit requirement.
That may effect dependencies (fonts/libs/etc.) but otherwise I think is irrelevant to this thread.
Please comment about this onwiki.
-Jeremy
The objective is to get buy in and comments on what is proposed.
If it was as simple as adding a few fonts and other dependecies, would the previous iteraton of the software have remained broken?
Sadly, it is a wee bit more complicated. Thanks, GerardM Op 17 nov. 2013 15:30 schreef "Jeremy Baron" jeremy@tuxmachine.com:
On Nov 17, 2013 9:17 AM, "Gerard Meijssen" gerard.meijssen@gmail.com wrote:
The current PDF support is broken. It does not support all the languages
we
support. I do not see that this an explicit requirement.
That may effect dependencies (fonts/libs/etc.) but otherwise I think is irrelevant to this thread.
Please comment about this onwiki.
-Jeremy _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org