Hello,
as I understand there is currently ongoing development to create a new renderer for PDF versions of wiki pages.
Development is ongoing since August 2018 according to
https://www.mediawiki.org/wiki/Reading/Web/PDF_Functionality
as I also understand nothing has been deployed on any Wiki yet.
as I also understand the new rendered is based on mwlib.
as I also understand mwlib does not work with Python 3 according to
https://mwlib.readthedocs.io/en/latest/installation.html
It will /not/ work with python versions >= 3 or < 2.6.
as I also understand Python 2 will not receive any security updates from 1st January 2020 according to
https://www.python.org/dev/peps/pep-0373/#update
Being the last of the 2.x series, 2.7 will have an extended period of maintenance. Specifically, 2.7 will receive bugfix support until January 1, 2020. After the last release, 2.7 will receive no support.
as I understand concluding from the above the new renderer will be decommissioned on 1st January 2020
which I don't understand as you will certainly understand.
Yours Dirk
Hi Dirk,
I suggest asking this question on the specific talkpage there, instead. (Where I see you're already actively engaged.)
This mailing list is for disseminating /known/ information across the Wikimedia languages/projects, and is not for asking technical questions - I.e. the people who know the answers to your specific questions are not likely to be subscribed here.
Thanks,
On Fri, Mar 15, 2019 at 11:50 AM Dirk Hünniger via Wikitech-ambassadors < wikitech-ambassadors@lists.wikimedia.org> wrote:
Hello,
as I understand there is currently ongoing development to create a new renderer for PDF versions of wiki pages.
Development is ongoing since August 2018 according to
https://www.mediawiki.org/wiki/Reading/Web/PDF_Functionality
as I also understand nothing has been deployed on any Wiki yet.
as I also understand the new rendered is based on mwlib.
as I also understand mwlib does not work with Python 3 according to
https://mwlib.readthedocs.io/en/latest/installation.html
It will *not* work with python versions >= 3 or < 2.6.
as I also understand Python 2 will not receive any security updates from 1st January 2020 according to
https://www.python.org/dev/peps/pep-0373/#update
Being the last of the 2.x series, 2.7 will have an extended period of maintenance. Specifically, 2.7 will receive bugfix support until January 1, 2020. After the last release, 2.7 will receive no support.
as I understand concluding from the above the new renderer will be decommissioned on 1st January 2020
which I don't understand as you will certainly understand.
Yours Dirk
Wikitech-ambassadors mailing list Wikitech-ambassadors@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-ambassadors
I think the new PDF renderer is going to use Parsoid and some web browser engine for rendering pages, rather than mwlib. But I don't remember where I read this; maybe I read it about some other planned PDF renderer (they seem to come and go), since the page doesn't mention Parsoid (it doesn't mention mwlib either, though).
No the parsiod based version was mw-ocg-latexer which is already decommissioned. The web browser based version is the currently deployed one, which is called electron-render-service. It has been decided to decommission it. And it has decommissioned itself of some wikis already. My question was about the replacement for that, which is currently under development. I just want to make sure we will not have to decommission it again a in few month. In particular since we have not yet deployed it.
On 3/15/19 10:11 PM, Bartosz Dziewoński wrote:
I think the new PDF renderer is going to use Parsoid and some web browser engine for rendering pages, rather than mwlib. But I don't remember where I read this; maybe I read it about some other planned PDF renderer (they seem to come and go), since the page doesn't mention Parsoid (it doesn't mention mwlib either, though).
There are two different PDF renderer tools: the single page PDF renderer ("Download as PDF" link in the sidebar, via the ElectronPdfService extension [1]) and the article collection renderer ("Create a book" link, via the Collection extension [2]).
The single page renderer is today served by a tool called Electron [3]; it's in the process of being replaced by a new tool called Proton [4]. These are both node.js services which manage headless Chromium instances - which means the actual rendering engine will stay the same, so no user-facing changes are expected. The switch is for operational reasons: Electron crashes periodically, and has been written before the Chromium project provided an official library for remote-controlling headless browsers, so it didn't take advantage of that. Proton is currently getting mirrored traffic (ie. it is deployed in production for testing purposes, and both it and Electron render the PDF files requested by users, but only the one from Electron is returned). The collection renderer used to be served by a tool called OCG [5], which has been decommissioned about a year ago. It also functions as a frontend to PediaPress [6], who create print-on-demand books of Wikipedia content. They use mwlib internally (and are the main developers of it). I believe they plan to provide PDF download functionality eventually.
So in short, the WMF is not involved with mwlib development, you should probably contact PediaPress (see [7]) if you have questions about that. The PDF renderer project at the WMF is not related to mwlib and not affected by the Python 2 life cycle.
[1] https://www.mediawiki.org/wiki/Extension:ElectronPdfService [2] https://www.mediawiki.org/wiki/Extension:Collection [3] https://www.mediawiki.org/wiki/Electron [4] https://www.mediawiki.org/wiki/Proton [5] https://www.mediawiki.org/wiki/Offline_content_generator [6] https://meta.wikimedia.org/wiki/Book_tool/Help/Books/Frequently_Asked_Questi... [7] https://pediapress.com/code/
Hi Tisza.
thanks a lot for your answer. A Chromium based solution is certainly one of the best you can get. Its cheap in computational resources and updates should be available for a long time. Sorry for creating unnecessary work for you. I just figured out from the following link that the new renderer was based on mwlib and reportlab. But that dates back to April 2018 and was last updated in August 2018 and obviously this information is outdated now.
https://www.mediawiki.org/wiki/Reading/Web/PDF_Functionality
also these two pages seem to contain the same outdated information.
https://www.reportlab.com/opensource/
https://www.reportlab.com/casestudies/wikipedia/
Yours Dirk
On 3/17/19 8:14 PM, Tisza Gergő wrote:
There are two different PDF renderer tools: the single page PDF renderer ("Download as PDF" link in the sidebar, via the ElectronPdfService extension [1]) and the article collection renderer ("Create a book" link, via the Collection extension [2]).
The single page renderer is today served by a tool called Electron [3]; it's in the process of being replaced by a new tool called Proton [4]. These are both node.js services which manage headless Chromium instances - which means the actual rendering engine will stay the same, so no user-facing changes are expected. The switch is for operational reasons: Electron crashes periodically, and has been written before the Chromium project provided an official library for remote-controlling headless browsers, so it didn't take advantage of that. Proton is currently getting mirrored traffic (ie. it is deployed in production for testing purposes, and both it and Electron render the PDF files requested by users, but only the one from Electron is returned). The collection renderer used to be served by a tool called OCG [5], which has been decommissioned about a year ago. It also functions as a frontend to PediaPress [6], who create print-on-demand books of Wikipedia content. They use mwlib internally (and are the main developers of it). I believe they plan to provide PDF download functionality eventually.
So in short, the WMF is not involved with mwlib development, you should probably contact PediaPress (see [7]) if you have questions about that. The PDF renderer project at the WMF is not related to mwlib and not affected by the Python 2 life cycle.
[1] https://www.mediawiki.org/wiki/Extension:ElectronPdfService [2] https://www.mediawiki.org/wiki/Extension:Collection [3] https://www.mediawiki.org/wiki/Electron [4] https://www.mediawiki.org/wiki/Proton [5] https://www.mediawiki.org/wiki/Offline_content_generator [6] https://meta.wikimedia.org/wiki/Book_tool/Help/Books/Frequently_Asked_Questi... [7] https://pediapress.com/code/
On Sun, Mar 17, 2019 at 12:50 PM Dirk Hünniger dirk.hunniger@googlemail.com wrote:
A Chromium based solution is certainly one of the best you can get. Its cheap in computational resources and updates should be available for a long time.
I think computationally it's actually more expensive (OCG transformed the wikitext syntax tree into TeX, while Chromium does full HTML layouting). That's one of the reasons PDF rendering for collections is not avaible anymore; Chromium would just crash when trying to render a thousand-page book.
On the other hand, Chromium-rendered PDF will actually look the way it looks in the browser, without any maintenance effort needed (other than occasional tweaks to the print CSS), while with non-browser-based tools every template, extension and wikitext feature that had a visual component required dedicated handling, and common layout concepts like tables or multiple columns were extremely hard to get right.
I just figured out from the following link that the new renderer was based on mwlib and reportlab.
Neither of those are mentioned on the page though.
I think the WMF ran its own mwlib service a long time ago (2013-ish?) but it didn't work well and was replaced by OCG (which eventually proved unmaintainable due to the above issues and lack of resourcing).
On Sun, 17 Mar 2019 at 20:28, Tisza Gergő gtisza@gmail.com wrote:
I think the WMF ran its own mwlib service a long time ago (2013-ish?) but it didn't work well
I could be misremembering but wasn't that the thing that nobody knew how to reproduce the setup of and was one of the last things left in pmtpa?
On Sun, Mar 17, 2019 at 1:40 PM Alex Monk krenair@gmail.com wrote:
I could be misremembering but wasn't that the thing that nobody knew how to reproduce the setup of and was one of the last things left in pmtpa?
Yeah, and also on Ubuntu Hardy (which was EOL for over a year by then).
And fwiw afaik pediapress hasn't *really* supported mwlib for years now. The commit history at https://github.com/pediapress/mwlib/commits/master is pretty sparse. --scott, who wrote the ocg renderer
On Sun, Mar 17, 2019, 5:00 PM Tisza Gergő gtisza@gmail.com wrote:
On Sun, Mar 17, 2019 at 1:40 PM Alex Monk krenair@gmail.com wrote:
I could be misremembering but wasn't that the thing that nobody knew how to reproduce the setup of and was one of the last things left in pmtpa?
Yeah, and also on Ubuntu Hardy (which was EOL for over a year by then). _______________________________________________ Wikitech-ambassadors mailing list Wikitech-ambassadors@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-ambassadors
wikitech-ambassadors@lists.wikimedia.org