All,
We've just finished our second sprint on the new PDF renderer. A significant chunk of renderer development time this cycle was on non latin script support, as well as puppetization and packaging for deployment. We have a work in progress pipeline up and running in labs which I encourage everyone to go try and break. You can use the following featured articles just to see what our current output is: * http://ocg-collection-alpha.wmflabs.org/index.php/Alexis_Bachelot * http://ocg-collection-alpha.wmflabs.org/index.php/Atlantis:_The_Lost_Empire
Some other articles imported on that test wiki: * http://ur1.ca/gg0bw
Please note that some of these will fail due to known issues noted below.
You can render any page in the new renderer by clicking the sidebar link "Download as WMF PDF"; if you "Download as PDF" you'll be using the old renderer (useful for comparison.) Additionally, you can create full books via Special:Book -- our renderer is "RDF to Latex (PDF)" and the old renderer is "e-book (PDF)". You can also try out the "RDF to Text (TXT)" renderer, but that's not on the critical path. As of right now we do not have a bugzilla project entry so reply to this email, or email me directly -- we'll need one of: the name of the page, the name of the collection, or the collection_id parameter from the URL to debug.
There are some code bits that we know are still missing that we will have to address in the coming weeks or in another sprint. * Attribution for images and text. The APIs are done, but we still need to massage that information into the document. * Message translation -- right now all internal messages are in English which is not so helpful to non English speakers. * Things using the <cite> tag and the Cite extension are not currently supported (meaning you won't get nice references.) * Tables may not render at all, or may break the renderer. * Caching needs to be greatly improved.
Looking longer term into deployment on wiki, my plans right now are to get this into beta labs for general testing and connect test.wikipedia.org up to our QA hardware for load testing. The major blocker there is acceptance of the Node.JS 0.10, and TexLive 2012 packages into reprap, our internal aptitude package source. This is not quite as easy as it sounds, we already use TexLive 2009 in production for the Math extension and we must apply thorough tests to ensure we do not introduce any regressions when we update to the 2012 package. I'm not sure what actual dates for those migrations / testing will be because it greatly depends on when Ops has time. In the meantime, our existing PDF cluster based on mwlib will continue to serve our offline needs. Once our solution is deployed and tested, mwlib (pdf[1-3]) will be retired here at the WMF and print on demand services will be provided directly by PediaPress servers.
For the technically curious; we're approximately following the parsoid deployment model -- using trebuchet to push out a source repository (services/ocg-collection) that has the configuration and node dependencies built on tin along with git submodules containing the actual service code.
It may not look like it on the surface, but we've come a long way and it wouldn't have been possible without the (probably exasperated) help from Jeff Green, Faidon, and Ori. Also big thanks to Brad and Max for their work, and Gabriel for some head thunking. C. Scott and I are not quite off the hook yet, as indicated by the list above, but hopefully soon enough we'll be enjoying the cake and cookies from another new product launch. (And yes, even if you're remote if I promised you cookies as bribes I'll ship them to you :p)
~Matt Walker
Hoi, I have a few questions
- do you support other scripts used by languages like Malayalam (ml), Persian (fa), Chinese (zh) Russian (ru) ?? - when you do, do you have examples for these languages ? - are the messages not localised or are they also not internationalised ? - are support for other scripts and proper internationalisation and localisation blockers for deployment ?
Thanks, GerardM
On 18 January 2014 03:42, Matthew Walker mwalker@wikimedia.org wrote:
All,
We've just finished our second sprint on the new PDF renderer. A significant chunk of renderer development time this cycle was on non latin script support, as well as puppetization and packaging for deployment. We have a work in progress pipeline up and running in labs which I encourage everyone to go try and break. You can use the following featured articles just to see what our current output is: * http://ocg-collection-alpha.wmflabs.org/index.php/Alexis_Bachelot * http://ocg-collection-alpha.wmflabs.org/index.php/Atlantis:_The_Lost_Empire
Some other articles imported on that test wiki: * http://ur1.ca/gg0bw
Please note that some of these will fail due to known issues noted below.
You can render any page in the new renderer by clicking the sidebar link "Download as WMF PDF"; if you "Download as PDF" you'll be using the old renderer (useful for comparison.) Additionally, you can create full books via Special:Book -- our renderer is "RDF to Latex (PDF)" and the old renderer is "e-book (PDF)". You can also try out the "RDF to Text (TXT)" renderer, but that's not on the critical path. As of right now we do not have a bugzilla project entry so reply to this email, or email me directly -- we'll need one of: the name of the page, the name of the collection, or the collection_id parameter from the URL to debug.
There are some code bits that we know are still missing that we will have to address in the coming weeks or in another sprint. * Attribution for images and text. The APIs are done, but we still need to massage that information into the document. * Message translation -- right now all internal messages are in English which is not so helpful to non English speakers. * Things using the <cite> tag and the Cite extension are not currently supported (meaning you won't get nice references.) * Tables may not render at all, or may break the renderer. * Caching needs to be greatly improved.
Looking longer term into deployment on wiki, my plans right now are to get this into beta labs for general testing and connect test.wikipedia.org up to our QA hardware for load testing. The major blocker there is acceptance of the Node.JS 0.10, and TexLive 2012 packages into reprap, our internal aptitude package source. This is not quite as easy as it sounds, we already use TexLive 2009 in production for the Math extension and we must apply thorough tests to ensure we do not introduce any regressions when we update to the 2012 package. I'm not sure what actual dates for those migrations / testing will be because it greatly depends on when Ops has time. In the meantime, our existing PDF cluster based on mwlib will continue to serve our offline needs. Once our solution is deployed and tested, mwlib (pdf[1-3]) will be retired here at the WMF and print on demand services will be provided directly by PediaPress servers.
For the technically curious; we're approximately following the parsoid deployment model -- using trebuchet to push out a source repository (services/ocg-collection) that has the configuration and node dependencies built on tin along with git submodules containing the actual service code.
It may not look like it on the surface, but we've come a long way and it wouldn't have been possible without the (probably exasperated) help from Jeff Green, Faidon, and Ori. Also big thanks to Brad and Max for their work, and Gabriel for some head thunking. C. Scott and I are not quite off the hook yet, as indicated by the list above, but hopefully soon enough we'll be enjoying the cake and cookies from another new product launch. (And yes, even if you're remote if I promised you cookies as bribes I'll ship them to you :p)
~Matt Walker _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Gerard,
On Sat, Jan 18, 2014 at 1:38 AM, Gerard Meijssen gerard.meijssen@gmail.comwrote:
- do you support other scripts used by languages like Malayalam (ml),
Persian (fa), Chinese (zh) Russian (ru) ??
In the final product yes; I'm not entirely sure where we are with ml, and zh; but I've seen test renders in fa, and ru. It is a goal of our project to offer significantly better render support for all languages.
- when you do, do you have examples for these languages ?
If we don't already; please feel free to add them to my test instance on labs and report back. I know the zh and ru test pages I already have on wiki do render but the other language tests seem to fail at this time -- possibly due to pages that I imported. It would actually be fairly useful to have test pages with just the language content; and no extra templates / wiki features.
- are the messages not localised or are they also not internationalised ?
At this time, the internal status messages are not localized nor internationalized. I plan to add support for that; but it was not an initial focus because of the limited utility of these messages in the UI itself.
- are support for other scripts and proper internationalisation and
localisation blockers for deployment ?
We have no specific goals for script support except better or equal in parity to the current mwlib renderer. In this phase, and once we deploy to beta labs; we're going to be relying on the community to tell us where we need to improve. It's also likely that both render pipelines will continue to be offered for some time in parallel.
I do not consider localization of status messages in the backend renderer a blocker for the reason that a user does not need to understand the messages in order to continue to use the collection extension or renderer itself. It will merely be failing to report the in progress status of the render job. The failure and success notifications *are* localized so the final state is then something that any user can proceed from.
1. Can this be set up for testing locally? Where is the new software? I'm not sure that I see it in the master version of Collection in Gerrit.
2. Are the wikis with a non-English content language where this can be tested?
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
2014/1/18 Matthew Walker mwalker@wikimedia.org
All,
We've just finished our second sprint on the new PDF renderer. A significant chunk of renderer development time this cycle was on non latin script support, as well as puppetization and packaging for deployment. We have a work in progress pipeline up and running in labs which I encourage everyone to go try and break. You can use the following featured articles just to see what our current output is: * http://ocg-collection-alpha.wmflabs.org/index.php/Alexis_Bachelot * http://ocg-collection-alpha.wmflabs.org/index.php/Atlantis:_The_Lost_Empire
Some other articles imported on that test wiki: * http://ur1.ca/gg0bw
Please note that some of these will fail due to known issues noted below.
You can render any page in the new renderer by clicking the sidebar link "Download as WMF PDF"; if you "Download as PDF" you'll be using the old renderer (useful for comparison.) Additionally, you can create full books via Special:Book -- our renderer is "RDF to Latex (PDF)" and the old renderer is "e-book (PDF)". You can also try out the "RDF to Text (TXT)" renderer, but that's not on the critical path. As of right now we do not have a bugzilla project entry so reply to this email, or email me directly -- we'll need one of: the name of the page, the name of the collection, or the collection_id parameter from the URL to debug.
There are some code bits that we know are still missing that we will have to address in the coming weeks or in another sprint. * Attribution for images and text. The APIs are done, but we still need to massage that information into the document. * Message translation -- right now all internal messages are in English which is not so helpful to non English speakers. * Things using the <cite> tag and the Cite extension are not currently supported (meaning you won't get nice references.) * Tables may not render at all, or may break the renderer. * Caching needs to be greatly improved.
Looking longer term into deployment on wiki, my plans right now are to get this into beta labs for general testing and connect test.wikipedia.org up to our QA hardware for load testing. The major blocker there is acceptance of the Node.JS 0.10, and TexLive 2012 packages into reprap, our internal aptitude package source. This is not quite as easy as it sounds, we already use TexLive 2009 in production for the Math extension and we must apply thorough tests to ensure we do not introduce any regressions when we update to the 2012 package. I'm not sure what actual dates for those migrations / testing will be because it greatly depends on when Ops has time. In the meantime, our existing PDF cluster based on mwlib will continue to serve our offline needs. Once our solution is deployed and tested, mwlib (pdf[1-3]) will be retired here at the WMF and print on demand services will be provided directly by PediaPress servers.
For the technically curious; we're approximately following the parsoid deployment model -- using trebuchet to push out a source repository (services/ocg-collection) that has the configuration and node dependencies built on tin along with git submodules containing the actual service code.
It may not look like it on the surface, but we've come a long way and it wouldn't have been possible without the (probably exasperated) help from Jeff Green, Faidon, and Ori. Also big thanks to Brad and Max for their work, and Gabriel for some head thunking. C. Scott and I are not quite off the hook yet, as indicated by the list above, but hopefully soon enough we'll be enjoying the cake and cookies from another new product launch. (And yes, even if you're remote if I promised you cookies as bribes I'll ship them to you :p)
~Matt Walker _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
apologies: s/Are the/Are there/
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
2014/1/19 Amir E. Aharoni amir.aharoni@mail.huji.ac.il
- Can this be set up for testing locally? Where is the new software? I'm
not sure that I see it in the master version of Collection in Gerrit.
- Are the wikis with a non-English content language where this can be
tested?
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
2014/1/18 Matthew Walker mwalker@wikimedia.org
All,
We've just finished our second sprint on the new PDF renderer. A significant chunk of renderer development time this cycle was on non latin script support, as well as puppetization and packaging for deployment. We have a work in progress pipeline up and running in labs which I encourage everyone to go try and break. You can use the following featured articles just to see what our current output is: * http://ocg-collection-alpha.wmflabs.org/index.php/Alexis_Bachelot *
http://ocg-collection-alpha.wmflabs.org/index.php/Atlantis:_The_Lost_Empire
Some other articles imported on that test wiki: * http://ur1.ca/gg0bw
Please note that some of these will fail due to known issues noted below.
You can render any page in the new renderer by clicking the sidebar link "Download as WMF PDF"; if you "Download as PDF" you'll be using the old renderer (useful for comparison.) Additionally, you can create full books via Special:Book -- our renderer is "RDF to Latex (PDF)" and the old renderer is "e-book (PDF)". You can also try out the "RDF to Text (TXT)" renderer, but that's not on the critical path. As of right now we do not have a bugzilla project entry so reply to this email, or email me directly -- we'll need one of: the name of the page, the name of the collection, or the collection_id parameter from the URL to debug.
There are some code bits that we know are still missing that we will have to address in the coming weeks or in another sprint. * Attribution for images and text. The APIs are done, but we still need to massage that information into the document. * Message translation -- right now all internal messages are in English which is not so helpful to non English speakers. * Things using the <cite> tag and the Cite extension are not currently supported (meaning you won't get nice references.) * Tables may not render at all, or may break the renderer. * Caching needs to be greatly improved.
Looking longer term into deployment on wiki, my plans right now are to get this into beta labs for general testing and connect test.wikipedia.org up to our QA hardware for load testing. The major blocker there is acceptance of the Node.JS 0.10, and TexLive 2012 packages into reprap, our internal aptitude package source. This is not quite as easy as it sounds, we already use TexLive 2009 in production for the Math extension and we must apply thorough tests to ensure we do not introduce any regressions when we update to the 2012 package. I'm not sure what actual dates for those migrations / testing will be because it greatly depends on when Ops has time. In the meantime, our existing PDF cluster based on mwlib will continue to serve our offline needs. Once our solution is deployed and tested, mwlib (pdf[1-3]) will be retired here at the WMF and print on demand services will be provided directly by PediaPress servers.
For the technically curious; we're approximately following the parsoid deployment model -- using trebuchet to push out a source repository (services/ocg-collection) that has the configuration and node dependencies built on tin along with git submodules containing the actual service code.
It may not look like it on the surface, but we've come a long way and it wouldn't have been possible without the (probably exasperated) help from Jeff Green, Faidon, and Ori. Also big thanks to Brad and Max for their work, and Gabriel for some head thunking. C. Scott and I are not quite off the hook yet, as indicated by the list above, but hopefully soon enough we'll be enjoying the cake and cookies from another new product launch. (And yes, even if you're remote if I promised you cookies as bribes I'll ship them to you :p)
~Matt Walker _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Amir, Gerard: The easiest way to test locally at the moment is to use the standalone 'mw-ocg-bundler' and 'mw-ocg-latexer' node packages. There are good installation instructions in the READMEs, see:
https://npmjs.org/package/mw-ocg-bundler https://npmjs.org/package/mw-ocg-latexer
and let me know if I need to document anything better.
This will let you pull individual articles from an arbitrary wiki, and then typeset them with xelatex.
There is currently good support for quite a number of languages. My standard test case contains: http://ar.wikipedia.org/wiki/%D9%84%D9%8A%D9%88%D9%86%D9%8A%D9%84_%D9%85%D9%... http://ar.wikipedia.org/wiki/%D8%A8%D8%B4%D9%8A%D8%B1_%D8%A7%D9%84%D8%AB%D8%... http://ar.wikipedia.org/wiki/%D8%AD%D9%85%D8%B2%D8%A9_%D8%A8%D9%86_%D8%B9%D8... http://ar.wikipedia.org/wiki/%D8%A5%D8%B3%D8%B7%D9%86%D8%A8%D9%88%D9%84 http://ar.wikipedia.org/wiki/%D8%A7%D9%84%D8%AD%D8%B1%D8%A8_%D8%A7%D9%84%D8%... http://de.wikipedia.org/wiki/Papier http://en.wikipedia.org/wiki/Durian http://es.wikipedia.org/wiki/Latas_de_sopa_Campbell http://fa.wikipedia.org/wiki/%DA%A9%D8%B9%D8%A8%D9%87_%D8%B2%D8%B1%D8%AA%D8%... http://fr.wikipedia.org/wiki/Trachylepis_atlantica http://he.wikipedia.org/wiki/%D7%A1%D7%A4%D7%A8%D7%98%D7%94 http://hi.wikipedia.org/wiki/%E0%A4%B0%E0%A4%BE%E0%A4%AE%E0%A4%BE%E0%A4%AF%E... http://it.wikipedia.org/wiki/La_vita_%C3%A8_meravigliosa http://ja.wikipedia.org/wiki/%E7%86%8A%E9%87%8E%E4%B8%89%E5%B1%B1%E6%9C%AC%E... http://ja.wikipedia.org/wiki/%E9%87%91%E6%98%9F%E3%81%AE%E6%97%A5%E9%9D%A2%E... http://ko.wikipedia.org/wiki/%EC%A1%B0%ED%99%94%EC%A7%84%EB%8F%99%EC%9E%90 http://ml.wikipedia.org/wiki/%E0%B4%AE%E0%B4%B2%E0%B4%AF%E0%B4%BE%E0%B4%B3%E... http://pl.wikipedia.org/wiki/Efekt_potwierdzenia http://pt.wikipedia.org/wiki/Scaphyglottis http://ru.wikipedia.org/wiki/%D0%91%D0%B8%D1%82%D0%B2%D0%B0_%D0%BF%D1%80%D0%... http://simple.wikipedia.org/wiki/Taoism http://vi.wikipedia.org/wiki/V%E1%BB%87_tinh_t%E1%BB%B1_nhi%C3%AAn_c%E1%BB%A... http://zh.wikipedia.org/wiki/%E7%B4%8D%E7%B2%B9%E5%BE%B7%E5%9C%8B%E6%B5%B7%E...
and a few other English articles. That said, I don't read most of these languages, so I've mostly been trying to ensure that our output matches the HTML displayed by the wiki. It is quite possible I've chosen bad-looking fonts, or that there are other details that could be improved. (For example, the way that Vietnamese stacked accents was bad for a while; I've fixed that now.) Comments eagerly requested! --scott
ps. there are a number of minor issues with citations in RTL languages, even in our standard HTML rendering on the wikis; it appears that our citation templates should be more aggressive about adding <bdi> tags or lang attributes to ensure that citations of LTR sources in an RTL article are displayed as nicely as possible. If these fixes are made to the source, the latex output should inherit them.
Hi Matthew,
greate work, thank you for sharing. In my company we need an extension like this. A few months ago, I was not successful to find a solution accepting UTF-8 encoded Unicode characters greater than 0x7F inside URLs.
Here I couldn't find an article, which was not forwarded to another article. Therefore I just created the article [1] and tried if it is possible to render it as PDF.
Though this is "just" a sprint's result, there are still some bugs, but mainly rendering was possible to render the article. That, compared to my experiences, is a really good result.
Is it also possible to set this up behind a firewall?
[1] http://ocg-collection-alpha.wmflabs.org/index.php/Test_German_Umlauts_%C3%A4...
Cheers,
Marco
On 01/18/2014 03:42 AM, Matthew Walker wrote:
All,
We've just finished our second sprint on the new PDF renderer. A significant chunk of renderer development time this cycle was on non latin script support, as well as puppetization and packaging for deployment. We have a work in progress pipeline up and running in labs which I encourage everyone to go try and break. You can use the following featured articles just to see what our current output is: * http://ocg-collection-alpha.wmflabs.org/index.php/Alexis_Bachelot * http://ocg-collection-alpha.wmflabs.org/index.php/Atlantis:_The_Lost_Empire
Some other articles imported on that test wiki: * http://ur1.ca/gg0bw
Please note that some of these will fail due to known issues noted below.
You can render any page in the new renderer by clicking the sidebar link "Download as WMF PDF"; if you "Download as PDF" you'll be using the old renderer (useful for comparison.) Additionally, you can create full books via Special:Book -- our renderer is "RDF to Latex (PDF)" and the old renderer is "e-book (PDF)". You can also try out the "RDF to Text (TXT)" renderer, but that's not on the critical path. As of right now we do not have a bugzilla project entry so reply to this email, or email me directly -- we'll need one of: the name of the page, the name of the collection, or the collection_id parameter from the URL to debug.
There are some code bits that we know are still missing that we will have to address in the coming weeks or in another sprint. * Attribution for images and text. The APIs are done, but we still need to massage that information into the document. * Message translation -- right now all internal messages are in English which is not so helpful to non English speakers. * Things using the <cite> tag and the Cite extension are not currently supported (meaning you won't get nice references.) * Tables may not render at all, or may break the renderer. * Caching needs to be greatly improved.
Looking longer term into deployment on wiki, my plans right now are to get this into beta labs for general testing and connect test.wikipedia.org up to our QA hardware for load testing. The major blocker there is acceptance of the Node.JS 0.10, and TexLive 2012 packages into reprap, our internal aptitude package source. This is not quite as easy as it sounds, we already use TexLive 2009 in production for the Math extension and we must apply thorough tests to ensure we do not introduce any regressions when we update to the 2012 package. I'm not sure what actual dates for those migrations / testing will be because it greatly depends on when Ops has time. In the meantime, our existing PDF cluster based on mwlib will continue to serve our offline needs. Once our solution is deployed and tested, mwlib (pdf[1-3]) will be retired here at the WMF and print on demand services will be provided directly by PediaPress servers.
For the technically curious; we're approximately following the parsoid deployment model -- using trebuchet to push out a source repository (services/ocg-collection) that has the configuration and node dependencies built on tin along with git submodules containing the actual service code.
It may not look like it on the surface, but we've come a long way and it wouldn't have been possible without the (probably exasperated) help from Jeff Green, Faidon, and Ori. Also big thanks to Brad and Max for their work, and Gabriel for some head thunking. C. Scott and I are not quite off the hook yet, as indicated by the list above, but hopefully soon enough we'll be enjoying the cake and cookies from another new product launch. (And yes, even if you're remote if I promised you cookies as bribes I'll ship them to you :p)
~Matt Walker _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Marco,
Is it also possible to set this up behind a firewall?
Yes; with the caveat that your wiki must be running Parsoid. It is also theoretically possible to still use Print on Demand services behind a firewall as we can POST a zip bundle to them -- likely however you'd just disable that functionality and I'm not sure our new bundle format is entirely compatible with the old bundle format...
If you want to set this up locally; I can help with that if you jump on IRC #mediawiki-pdfhack on freenode. I'm mwalker.
~Matt Walker Wikimedia Foundation Fundraising Technology Team
I didn't look at the new renderer carefully, but I guess it's a Parsoid-based one. Hope that the language conversion syntax issue in PDF output can be resolved together with Parsoid in the future, which blocks the deployment of PDF output on zhwiki currently. See https://bugzilla.wikimedia.org/show_bug.cgi?id=34919 .
-Liangent
On Fri, Jan 24, 2014 at 2:38 AM, Matthew Walker mwalker@wikimedia.orgwrote:
Marco,
Is it also possible to set this up behind a firewall?
Yes; with the caveat that your wiki must be running Parsoid. It is also theoretically possible to still use Print on Demand services behind a firewall as we can POST a zip bundle to them -- likely however you'd just disable that functionality and I'm not sure our new bundle format is entirely compatible with the old bundle format...
If you want to set this up locally; I can help with that if you jump on IRC #mediawiki-pdfhack on freenode. I'm mwalker.
~Matt Walker Wikimedia Foundation Fundraising Technology Team _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Dear All,
Is it possible at the current moment to test the new PDF Renderer online for RTL languages? And is it possible to adjust the page layout? I see that the default is the two column layout.
Thanks,
Kind Regards, Aya Saif El-yazal Mahfouz
On Sat, Jan 25, 2014 at 9:56 AM, Liangent liangent@gmail.com wrote:
I didn't look at the new renderer carefully, but I guess it's a Parsoid-based one. Hope that the language conversion syntax issue in PDF output can be resolved together with Parsoid in the future, which blocks the deployment of PDF output on zhwiki currently. See https://bugzilla.wikimedia.org/show_bug.cgi?id=34919 .
-Liangent
On Fri, Jan 24, 2014 at 2:38 AM, Matthew Walker <mwalker@wikimedia.org
wrote:
Marco,
Is it also possible to set this up behind a firewall?
Yes; with the caveat that your wiki must be running Parsoid. It is also theoretically possible to still use Print on Demand services behind a firewall as we can POST a zip bundle to them -- likely however you'd just disable that functionality and I'm not sure our new bundle format is entirely compatible with the old bundle format...
If you want to set this up locally; I can help with that if you jump on
IRC
#mediawiki-pdfhack on freenode. I'm mwalker.
~Matt Walker Wikimedia Foundation Fundraising Technology Team _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hoi Liangent,
Does [1] this answer your question ? It is a page they use for testing. Thanks, Gerard
http://zh.wikipedia.org/wiki/%E7%B4%8D%E7%B2%B9%E5%BE%B7%E5%9C%8B%E6%B5%B7%E...
On 25 January 2014 08:56, Liangent liangent@gmail.com wrote:
I didn't look at the new renderer carefully, but I guess it's a Parsoid-based one. Hope that the language conversion syntax issue in PDF output can be resolved together with Parsoid in the future, which blocks the deployment of PDF output on zhwiki currently. See https://bugzilla.wikimedia.org/show_bug.cgi?id=34919 .
-Liangent
On Fri, Jan 24, 2014 at 2:38 AM, Matthew Walker <mwalker@wikimedia.org
wrote:
Marco,
Is it also possible to set this up behind a firewall?
Yes; with the caveat that your wiki must be running Parsoid. It is also theoretically possible to still use Print on Demand services behind a firewall as we can POST a zip bundle to them -- likely however you'd just disable that functionality and I'm not sure our new bundle format is entirely compatible with the old bundle format...
If you want to set this up locally; I can help with that if you jump on
IRC
#mediawiki-pdfhack on freenode. I'm mwalker.
~Matt Walker Wikimedia Foundation Fundraising Technology Team _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Sat, Jan 25, 2014 at 6:13 PM, Gerard Meijssen gerard.meijssen@gmail.comwrote:
Hoi Liangent,
Does [1] this answer your question ? It is a page they use for testing. Thanks, Gerard
http://zh.wikipedia.org/wiki/%E7%B4%8D%E7%B2%B9%E5%BE%B7%E5%9C%8B%E6%B5%B7%E...
It's mentioned as "a test case" but where's the output (expected and actual) of that article?
-Liangent
On 25 January 2014 08:56, Liangent liangent@gmail.com wrote:
I didn't look at the new renderer carefully, but I guess it's a Parsoid-based one. Hope that the language conversion syntax issue in PDF output can be resolved together with Parsoid in the future, which blocks the deployment of PDF output on zhwiki currently. See https://bugzilla.wikimedia.org/show_bug.cgi?id=34919 .
-Liangent
On Fri, Jan 24, 2014 at 2:38 AM, Matthew Walker <mwalker@wikimedia.org
wrote:
Marco,
Is it also possible to set this up behind a firewall?
Yes; with the caveat that your wiki must be running Parsoid. It is also theoretically possible to still use Print on Demand services behind a firewall as we can POST a zip bundle to them -- likely however you'd
just
disable that functionality and I'm not sure our new bundle format is entirely compatible with the old bundle format...
If you want to set this up locally; I can help with that if you jump on
IRC
#mediawiki-pdfhack on freenode. I'm mwalker.
~Matt Walker Wikimedia Foundation Fundraising Technology Team _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Jan 25, 2014 9:55 AM, "Liangent" liangent@gmail.com wrote:
It's mentioned as "a test case" but where's the output (expected and actual) of that article?
You may find some answers in the initial mail that started this thread.
-Jeremy
Yes, zhwiki is still an issue because of LanguageConverter. I will be fixing that issue in both Parsoid and the PDF renderer. (As soon as I fix some long-standing bugs in image handling for Parsoid/VE.)
It's a bit tough to test the renderer on-line at the moment, because you have to import your own non-English content into the test wiki. I recommend trying things out off-line if possible. I can also email/post sample articles if you like. RTL languages should be well-supported; I spent about a week getting the details of the bidirectional algorithm correct. (And of course we inherit nice ligatures, etc, for Arabic from the XeTeX engine.) --scott
Hello C.Scott,
Could you kindly try importing the following articles from the Arabic Wikipedia and then sending me the resultant pdf files?
https://ar.wikipedia.org/wiki/%D9%83%D8%A3%D8%B3_%D8%A7%D9%84%D8%B9%D8%A7%D9... https://ar.wikipedia.org/wiki/%D8%A5%D8%B3%D9%84%D8%A7%D9%85 https://ar.wikipedia.org/wiki/%D9%85%D9%83%D9%84%D8%A7%D8%B1%D9%8A%D9%86_%D9... https://ar.wikipedia.org/wiki/%D9%84%D9%8A%D9%81%D9%8Ahttps://ar.wikipedia.org/wiki/ليفي_أشكول _أشكول https://ar.wikipedia.org/wiki/ليفي_أشكول https://ar.wikipedia.org/wiki/%D8%AC%D8%A7%D9%85%D8%B9%D8%A9_%D8%A7%D9%84%D8...https://ar.wikipedia.org/wiki/ليفي_أشكول https://ar.wikipedia.org/wiki/%D8%AC%D9%85%D9%87%D9%88%D8%B1%D9%8A%D8%A9 أيرلندا
I might ask you in the future to add to your test cases some articles in Farsi and Hebrew too. If this will be a burden, then feel free to simply send me what you have already got. On my side, I will try to setup the extension in the near future.
Thank you for your efforts,
Kind Regards, Aya Saif El-yazal Mahfouz
On Sat, Jan 25, 2014 at 6:09 PM, C. Scott Ananian cananian@wikimedia.orgwrote:
Yes, zhwiki is still an issue because of LanguageConverter. I will be fixing that issue in both Parsoid and the PDF renderer. (As soon as I fix some long-standing bugs in image handling for Parsoid/VE.)
It's a bit tough to test the renderer on-line at the moment, because you have to import your own non-English content into the test wiki. I recommend trying things out off-line if possible. I can also email/post sample articles if you like. RTL languages should be well-supported; I spent about a week getting the details of the bidirectional algorithm correct. (And of course we inherit nice ligatures, etc, for Arabic from the XeTeX engine.) --scott
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hi,
On 2014-01-23 19:38, Matthew Walker wrote:
If you want to set this up locally; I can help with that if you jump on IRC #mediawiki-pdfhack on freenode. I'm mwalker.
Thank you a lot for helping me installing the stack. Although it is an early stage of the project, it is working quite well. We were looking for such a solution for months now. This is the first one doing the job we need.
Yesterday I showed the output to my employer. He was very delighted about the result. So, many commendations from our company.
Cheers,
Marco
On Sat, Jan 18, 2014 at 3:42 AM, Matthew Walker mwalker@wikimedia.orgwrote:
We've just finished our second sprint on the new PDF renderer.
A Google code-in student wrote some tests[1][2] for testing existing export to PDF functionality. I did not have the time to review the last few patch sets and merge them into master branch. Let me know if anybody is interested in writing more tests.
Željko -- 1: https://gerrit.wikimedia.org/r/#/c/98160 2: https://gerrit.wikimedia.org/r/#/c/105179
On 01/17/2014 09:42 PM, Matthew Walker wrote:
All,
We've just finished our second sprint on the new PDF renderer. A significant chunk of renderer development time this cycle was on non latin script support, as well as puppetization and packaging for deployment. We have a work in progress pipeline up and running in labs which I encourage everyone to go try and break. You can use the following featured articles just to see what our current output is: * http://ocg-collection-alpha.wmflabs.org/index.php/Alexis_Bachelot * http://ocg-collection-alpha.wmflabs.org/index.php/Atlantis:_The_Lost_Empire
Some other articles imported on that test wiki: * http://ur1.ca/gg0bw
Please note that some of these will fail due to known issues noted below.
You can render any page in the new renderer by clicking the sidebar link "Download as WMF PDF"; if you "Download as PDF" you'll be using the old renderer (useful for comparison.) Additionally, you can create full books via Special:Book -- our renderer is "RDF to Latex (PDF)" and the old renderer is "e-book (PDF)". You can also try out the "RDF to Text (TXT)" renderer, but that's not on the critical path. As of right now we do not have a bugzilla project entry so reply to this email, or email me directly -- we'll need one of: the name of the page, the name of the collection, or the collection_id parameter from the URL to debug.
There are some code bits that we know are still missing that we will have to address in the coming weeks or in another sprint. * Attribution for images and text. The APIs are done, but we still need to massage that information into the document. * Message translation -- right now all internal messages are in English which is not so helpful to non English speakers. * Things using the <cite> tag and the Cite extension are not currently supported (meaning you won't get nice references.) * Tables may not render at all, or may break the renderer. * Caching needs to be greatly improved.
Looking longer term into deployment on wiki, my plans right now are to get this into beta labs for general testing and connect test.wikipedia.org up to our QA hardware for load testing. The major blocker there is acceptance of the Node.JS 0.10, and TexLive 2012 packages into reprap, our internal aptitude package source. This is not quite as easy as it sounds, we already use TexLive 2009 in production for the Math extension and we must apply thorough tests to ensure we do not introduce any regressions when we update to the 2012 package. I'm not sure what actual dates for those migrations / testing will be because it greatly depends on when Ops has time. In the meantime, our existing PDF cluster based on mwlib will continue to serve our offline needs. Once our solution is deployed and tested, mwlib (pdf[1-3]) will be retired here at the WMF and print on demand services will be provided directly by PediaPress servers.
For the technically curious; we're approximately following the parsoid deployment model -- using trebuchet to push out a source repository (services/ocg-collection) that has the configuration and node dependencies built on tin along with git submodules containing the actual service code.
It may not look like it on the surface, but we've come a long way and it wouldn't have been possible without the (probably exasperated) help from Jeff Green, Faidon, and Ori. Also big thanks to Brad and Max for their work, and Gabriel for some head thunking. C. Scott and I are not quite off the hook yet, as indicated by the list above, but hopefully soon enough we'll be enjoying the cake and cookies from another new product launch. (And yes, even if you're remote if I promised you cookies as bribes I'll ship them to you :p)
~Matt Walker _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hey there! We just got a #mediawiki question about Collections and so I was wondering what we can tell third-party MediaWiki administrators about the new renderer work? Thanks!
wikitech-l@lists.wikimedia.org