Hi,
I'm grabbing this opportunity to bring up 3 bugs related to mwlib that
deserve a larger discussion and should perhaps be implemented
differently in the new version.
1.
https://bugzilla.wikimedia.org/show_bug.cgi?id=56560 - PDF creation
tool considers IPv6 addresses as users, not anonymous.
I've pushed a patched for this and it was merged; however, the
detection was based on regex and, as a quick google search will tell
you, it's not so obvious to do a regex to cover all IPv6 cases.
Perhaps the information anon user/logged in user might be sent from
MW.
2.
https://bugzilla.wikimedia.org/show_bug.cgi?id=56219 - PDF creation
tool excludes contributors with a "bot" substring in their username
I've also pushed a pull request for this one, but it was rejected
based on the en.wp policy that prevents bot-like usernames for humans.
The problem is more complex though:
a. Should bots be credited for their edits? While most of them do
simple tasks, we have recently seen an increase in bot-created
content. On ro.wp we even have a few lists only edited by robots.
b. If the robots should _not_ be credited, how do we detect them?
Ideally, there should be an automatical way to do so, but according to
http://www.mediawiki.org/wiki/Bots, it only works for recent changes.
Less ideally, only users with "bot" at the end should be removed, in
order to keep users like
https://ro.wikipedia.org/wiki/Utilizator:Vitalie_Ciubotaru (which is
not a robot, but has "bot" in the name) in the contributor list.
3.
https://bugzilla.wikimedia.org/show_bug.cgi?id=2994 - Automatically
generated count and list of contributors to an article (authorship
tracking)
This is an old enhancement request, revived by me last month in a
wikimedia-l thread:
http://lists.wikimedia.org/pipermail/wikimedia-l/2013-October/128575.html
. The idea is to decide if and how to credit:
a. vandals
b. reverters
c. contributors which had their valid contributions rephrased or
replaced from the article.
d. contributors with valid contributions but invalid names
I hope the people working on this feature will take the time to
consider these issues and come up with solutions for them.
Thanks,
Strainu
2013/11/13 Erik Moeller <erik(a)wikimedia.org>rg>:
Hi folks,
for a long time we've relied on the mwlib libraries by PediaPress to
generate PDFs on Wikimedia sites. These have served us well (we
generate >200K PDFs/day), but they architecturally pre-date a lot of
important developments in MediaWiki, and actually re-implement the
MediaWiki parser (!) in Python. The occasion of moving the entire PDF
service to a new data-center has given us reason to re-think the
architecture and come up with a minimally viable alternative that we
can support long term.
Most likely, we'll end up using Parsoid's HTML5 output, transform it
to add required bits like licensing info and prettify it, and then
render it to PDF via phantomjs, but we're still looking at various
rendering options.
Thanks to Matt Walker, C. Scott Ananian, Max Semenik, Brad Jorsch and
Jeff Green for joining the effort, and thanks to the PediaPress folks
for giving background as needed. Ideally we'd like to continue to
support printed book generation via PediaPress' web service, while
completely replacing the rendering tech stack on the WMF side of
things (still using the Collection extension to manage books). We may
need to deprecate some output formats - more on that as we go.
We've got the collection-alt-renderer project set up on Labs (thanks
Andrew) and can hopefully get a plan to our ops team soon as to how
the new setup could work.
If you want to peek - work channel is #mediawiki-pdfhack on FreeNode.
Live notes here:
http://etherpad.wikimedia.org/p/pdfhack
Stuff will be consolidated here:
https://www.mediawiki.org/wiki/PDF_rendering
Some early experiments with different rendering strategies here:
https://github.com/cscott/pdf-research
Some improvements to Collection extension underway:
https://gerrit.wikimedia.org/r/#/q/status:open+project:mediawiki/extensions…
More soon,
Erik
--
Erik Möller
VP of Engineering and Product Development, Wikimedia Foundation
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l