Wikisource-l May 2014

wikisource-l@lists.wikimedia.org

11 participants
7 discussions

Re: [Wikisource-l] Documenting ProofreadPage API

by Kishan Thobhani

Thank you for all the inputs. I have documented an abstract version here, http://www.mediawiki.org/wiki/Extension_talk:Proofread_Page#API_Documentati… At some places i have placed <Need more input pointer> so if anyone has any knowledge about same, please fill in it would be really great if we can document as much as we can with more perspective. Examples & information in this section right now is more or less referenced from http://wikisource.org/w/api.php You can shoot your views in here too. I’ll take care to cross reference it in the draft. I would like everyone to have a look at TBTs thoughts here, https://www.mediawiki.org/wiki/Extension_talk:Proofread_Page#w.r.t_IRC_chat TBT suggest we should specify format specification of how we format output. Currently prp just supports wikitext. If you anyone would like to expand on this point or give some view on TBT point that would be great. Over a brief conversation with TBT, he also suggested that we should serialise output in JSON format? Thank you TBT for this. @gaurav vadia > You can do a lot with ProofreadPage without any new APIs. For example, I wrote a Perl module to download an entire book from the English Wikisource as WikiText two years ago. At that time, I implemented it for a hypothetical “Index:Entire book.pdf” by: > > 1. Using prop=imageinfo to get the number of pages for “File:Entire book.djvu". > 2. Using prop=revisions to download the Wikitext for each individual page from “Page:Entire book.djvu/1” to “Page:Entire book.djvu/9999” (if the image had 9,999 pages). > > This will work for Wikisources that redirect “File:”, “Index:” and “Page:” into their local namespaces. I ignored the proofread status entirely, since all the pages I needed to download had already been transcribed, but I guess it might be helpful to have an API query that could return the proofread status for every page in an Index page. That’s the only idea I have for now! Do you think it would be possible to create a small example of same in terms of proofread hooks? It could be just API calls. So that we can mention it in example sections. @ thomas tanon > Feel free to start a page describing what the two API hooks do with a simple example as it's done in pages like [0]. It would be a nice basis for other people to share their use cases. Yes i have created an section on http://www.mediawiki.org/wiki/Extension_talk:Proofread_Page#API_Documentati… but it needs massive improvements. Im still not sure about parameters. But there is likely that there as an scope to improve on API and provide more flexibility if we start here. Thank you everyone.

9 years, 11 months

Zürich Hackathon report (Wikisource perspective)

by David Cuenca

This year in the Hackathon we were two Wikisource volunteers, Tpt and me, although I must say that the number of supporters is growing. I went there on Friday night, and left on Sunday morning, so initially I didn't expect to accomplish much other than to catch up with new developments and follow-up general standing issues. One of those issues was the RFC on associated namespaces [1], it needed more developers to comment on its general terms and on the database schema, and I am glad that it inched forward. It is important to get this solved because it was one of the main blockers of the GSoC last year for a customized book uploading interface in the Upload Wizard. It also blocks other important stuff relevant for all projects. During the conference Max Klein and Daniel Mietchen showcased me their Wikiproject to import Open Access papers from PubMed Central into Wikisource [2] using an automated tool (still under development). These imported papers later on can be cited in Wikipedia articles. I think it is an amazing concept which revives the Wikisource aspect of supporting Wikipedia references with current sources, and that might attract even more positive attention to our project. This fits perfectly with the strategy started last year of synchronizing bibliographic metadata through Wikidata, which of course will be more feasible once arbitrary item access is possible [3]. Daniel also has informed me that, regarding PDF import, Peter Murray-Rust has started a project to mine scientific literature. It will be interesting to take a closer look into their contentMine [4] and see if there are points of intersection. He will give a keynote during Wikimania. Matt Flaschen taught me with great patience how to set up Vagrant [5] and what you can do with it. It is basically a virtual machine with mediawiki installed and configured, so you have your own instance running in just a few minutes (well, in my little 2gb-ram laptop it took much longer because to run smoothly it needs about 8gb ram and a few cores). It is really wonderful to have your own development mediawiki so easily installed and accessible normally from the browser. Then there are the so called "roles" that install some extensions automatically [6], like "visualeditor" or "proofreadpage". I also got the opportunity to thank Nemo personally for helping me to learn how to use the totally user-unfriendly tool from the Internet Archive to upload images and convert them automatically into OCR'ed djvu files. Something important for our mission, which I hope the GSoC of this year will make easier. In the afternoon there was the presentation of the new Executive Director, Lila Tretikov [7]. She gave a short talk and spent most of her session answering diverse questions from the audience, the most important for us perhaps being "what about sister projects?" (thanks Cristian Consonni!). Her answer was in the lines of "there are projects more aligned with our movement vision than others, and we might want to support those". We will have to wait and see into which actions that statement will transform. I hope wikisourcerors can be thankful to the new ED. For now I can say that she transmits a positive attitude. >From his side Tpt was working on getting the "other projects side bar" deployed as a beta feature [8] and on the Guided Tours for Proofread Page extension. Amazing stuff. I really hope that his Wikimania scholarship gets approved! Cheers, Micru PS: mentioned people have been BCC'ed just for information, no action required from them [1] https://www.mediawiki.org/wiki/Requests_for_comment/Associated_namespaces [2] https://en.wikisource.org/wiki/Wikisource:WikiProject_Open_Access [3] https://bugzilla.wikimedia.org/show_bug.cgi?id=47930 [4] https://github.com/petermr/contentMine [5] https://www.mediawiki.org/wiki/MediaWiki-Vagrant [6] https://www.mediawiki.org/wiki/MediaWiki-Vagrant/Roles [7] http://blog.wikimedia.org/2014/05/01/wmf_announces_new_ed_lila_tretikov/ [8] https://www.mediawiki.org/wiki/Wikibase/Beta_Features/Other_projects_sidebar

9 years, 11 months

Documenting ProofreadPage API

by Kishan Thobhani

Hello everyone, I'm Kishan Thobhani(kishanio) a fairly new contributer to Wikimedia as well as Wikisource. I was redirected here by Sumana Harihareswara with a proposed task of documenting API for ProofreadPage extension (https://www.mediawiki.org/wiki/Extension:ProofreadPage) and later helping to improve same. At this point ProofreadPage (prp) API adds 2 hooks over API under action=query module:- 1.) Properties - prop=proofread ( This is to get Proofreading level of Page: pages ) 2.) Meta - meta=proofreadinfo ( Local Configuration Information ) In context, i would really appriciate if someone can share thier thoughts and help me compile notes to proceed further. Points could include:- 1.) Use-case of API. 2.) Existing components/projects/bots already using proofread API features. 3.) Anything else. We have a dedicated section to document any initial finding/notes. Feel free to edit and append your input here too. https://www.mediawiki.org/wiki/Extension_talk:Proofread_Page#API_Documentat… Thank you. Have a great weekend ahead.

9 years, 11 months

Preparing mass invitation for the Wikisource meetup

by David Cuenca

Hi all, I have started a page to sign-up for the Wikisource Meetup at Wikimania 2014 in London. If you are interested in coming please add your name so we can show that there is interest. https://wikimania2014.wikimedia.org/wiki/Wikisource_Meetup And in the next days I would like to send a MassMessage inviting the WsUG members and each language community. The draft to be modified/corrected/improved is here https://meta.wikimedia.org/wiki/Wikisource_Community_User_Group/Wikisource_… I think we still have time to apply for funds to bring selected members of each Wikisource language community to promote a better coordination, dialogue, and international projects. So if you are interested or you know someone we should bring, please get in touch asap. Ah, and during the Zurich Hackathon I met Tpt and he was working on getting the "Other projects sitebar" deployed as a beta feature on all sites. Basically it displays automatically the sitelinks to sister projects. I cannot wait to see this deployed :) http://lists.wikimedia.org/pipermail/wikidata-l/2014-April/003690.html https://fr.wikisource.org/wiki/Appel_du_18_juin https://www.mediawiki.org/wiki/Wikibase/Beta_Features/Other_projects_sidebar Cheers, Micru

9 years, 11 months

Thanks for installing jq and pdftk into Tool Labs

by Alex Brollo

As you perhaps know, two interesting routines: jq and pdftk have been installed into Tool Labs (thanks Tim!), I just tested pdftk and it runs perfectly. * jq is a json data parser, I found it into the doc of ia (the command line version of internetarchive python module: https://pypi.python.org/pypi/internetarchive); * pdftk if a old but effective tool to manipulate pdf files, I need it to split & merge pages of big pdf files while uploading them from Opal into opallibriantichi new "wikisource oriented" collection of Internet archive. If you are interested about ancient books, take a look: most of them are Italian books, but there are too Latin, French, English ancient books, sometimes with parallelel translation. Alex brollo

9 years, 11 months

IA opallibriantichi new collection

by Alex Brollo

Happy to let you know that there's a new IA collection from an Italian library (Opal Libri Antichi), and that uploads have been done by a wikisourcian, with any possible care about metadata, so that Tpt IA-Commons uploader can be easily used. Most items have been uploaded as high-resolution TIFF zips, so that a better resolution djvu can be rather simply obtained when needed. Items are so far mainly ancient Italian books, but there are too many French and Latin books. In days/hours the goal of first 1000 uploads will be met. Here the link for colection's list: https://archive.org/search.php?query=collection%3Aopallibriantichi&sort=-pu… Alex

9 years, 11 months

Wikisource core feature is broken for more than 50 hours

by Luiz Augusto

So sorry for the cross-posting and for this shout for help that some can read as a forum shopping, but this is really annoying. https://bugzilla.wikimedia.org/show_bug.cgi?id=64622 In short, we on all Wikisource wikis are unable to start working on new pages from digitized books (or for newly overwritten uploads) without poking the server N times in order to generate a single resized image. Please fix it ASAP. Please see also #c18 on the mentioned bug.

9 years, 11 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Wikisource-l May 2014