Thank you for all the inputs.
I have documented an abstract version here,
http://www.mediawiki.org/wiki/Extension_talk:Proofread_Page#API_Documentati…
At some places i have placed <Need more input pointer> so if anyone has any knowledge about same, please fill in it would be really great if we can document as much as we can with more perspective. Examples & information in this section right now is more or less referenced from http://wikisource.org/w/api.php
You can shoot your views in here too. I’ll take care to cross reference it in the draft.
I would like everyone to have a look at TBTs thoughts here,
https://www.mediawiki.org/wiki/Extension_talk:Proofread_Page#w.r.t_IRC_chat
TBT suggest we should specify format specification of how we format output. Currently prp just supports wikitext. If you anyone would like to expand on this point or give some view on TBT point that would be great. Over a brief conversation with TBT, he also suggested that we should serialise output in JSON format? Thank you TBT for this.
@gaurav vadia
> You can do a lot with ProofreadPage without any new APIs. For example, I wrote a Perl module to download an entire book from the English Wikisource as WikiText two years ago. At that time, I implemented it for a hypothetical “Index:Entire book.pdf” by:
>
> 1. Using prop=imageinfo to get the number of pages for “File:Entire book.djvu".
> 2. Using prop=revisions to download the Wikitext for each individual page from “Page:Entire book.djvu/1” to “Page:Entire book.djvu/9999” (if the image had 9,999 pages).
>
> This will work for Wikisources that redirect “File:”, “Index:” and “Page:” into their local namespaces. I ignored the proofread status entirely, since all the pages I needed to download had already been transcribed, but I guess it might be helpful to have an API query that could return the proofread status for every page in an Index page. That’s the only idea I have for now!
Do you think it would be possible to create a small example of same in terms of proofread hooks? It could be just API calls. So that we can mention it in example sections.
@ thomas tanon
> Feel free to start a page describing what the two API hooks do with a simple example as it's done in pages like [0]. It would be a nice basis for other people to share their use cases.
Yes i have created an section on http://www.mediawiki.org/wiki/Extension_talk:Proofread_Page#API_Documentati…
but it needs massive improvements. Im still not sure about parameters. But there is likely that there as an scope to improve on API and provide more flexibility if we start here.
Thank you everyone.
This year in the Hackathon we were two Wikisource volunteers, Tpt and me,
although I must say that the number of supporters is growing. I went there
on Friday night, and left on Sunday morning, so initially I didn't expect
to accomplish much other than to catch up with new developments and
follow-up general standing issues.
One of those issues was the RFC on associated namespaces [1], it needed
more developers to comment on its general terms and on the database schema,
and I am glad that it inched forward. It is important to get this solved
because it was one of the main blockers of the GSoC last year for a
customized book uploading interface in the Upload Wizard. It also blocks
other important stuff relevant for all projects.
During the conference Max Klein and Daniel Mietchen showcased me their
Wikiproject to import Open Access papers from PubMed Central into
Wikisource [2] using an automated tool (still under development). These
imported papers later on can be cited in Wikipedia articles. I think it is
an amazing concept which revives the Wikisource aspect of supporting
Wikipedia references with current sources, and that might attract even more
positive attention to our project. This fits perfectly with the strategy
started last year of synchronizing bibliographic metadata through Wikidata,
which of course will be more feasible once arbitrary item access is
possible [3]. Daniel also has informed me that, regarding PDF import, Peter
Murray-Rust has started a project to mine scientific literature. It will be
interesting to take a closer look into their contentMine [4] and see if
there are points of intersection. He will give a keynote during Wikimania.
Matt Flaschen taught me with great patience how to set up Vagrant [5] and
what you can do with it. It is basically a virtual machine with mediawiki
installed and configured, so you have your own instance running in just a
few minutes (well, in my little 2gb-ram laptop it took much longer because
to run smoothly it needs about 8gb ram and a few cores). It is really
wonderful to have your own development mediawiki so easily installed and
accessible normally from the browser. Then there are the so called "roles"
that install some extensions automatically [6], like "visualeditor" or
"proofreadpage".
I also got the opportunity to thank Nemo personally for helping me to learn
how to use the totally user-unfriendly tool from the Internet Archive to
upload images and convert them automatically into OCR'ed djvu files.
Something important for our mission, which I hope the GSoC of this year
will make easier.
In the afternoon there was the presentation of the new Executive Director,
Lila Tretikov [7]. She gave a short talk and spent most of her session
answering diverse questions from the audience, the most important for us
perhaps being "what about sister projects?" (thanks Cristian Consonni!).
Her answer was in the lines of "there are projects more aligned with our
movement vision than others, and we might want to support those". We will
have to wait and see into which actions that statement will transform. I
hope wikisourcerors can be thankful to the new ED. For now I can say that
she transmits a positive attitude.
>From his side Tpt was working on getting the "other projects side bar"
deployed as a beta feature [8] and on the Guided Tours for Proofread Page
extension. Amazing stuff. I really hope that his Wikimania scholarship gets
approved!
Cheers,
Micru
PS: mentioned people have been BCC'ed just for information, no action
required from them
[1]
https://www.mediawiki.org/wiki/Requests_for_comment/Associated_namespaces
[2] https://en.wikisource.org/wiki/Wikisource:WikiProject_Open_Access
[3] https://bugzilla.wikimedia.org/show_bug.cgi?id=47930
[4] https://github.com/petermr/contentMine
[5] https://www.mediawiki.org/wiki/MediaWiki-Vagrant
[6] https://www.mediawiki.org/wiki/MediaWiki-Vagrant/Roles
[7] http://blog.wikimedia.org/2014/05/01/wmf_announces_new_ed_lila_tretikov/
[8]
https://www.mediawiki.org/wiki/Wikibase/Beta_Features/Other_projects_sidebar
Hello everyone,
I'm Kishan Thobhani(kishanio) a fairly new contributer to Wikimedia as well as Wikisource.
I was redirected here by Sumana Harihareswara with a proposed task of documenting API for ProofreadPage extension (https://www.mediawiki.org/wiki/Extension:ProofreadPage) and later helping to improve same.
At this point ProofreadPage (prp) API adds 2 hooks over API under action=query module:-
1.) Properties - prop=proofread ( This is to get Proofreading level of Page: pages )
2.) Meta - meta=proofreadinfo ( Local Configuration Information )
In context, i would really appriciate if someone can share thier thoughts and help me compile notes to proceed further.
Points could include:-
1.) Use-case of API.
2.) Existing components/projects/bots already using proofread API features.
3.) Anything else.
We have a dedicated section to document any initial finding/notes. Feel free to edit and append your input here too.
https://www.mediawiki.org/wiki/Extension_talk:Proofread_Page#API_Documentat…
Thank you. Have a great weekend ahead.
As you perhaps know, two interesting routines: jq and pdftk have been
installed into Tool Labs (thanks Tim!), I just tested pdftk and it runs
perfectly.
* jq is a json data parser, I found it into the doc of ia (the command line
version of internetarchive python module:
https://pypi.python.org/pypi/internetarchive);
* pdftk if a old but effective tool to manipulate pdf files, I need it to
split & merge pages of big pdf files while uploading them from Opal into
opallibriantichi new "wikisource oriented" collection of Internet archive.
If you are interested about ancient books, take a look: most of them are
Italian books, but there are too Latin, French, English ancient books,
sometimes with parallelel translation.
Alex brollo
Happy to let you know that there's a new IA collection from an Italian
library (Opal Libri Antichi), and that uploads have been done by a
wikisourcian, with any possible care about metadata, so that Tpt IA-Commons
uploader can be easily used.
Most items have been uploaded as high-resolution TIFF zips, so that a
better resolution djvu can be rather simply obtained when needed.
Items are so far mainly ancient Italian books, but there are too many
French and Latin books.
In days/hours the goal of first 1000 uploads will be met.
Here the link for colection's list:
https://archive.org/search.php?query=collection%3Aopallibriantichi&sort=-pu…
Alex
So sorry for the cross-posting and for this shout for help that some can
read as a forum shopping, but this is really annoying.
https://bugzilla.wikimedia.org/show_bug.cgi?id=64622
In short, we on all Wikisource wikis are unable to start working on new
pages from digitized books (or for newly overwritten uploads) without
poking the server N times in order to generate a single resized image.
Please fix it ASAP. Please see also #c18 on the mentioned bug.