Hello all,
We've just published the November 2016 Indic Wikisource statistics. After implementing Google OCR script to our all Indic Wikisource , they are growing rapidly.
Here is the few stats ans their top three rank...
As per Number of article 1. Sanskrit Wikisource ( 15445 pages) - supported by 0.05% scan pages. 2. Telugu Wikisource ( 11707 pages) - supported by 24.3% scan pages. 3. Kannada Wikisource ( 7864 pages) - supported by 0.99% scan pages.
As per Number of page Validation
1. Telugu Wikisource ( 18142 pages) 2. Tamil Wikisource ( 5167 pages) 3. Gujarati Wikisource ( 3729 pages)
As per Number of page Proofread
1. Telugu Wikisource ( 20213 pages) 2. Malayalam Wikisource ( 8065 pages) 3. Tamil Wikisource ( 7737 pages)
As per percentage supported by scan pages. 1. Bengali Wikisource (25.90%) 2. Telugu Wikisource ( 24.30%) 3. Gujarati Wikisource (17.51%)
I want to specially mention that there are no visible improvement at Marathi and Assamese Wikisource.
For Sanskrit and Kannada Wikisource, they need to exploring their work of proofreaded text towards scan page support.
Full Indic Wikisource stats here https://wikisource.org/wiki/Wikisource:Indic_Wikisource_Stats
Regards, Jayanta Nath Indic Wikisource Community
Thanks, Jayanta, is very important that you keep track of this progress. Have you talked with Sam Wilson about this?
There could be many ways in which the WMF can help you analyze this important moment of the Indic community, and it's also very important to them (and their donors) to understand how do they have an impact.
Google OCR is a "simple thing", but we ("Western wikisources) learned very late that OCR was not available in many Indic languages. I have shown many people in the WMF the stats about Telugu Wikisource (the peak in the chart) and it's crucial that many other people inside WMF is aware of that. The Indic Wikisource community can show that there are very "cheap" things the WMF can do to help their communities thrive. The Indic Wikisource community thus has a big responsability ;-)
Aubrey
On Wed, Nov 2, 2016 at 7:02 PM, Jayanta Nath jayantanth@gmail.com wrote:
Hello all,
We've just published the November 2016 Indic Wikisource statistics. After implementing Google OCR script to our all Indic Wikisource , they are growing rapidly.
Here is the few stats ans their top three rank...
As per Number of article
- Sanskrit Wikisource ( 15445 pages) - supported by 0.05% scan pages.
- Telugu Wikisource ( 11707 pages) - supported by 24.3% scan pages.
- Kannada Wikisource ( 7864 pages) - supported by 0.99% scan pages.
As per Number of page Validation
- Telugu Wikisource ( 18142 pages)
- Tamil Wikisource ( 5167 pages)
- Gujarati Wikisource ( 3729 pages)
As per Number of page Proofread
- Telugu Wikisource ( 20213 pages)
- Malayalam Wikisource ( 8065 pages)
- Tamil Wikisource ( 7737 pages)
As per percentage supported by scan pages.
- Bengali Wikisource (25.90%)
- Telugu Wikisource ( 24.30%)
- Gujarati Wikisource (17.51%)
I want to specially mention that there are no visible improvement at Marathi and Assamese Wikisource.
For Sanskrit and Kannada Wikisource, they need to exploring their work of proofreaded text towards scan page support.
Full Indic Wikisource stats here https://wikisource.org/wiki/Wikisource:Indic_Wikisource_Stats
Regards, Jayanta Nath Indic Wikisource Community
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Yes, I agree! :-) There're so many smallish things that I reckon can go a long way towards making Wikisources bigger and better.
And it keeps surprising me how many people within the Wikimedia movement aren't familiar with how Wikisource works — and are amazed when they're shown! :-) It really does seem that we're not very good at advertising ourselves. (Well, one doesn't like to blow one's own trumpet, does one?)
Talking of stats, what is French Wikisource doing that's so successful at getting things proofread and validated? https://tools.wmflabs.org/phetools/graphs/Wikisource_-_proofread_pages_per_d... https://tools.wmflabs.org/phetools/statistics.php?diff=30
—sam
On Thu, 3 Nov 2016, at 02:16 AM, Andrea Zanni wrote:
Thanks, Jayanta, is very important that you keep track of this progress. Have you talked with Sam Wilson about this? There could be many ways in which the WMF can help you analyze this important moment of the Indic community, and it's also very important to them (and their donors) to understand how do they have an impact. Google OCR is a "simple thing", but we ("Western wikisources) learned very late that OCR was not available in many Indic languages. I have shown many people in the WMF the stats about Telugu Wikisource (the peak in the chart) and it's crucial that many other people inside WMF is aware of that. The Indic Wikisource community can show that there are very "cheap" things the WMF can do to help their communities thrive. The Indic Wikisource community thus has a big responsability ;-) Aubrey
On Wed, Nov 2, 2016 at 7:02 PM, Jayanta Nath jayantanth@gmail.com wrote:
Hello all,
We've just published the November 2016 Indic Wikisource statistics. After implementing Google OCR script to our all Indic Wikisource , they are growing rapidly.
Here is the few stats ans their top three rank... As per Number of article
- Sanskrit Wikisource ( 15445 pages) - supported by 0.05% scan pages.
- Telugu Wikisource ( 11707 pages) - supported by 24.3% scan pages.
- Kannada Wikisource ( 7864 pages) - supported by 0.99% scan pages.
As per Number of page Validation
- Telugu Wikisource ( 18142 pages)
- Tamil Wikisource ( 5167 pages)
- Gujarati Wikisource ( 3729 pages)
As per Number of page Proofread
- Telugu Wikisource ( 20213 pages)
- Malayalam Wikisource ( 8065 pages)
- Tamil Wikisource ( 7737 pages)
As per percentage supported by scan pages.
- Bengali Wikisource (25.90%)
- Telugu Wikisource ( 24.30%)
- Gujarati Wikisource (17.51%)
I want to specially mention that there are no visible improvement at Marathi and Assamese Wikisource.
For Sanskrit and Kannada Wikisource, they need to exploring their work of proofreaded text towards scan page support. Full Indic Wikisource stats here https://wikisource.org/wiki/Wikisource:Indic_Wikisource_Stats Regards, Jayanta Nath Indic Wikisource Community
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
I guess that the "100 livres en 100 jours https://fr.wikisource.org/wiki/Wikisource:Accueil/100wikijours" (100 books in 100 days) challenge help somewhat. The goal is to treat a whole new book everyday. No anticipation work allowed. Missing the goal a single day reset the counter.
Le 03/11/2016 à 01:46, Sam Wilson a écrit :
Yes, I agree! :-) There're so many smallish things that I reckon can go a long way towards making Wikisources bigger and better.
And it keeps surprising me how many people within the Wikimedia movement aren't familiar with how Wikisource works — and are amazed when they're shown! :-) It really does seem that we're not very good at advertising ourselves. (Well, one doesn't like to blow one's own trumpet, does one?)
Talking of stats, what is French Wikisource doing that's so successful at getting things proofread and validated? https://tools.wmflabs.org/phetools/graphs/Wikisource_-_proofread_pages_per_d... https://tools.wmflabs.org/phetools/statistics.php?diff=30
—sam
On Thu, 3 Nov 2016, at 02:16 AM, Andrea Zanni wrote:
Thanks, Jayanta, is very important that you keep track of this progress. Have you talked with Sam Wilson about this? There could be many ways in which the WMF can help you analyze this important moment of the Indic community, and it's also very important to them (and their donors) to understand how do they have an impact. Google OCR is a "simple thing", but we ("Western wikisources) learned very late that OCR was not available in many Indic languages. I have shown many people in the WMF the stats about Telugu Wikisource (the peak in the chart) and it's crucial that many other people inside WMF is aware of that. The Indic Wikisource community can show that there are very "cheap" things the WMF can do to help their communities thrive. The Indic Wikisource community thus has a big responsability ;-) Aubrey
On Wed, Nov 2, 2016 at 7:02 PM, Jayanta Nath <jayantanth@gmail.com mailto:jayantanth@gmail.com> wrote:
Hello all, We've just published the November 2016 Indic Wikisource statistics. After implementing Google OCR script to our all Indic Wikisource , they are growing rapidly. Here is the few stats ans their top three rank... As per Number of article 1. Sanskrit Wikisource ( 15445 pages) - supported by 0.05% scan pages. 2. Telugu Wikisource ( 11707 pages) - supported by 24.3% scan pages. 3. Kannada Wikisource ( 7864 pages) - supported by 0.99% scan pages. As per Number of page Validation 1. Telugu Wikisource ( 18142 pages) 2. Tamil Wikisource ( 5167 pages) 3. Gujarati Wikisource ( 3729 pages) As per Number of page Proofread 1. Telugu Wikisource ( 20213 pages) 2. Malayalam Wikisource ( 8065 pages) 3. Tamil Wikisource ( 7737 pages) As per percentage supported by scan pages. 1. Bengali Wikisource (25.90%) 2. Telugu Wikisource ( 24.30%) 3. Gujarati Wikisource (17.51%) I want to specially mention that there are no visible improvement at Marathi and Assamese Wikisource. For Sanskrit and Kannada Wikisource, they need to exploring their work of proofreaded text towards scan page support. Full Indic Wikisource stats here https://wikisource.org/wiki/Wikisource:Indic_Wikisource_Stats <https://wikisource.org/wiki/Wikisource:Indic_Wikisource_Stats> Regards, Jayanta Nath Indic Wikisource Community _______________________________________________ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org <mailto:Wikisource-l@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikisource-l <https://lists.wikimedia.org/mailman/listinfo/wikisource-l>
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org mailto:Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Thanks Mathieu. What really strikes me is that challenge is doable in fr.wikisource: in many others would be complete madness ;-) Also, Polish Wikisource is doing great.
What interest me is understanding how they are building their community of active and super-active proofreaders: are they doing something that other wikisource aren't?
Aubrey
On Thu, Nov 3, 2016 at 8:36 AM, mathieu stumpf guntz < psychoslave@culture-libre.org> wrote:
I guess that the "100 livres en 100 jours https://fr.wikisource.org/wiki/Wikisource:Accueil/100wikijours" (100 books in 100 days) challenge help somewhat. The goal is to treat a whole new book everyday. No anticipation work allowed. Missing the goal a single day reset the counter.
Le 03/11/2016 à 01:46, Sam Wilson a écrit :
Yes, I agree! :-) There're so many smallish things that I reckon can go a long way towards making Wikisources bigger and better.
And it keeps surprising me how many people within the Wikimedia movement aren't familiar with how Wikisource works — and are amazed when they're shown! :-) It really does seem that we're not very good at advertising ourselves. (Well, one doesn't like to blow one's own trumpet, does one?)
Talking of stats, what is French Wikisource doing that's so successful at getting things proofread and validated? https://tools.wmflabs.org/phetools/graphs/Wikisource_-_ proofread_pages_per_day.png https://tools.wmflabs.org/phetools/statistics.php?diff=30
—sam
On Thu, 3 Nov 2016, at 02:16 AM, Andrea Zanni wrote:
Thanks, Jayanta, is very important that you keep track of this progress. Have you talked with Sam Wilson about this? There could be many ways in which the WMF can help you analyze this important moment of the Indic community, and it's also very important to them (and their donors) to understand how do they have an impact. Google OCR is a "simple thing", but we ("Western wikisources) learned very late that OCR was not available in many Indic languages. I have shown many people in the WMF the stats about Telugu Wikisource (the peak in the chart) and it's crucial that many other people inside WMF is aware of that. The Indic Wikisource community can show that there are very "cheap" things the WMF can do to help their communities thrive. The Indic Wikisource community thus has a big responsability ;-) Aubrey
On Wed, Nov 2, 2016 at 7:02 PM, Jayanta Nath jayantanth@gmail.com wrote:
Hello all,
We've just published the November 2016 Indic Wikisource statistics. After implementing Google OCR script to our all Indic Wikisource , they are growing rapidly.
Here is the few stats ans their top three rank... As per Number of article
- Sanskrit Wikisource ( 15445 pages) - supported by 0.05% scan pages.
- Telugu Wikisource ( 11707 pages) - supported by 24.3% scan pages.
- Kannada Wikisource ( 7864 pages) - supported by 0.99% scan pages.
As per Number of page Validation
- Telugu Wikisource ( 18142 pages)
- Tamil Wikisource ( 5167 pages)
- Gujarati Wikisource ( 3729 pages)
As per Number of page Proofread
- Telugu Wikisource ( 20213 pages)
- Malayalam Wikisource ( 8065 pages)
- Tamil Wikisource ( 7737 pages)
As per percentage supported by scan pages.
- Bengali Wikisource (25.90%)
- Telugu Wikisource ( 24.30%)
- Gujarati Wikisource (17.51%)
I want to specially mention that there are no visible improvement at Marathi and Assamese Wikisource.
For Sanskrit and Kannada Wikisource, they need to exploring their work of proofreaded text towards scan page support. Full Indic Wikisource stats here https://wikisource.org/wiki/Wikisource:Indic_Wikisource_Stats Regards, Jayanta Nath Indic Wikisource Community
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
*_______________________________________________* Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing listWikisource-l@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
2016-11-03 10:12 GMT+01:00 Andrea Zanni zanni.andrea84@gmail.com:
Thanks Mathieu. What really strikes me is that challenge is doable in fr.wikisource: in many others would be complete madness ;-) Also, Polish Wikisource is doing great.
What interest me is understanding how they are building their community of active and super-active proofreaders: are they doing something that other wikisource aren't?
Not sure if there is a link but when you mention fr.ws and pl.ws I can immediately think of a correlation since these two are among the rare wikisources which are prooferead system only (or nearly only : 92 % and 96 % of mainspace pages back with scan, see http://tools.wmflabs.org/phetools/statistics.php).
Cdlt, ~nicolas
2016-11-03 10:12 GMT+01:00 Andrea Zanni zanni.andrea84@gmail.com:
Thanks Mathieu. What really strikes me is that challenge is doable in fr.wikisource: in many others would be complete madness ;-) Also, Polish Wikisource is doing great.
What interest me is understanding how they are building their community of active and super-active proofreaders: are they doing something that other wikisource aren't?
Not sure if there is a link but when you mention fr.ws and pl.ws I can immediately think of a correlation since these two are among the rare wikisources which are prooferead system only (or nearly only : 92 % and 96 % of mainspace pages back with scan, see http://tools.wmflabs.org/phetools/statistics.php).
Cdlt, ~nicolas
In pl.ws we have a policy that if a text *can* be processed using ProofreadPage (legal aspects, scan availability) then it *has to* be processed using this extention.
Ankry
I go sometimes into fr.source as a contributor, even if my French is very poor; I appreciate a lot fr.source editing tools for proofreading, they document a deep interest about any trick to make editing faster, safer, and more comfortable. This "evidence of care" is very rewarding for any contributor.
Alex
2016-11-03 12:46 GMT+01:00 Ankry ankry@mif.pg.gda.pl:
2016-11-03 10:12 GMT+01:00 Andrea Zanni zanni.andrea84@gmail.com:
Thanks Mathieu. What really strikes me is that challenge is doable in fr.wikisource: in many others would be complete madness ;-) Also, Polish Wikisource is doing great.
What interest me is understanding how they are building their community of active and super-active proofreaders: are they doing something that other wikisource aren't?
Not sure if there is a link but when you mention fr.ws and pl.ws I can immediately think of a correlation since these two are among the rare wikisources which are prooferead system only (or nearly only : 92 % and
96
% of mainspace pages back with scan, see http://tools.wmflabs.org/phetools/statistics.php).
Cdlt, ~nicolas
In pl.ws we have a policy that if a text *can* be processed using ProofreadPage (legal aspects, scan availability) then it *has to* be processed using this extention.
Ankry
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
On 3 November 2016 at 19:50, Alex Brollo alex.brollo@gmail.com wrote:
I go sometimes into fr.source as a contributor, even if my French is very poor; I appreciate a lot fr.source editing tools for proofreading, they document a deep interest about any trick to make editing faster, safer, and more comfortable. This "evidence of care" is very rewarding for any contributor.
Alex
It would be great if a common page in meta or mul.ws is created to document all the best practices, gadgets, tools, scripts which are used by every language communities specially the big ones. That would help the smaller communities to uplift themselves and draw more editors.
+1 to Bodhisattwa
a page for Best Practice would be very much appreciated!
Anika
2016-11-03 15:33 GMT+01:00 Bodhisattwa Mandal bodhisattwa.rgkmc@gmail.com:
On 3 November 2016 at 19:50, Alex Brollo alex.brollo@gmail.com wrote:
I go sometimes into fr.source as a contributor, even if my French is very poor; I appreciate a lot fr.source editing tools for proofreading, they document a deep interest about any trick to make editing faster, safer, and more comfortable. This "evidence of care" is very rewarding for any contributor.
Alex
It would be great if a common page in meta or mul.ws is created to document all the best practices, gadgets, tools, scripts which are used by every language communities specially the big ones. That would help the smaller communities to uplift themselves and draw more editors.
-- Bodhisattwa
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Bodhisattwa Mandal, 03/11/2016 15:33:
It would be great if a common page in meta or mul.ws http://mul.ws is created to document all the best practices, gadgets, tools, scripts which are used by every language communities specially the big ones. That would help the smaller communities to uplift themselves and draw more editors.
Recurring aspects are listed at https://wikisource.org/wiki/WS:COORD
Nemo
Thanks Mathieu. What really strikes me is that challenge is doable in fr.wikisource: in many others would be complete madness ;-) Also, Polish Wikisource is doing great.
What interest me is understanding how they are building their community of active and super-active proofreaders: are they doing something that other wikisource aren't?
Aubrey
I think I will nor betray any secret If I tell about that.
At the moment our community is based on few active users and we are trying to support newcommers actively. We noticed that many new users do not like classic on-wiki communication (through talk pages or Scriptorium) so we support them also through other channels (email, IRC). They are always welcome to ask.
I think we have no two users that come to the project in the same way. Our users are often active in many fields. We have a Facebook page, sometimes we mention our project on various fan sites (eg few years ago a short note about plwikisource on an ebook fan site brought over 100 new users; a few of them is still active). We also appreciate occasional actions of our fellow colleagues from Polish Wikipedia and Wikimedia PL (interviews, blog articles, workshops).
We notice that our community originates from different societies than Wikipedia community (in Wikipedia some level of creativity is required, in Wikisource other skills are preferred) and as Polish orthography and grammar did change significantly since XIX c. and even since 1920-ties and 1930-ties, we do not look for new users among teenagers (we do not want to break their fresh orthography-related skills; we look for users rather among retired :) ).
Also I think, OCR tools progress (thanks to Wieralee), a short techical guide for newcommers (also thanks to Wieralee) and a lot of automation (thanks to Zdzislaw) made plwikisource more familiar for new users, even if they have no earlier wiki experience. (When I really came to plws in 2010, almost all books were re-written manually)
We also noticed that various near goals when announced (eg. reaching 90% ProofreadPage-based pages in main, 300.000 pages in Page namespace, 150.000 proofread pages or prepare the full set of Sienkiewicz's texts onto 100th anniversary of his death) make our community more active.
Ankry
Hi,
2016-11-03 8:36 GMT+01:00 mathieu stumpf guntz < psychoslave@culture-libre.org>:
I guess that the "100 livres en 100 jours" (100 books in 100 days)
challenge help somewhat. The goal is to treat a whole new book everyday. No anticipation work allowed. Missing the goal a single day reset the counter.
Le 03/11/2016 à 01:46, Sam Wilson a écrit :
Yes, I agree! :-) There're so many smallish things that I reckon can go a long way towards making Wikisources bigger and better.
And it keeps surprising me how many people within the Wikimedia movement aren't familiar with how Wikisource works — and are amazed when they're shown! :-) It really does seem that we're not very good at advertising ourselves. (Well, one doesn't like to blow one's own trumpet, does one?)
Talking of stats, what is French Wikisource doing that's so successful at getting things proofread and validated? https://tools.wmflabs.org/phetools/graphs/Wikisource_-_ proofread_pages_per_day.png https://tools.wmflabs.org/phetools/statistics.php?diff=30
—sam
Yes, the 100 books in 100 days challenge helps a bit, but growth comes
mainly from Zoé, who corrects all volumes of the "Revue des Deux Mondes", and the partnership with the Bibliothèque et Archives nationales du Québec (Quebec National Archives and Library), due to the leadership of Ernest. See https://fr.wikisource.org/wiki/Wikisource:BAnQ and http://www.banq.qc.ca/activites/wiki/wiki-source.html
Regards,
Yann
wikisource-l@lists.wikimedia.org