Hi everyone,
I recently set up a MediaWiki (http://server.bluewatersys.com/w90n740/)
and I need to extra the content from it and convert it into LaTeX
syntax for printed documentation. I have googled for a suitable OSS
solution but nothing was apparent.
I would prefer a script written in Python, but any recommendations
would be very welcome.
Do you know of anything suitable?
Kind Regards,
Hugo Vincent,
Bluewater Systems.
I've been putting placeholder images on a lot of articles on en:wp.
e.g. [[Image:Replace this image male.svg]], which goes to
[[Wikipedia:Fromowner]], which asks people to upload an image if they
own one.
I know it's inspired people to add free content images to articles in
several cases. What I'm interested in is numbers. So what I'd need is
a list of edits where one of the SVGs that redirects to
[[Wikipedia:Fromowner]] is replaced with an image. (Checking which of
those are actually free images can come next.)
Is there a tolerably easy way to get this info from a dump? Any
Wikipedia statistics fans who think this'd be easy?
(If the placeholders do work, then it'd also be useful convincing some
wikiprojects to encourage the things. Not that there's ownership of
articles on en:wp, of *course* ...)
- d.
>
> Message: 8
> Date: Fri, 12 Oct 2007 17:59:22 +0200
> From: GerardM <gerard.meijssen(a)gmail.com>
> Subject: Re: [Wikitech-l] Primary account for single user login
>
> Hoi,
> This issue has been decided. Seniority is not fair either; there are
> hundreds if not thousands of users that have done no or only a few edits and
> I would not consider it fair when a person with say over 10.000 edits should
> have to defer to these typically inactive users.
1. Yes, it's not fair, but this is the truth on wikimedia project that ones
have to admit. Imagine if, all wikimedia sites has a single user login
since when it is first established, the one who first register will own that
username for all wikimedia sites.
2. The person with less edits, doesn't mean that they are less active than the
one with more edits. And according to,
http://en.wikipedia.org/wiki/Wikipedia:Edit_count,
``Edit counts do not necessarily reflect the value of a user's contributions
to the Wikipedia project.''
What if, some users have less edits count,
* since they deliberately edit, preview, edit, and preview the articles,
over and over, before submitting the deliberated versions to wikimedia
sites.
* Some users edit, edit and edit the articles in their offline storage, over
and over, before submitting the only final versions to wikimedia sites.
While some users have more edits count,
* since they often submit so many changes, without previewing it first, and
have to correct the undeliberated edit, over and over.
* Some users often submit so many minor changes, over and over, rather than
accumulate the changes resulting in fewer edits count.
* Some users do so many robot routines by themselves, rather than letting
the real robot to do those tasks.
* Some users often take part in many edit wars.
* Some users often take part in many arguments in many talk pages.
What if, the users with less edits count, try to increase their edits count
to take back the status of primary account.
What if, they decide to change their habit of editing, to increase the
edits count,
* by submitting many edits without deliberated preview,
* by splitting the accumulated changes into many minor edits, and submit
them separately,
* by stopping their robots, and do those robot routines by themselves,
* by joining edit wars.
3. According to 2) above, I think, the better measurement of activeness is to
measure the time between the first edit and the last edit of that username.
The formula will look like this,
activeness = last edit time - first edit time
>
> A choice has been made and as always, there will be people that will find an
> un-justice. There were many discussions and a choice was made. It is not
> good to revisit things continuously, it is good to finish things so that
> there is no point to it any more.
>
> Thanks,
> GerardM
>
> On 10/12/07, Anon Sricharoenchai <anon.hui(a)gmail.com> wrote:
> >
> > According to the conflict resolution process, that the account with
> > most edits is selected as a primary account for that username, this
> > may sound reasonable for the username that is owned by the same person
> > on all wikimedia sites.
> >
> > But the problem will come when the same username on those wikimedia
> > sites is owned by different person and they are actively in used.
> > The active account that has registered first (seniority rule) should
> > rather be considered the primary account.
> > Since, I think the person who register first should own that username
> > on the unified
> > wikimedia sites.
> >
> > Imagine, what if the wikimedia sites have been unified ever since the
> > sites are
> > first established long time ago (that their accounts have never been
> > separated),
> > the person who register first will own that username on all of the
> > wikimedia
> > sites.
> > The person who come after will be unable to use the registered
> > username, and have
> > to choose their alternate username.
> > This logic should also apply on current wikimedia sites, after it have
> > been
> > unified.
> >
I have not seen a comprehensive overview of MediaWiki localisation discussed on the lists I am posting this message to, so I thought I might give it a try. All statistics are based on MediaWiki 1.12 alpha, SVN version r29106.
==Introduction==
*Localisation or L10n - the process of adapting the software to be as familiar as possible to a specific locale (in scope)
*Internationalisation or i18n - the process of ensuring that an application is capable of adapting to local requirements (out of scope)
MediaWiki has a user interface (UI) definition for 319 languages. Of those languages at least 17 language codes are duplicates and/or serve a purpose for usability[1]. Reporting on them, however, is not relevant. So MediaWiki in its current state supports 302 languages. To be able to generate statistics on localisation, a MessagesXx.php file should be present in languages/messages. There currently are 262 such files, of which 16 are redirects from the duplicates/usability group[2]. So MediaWiki has an active in-product localisation for 236 languages. 66 languages have an interface, but simply fall back to English.
The MediaWiki core product recognises several collections of localisable content (three of which are defined in messageTypes.inc):
* 'normal' messages that can be localised (1726)
* optional messages that can be localised, which usually only happens for languages not using a Latin script (161)
* ignored messages that should not be localised (100)
* namespace names and namespace aliases (17)
* skin names (7)
* magic words (120)
* special page names (76)
* other (directionality, date formats, separators, book store lists, link trail, and others)
Localisation of MediaWiki revolves around all of the above. Reporting is done on the normal messages only.
MediaWiki is more than just the core product. On http://www.mediawiki.org/wiki/Category:All_extensions some 750 extensions have some kind of documentation. This analysis will scope only to the code currently present in svn.wikimedia.org/svnroot/mediawiki/trunk. The source code repository contains give or take 230 extensions. Of those 230 extensions, about 140 contain messages that can be visible in the UI in some use case (debugging excluded). Out of those 140, about 10 extensions have an exotic implementation for localisation localisation support at all (just English text in the code). 10 extensions appear to be outdated. I have seen about 5 different 'standard' implementations of i18n in extensions. Since MediaWiki 1.11 there is wfLoadExtensionMessages. Not that many extensions use this yet for message handling. If you can help add more standard i18n support for extensions (an overview can be found at http://translatewiki.net/wiki/User:Siebrand/tobeadded) or help in standardising L10n for extensions, please do not hesitate.
==MediaWiki localisation in practice==
Localisation of MediaWiki is currently done in the following ways I am aware of:
* in local wikis: Sysops on local wikis shape and translate messages to fit their needs. This is being done in wikis that are part of Wikimedia, Wikia, Wikitravel, corporate wikis, etc. This type of localisation has the fewest benefits for the core product and extensions because it happens completely out of the scope of svn committers. I have heard Wikia supports languages that are not supported in the svn version. I would like to get some help in identifying and contacting these communities to try and get their localisations in the core product. Together with SPQRobin, I am trying to get what has been localised in local Wikipedias into the core product and recruit users that worked on the localisation to work on a more centralised way of localisation (see Betawiki)
* through bugzilla/svn: A user of MediaWiki submits patches for core messages and/or extensions. These users are mostly part of a wiki community that is part of Wikimedia. These are usually taken care of by committers raymond, rotemliss, and sometimes others). Some users maintain a language directly on SVN. At the moment, 10-15 languages are maintained this way: Danish, German, Persian, Hebrew, Indonesian, Kazach (3 scripts), Chinese (3 variants), and some more less frequently.
* through Betawiki: Betawiki was founded in mid 2005 by Niklas Laxström. In the years to follow, Betawiki has grown to be a MediaWiki localisation community with over 200 users that has contributed to the localisation of 120 languages each month in the past few months. Users that are only familiar with MediaWiki as a tool can localise almost every aspect of MediaWiki (except for the group 'other' mentioned earlier) in a wiki interface. The work of the translators is regularly committed to svn by nikerabbit, and myself. Betawiki also offers a .po export that enables users to use more advanced translation tools to make their translation. This option was added recently and no translations in this format have been sumitted yet. Betawiki also supports translation of 122 extensions, aiming to support everything that can be supported.
==MediaWiki localisation statistics==
MediaWiki localisation statistics have been around since June 2005 at http://www.mediawiki.org/wiki/Localisation_statistics[3]. Traditionally reports have focused on the complete set of core messages. Recently a small study was done after usage of messages, which resulted in a set of almost 500 'most often used messages in MediaWiki', based on usage of messages on the cluster of Wikimedia (http://translatewiki.net/wiki/Most_often_used_messages_in_MediaWiki).
Up to recently there were no statistics available on the localisation of extensions. Through groupStatistics.php in the extension Translate, these statistics can now be created. Aside from reporting on 'most often used MediaWiki messages', 'MediaWiki messages', and 'all extension messages supported by extension Translate' (short: extension messages). Additionally, a meta extension group of 34 extensions used in the projects of Wikimedia has been created (short: Wikimedia messages). A regularly updated table of these statistics can be found at http://translatewiki.net/wiki/Translating:Group_statistics.
Some (arbitrary) milestones have been set for the four above mentioned collections of messages. For the usability of MediaWiki in a particular language, the group 'core-mostused' is the most important. A language must qualify for MediaWiki to have minimal support for that language. Reaching the milestones for the first two groups is something the Wikimedia language committee considers to use as a requirement for new Wikimedia wikis:
* core-mostused (496 messages): 98%
* wikimedia extensions (354 messages): 90%
* core (1726 messages): 90%
* extensions (1785 messages): 65%
Currently the following numbers of languages have passed the above milestones:
* core-mostused: 47 (15,5% of supported languages)
* wikimedia extensions: 10 (3,3% of supported languages)
* core: 49 (16,2% of supported languages)
* extensions: 7 (2,3% of supported languages)
==Conclusion==
So... Are we doing well on localisation or do we suck? My personal opinion is that we do something in between. Observing that there are some 250 Wikipedias that all use the Wikimedia Commons media repository, and that only 47 languages have a minimal localisation, we could do better. With Single User Login around the corner (isn't it), we must do better. On the other hand, new language projects within Wikimedia all have excellent localisation of the core product. These languages include Asturian, Bikol Central, Lower Sorbian, Extremaduran, and Galician. But where is Hindi, for example, with currently only 7% of core messages translated?
With the Wikimedia Foundation aiming to put MediaWiki to good use in developing countries and products like NGO-in-a-box that include MediaWiki, the potential of MediaWiki as a tool in creating and preserving knowledge in the languages of the world is huge. We have to tap into that potential and *you* (yes, I am glad you read this far and are now reading my appeal) can help. If you know people that are proficient in a language and like contributing to localisation, please point them in the right direction. If you know of organisations that can help localising MediaWiki: please approach them and ask them to help.
We have all the tools now to successfully localise MediaWiki into any of the 7000 or so languages that have been classified in ISO 639-3. We only need one person per language to make it happen. Reaching the first two milestones (core-mostused and wikimedia extensions) takes about 16 hours of work. Using Betawiki or the .po, little to no technical knowledge is required.
This was the pitch. How about we aim to at least double the numbers by the end of 2008 to:
* core-mostused: 120
* wikimedia extensions: 50
* core: 90
* extensions: 20
I would like to wish everyone involved in any aspect of MediaWiki a wonderful 2008.
Cheers!
Siebrand Mazeland
[1] als,crh,iu,kk,kk-cn,kk-kz,kk-tr,ku,sr,sr-jc,sr-jl,zh,zh-cn,zh-sg,zh-hk,zh-min-nan,zh-yue
[2] crh,iu,kk,kk-cn,kk-kz,kk-tr,ku,sr,sr-jc,sr-jl,zh,zh-cn,zh-sg,zh-hk,zh-min-nan,zh-yue
[3] older locations are http://www.mediawiki.org/wiki/Localisation_statistics/stats and
http://meta.wikimedia.org/wiki/Localization_statistics
Why is Special:Wantedpages not updated in Wikimedia sites since 3 September? If
it is too expensive to generate it in large wikis (especially enwiki), could it
be re-enabled for smaller wikis?
Thanks.
Dear Wikitech list members,
This is my first post here, I have been redirected by Alfio who said you
might have some answers regarding my research.
Here's my orginal question
(http://it.wikipedia.org/wiki/Discussioni_utente:Alfio#Long_Tail_of_Wikipedi
a_Usage) and Alfio's answer
(http://en.wikipedia.org/wiki/User_talk:Junjulien) right below:
Dear Alfio,
I am part of an organization that tries, amongst other things, to promote
the use of wikipedias in native languages. I believe you take an active part
in compiling these statistics :
http://en.wikipedia.org/wiki/Wikipedia:Multilingual_statistics, and I hope
you might point me in the right direction for my research. I am interested
in establishing a matrix which would give the number of users for each
"below 100 000 articles" wikipedias (from #16 onward in this list
http://meta.wikimedia.org/wiki/List_of_Wikipedias), against the countries
where the visitor's traffix originates from, as well as against where the
editors are editing from. Obviously it would be great to have time as a 3rd
dimension to follow trends...
Where to start? Who to ask to?
Please contact me on my talk page
http://en.wikipedia.org/wiki/User_talk:Junjulien
Thanks a lot for your time,
Jun Julien Matsushita Project Coordinator Internews Europe
-----------------------
Hello,
sorry for the late answer (holidays...). It is true that I compile part of
the Multilingual statistics, but my contribution is limited to getting the
current copy of <http://meta.wikimedia.org/wiki/List_of_Wikipedias>
http://meta.wikimedia.org/wiki/List_of_Wikipedias and feeding it to a script
which generates the table. The list of wikipedias itself, as far as I know,
is bot-generated, but I only have the foggiest idea of how (wikipedia's ways
can be strange at times... :-)
Your project would need a great deal of data about editors and readers, and
data about the readers is probably unavailable as it would require
collecting server logs, and Wikimedia servers do not have the capability of
recording visitor logs at our current load. I remember seeing on wikitech-l
that someone is recording decimated data, e.g. one in 10 or 100 visitors,
but deleting personal info like the originating IP, which would defeat
geolocation.
About the editors, the IP addresses of logged in users are not collected
(again). While for anonymous editors, the IP is recorded in the history and
you could download a full history dump from
<http://download.wikimedia.org/> http://download.wikimedia.org and see what
you can recover. In short, i don't really know how to help you. Try to write
to wikitech-l (see <http://lists.wikimedia.org/mailman/listinfo/wikitech-l>
http://lists.wikimedia.org/mailman/listinfo/wikitech-l), and see if someone
has the data you need.
Cheers,
Alfio
-------------------------
Has anyone a clue as to where to direct my efforts?
Thanks a lot for your time,
Jun Julien Matsushita
Radio Connect Project Coordinator
Internews Europe
14, cité Griset - 75011 Paris
France - www.internews.eu
skype: junjulien
Happy Holidays everyone.
I'm going through: http://www.mediawiki.org/wiki/Special:Version -- and it
means that the version of mediawiki currently deployed is r28966, however
that commit was only yesterday by siebrand -- is that correct. Also is the
extension list fully up to date, ie if I download and install everything on
that page, and run mwdumper I should have an exact mirror of wikipedia?
Thanks,
Yousef
Dear enwiki sysadmins/developers,
I would like to start Extrapedia, a catch-all wiki for stuff that
can't go on Wikipedia. At first, it would only accept high-quality
articles about subjects that enwiki considers non-notable.[1] I would
eventually like to write some code to make it easier to move pages
from Wikipedia to Extrapedia. I figure that writing-code bit will be
easier for me if you admins host Extrapedia on either the Foundation
servers or on some personal servers of your own.
So, would you sysadmins be willing to kindly host it for now?
Kind regards,
Jason Spiro
^ [1]. Later on, I might expand it to include bad-quality articles
and/or articles deleted from enwiki for other reasons. Either way,
only NPOV material would be allowed.
--
Jason Spiro: corporate trainer, web developer, IT consultant.
I support Linux, UNIX, Windows, and more.
Contact me to discuss your needs and get a free estimate.
+1 (613) 668-6096 / Email: info(a)jspiro.com / MSN: jasonspiro(a)hotmail.com
[Apologies for cross posting, please take discussions to semediawiki-user]
Hi all,
we are delighted to announce that the first stable version of Semantic
MediaWiki, SMW 1.0, has been completed in 2007 and is now available for
download [1].
The online documentation will be updated accordingly within the next weeks.
== New features at a glance ==
* Simplified semantic annotations: just one kind of annotation ("Property").
* Significant speedups for page rendering and loading
* Prettier and easier to understand interfaces
* Alternative inline query syntax {{#ask:...}}, fully compatible with
MediaWiki templates, template parameters, and other extensions
* Semantic RSS feeds: subscribe to your favourite query results
* More expressive semantic querying: class and property hierarchies, equality
* Pattern matching, disjunctions, and inequality in query conditions
* Fewer and simpler datatypes, more tolerant parsing
* Better media support: better treatment of links to Image: and Media:
* Better internationalisation, new languages: Dutch, Chinese
* Experimental n-ary properties for list-like property values
* Ontology import re-enabled (simple annotation import)
* Support for upcoming MediaWiki 1.12
* Many many bugfixes and improvements
The complete list of changes is found at [2].
== How to install/upgrade ==
SMW 1.0 requires MediaWiki of version 1.11 or greater (though 1.10 still work
for the most part). Existing installations can easily be upgraded, and
existing syntax will mostly continue to work as expected. Details on upgrade
and installation are found in the INSTALL instructions [3].
Any problems can be discussed on our user mailing list "semediawiki-user".
== Acknowledgements ==
This release represents a major step in SMW development, and would not have
been possible without a number of contributors, translators, and, of course,
users, who have greatly aided the development of SMW [4]. In addition,
specific development tasks have been supported by the European Union within
projects SEKT and NeOn, and by Vulcan Inc. and ontoprise GmbH within the Halo
project. Thanks!
Finally, given that none of us accepts any donations, I would like to point
out that the development of SMW hinges upon the established communities and
experiences of the open content and free software movement. SMW would not
have been possible without the continued activities from organisations such
as the Wikimedia Foundation (obviously), but also the Free Software
Foundation or Creative Commons (providing us developers with some legal
safety). If you want to support SMW, please support those or similar
organisations.
So have fun with the new release, and all the best for the new year!
Markus
[1] http://sourceforge.net/projects/semediawiki/
[2]
http://svn.wikimedia.org/svnroot/mediawiki/trunk/extensions/SemanticMediaWi…
[3]
http://svn.wikimedia.org/svnroot/mediawiki/trunk/extensions/SemanticMediaWi…
[4]
http://svn.wikimedia.org/svnroot/mediawiki/trunk/extensions/SemanticMediaWi…
--
Markus Krötzsch
Institut AIFB, Universät Karlsruhe (TH), 76128 Karlsruhe
phone +49 (0)721 608 7362 fax +49 (0)721 608 5998
mak(a)aifb.uni-karlsruhe.de www http://korrekt.org