Please forgive me if I'm formatting this post wrongly, this is my first comment to this group.
I am involved with the Wikipedia 1.0 team on en, both the Version 0.5 project and the contact with WikiProjects. It might help to let folks know what we're up to, since much of the validation work you mention goes hand in hand with our mission at Wikipedia 1.0. 1. We are putting together an "alpha test" version of the most important articles of Wikipedia (with vetting for quality), with a planned release in the autumn of 2006. For this version, each article is simply nominated by one person, then reviewed by another from a "review team". Anyone can sign up for this team, though in practice only a few who sign up seem to review much. 2. We hope to go on to do further expanded versions after V0.5, but these will almost certainly include review by several independent reviewers, not merely one. 3. Oleg Alexandrov has worked miracles with a bot that uses categories, and this is now generating lists daily with the title "XXXX articles by quality," summarised here: http://en.wikipedia.org/wiki/Wikipedia:Version_1.0_Editorial_Team/Index_of_s... Anything from our "core topics" list or our Version 0.5 list is tagged on the article talk page, the bot picks this up every night. Delirium may be interested to note that the bot stores a link to the version it found on the day the assessment was done. We will compare this with the current version, allowing us easily to check for a quality decline when we go to press. 4. This bot was mainly designed to help WikiProjects provide us with information on their articles. This is proving a great success, with new projects being added to the bot's list every couple of days. We recently began a "second round" of contacting projects, and this will bear fruit over the summer and autumn. The Military History project, for example, now has over 4000 articles assessed. http://en.wikipedia.org/wiki/Wikipedia:Version_1.0_Editorial_Team/Military_h... It is my hope that once these assessment schemes become established right across Wikipedia, we will end up with a large body of assessments BY SUBJECT EXPERTS. People who know a subject well are much more likely to know, "This biography should also discuss X's work on Y." 5. This contact we are building with projects will help greatly if we institute a system of expert peer review (no. 2 of Delirium's list) of selected articles - we already have people we know in most subject areas.
Regarding Delirium's comments in detail, I don't like option 1 for the same reasons as others. I think option 2 is possible, indeed many groups like Chemistry and Military History are already well down that road, and within a year I expect us to have most areas of en:Wikipedia covered. Giving the work, responsibility and tools to the people who know and care about the particular articles is a very powerful way to do this, and extremely scalable. As for option 3, as Sj points out this is partly what projects like Version 0.5 are doing.
Overall, I think it is crucial to distinguish between VALIDATION and ASSESSMENT. Validation is often used rather loosely here, but my previous career in the pharmaceutical industry forces me to consider validation to mean, "How do we know this article is completely accurate?" It goes much deeper than assessment (as done at V0.5), which is merely a 10-15 minute scan of the article- "does it seem complete, are the sources cited, is it written well, etc.?" My favourite example is an article I wrote on gold(III) chloride, listed as a "Good Article." How do you KNOW that the magnetic susceptibility is minus 0.000112 cc/mol, i.e., can you validate this article? I would like to see Wikipedia move towards having validated versions of articles available, articles that have been rigorously checked by subject experts. I'd like to see the standard version of each article still fully available, but for all validated articles I'd like a tab at the top saying "validated" that would allow any user to see a non-editable, validated version of the article. It might be necessary to create a new namespace on Wikipedia to do this. This approach in effect combines options 1 & 2. There is a proposal sitting on Wikipedia that suggests much of this - I don't agree with all of it, but it's a very good start: http://en.wikipedia.org/wiki/User:TidyCat/Achieving_validation_on_Wikipedia
I am helping to organise a discussion on this very topic at Wikimania in August, I hope some of this group will be there. I think this is a nettle we have to grasp if Wikipedia is to move forward and receive the respect it deserves. The only way to achieve this IMHO is to get a group of people around a physical table (not a virtual one!) who can come to a workable consensus view, and to have people at that table who can also say, "I can write the code" and "I will authorise the changes." Please be there!
Martin A. Walker (User:Walkerma)
Hi all!
How do I implement a localised sort criteria?
pms alfabet is as follows:
A B C D E Ë F G H I J L M N N- O Ò P Q R S T U V Z
So I need the accented letters to come in right places in lists... thank God there is NO word starting with N-, so we can simply forget about it.
Yet we do use À È É Ì Ó Ù that are NOT letters, but only accents. So this ones should simply be ignored. It would also be nice if the search would be able to find "Pàgina" by a wrong request for "pagina". That is, if it could ignore accents (apart from Ò and Ë, as they are proper letters).
Is this somehow possible?
Thanks Bèrto
Berto wrote:
How do I implement a localised sort criteria?
A B C D E Ë F G H I J L M N N- O Ò P Q R S T U V Z
So I need the accented letters to come in right places in lists... thank God there is NO word starting with N-, so we can simply forget about it.
For sorting of pages within category listings, you can manually specify a 'sort key' which will be used for sorting instead of the page title as listed.
Since eg 'N-ame' will sort before 'Name' you could fake the sort key using ~, which will sort after:
[[Category:Sorted pages|Name]] [[Category:Sorted pages|N~ame]]
(However this will not show the separate letter as a heading in the list.)
For now the actual sorting is done by unicode code point, which is far from ideal, but that's the situation at present.
Yet we do use À È É Ì Ó Ù that are NOT letters, but only accents. So this ones should simply be ignored. It would also be nice if the search would be able to find "Pàgina" by a wrong request for "pagina". That is, if it could ignore accents (apart from Ò and Ë, as they are proper letters).
I'm working on fixes to the search; this should improve in the coming weeks.
-- brion vibber (brion @ pobox.com)
I agree with Berto on this...there should definitely be some kind of wiki-defined sort order so that accented characters can sort inline with non-accented in languages where the accents are not separate letters. I'm wondering though, does anyone have a proposed solution other than the sort-keys, that would be more of an actual solution than a loophole? Perhaps something like a system message which defines a wiki's search order and which letters the wiki will treat as equivalent? And something so that searches for, say, "pago" could turn up results like "págo" and "pàgo" (since the accents would be ignored)? And how far along is a solution?
I'm a question guy tonight...
James
-----Original Message----- From: wikipedia-l-bounces@Wikimedia.org [mailto:wikipedia-l-bounces@Wikimedia.org] On Behalf Of Brion Vibber Sent: Thursday, June 22, 2006 2:43 PM To: wikipedia-l@Wikimedia.org Subject: Re: [Wikipedia-l] Sort?
Berto wrote:
How do I implement a localised sort criteria?
A B C D E Ë F G H I J L M N N- O Ò P Q R S T U V Z
So I need the accented letters to come in right places in lists... thank
God
there is NO word starting with N-, so we can simply forget about it.
For sorting of pages within category listings, you can manually specify a 'sort key' which will be used for sorting instead of the page title as listed.
Since eg 'N-ame' will sort before 'Name' you could fake the sort key using ~, which will sort after:
[[Category:Sorted pages|Name]] [[Category:Sorted pages|N~ame]]
(However this will not show the separate letter as a heading in the list.)
For now the actual sorting is done by unicode code point, which is far from ideal, but that's the situation at present.
Yet we do use À È É Ì Ó Ù that are NOT letters, but only accents. So this ones should simply be ignored. It would also be nice if the search would
be
able to find "Pàgina" by a wrong request for "pagina". That is, if it
could
ignore accents (apart from Ò and Ë, as they are proper letters).
I'm working on fixes to the search; this should improve in the coming weeks.
-- brion vibber (brion @ pobox.com)
James R. Johnson wrote:
I agree with Berto on this...there should definitely be some kind of wiki-defined sort order so that accented characters can sort inline with non-accented in languages where the accents are not separate letters. I'm wondering though, does anyone have a proposed solution other than the sort-keys, that would be more of an actual solution than a loophole? Perhaps something like a system message which defines a wiki's search order and which letters the wiki will treat as equivalent? And something so that searches for, say, "pago" could turn up results like "págo" and "pàgo" (since the accents would be ignored)? And how far along is a solution?
I'm a question guy tonight...
IIRC this is Mediazilla Bug 164 - http://bugzilla.wikimedia.org/show_bug.cgi?id=164
Hi!
Perhaps something like a system message which defines a wiki's search order and which letters the wiki will treat as equivalent? And something so that searches for, say, "pago" could turn up results like "págo" and "pàgo" (since the accents would be ignored)?
Yes, this would be the most practical solution to localise sorting orders.
As per accents, so far all we can do is manually input tons of redirects to ensure the search will give a proper result... maybe a bot could do it for us? Such redirects should somehow be marked, though, so that we can trace them and kill them all once the bug is fixed.
Bèrto
Hi!
On 23/06/06, Berto albertoserra@ukr.net wrote:
Perhaps something like a system message which defines a wiki's search order and which letters the wiki will treat as equivalent? And something so that searches for, say, "pago" could turn up results like "págo" and "pàgo" (since the accents would be ignored)?
Yes, this would be the most practical solution to localise sorting orders.
I think you are mixing sorting and searching. Anyway, I don't think this is a good solution to sorting, this would IMHO be 1) a hack solution that would not be enough for many languages (e.g. in Czech, alphabetical sort is a two-stage process, as you can see at http://cs.wikipedia.org/wiki/Abecedn%C3%AD_%C5%99azen%C3%AD#P.C5.99.C3.ADkla...) 2) maybe hard to implement without excessive performance implications
A correct solution is to use the functions already provided by system (and database); MySQL already has some support for collation that could be used, see the discussion at [[bugzilla:164]].
As per accents, so far all we can do is manually input tons of redirects to ensure the search will give a proper result... maybe a bot could do it for us? Such redirects should somehow be marked, though, so that we can trace them and kill them all once the bug is fixed.
I have been thinking about implementing a diacritical filter for the search for a long time. It is quite easy and I have it practically done (for Lucene.NET), only not tested. But there is a question whether Wikimedia servers are not going to switch to another search engine, as has been discussed recently. (Lemmatization would be fine, too, but I know I do not have the required knowledge.)
--[[cs:User:Mormegil | Petr Kadlec]]
I have made some new toys that might be of interest:
Instead of waiting for the next database dump and downloading the whole thing, you can now get a XML dump for all articles in a category and its subcategories:
http://tools.wikimedia.de/~magnus/cat2xml.php?category=History_Version_0.5_a...
Sadly, the toolserver is somewhat broken for use on en, so it doesn't even know "Category:Wikipedia Version 0.5", but that is someone else's problem (since ages, I might add). So until this is fixed, you can't get the complete article texts for 0.5 in one swift motion, but then! ;-)
I have also found a web server (Server2Go) which runs directly from CD (or the main directory of any drive, including USB sticks). I have managed to integrate MediaWiki into it. While a MySQL variant is available, I went for a new MediaWiki database type supporting plain-text files (read-only at the moment). Back to the roots, I guess ;-)
Anyway, I have created what I hope is a standalone, zero-install "History Wikipedia 0.5" bundle, currently consisting of a pityful 19 articles. This was done by converting the above mentioned custom-XML-dump into plain-text articles using a script from my wiki2xml toy.
Categories don't work, and I was too lazy to set up interlanguage links. No images either.
Still interested? Try http://tools.wikimedia.de/~magnus/history05.zip (23MB)
Best to burn the whole thing to a CD, as it /has/ to be in the root directory. I *did* mention that it's currently Windows-only, right?
Magnus
Magnus Manske wrote:
I have made some new toys that might be of interest:
Instead of waiting for the next database dump and downloading the whole thing, you can now get a XML dump for all articles in a category and its subcategories:
http://tools.wikimedia.de/~magnus/cat2xml.php?category=History_Version_0.5_a...
Sadly, the toolserver is somewhat broken for use on en, so it doesn't even know "Category:Wikipedia Version 0.5", but that is someone else's problem (since ages, I might add). So until this is fixed, you can't get the complete article texts for 0.5 in one swift motion, but then! ;-)
I have also found a web server (Server2Go) which runs directly from CD (or the main directory of any drive, including USB sticks). I have managed to integrate MediaWiki into it. While a MySQL variant is available, I went for a new MediaWiki database type supporting plain-text files (read-only at the moment). Back to the roots, I guess ;-)
Anyway, I have created what I hope is a standalone, zero-install "History Wikipedia 0.5" bundle, currently consisting of a pityful 19 articles. This was done by converting the above mentioned custom-XML-dump into plain-text articles using a script from my wiki2xml toy.
Categories don't work, and I was too lazy to set up interlanguage links. No images either.
Still interested? Try http://tools.wikimedia.de/~magnus/history05.zip (23MB)
Best to burn the whole thing to a CD, as it /has/ to be in the root directory. I *did* mention that it's currently Windows-only, right?
How big is it unpacked? Would it be small enough to fit on a cheap USB drive?
Alphax (Wikipedia email) schrieb:
Magnus Manske wrote:
I have made some new toys that might be of interest:
Instead of waiting for the next database dump and downloading the whole thing, you can now get a XML dump for all articles in a category and its subcategories:
http://tools.wikimedia.de/~magnus/cat2xml.php?category=History_Version_0.5_a...
Sadly, the toolserver is somewhat broken for use on en, so it doesn't even know "Category:Wikipedia Version 0.5", but that is someone else's problem (since ages, I might add). So until this is fixed, you can't get the complete article texts for 0.5 in one swift motion, but then! ;-)
I have also found a web server (Server2Go) which runs directly from CD (or the main directory of any drive, including USB sticks). I have managed to integrate MediaWiki into it. While a MySQL variant is available, I went for a new MediaWiki database type supporting plain-text files (read-only at the moment). Back to the roots, I guess ;-)
Anyway, I have created what I hope is a standalone, zero-install "History Wikipedia 0.5" bundle, currently consisting of a pityful 19 articles. This was done by converting the above mentioned custom-XML-dump into plain-text articles using a script from my wiki2xml toy.
Categories don't work, and I was too lazy to set up interlanguage links. No images either.
Still interested? Try http://tools.wikimedia.de/~magnus/history05.zip (23MB)
Best to burn the whole thing to a CD, as it /has/ to be in the root directory. I *did* mention that it's currently Windows-only, right?
How big is it unpacked? Would it be small enough to fit on a cheap USB drive?
78 MB for the software, plus the (currently uncompressed) text. I'm sure I could cut off some size if need arises, but for now I'm just happy it works :-)
Magnus
wikipedia-l@lists.wikimedia.org