Wikitech-l May 2007

wikitech-l@lists.wikimedia.org

89 participants
135 discussions

MySQL 4.0.20 stop word list -- when did its use stop on en.wikipedia?
by Reid Priedhorsky 24 May '07

24 May '07

Hi folks, On <http://meta.wikimedia.org/wiki/Stop_word_list>, it's claimed that for full-text searching, English Wikipedia uses the MySQL 4.0.20 stop word list with a few modifications. I gather that this is no longer the case; when did it stop? Thanks, Reid

2 1

Database dumps of english wikipedia
by David Milne 24 May '07

24 May '07

Hi, I haven't seen a complete dump of the english wikipedia for a while, and I'm a little worried: I am a PhD student from New Zealand, and I'm completely dependent on this data - particularly the sql tables (page, pagelink, etc). With the last 2 dumps failing, and others being removed, there isnt a single complete dump of the english wikipedia data left available for download - and there hasn't been a new dump since the beginning of April. Has something happened? Cheers, Dave

3 3

character set problems in reverts
by Reid Priedhorsky 24 May '07

24 May '07

Hi folks, In our ongoing research here at UMN, we've discovered some reverts that introduce apparent character set problems; what seems to happen is that some Unicode characters are replaced by a character I don't recognize followed by a hexadecimal number. For example: http://en.wikipedia.org/w/index.php?title=Dog&diff=58851026&oldid=58821211 What I see is that a sequence of five characters that I don't have glyphs for, which show up as five boxes with the numbers "010337 01033F 01033D 010333 010343" in them, is replaced with the sequence "?df37?df3f?df3d?df33?df43", where ? is not the question mark but a black diamond with a white question mark in it (a zero byte?). Do any of you have pointers on information as to what is going on? We are trying to devise a workaround that would result in revisions like this comparing identical. Many thanks, Reid

4 4

Re: [Wikitech-l] SVN: [22159] trunk/phase3/includes/SpecialAllmessages.php
by Raimond Spekking 23 May '07

23 May '07

vyznev(a)svn.wikimedia.org schrieb: > Revision: 22159 > Author: vyznev > Date: 2007-05-13 23:07:06 -0700 (Sun, 13 May 2007) > > Log Message: > ----------- > All the MediaWiki: pages linked to from Special:Allmessages have at least a default value, there's no point in showing any of them as redlinks. It is now impossible to see if a message exists in a localized revision in the MediaWiki namespace which is identical to the default. It makes it hardly difficult to recognize which messages have to be deleted from MW namespace after commit of messages to the MessagesXx.php file. Raymond.

3 4

skeleton
by Froilan de La Hera 23 May '07

23 May '07

Is possible this? I want that when a category creates, automatically it creates an default article structure within her. Example, because mi English is very poor: I make a Category Exception, and now, automatically it creates the articles: Description_Exception, News_Exception, Source_Exception, etc. Very complicated? Thanks!

1 0

MediaWiki automated test run failure 2007-05-23
by brion＠pobox.com 23 May '07

23 May '07

An automated run of parserTests.php showed the following failures: This is MediaWiki version 1.11alpha (r22359). Reading tests from "maintenance/parserTests.txt"... Reading tests from "extensions/Cite/citeParserTests.txt"... Reading tests from "extensions/Poem/poemParserTests.txt"... 18 still FAILING test(s) :( * URL-encoding in URL functions (single parameter) [Has never passed] * URL-encoding in URL functions (multiple parameters) [Has never passed] * Table security: embedded pipes (http://mail.wikipedia.org/pipermail/wikitech-l/2006-April/034637.html) [Has never passed] * Link containing double-single-quotes '' (bug 4598) [Has never passed] * message transform: <noinclude> in transcluded template (bug 4926) [Has never passed] * message transform: <onlyinclude> in transcluded template (bug 4926) [Has never passed] * BUG 1887, part 2: A <math> with a thumbnail- math enabled [Has never passed] * HTML bullet list, unclosed tags (bug 5497) [Has never passed] * HTML ordered list, unclosed tags (bug 5497) [Has never passed] * HTML nested bullet list, open tags (bug 5497) [Has never passed] * HTML nested ordered list, open tags (bug 5497) [Has never passed] * Fuzz testing: image with bogus manual thumbnail [Introduced between 08-Apr-2007 07:15:22, 1.10alpha (r21099) and 25-Apr-2007 07:15:46, 1.10alpha (r21547)] * Inline HTML vs wiki block nesting [Has never passed] * Mixing markup for italics and bold [Has never passed] * dt/dd/dl test [Has never passed] * Images with the "|" character in the comment [Has never passed] * Parents of subpages, two levels up, without trailing slash or name. [Has never passed] * Parents of subpages, two levels up, with lots of extra trailing slashes. [Has never passed] Passed 495 of 513 tests (96.49%)... 18 tests failed!

1 0

Re: [Wikitech-l] Wikipedia for Schools
by Andrew Cates 22 May '07

22 May '07

Thanks for the reactions. This is getting a bit editorial for this mailing list and perhaps the editorial content bit should move to the project pages. Technically we have is a decent script which eats a list of archived versions of articles and puts out a cleaned static tree, obeying manually alterable delete instructions. It is very easy to restore content or run this on another list of articles if you have one. But anyway Matthew makes a fair point, I should have thought through exactly our process. Please bear in mind this is motley crew of volunteer stuff not professional editors. The process was chuck everything into a funnel, get a volunteer to read it and then throw irrelevant stuff out; then sort by school topic then go get other articles to fill holes in curriculum. However very US-centric content and fringe content got thrown out (see list at http://en.wikipedia.org/w/index.php?title=Wikipedia:Wikipedia_CD_Selection&… for discarded articles), including things like baseball players. I hadn't really noticed many FAs going from the 800 articles I did personally but overall no doubt this included FA/GA stuff, plus a tonne of Pokemon characters. In defence the current collection of FA and GAs is very skewed. On the other two questions 1) "how many good articles has Wikipedia" I concede I could be completely wrong. There are a vast number of key school topics (such as classic novels) where the content is hugely disappointing and we kept hitting poor quality articles when trying to fill holes. We also had a quick go at comparing with EB articles and were saddened. But Walkerma thinks 50,000 good articles could be found and he could be right, it could be far more. 2) Censorship: lets not get this out of proportion. There were a small number of articles where we thought content might cause issues. We could have left out all these articles with no sweat; no one would notice. There are plenty of places a 15 year old can go for things not in this collection. There is plenty of content which an 8 year old won't understand. We have taken out a small amount of content to allow the appeal to widen downwards in schools. You go get your list of archived articles chosen your way and we will knock off a static copy for your choice, with no section deletes: no problem. 3) I am happy to be guided on citations but part of the problem is that the formatting is so variable in Wikipedia itself we were struggling with it. WP chooses to nofollow citations so I guess we all agree this part of content is unreliable? Anyway its done so many different ways we thought it needed to come out. BozMo ============ Matthew Brown wrote: > On 5/22/07, Andrew Cates <andrew(a)catesfamily.org.uk> wrote: >> It contains all Good & Featured content (except adult content). > > Not true, unless 'adult content' means not only content deemed > unsuitable for children but also content deemed not interesting or > using some other mechanism. Since I only could be bothered to go > through the FA process once, of course I looked to see if "my" FA was > included, and it wasn't. > > Which is no problem, it's on a nerdy topic of little general interest, > but this seemed to diverge from what you said, so I thought I'd bring > it up before other people ;) > > -Matt > > >

1 0

Re: [Wikitech-l] [MediaWiki-CVS] SVN: [22311] trunk/phase3/maintenance
by Brion Vibber 22 May '07

22 May '07

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 aaron(a)svn.wikimedia.org wrote: > + print( "Flagging bot account edits...\n" ); > + > + # Fill in the rc_bot field > + $sql = "SELECT DISTINCT rc_user FROM $recentchanges " . > + "LEFT JOIN $usergroups ON rc_user=ug_user " . > + "WHERE ug_group='bot'"; This is fragile, as there's no guarantee that the "bot" group is the only one that has bot privileges, or indeed that it exists at all. Instead, you should look up which group(s), if any, have the 'bot' permission in $wgGroupPermissions, then search for those groups. - -- brion vibber (brion @ wikimedia.org) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGUwf2wRnhpk1wk44RAghvAJ9rHGFxPpDuJ52svK8ykrw3OkAsMwCeJcxg jI74ZZYmW0C7JF+bsqc3t9o= =8fBO -----END PGP SIGNATURE-----

1 0

MediaWiki automated test run failure 2007-05-22
by brion＠pobox.com 22 May '07

22 May '07

An automated run of parserTests.php showed the following failures: This is MediaWiki version 1.11alpha (r22317). Reading tests from "maintenance/parserTests.txt"... Reading tests from "extensions/Cite/citeParserTests.txt"... Reading tests from "extensions/Poem/poemParserTests.txt"... 18 still FAILING test(s) :( * URL-encoding in URL functions (single parameter) [Has never passed] * URL-encoding in URL functions (multiple parameters) [Has never passed] * Table security: embedded pipes (http://mail.wikipedia.org/pipermail/wikitech-l/2006-April/034637.html) [Has never passed] * Link containing double-single-quotes '' (bug 4598) [Has never passed] * message transform: <noinclude> in transcluded template (bug 4926) [Has never passed] * message transform: <onlyinclude> in transcluded template (bug 4926) [Has never passed] * BUG 1887, part 2: A <math> with a thumbnail- math enabled [Has never passed] * HTML bullet list, unclosed tags (bug 5497) [Has never passed] * HTML ordered list, unclosed tags (bug 5497) [Has never passed] * HTML nested bullet list, open tags (bug 5497) [Has never passed] * HTML nested ordered list, open tags (bug 5497) [Has never passed] * Fuzz testing: image with bogus manual thumbnail [Introduced between 08-Apr-2007 07:15:22, 1.10alpha (r21099) and 25-Apr-2007 07:15:46, 1.10alpha (r21547)] * Inline HTML vs wiki block nesting [Has never passed] * Mixing markup for italics and bold [Has never passed] * dt/dd/dl test [Has never passed] * Images with the "|" character in the comment [Has never passed] * Parents of subpages, two levels up, without trailing slash or name. [Has never passed] * Parents of subpages, two levels up, with lots of extra trailing slashes. [Has never passed] Passed 495 of 513 tests (96.49%)... 18 tests failed!

1 0

importDump and setting rc_bot
by Jim Hu 22 May '07

22 May '07

As far as I can tell, importDump does not mark imported pages as coming from a bot, even when the user is a bot in the User table. Is that correct? Is there a way to indicate a bot revision in the xml, or do I need to do this in the db afterward? ===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054

2 3

← Newer
1
2
3
4
5
6
7
8
...
14
Older →

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l May 2007