Wikitech-l December 2005

wikitech-l@lists.wikimedia.org

132 participants
156 discussions

Importing Wikipedia articles into Mediawiki running on OS X
by Xed Mac 06 Dec '05

06 Dec '05

Hi I'd like some help importing Wikipedia articles into a Mediawiki installation running on OS X. I tried to use importDump.php, but couldn't even get it to run. After trying mwdumper.jar (which *appeared* to do "something"), there are still no articles in my WIki. Any help appreciated. Bear in mind I am a simple-minded OS X user. My previous efforts are detailed below. Thanks Xed > These are the steps I took: > > /usr/local/mysql-standard-5.0.16-osx10.4-powerpc/bin/mysql > -u root -p wikidb > > SOURCE > /Users/xed/Sites/mediawiki/maintenance/tables.sql > > ....which got: > Query OK, 0 rows affected, 1 warning (0.17 sec) > > Query OK, 0 rows affected, 1 warning (0.00 sec) > > Query OK, 0 rows affected (0.04 sec) > > Query OK, 0 rows affected, 1 warning (0.03 sec) > > Query OK, 0 rows affected, 1 warning (0.01 sec) > > Query OK, 0 rows affected, 1 warning (0.00 sec) > > Query OK, 0 rows affected, 1 warning (0.01 sec) > > Query OK, 0 rows affected, 1 warning (0.01 sec) > > Query OK, 0 rows affected, 1 warning (0.01 sec) > > Query OK, 0 rows affected, 1 warning (0.00 sec) > > Query OK, 0 rows affected, 1 warning (0.01 sec) > > Query OK, 0 rows affected, 1 warning (0.00 sec) > > Query OK, 0 rows affected, 1 warning (0.06 sec) > > Query OK, 0 rows affected, 1 warning (0.01 sec) > > Query OK, 0 rows affected, 1 warning (0.03 sec) > > Query OK, 0 rows affected, 1 warning (0.02 sec) > > Query OK, 0 rows affected, 1 warning (0.00 sec) > > Query OK, 0 rows affected, 1 warning (0.01 sec) > > Query OK, 0 rows affected, 1 warning (0.00 sec) > > Query OK, 0 rows affected, 1 warning (0.01 sec) > > Query OK, 0 rows affected, 1 warning (0.00 sec) > > Query OK, 0 rows affected, 1 warning (0.00 sec) > > Query OK, 0 rows affected, 1 warning (0.00 sec) > > Query OK, 0 rows affected, 1 warning (0.01 sec) > > Query OK, 0 rows affected (0.01 sec) > > Then I did: > /System/Library/Frameworks/JavaVM.framework/Versions/1.5/Commands/java > -jar -Xmx512m -Xms256m > /Users/xed/Desktop/mwdumper.jar > --format=sql:1.5 > /Users/xed/Desktop/20051127_pages_articles.xml.bz2 | > /usr/local/mysql-standard-5.0.16-osx10.4-powerpc/bin/mysql > -u wikiuser -p wikidb > > ...and it stepped thru (without errors) from: > 1,000 pages (91.033/sec), 1,000 revs (91.033/sec) > > ...to.. > > 2,136,559 pages (189.635/sec), 2,136,559 revs > (189.635/sec) > > After this, I rebooted and checked my Wiki. The articles where nowhere to be found. Obviously it did something, but I don't know what. ___________________________________________________________ WIN ONE OF THREE YAHOO! VESPAS - Enter now! - http://uk.cars.yahoo.com/features/competitions/vespa.html

1 0

Article->insertNewArticle and category links
by Travis Derouin 06 Dec '05

06 Dec '05

Is there any reason why when inserting a new article from outside EditPage.php doesn't update the category links? $title = Title::newFromText("Hellos"); $article = new Article($title); $ret = $article->insertNewArticle("hi there![[Category:Greetings]]", "", false, false, false) The article is successfully created and links to the Greetings category. However, the "Greetings" category does not show the new article in the list of articles in that category. Travis

2 1

Re: [Wikitech-l] Squid configuration problem
by Travis Derouin 06 Dec '05

06 Dec '05

Hey Brion, Doh! It does look like the UTM code that Google Analytics uses does cookie anonymous users, will this prevent us from configuring Squid properly? The cookies won't affect the display of the pages loaded, is there a way to ignore a set of cookies that start with __utm in the context of the cache? Travis > Message: 10 > Date: Mon, 05 Dec 2005 11:32:34 -0800 > From: Brion Vibber <brion(a)pobox.com> > Subject: Re: [Wikitech-l] Squid configuration problem > To: Wikimedia developers <wikitech-l(a)wikimedia.org> > Message-ID: <439495D2.3050400(a)pobox.com> > Content-Type: text/plain; charset="iso-8859-1" > > Travis Derouin wrote: > > Hey, > > > > Does anyone have any suggestions about configuring Squid in reverse proxy > > for MW? We're seeing that our articles aren't being cached and Squid is > > still making several requests per second for the same article, despite > > configuring Squid as instructed at meta.wikimedia.org. Here are our > > settings: > > > > $wgUseSquid = true; > > $wgUseESI = false; > > $wgInternalServer = $wgServer; > > $wgSquidMaxage = 18000; > > $wgSquidServers = array('10.234.169.202'); > > $wgSquidServersNoPurge = array(); > > $wgMaxSquidPurgeTitles = 400; > > $wgSquidFastPurge = true; > > > > Squid and Apache are running on different machines, and we're seeing several > > 200 response codes for unchanged articles from Apache, even in a small time > > span of a few minutes. While the apache load has been reduced, we should be > > seeing Squid handling more of the serving of normal article pages, instead > > of forwarding the request to Apache almost 100% of the time. > > Can you confirm that anonymous users have no cookies? If you manually hit the > server with manual HTTP can you confirm cache hits/misses? > > -- brion vibber (brion @ pobox.com) > >

3 3

FW: Whitelisting 'members'
by Gerrit Holl 06 Dec '05

06 Dec '05

This question belongs here, I believe. Would such a patch be possible? ----- Forwarded message from Gerrit Holl <gerrit(a)nl.linux.org> ----- Date: Mon, 5 Dec 2005 11:12:36 +0100 Subject: Whitelisting 'members' From: Gerrit Holl <gerrit(a)nl.linux.org> To: helpdesk-l-owner(a)wikimedia.org Cc: helpdesk-l(a)wikimedia.org User-Agent: Mutt/1.4.1i Lines: 32 Gerrit Holl wrote: > After Christmas I might have time to help in creating such a software > patch, it should not be too difficult and it would probably make a lot > of difference in the moderation queue work. I might have time before as well. What version of mailman do the mailinglist servers run? Is read-only access to the database available from the mailinglist server? It would require but a single SQL query: select "" from user where user_email=email_address limit 1; If this has a result, the e-mail is let through. If it has not, it is moderated. Almost all, if not all, of those e-mails will be good faith. A few lines of code added to Mailman/Handlers/Moderate.py function process would do the trick, I think before line 93 or 109: hold for approval if the action for non-members is 'hold'. We could change that into: hold for approval if the address does not occur in the enwiki database, and is not a member of the mailing list, and the mailing list default is set to 'hold'. What do you think? Gerrit. -- Temperature in Luleå, Norrbotten, Sweden: | Current temperature 05-12-05 10:49:53 -4.5 degrees Celsius ( 23.8F) | -- Det finns inte dåligt väder, bara dåliga kläder. ----- End forwarded message ----- -- Temperature in Luleå, Norrbotten, Sweden: | Current temperature 05-12-05 11:19:50 -4.5 degrees Celsius ( 23.9F) | -- Det finns inte dåligt väder, bara dåliga kläder.

2 3

Messages
by Stephen Bain 06 Dec '05

06 Dec '05

Hi, I'm having a go at coding a feature request, and I have a question about messages. The feature is a new special page, so all of the code is within one file, the only thing that is outside that file is a require_once and some messages. I was wondering where is the place to put new messages? I figured in Language.php, but how then do I update the database with them and create the relevant MediaWiki: pages? Also, http://meta.wikimedia.org/wiki/How_to_become_a_MediaWiki_hacker recommends creating a patch and posting it at bugzilla, but I'm on Windows - how do I create a cvs diff in this environment? Thanks, -- Stephen Bain stephen.bain(a)gmail.com

6 6

RfC regarding technical problems of Wikipedia (Mediawiki ??)
by Andy Rabagliati 06 Dec '05

06 Dec '05

Folks, I hope I am not off-topic to refer this RfC to the mailing list. http://en.wikipedia.org/wiki/Wikipedia_talk:Requests_for_comment/Roylee Please direct all discussion to the Talk page above, and not to this list ? I include a little context below. Cheers, Andy! Purpose of this RfC Personally, I am not interested in sanctioning the editor Roylee. What I think is most important, and accordingly what should be the aim of this RfC, is finding a solution to the underlying social and technical problems of Wikipedia as exposed by this issue. As BanyanTree said here, "it took over a hundred edits before Mark began to reel Roylee in. Is there another user whose made 50 similar edits, who has not been discovered? Are there a hundred such users?" Simple-minded Linus's law-derivates are not the answer here. Many eyes have looked at the articles that were affected, and did not recognize any problems. More paradoxically, if literally everyone would have assumed good faith, only brute force fact-checking could have detected the problem eventually. This is profoundly worrying. Quite frankly, all this has made me doubt, maybe for the first time, the long-term viability of Wikipedia as a trustworthy resource. -- mark 14:46, 3 December 2005 (UTC) A clever manipulator will always be able to insert unwarranted material into Wikipedia for a time, but many eyes eventually discover even believable nonsense. This issue is a case in point. The principle of caveat lector should not be restricted to elite education: we should all have been raised as doubters of text, right from the start. Even the Encyclopaedia Britannica. As for me, so far am I from continuing to Assume Good Faith, in the face of bad edits, when I find a vandal I try to check through that IP's contributions, and sometimes discover previously unnoticed vandalism. The question remains, how many of these bad edits remain in Wikipedia articles? --Wetman 03:05, 4 December 2005 (UTC) I disagree with your first point ("many eyes eventually discover..."), which is of course a truism among Wikipedians as an extension of the proposition that vandalized articles constitute a fraction of one percent of all articles. This case appears to indicate that "believable nonsense" can remain on Wikipedia for unacceptably long periods. Cases such as the long-lasting misinformation found recently on John Seigenthaler Sr. can perhaps be written off as "believable nonsense" that was missed because the vandal did not edit a number of articles, which has a better chance of rousing suspicion and being reverted (another truism of vandal fighting). This is not the case with Roylee, who created believable nonsense in self-supporting webs of article across the normal Wikipedian topics of specialization over a period of months. This simply shouldn't have have been possible, and that fact that it did happen indicates that Wikipedia's processes are not as robust as they are advertised. As I mention in the post that Mark links to above, Roylee throws any blanket reassurance given for Wikipedia's credibility into doubt. - BanyanTree 19:59, 4 December 2005 (UTC) Wikipedia has credibility? Have you seen pages like Nietzsche or Khmer Rouge? Wikipedia has many, many roadblocks to overcome before it has any hint of credibility collectively. I think currently, each article has to prove it's own credibility, it's not simply inherited because it's a Wikipedia article. (Bjorn Tipling 21:23, 4 December 2005 (UTC)) I basically concur (see Wikipedia:Researching with Wikipedia, largely written by me and Dan Keshet), but the goal is to work toward having credibility, and it looks like Roylee's contributions have been a detriment. And that is what this RfC is about, no? -- Jmabel | Talk 03:53, 5 December 2005 (UTC) This is especially the case as we cannot be sure if he is unique or if his detection is unique, e.g. are there numerous users adding misinformation using similar patterns? I would like to think not, but I don't know how anyone can guarantee it. I had hoped to hear a solution to the problem from people reading this RfC, and the lack makes me think that the problem is structural rather than individual. I would love to be proved wrong on this. - BanyanTree 19:59, 4 December 2005 (UTC)

3 2

Java code to convert HTML to Wiki syntax?
by Chuck Smith 06 Dec '05

06 Dec '05

I am currently working on a Java applet for JSP wiki which I would eventually like to migrate for MediaWiki. It highlights wiki syntax to make it easier for people to learn and use wikis (for details see http://www.jspwiki.org/wiki/WikiWizard). Right now I want it to copy text from MS Word and format it into wiki syntax. I have already figured out how to get the Word content from the clipboard as HTML (for details see http://forum.java.sun.com/thread.jspa?threadID=688889) and I was wondering if anyone knows of any open source Java code that converts HTML to wiki syntax. Any help would be greatly appreciated. Thanks, Chuck PS: For those interested in the differences between MediaWiki syntax and JSP Wiki syntax, I have created the following page: http://www.jspwiki.org/wiki/MigratingFromMediaWiki

1 0

road to stability, formatted. last kick-off posting.
by Kai Kumpf 06 Dec '05

06 Dec '05

Thanks to Brion, who pointed out the matter of readabilty to me. Accordingly, please allow me another, nicely formatted and more to the point, posting: It has been said that Wikipedia is „work in progress“ and will probably continue to do so. On the other hand it ails from the fact that at no given point in time you can be certain to have a simultaneously 1. consistent (with respect to various articles on a similar topic) 2. unvandalized and 3. correct (with respect to a single article) throughout Wikipedia From my point of view, compared to those three points the shortcoming of the non-completeness of WP dwindles to almost nothing. Let me draw your attention to the fact that the construction plans for roads to stability – or at least local optima – have long been laid out by physics. Heat a dynamic system quickly then let it cool down in a slower and controlled fashion, allowing less and less dramatic changes to take place as time passes. Simulated annealing (http://en.wikipedia.org/wiki/Simulated_annealing) is the magic spell that might work for wikixyzs in a way similar to that in the real world. The rationale behind my suggestion is of course that articles that have matured over time are - statistically speaking - less likely to improve when large modifications are made than relatively new ones. Some of the articles have reached a stage where well-meant editing effectively mucks up the inner structure and logic. What I think reasonable is to lift the threshold for substantial edits, maybe not by limiting access but by asking for more substantial background information from the authors (references, printed, electronic,...) than the simple comment line. There is too much unproven and partially unprovable information in the WP. That could have been prevented long ago by obliging the authors to give references for their information. Besides, this task would make it successively harder to simply put established statements upside down. Whereas scientific journals have peer review to prevent superfluous or erroneous contributions, WP only offers the weak weapons of discussion pages (for everyone) and reverts (mostly by admins, who can't always claim erudition in all the domains they are watching, I guess). So why not confer a little bit more of responsibility to the authors!? He/she could be aided by predefined lists, checkboxes, comboboxes (for ref.type, etc.). Asking a little more information from authors could be a substantial part of the rising editing threshold necessary for "cooling down" WP a bit. I find myself increasingly involved in hunting down vandals and their work – partly due to the ease of use WP offers for non-serious edits, too, and I can‘t help feeling that a larger and larger part of WP keeps a larger and larger part of the community busy with just keeping up the existing standard. We mustn't be sure of still finding enthusiatic acclaim in the years to come when WP becomes a battlefield in a fight against distracting, redundant or plain wrong infobits. Comments from both the user/admin and developer side welcome. Best, kai (kku)

1 0

stuff. (performance, squids, leaks)
by Domas Mituzas 05 Dec '05

05 Dec '05

Hello folks, lately again we've had some stuff going on... Though Brion was implementing that anon-blocking stuff (yay, more blocking - faster performance!) we were targeting other performance issues as well. Tim did rewrite ip block code (did cut 50ms or so ;-) as well as made lots of other nice stuff, and now we implemented Mark's idea to run diskless squids (well, they have disks, but no cache on them). Lots of our new servers have joined object cache running (hehe, again) Tugela, instead of memcached. It's interesting to see how it should grow. Sadly, no expiration (memory->disk) of objects happened yet in a week, so we can't measure anything. BerkeleyDB standalone might be a bit faster than memcached, though, benchmarks on same hardware were not conducted. ~22G of data is cached in object cache now - parser objects, image metadata, diffs, sessions, user objects, 'you have new messages' bits and language objects. So far we didn't notice any of glitches that forced us to remove Tugela from service before (some cosmetic patches were done). Anyway, we have more RAM, that didn't cost millions, we use it. Anyway, today with squids running from memory only we managed to achieve 0.09s average response times for logged in users, at least those who go directy to Florida. Before that Squid efficiency was really distorted by somewhat blocking async i/o (if it really existed there), poor sibling relations and memory leak. We still have that memory leak and are somewhat lost with it.. Squid 'accounts' for 1G memory, uses >2G, and it grows, until restarted. We need to solve that, but nobody has every really touched valgrind at such loads (eh, today squid servers were serving like >700 requests per second each), and I'm not sure if anyone touched valgrind properly at all ;-) We'll soon have a bunch of servers suitable for squid task, but still, using them more efficiently would help. We will always lack resources at some place :-) Guidelines could help, as well we could simply provide our sources, a bit of configuration and load documentation. *shrug*. Another troubling part is sibling relations - right now each proxy marks others as siblings and proxy-only, that is shouldn't save contents into cache. Eventually they do not talk to each other at all and hit backend, and all have their separate caches. I'm not sure if that's related with equal object expiration times or any other hypothesis. If anyone has had experience with squids in such setups, where there're lots of objects and lots of servers and efficiency was managed, it would be sure nice to hear it. It is still strange that it blocks quite a bit at some housekeeping operations on i/o. BTW, it took a while today to detect a serious packet loss in our upstream providers. It does slightly affect client network performance, but quite stalls communication between our distributed clusters. Looking for such problems becomes a bit of witch hunt :) So much of today's experiences and joys ;-) Cheers, Domas

2 1

Writing to CVS
by Stephen Bain 05 Dec '05

05 Dec '05

Thanks to all who replied to my previous message: http://mail.wikipedia.org/pipermail/wikitech-l/2005-December/032856.html In the end I used TortoiseCVS, which recommended WinMerge for diffs. Apologies in advance for another newbie question, this is the first time I've worked with CVS. I've finished my patch, and posted it at MediaZilla: http://bugzilla.wikimedia.org/show_bug.cgi?id=04028 How do I add this to the CVS, and more importantly, in which package do I add my changes? Since the feture request is for a Special: page, almost all the code is self-contained, and so it should be fairly safe to upload. -- Stephen Bain stephen.bain(a)gmail.com

3 2

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l December 2005