Wikitech-l October 2005

wikitech-l@lists.wikimedia.org

118 participants
151 discussions

Separate directory for in-progress database dumps?
by Nick Jenkins 04 Oct '05

04 Oct '05

Hi, For the backups (such as at http://download.wikimedia.org/wikipedia/en/ ), can I please make a suggestion? Can the intermediate backup files please be placed in a separate directory (e.g. http://download.wikimedia.org/wikipedia/en/in-progress/ ), and only be moved to the real directory once they are complete and there were no errors (such as running out of disk space), so as to avoid confusion about what is and what is not a complete and valid dump? The reason I ask is that the file listing for this URL currently is very confusing - for example, here are the compressed "full.xml" dump files at this location, and their dates and file sizes: ========================================================= 20050713_pages_full.xml.gz 2005-Jul-16 16:25:28 29.9G 20050924_pages_full.xml.bz2 2005-Oct-01 03:52:03 11.3G 20051002_pages_full.xml.bz2 2005-Oct-03 19:35:09 1.7G ========================================================= I have to presume that the "20051002_pages_full.xml.bz2" is an in-progress dump, because it is one tenth of the size of the file of the week before, despite using the same compression. Then, once you start to second-guessing whether something is a complete dump or not, it opens a can of worms: you then have to ask whether the 20050924_pages_full.xml.bz2 dump is complete, given that the previous dump is nearly 3 times the size. Or maybe it's because of the difference between gzip and bzip2 .... who can say for sure? All the best, Nick.

2 1

XML Export of the full history
by Nikos Korfiatis 03 Oct '05

03 Oct '05

Dear all, I want to export the page history in an xml format. I found the page http://en.wikipedia.org/wiki/Special:Export for exporting the page history in XML but unfortunately the result contains only the current revision. I unchecked the checkbox but again it gives me the current revision. Is there any special parameter that i have to pass to the url ? Thanks in advance, Nikos Korfiatis Royal Institute of Technology (KTH), Sweden

3 2

Dump status
by Nyenyec N 03 Oct '05

03 Oct '05

Hi, What's the status of the database dumps? Editor's in HuWiki are eagerly awaiting the next stats update. :) Thanks, nyenyec

3 4

Remote loading by firebird.cn
by Henrygb 03 Oct '05

03 Oct '05

The fact that http://www.firebird.cn/wiki/Special:Newpages produces an uptodate list suggests that this site is remote loading.

1 0

LatexDoc.php - extensions unter Win XP ?
by FlaBot 03 Oct '05

03 Oct '05

Hi ! I use MiKTeX unter Win Xp and want to use the LatexDoc-Extension. The DVI/PDF Buttons are shown but when pressing the buttions there is an error message : LaTeX error Command: pdflatex -interaction=batchmode -quiet \input "C:\htdocs\wiki/upload/latexdoc/ltd_2fe00d8bf090e354f95afd999a12711e.tex" 2>&1 Output: entering extended mode --- Content of texput.log This is pdfeTeX, Version 3.141592-1.21a-2.2 (MiKTeX 2.4) (preloaded format=latex 2005.10.1) 2 OCT 2005 13:36 entering extended mode **\input C:\htdocs\wiki/upload/latexdoc/ltd_2fe00d8bf090e35 4f95afd999a12711e.tex ! Emergency stop *** (job aborted, file error in nonstop mode) Here is how much of TeX's memory you used: 5 strings out of 95502 125 string characters out of 1189316 44801 words of memory out of 1048577 3216 multiletter control sequences out of 60000 3640 words of font info for 14 fonts, out of 1000000 for 2000 14 hyphenation exceptions out of 4999 4i,0n,3p,1b,8s stack positions out of 5000i,500n,10000p,200000b,32768s PDF statistics: 0 PDF objects out of 300000 0 named destinations out of 300000 1 words of extra memory for PDF output out of 65536 No pages of output. What to do ? Greeting Flacus -- [[Benutzer:Flacus]][[Benutzer:FlaBot]] http://www.flacus.de/wikipedia/Interwiki-Link-Checker/

2 2

A simple request for a server...
by Jason Y. Lee 02 Oct '05

02 Oct '05

The request for a server is simply a request for a central repository of Wikimedia scripts that are used in the maintainance of any of the Wikimedia servers, such as Wikipedia. As it current stands, it is contributors who run their own copy of their scripts located somewhere in the world. Should that person ever disappear from the project, it is likely that his/her useful source code that was written disappears forever. It is immensely difficult to classify this as a SourceForge project, especially when it is many different pieces of code running for different purposes. Additionally, there is a greater concern to not release some of the scripts to the general public for fear that they may be used for vandalism purposes. However, it should be noted that the code is still useful to someone else who wishs the time and effort to continuously update it, especially with the changing MediaWiki software. Furthermore, there is no easy collaborative way to update source code between contributors. For example, I have several scripts that run a specific purpose. If wished for someone else to take the task of improving my script, I would have to ask them personally and they would have to send the changes to me. There is no easy way to merge changes, like a CVS or SVN service. We have many programmers who contribute to the Wikipedia, who write in various languages, in order to perform a certain maintainance task. Their skills may not be up to par in order to become a developer for the MediaWiki software, but they, as do I, want to use their programming skills in some manner to help one of the Wikimedia projects. It would be immensively useful for a single Wikimedia server to be used as a place for trusted Wikimedia individuals to perform collaborative programming projects in the assistance of maintaining various Wikimedia pages. One of the things about source code that is so great is that it is universal and can be read by anyone who understands the programming language. Case in point, if I wrote a simple bot script that reset the Sandbox on the English Wikipedia, and someone from the Chinese Wikipedia wanted to adapt it for the Chinese Wikipedia, he or she, as long they were a long standing trusted Chinese Wikipedian user, could take my code, and readapt it for their purposes. Case in point, English Wikipedian User Kevin Rector wrote KevinBot to do some bot work, that is explicitly written in C++ (I think). But his code is now out of date due to the changes in MediaWiki code causing his bot code to break. Someone asked me to update his bot and send me the executable, but that doesn't help me since I can't make any changes to the source code to correct his bot. Part of the frustration of bot code writing is that the MediaWiki code changes to the point that bots break. And bot programmers are people who struggle to maintain their own scripts... they don't have a group of people assisting them in the effort. If bot scripts can be updated with a group of people just like the MediaWiki software, it would be less frustrating and the bot code would be updated faster. As it stands, bots are scripts and/or programs that run on individual computers and servers maintained by Wikipedians. If anything happens to them, that small piece of knowledge in which the script is written in disappears. And if, for any reason, their computer breaks or their server breaks... their bot is inactive for an unknown period of time. There is no fallback. Furthermore, their server specs may not be something that a university or this organization can get. The power, speed, and stable connectivity is something that individuals can not achieve on their own. Granted, if someone leaves, someone can rewrite the work that they did. But the point is that they are reinventing the wheel just for that to work. I don't know where else to ask this, so I figured to email it to this list. ---- Jason Y. Lee AKA AllyUnion (English Wikipedia Administrator)

2 1

Re: Multilingual error messages
by Mark Ryan 02 Oct '05

02 Oct '05

Timwi wrote: > Where do we edit this? How do we contribute more translations? You can edit it by ripping the file off there, editing it, and giving it to a developer to re-upload to the server :) If any developer wants to make it more wiki, they can go right ahead. I'd just rather see them preventing the error messages to begin with :) As for the translations, I guess you could submit more at: http://meta.wikimedia.org/wiki/Multilingual_error_messages/Translations However, considering at this stage all languages are all compiled into the one file and the file is approaching 20Kb (despite my strenuous efforts to mke the code as small as possible). So any other languages that get added (in my opinion) won't be able to have as much text as the current ones. Maybe we should drop the donation paragraph? ~Mark Ryan

1 0

Re: Re: Multilingual error messages
by Mark Ryan 01 Oct '05

01 Oct '05

On 30/09/05, Mark Williamson <node.ue at gmail.com> wrote: > Yeah... would it be possible to have a client-side script to determine > the proper error message based on the requested subdomain? > > Mark That was one option I'd considered (and would have implemented, had I not run out of mid-semester break). It shouldn't be too difficult, anyway. If someone wants to spend some time writing some Javascript for this, inserting %U will insert the whole URL, from which you can likely get the subdomain. Otherwise, I'll get around to it in the next couple of weeks and submit it with the few changes that have been requested by people. I tested this error message on IE6, Firefox 1.0.6, Netscape 8, Opera 8.02, Lynx, Konqueror, e-Links and NCSA Mosaic (just to see if it worked in ultra-old browsers). The Javascript works in its entirety in Konqueror, IE and Firefox/Netscape. Opera does most of it fine except it doesn't colour the links according to which language is currently selected, because Opera doesn't currently support document.styleSheet which is what I used. I suppose the page could be rejigged to remedy this but it seems like too much effort for what is essentially a cosmetic thing. Surprisingly, the page displays perfectly in Internet Explorer (the strikethrough is faded in IE), which may offend some free software purists out there (hey, I'm sending this email to the technical list). ~Mark Ryan

2 1

LZMA compression for dumps?
by Brion Vibber 01 Oct '05

01 Oct '05

Pakaran suggested on IRC the use of 7zip's LZMA compression for data dumps, claiming really big improvements in compression over gzip. I did some test runs with the September 17 dump of es.wikipedia.org and can confirm it does make a big difference: 10,995,508,118 pages_full.xml 1.00x uncompressed XML 2,320,992,228 pages_full.xml.gz 4.74x gzipped output from mwdumper 775,765,248 pages_full.xml.bz2 14.17x "bzip2" 155,983,464 pages_full.xml.7z 70.49x "7za a -si" (gzip -9 makes a neglible difference versus the default compression level; bzip2 -9 seems to make no difference.) The 7za program is a fair bit slower than gzip, but at 10-15 times better compression I suspect many people would find the download savings worth a little extra trouble. While it's not any official or de-facto standard that we know of, the code is open source (LGPL, CPL) and a basic command-line archiver is available for most Unix-like platforms as well as Windows so it should be free to use (in the absence of surprise patents): http://www.7-zip.org/sdk.html I'm probably going to try to work LZMA compression into the dump process to supplement the gzipped files; and/or we could switch from gzip back to bzip2, which provides a still respectable improvement in compression and is a bit more standard. (We'd switched from bzip2 to gzip at some point in the SQL dump saga; I think this was when we had started using gzip internally on 'old' text entries and the extra time spent on bzip2 was wasted trying to recompress the raw gzip data in the dumps.) -- brion vibber (brion @ pobox.com)

4 3

{{msg}} URL
by Stefan Vesterlund 01 Oct '05

01 Oct '05

Hi Sorry if this is not the right place for this. I'm a sort of a black sheep here running an html static clone of Wikipedia. Right now I'm tying to improve the program to include {{msg}}. Half a year ago this was the URL to get the content, http://no.wikipedia.org/w/wiki.phtml?title=Template:Akershus&action=raw&cty…. Is there a new way or url to get the content? Regards Stefan Vesterlund

2 3

Jump to page:

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l October 2005