Hi,
For the backups (such as at
http://download.wikimedia.org/wikipedia/en/ ), can I please make a
suggestion?
Can the intermediate backup files please be placed in a separate
directory (e.g.
http://download.wikimedia.org/wikipedia/en/in-progress/ ), and only be
moved to the real directory once they are complete and there were no
errors (such as running out of disk space), so as to avoid confusion
about what is and what is not a complete and valid dump?
The reason I ask is that the file listing for this URL currently is
very confusing - for example, here are the compressed "full.xml" dump
files at this location, and their dates and file sizes:
=========================================================
20050713_pages_full.xml.gz 2005-Jul-16 16:25:28 29.9G
20050924_pages_full.xml.bz2 2005-Oct-01 03:52:03 11.3G
20051002_pages_full.xml.bz2 2005-Oct-03 19:35:09 1.7G
=========================================================
I have to presume that the "20051002_pages_full.xml.bz2" is an
in-progress dump, because it is one tenth of the size of the file of
the week before, despite using the same compression.
Then, once you start to second-guessing whether something is a
complete dump or not, it opens a can of worms: you then have to ask
whether the 20050924_pages_full.xml.bz2 dump is complete, given that
the previous dump is nearly 3 times the size. Or maybe it's because of
the difference between gzip and bzip2 .... who can say for sure?
All the best,
Nick.
Dear all,
I want to export the page history in an xml format. I found the page
http://en.wikipedia.org/wiki/Special:Export for exporting the page history
in XML but unfortunately the result contains only the current revision. I
unchecked the checkbox but again it gives me the current revision. Is there
any special parameter that i have to pass to the url ?
Thanks in advance,
Nikos Korfiatis
Royal Institute of Technology (KTH), Sweden
Hi ! I use MiKTeX unter Win Xp and want to use the LatexDoc-Extension.
The DVI/PDF Buttons are shown but when pressing the buttions there is an
error message :
LaTeX error
Command: pdflatex -interaction=batchmode -quiet \input
"C:\htdocs\wiki/upload/latexdoc/ltd_2fe00d8bf090e354f95afd999a12711e.tex"
2>&1
Output: entering extended mode
---
Content of texput.log
This is pdfeTeX, Version 3.141592-1.21a-2.2 (MiKTeX 2.4) (preloaded
format=latex 2005.10.1) 2 OCT 2005 13:36
entering extended mode
**\input C:\htdocs\wiki/upload/latexdoc/ltd_2fe00d8bf090e35
4f95afd999a12711e.tex
! Emergency stop
*** (job aborted, file error in nonstop mode)
Here is how much of TeX's memory you used:
5 strings out of 95502
125 string characters out of 1189316
44801 words of memory out of 1048577
3216 multiletter control sequences out of 60000
3640 words of font info for 14 fonts, out of 1000000 for 2000
14 hyphenation exceptions out of 4999
4i,0n,3p,1b,8s stack positions out of 5000i,500n,10000p,200000b,32768s
PDF statistics:
0 PDF objects out of 300000
0 named destinations out of 300000
1 words of extra memory for PDF output out of 65536
No pages of output.
What to do ?
Greeting Flacus
--
[[Benutzer:Flacus]][[Benutzer:FlaBot]]
http://www.flacus.de/wikipedia/Interwiki-Link-Checker/
The request for a server is simply a request for a central repository of
Wikimedia scripts that are used in the maintainance of any of the
Wikimedia servers, such as Wikipedia. As it current stands, it is
contributors who run their own copy of their scripts located somewhere in
the world. Should that person ever disappear from the project, it is
likely that his/her useful source code that was written disappears
forever. It is immensely difficult to classify this as a SourceForge
project, especially when it is many different pieces of code running for
different purposes. Additionally, there is a greater concern to not
release some of the scripts to the general public for fear that they may
be used for vandalism purposes. However, it should be noted that the code
is still useful to someone else who wishs the time and effort to
continuously update it, especially with the changing MediaWiki software.
Furthermore, there is no easy collaborative way to update source code
between contributors.
For example, I have several scripts that run a specific purpose. If
wished for someone else to take the task of improving my script, I would
have to ask them personally and they would have to send the changes to me.
There is no easy way to merge changes, like a CVS or SVN service. We
have many programmers who contribute to the Wikipedia, who write in
various languages, in order to perform a certain maintainance task. Their
skills may not be up to par in order to become a developer for the
MediaWiki software, but they, as do I, want to use their programming
skills in some manner to help one of the Wikimedia projects. It would be
immensively useful for a single Wikimedia server to be used as a place for
trusted Wikimedia individuals to perform collaborative programming
projects in the assistance of maintaining various Wikimedia pages.
One of the things about source code that is so great is that it is
universal and can be read by anyone who understands the programming
language. Case in point, if I wrote a simple bot script that reset the
Sandbox on the English Wikipedia, and someone from the Chinese Wikipedia
wanted to adapt it for the Chinese Wikipedia, he or she, as long they were
a long standing trusted Chinese Wikipedian user, could take my code, and
readapt it for their purposes.
Case in point, English Wikipedian User Kevin Rector wrote KevinBot to do
some bot work, that is explicitly written in C++ (I think). But his code
is now out of date due to the changes in MediaWiki code causing his bot
code to break. Someone asked me to update his bot and send me the
executable, but that doesn't help me since I can't make any changes to the
source code to correct his bot.
Part of the frustration of bot code writing is that the MediaWiki code
changes to the point that bots break. And bot programmers are people who
struggle to maintain their own scripts... they don't have a group of
people assisting them in the effort. If bot scripts can be updated with a
group of people just like the MediaWiki software, it would be less
frustrating and the bot code would be updated faster.
As it stands, bots are scripts and/or programs that run on individual
computers and servers maintained by Wikipedians. If anything happens to
them, that small piece of knowledge in which the script is written in
disappears. And if, for any reason, their computer breaks or their server
breaks... their bot is inactive for an unknown period of time. There is
no fallback. Furthermore, their server specs may not be something that a
university or this organization can get. The power, speed, and stable
connectivity is something that individuals can not achieve on their own.
Granted, if someone leaves, someone can rewrite the work that they did.
But the point is that they are reinventing the wheel just for that to
work.
I don't know where else to ask this, so I figured to email it to this list.
----
Jason Y. Lee
AKA AllyUnion (English Wikipedia Administrator)
Timwi wrote:
> Where do we edit this? How do we contribute more translations?
You can edit it by ripping the file off there, editing it, and giving
it to a developer to re-upload to the server :) If any developer wants
to make it more wiki, they can go right ahead. I'd just rather see
them preventing the error messages to begin with :)
As for the translations, I guess you could submit more at:
http://meta.wikimedia.org/wiki/Multilingual_error_messages/Translations
However, considering at this stage all languages are all compiled into
the one file and the file is approaching 20Kb (despite my strenuous
efforts to mke the code as small as possible). So any other languages
that get added (in my opinion) won't be able to have as much text as
the current ones. Maybe we should drop the donation paragraph?
~Mark Ryan
On 30/09/05, Mark Williamson <node.ue at gmail.com> wrote:
> Yeah... would it be possible to have a client-side script to determine
> the proper error message based on the requested subdomain?
>
> Mark
That was one option I'd considered (and would have implemented, had I
not run out of mid-semester break). It shouldn't be too difficult,
anyway. If someone wants to spend some time writing some Javascript
for this, inserting %U will insert the whole URL, from which you can
likely get the subdomain. Otherwise, I'll get around to it in the next
couple of weeks and submit it with the few changes that have been
requested by people.
I tested this error message on IE6, Firefox 1.0.6, Netscape 8, Opera
8.02, Lynx, Konqueror, e-Links and NCSA Mosaic (just to see if it
worked in ultra-old browsers). The Javascript works in its entirety in
Konqueror, IE and Firefox/Netscape. Opera does most of it fine except
it doesn't colour the links according to which language is currently
selected, because Opera doesn't currently support document.styleSheet
which is what I used. I suppose the page could be rejigged to remedy
this but it seems like too much effort for what is essentially a
cosmetic thing. Surprisingly, the page displays perfectly in Internet
Explorer (the strikethrough is faded in IE), which may offend some
free software purists out there (hey, I'm sending this email to the
technical list).
~Mark Ryan
Pakaran suggested on IRC the use of 7zip's LZMA compression for data
dumps, claiming really big improvements in compression over gzip. I did
some test runs with the September 17 dump of es.wikipedia.org and can
confirm it does make a big difference:
10,995,508,118 pages_full.xml 1.00x uncompressed XML
2,320,992,228 pages_full.xml.gz 4.74x gzipped output from mwdumper
775,765,248 pages_full.xml.bz2 14.17x "bzip2"
155,983,464 pages_full.xml.7z 70.49x "7za a -si"
(gzip -9 makes a neglible difference versus the default compression
level; bzip2 -9 seems to make no difference.)
The 7za program is a fair bit slower than gzip, but at 10-15 times
better compression I suspect many people would find the download savings
worth a little extra trouble.
While it's not any official or de-facto standard that we know of, the
code is open source (LGPL, CPL) and a basic command-line archiver is
available for most Unix-like platforms as well as Windows so it should
be free to use (in the absence of surprise patents):
http://www.7-zip.org/sdk.html
I'm probably going to try to work LZMA compression into the dump process
to supplement the gzipped files; and/or we could switch from gzip back
to bzip2, which provides a still respectable improvement in compression
and is a bit more standard.
(We'd switched from bzip2 to gzip at some point in the SQL dump saga; I
think this was when we had started using gzip internally on 'old' text
entries and the extra time spent on bzip2 was wasted trying to
recompress the raw gzip data in the dumps.)
-- brion vibber (brion @ pobox.com)
Hi
Sorry if this is not the right place for this.
I'm a sort of a black sheep here running an html static clone of Wikipedia. Right now I'm tying to improve the program to include {{msg}}. Half a year ago this was the URL to get the content, http://no.wikipedia.org/w/wiki.phtml?title=Template:Akershus&action=raw&cty….
Is there a new way or url to get the content?
Regards
Stefan Vesterlund