Greetings Wikitechies,
I'm working on a research project on Wikipedia, and I'd like to create or obtain
a historical snapshot of Wikipedia on or about a given date. I'm familiar
enough with mediawiki that I could hack my own script recreating the contents of
the "cur" table from the corresponding history (eg. by calling getRevisionText()
a couple of 10^5 times). However since I'd hate to rediscover the wheel, I'd
appreciate if you could let me know if this has been done before, if there are
archives of old snapshots, or if there's an easier way to approach it
technically.
Thank you!
Miran B
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
I remember, a while ago, some was bold and enabled patrolling edits for
Wikipedia. It was shortly disabled.
I think the main problem with this was that anyone could mark edits as
patrolled. I have something akin to a feature request, which basically
reinvisions how the patrol thing should work.
The basic thing is that the software will provide hooks for people to
make patrols, but won't make it mandatory for everybody.
A patrol is a group of editors who have a mutual trust of each other's
judgments. Whenever one person on that patrol marks an edit checked,
everyone else knows about it. Preferably, each patrol should specialize
(like check edits that are +-500 and anon), but some overlap is good.
Eventually, there should be patrols covering all areas of changes.
To create a patrol, you'd have to meet certain requirements (like being
an admin). You register the patrol on Wikipedia, mark down which areas
of the recent changes you will check, and then recruit people. Only the
patrol leader can recruit people, but it's recommended that they be
democratic about it.
Anyone can "tune" into a patrol, that is, they can see what the
patrollers have marked certain changes, but they can't mark a message
under the name of the patrol until they've been recruited.
Recent changes, for anon and the totally unrecruited, will stay the way
it is, but you will have the option to enable Patrol markings. We can
color code them, or a user can assign different trust levels to the
patrols and then use the overarching data on certain edits to determine
their validity (this is like a huge forward jump in time).
The Wikimedia servers will have new Patrol chat rooms, which will be
like the LiveRC but for Patrol information. Eventually, programs that
can crunch patrol data and LiveRC data will be developed to assist
patrolling.
The actual process of marking edits patrolled or not will use the
existing infrastructure. We will not be making lots of new columns in
the database, one column with a certain amount of bits (say two per
patrol) will work.
* 00 - unpatrolled
* 01 - dubious or
* 10 - something here.
* 11 - patrolled
If this isn't scalable, then we can make a new table just for patrols
along the lines of
COLUMN PRIMARY_KEY rc_patrol_id
COLUMN PRIMARY_KEY edit_id
COLUMN status
What do you think?
- --
Edward Z. Yang Personal: edwardzyang(a)thewritingpot.com
SN:Ambush Commander Website: http://www.thewritingpot.com/
GPGKey:0x869C48DA http://www.thewritingpot.com/gpgpubkey.asc
3FA8 E9A9 7385 B691 A6FC B3CB A933 BE7D 869C 48DA
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (MingW32)
iD8DBQFC39UAqTO+fYacSNoRArbEAJ9hPqoQZfUjZEqpK97R4ZyHnkjwWACfVws+
04HyxUWF6yrDTZEtEeh4Eq0=
=0CFu
-----END PGP SIGNATURE-----
hi there,
one short question => is there a way to export / migrate content from mediawiki which can be used for importing into another wiki (e.g. snipsnap)
thänx and greets from vienna,
gerold.
How does one go to know what namespace is an article in the new dump scheme.
xml looks something like this:
<page>
<title>MediaWiki:Categories</title>
<id>1085</id>
<restrictions>sysop</restrictions>
<revision>
<id>12375</id>
<timestamp>2004-10-04T06:34:35Z</timestamp>
<contributor><username>Transalpin</username><id>41</id></contributor>
<text xml:space="preserve">Kategorie</text>
</revision>
</page>
are we supposed to load the *ns0 file to know what article are in ns0? what
about the other namespace?
David Campeau
Hi all,
I'm not yet sure about the real purpose of the wikidata project.
Currently I feel a (or the?) main part is the data entry.
I welcome a reasonable and simplified approach. I guess, a
form based entry is the most obvious solution. But I'm not sure about the
appropriate Wiki syntax for the 'expert' mode where everything is open to
extended definitions. I feel the syntax should be closer to XML instead of
HTML commands, since XHTML2 becomes more and more stable. I guess that
there's not much help to implement a new syntax without upcoming standards
in mind.
However, I expect a rather different result from wikidata: I'd like to
access not only one fixed set of data, but I'd expect an overview of
multiple records, presented as a table.
I want to filter many records and limit the search to a subset.
I want to sort those records by one or multiple sort fields.
There are many more conclusions drawn from this expectation, such as
- search and replace operation on multiple records
- normalization of entries
- language conversion
- display variations (e.g. abbreviated, columns with markers
instead of field contents, html-table or preformatted text,
reordering of columns, subset of columns, transformation, ...)
- maybe one could even implement spreadsheet like field entry options
- custom sort orders
- search ranges (e.g. depending on field types)
- ...
But the main request here is: wikidata should be able to merge info from
multiple sources into one table overview.
I know about several projects which work this way on wikipedia. One of
them is to filter all wikipedia geo coordinates and put them into one
extracted format. I don't know about the operation after: Could I see this
extracted info in wikipedia/wikidata itself? Could I limit my view of the
full set of data (several thousand records) e.g. to a limited range (e.g.
an area within a certain range). Could I fix errors in this subset? And
would those modification be updated back to the original sources?
I guess, it COULD be done - in theory. I don't know yet whether this is
one of the goals of wikidata.
Thanks,
Martin
Has anyone tried sharing DBs on 1.5b yet?
I installed separate wikis in French, German and English and linked
them to the same DB on a test server yesterday, but it only worked for
a few minutes - but I think cookies and/or caching caught up with me
and made it 100% German - even for new machines visiting the pages for
the first time.
Thoughts, suggestions?
Paul
I am trying to rim tests from mediawiki scripts.
On my fedora core 3 php is CGI.
No cli PHP binary to download.
Now I am tired an asking for help if anyone can do this for me.
Thanks,
mm
Hi,
I'll be in California from Monday to Friday. I'll be giving a so-called
"TechTalk" at Google's HQ in MountainView. TechTalks are regular
presentations by outside speakers. I approached Google in February with
the goal to give them an idea of the current state of the Wikimedia
projects, and this is what we finally agreed to. From the abstract:
- - - - -
Wikipedia currently contains 2 million articles in 100 languages. With
more than 600,000 articles, the English edition exceeds all previously
existing encyclopedias in size. The Wikipedia website is ranked by
Alexa.com as one of the 100 largest world-wide, and the Wikimedia
servers respond to up to 1,300 requests per second.
This massive growth raises several questions:
* Due to the nature of open editing which is characteristic of wiki
technology, what methods are there to guarantee the validity of an
article a reader is looking at?
* What does it take to successfully apply the wiki principles to other
problem areas the Wikimedia Foundation is tackling, such as the creation
of a media repository or a news site?
* In what other ways could wikis be extended in their reach? Can the
entire web incorporate wiki-like mechanisms?
This presentation will describe the current state of thinking and
research on these questions within the Wikimedia community and introduce
specific technical solutions, such as the article validation system that
is key to Wikimedia's plans for a print edition.
- - - - -
My primary goal is to get Google's engineers intersted in Wikimedia's
software needs. The presentation is unrelated to any discussion about
Google hosting, and will likely not go into hardware/hosting related
issues. It's more a "Future Talk" similar to what I'm going to do at
Wikimania.
The talk will be on Wednesday, if there's some information or ideas that
you want to relay that might be on interest to them, please send it to
me by email ASAP, to this address. Sorry for the short notice.
Best,
Erik
Hello,
today i got the following (reproducible) PHP warning:
Warning: rename(/mnt/wikipedia/htdocs/commons/upload/thumb/b/b3/250px-429px-Vitruvian.jpg,/mnt/wikipedia/htdocs/commons/upload/thumb/1/10/429px-Vitruvian.jpg/250px-429px-Vitruvian.jpg): Not a directory in /usr/local/apache/common-local/php-1.5/includes/Image.php on line 956
The site in question was:
http://de.wikipedia.org/wiki/Menschen
Kind regards
--
Andreas 'ads' Scherbaum
Failure is not an option. It comes bundled with your Microsoft product.
(Ferenc Mantfeld)
I have translated a LanguageVi.php for Vietnamese version of MediaWiki 1.5.
It's not 100% Vietnamese yet, since the section on "exif" is too long, and
many people donot use it.
Also, MagicWords are left untouched, possibly for compatibility if this is
used with previous database, such as of http://vi.wikipedia.org/
Please include this (see attachement) in future release of MediaWiki 1.5,
once you have verfied it. I myself have been testing it quite a lilte.
I hope to get this being installed in http://vi.wikipedia.org/ as well.
Many thanks,
Trung.
_________________________________________________________________
Don't just search. Find. Check out the new MSN Search!
http://search.msn.com/