Wikitech-l November 2005

wikitech-l@lists.wikimedia.org

139 participants
172 discussions

Extending cur table with further fields
by topi 15 Nov '05

15 Nov '05

How can i add additional fields (added befor to the cur-table) to mediawiki. My aim is to add defined additional data to an article and to make use of the editing and version-system of mediawiki. These additional fields shall appear within the article. Thanks Heinz

1 0

HTML and subpage support on internal wiki needed
by Daniel Mayer 14 Nov '05

14 Nov '05

Could somebody add HTML and subpage support on internal wiki? I just posted a huge HTML doc there that needs to be subdivided. Right now it is a mess. Thanks in advance. :) -- mav __________________________________ Yahoo! FareChase: Search multiple travel sites in one click. http://farechase.yahoo.com

1 0

Copy pages from Wikipedia/Wiktionary for use on giveway CD (presentation in Cracow)
by Sabine Cretella 14 Nov '05

14 Nov '05

Hi, I need to create a giveaway CD where I would like to include some articles that are tranlsated in various languages. I have to give a presentation on wiki projects during a translator's conference. Now I tried to download some pages with HTTrack, but it just does not work. Is there a way to download specific pages and have them update every now and then since I suppose from now on it will happen more often to have to provide some single example pages. I know I could do this with IE using the "offline browsing" utility, well I use Firefox ... and I am not very keen on doing things with IE. Has someone an idea how I could achieve this? Thanks! Sabine ___________________________________ Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB http://mail.yahoo.it

1 0

Wiki-to-XML
by Magnus Manske 14 Nov '05

14 Nov '05

I just wanted to announce that * my PHP-based wiki-to-xml converter now supports the whole syntax * is now in the "php" directory of the CVS module "wiki2xml" * can be tested at http://magnusmanske.de/wiki2xml/w2x.php You can either enter raw wikitext, or a list of article titles. Templates can be automatically resolved (which is necessary for some pages, as otherwise the wiki syntax is invalid and rendered as plain text). Article and template texts are fetched from the given MediaWiki site. Please report any bugs you find. I will now start and try (again) to write a converter to OpenDocument format. Any help would be appreciated. Magnus

6 13

Creating Automated Wiki Sites, without self inserting into Database
by Keller Florian 14 Nov '05

14 Nov '05

Hello, I've a little question about automated creating a new Wiki Site over a self made Formular. I've reverse engineered the Database Design. It works well with my own inserts into the wikidb, but I want to use Wiki Functions to do this. Im using at the moment the latest stable Version (1.5.2). Could someone give me a hint to do this. It's very hard for me to study the Wikicode without any good Developer Documentation. I found nothing with that issue @ Meta Wiki. I think the solution is near: Line: 376 in EditPage.php # If article is new, insert it. $aid = $this->mTitle->getArticleID( GAID_FOR_UPDATE ); .... I hope someone can help, otherwise I will need more sleepless nights :D Greetings, Florian Keller

1 0

change page rendering to display all metadata
by candy 14 Nov '05

14 Nov '05

hi all, Can somebody help me with the folowing : In wikipedia when we view the page of an article, it displays the content of the page and not the metadata such as author(s),page timestamps etc. A part of these metadata is visible in the history section where it displays all the page revisions. I want to change the rendering of the page such that we have a new tab (say metadata) where all the metadata information regarding that page is displayed. I would like to know the procedure of how this can be done. I am using mediawiki1.5 and have already tried hacking through the code. I have a vague idea but its not very clear and concrete. Your advise,instruction or help will be highly appreciated. Thanking you, C

1 0

Ripuarian
by dominik.bach＠web.de 13 Nov '05

13 Nov '05

Hello, the question "dialect or language" has been discussed on MetaWiki, you can find all arguments there. > http://de.wikipedia.org/wiki/Ripuarian > http://de.wikipedia.org/wiki/Rheinl%C3%A4ndisch > >don't even exist yet. > >Best wishes, > >Tels http://de.wikipedia.org/wiki/Ripuarian does not exist because in German it is http://de.wikipedia.org/wiki/Ripuarisch and in English http://en.wikipedia.org/wiki/Ripuarian Best Dominik

1 0

change page rendering to display all metadata
by candy 13 Nov '05

13 Nov '05

1 0

Thoughts and measurments related to a new storage framework for large wikis.
by Gregory Maxwell 12 Nov '05

12 Nov '05

In many modern wikis (such as mediawiki) every version of a page through it's life is made accessable. There are many options on how to store this information, the most simple being to store a complete copy of every article. Another popular option is grouping up old versions into one record and gzipping the stack together, block compression. Versions can be gziped by themselves but this isn't a huge win for most articles. Duplicate versions can eliminated, diffs can be computed.. there are many options. Every one of these options has a time/space tradeoff. Different options are better for different access patterns. For Wikipedia we mostly use whole versions, with a little bit of block compression thrown in every once in a while for good measure. This makes deletion fairly easy although the requirement to truly support deletion is not that clear. It is believed that this has reasonable performance characteristics, but I will explain at the end that it might not be ideal even from a performance perspective. It is not obvious not ideal from a space perspective. I'd like to propose a new framework for storage of archive versions. I will then back up this concept with some measurements. When a page is moved into the archive (or sometime before), we compute its cryptographic hash. If the archive already has that hash, we are done. This eliminates the bit identical duplicates that come out of reverting. If the hash is new, we compute a binary delta between the new version and the previous version of the page. The previous version is obtained from version_cache (discussed below). If it is the first version we diff against the empty string. We store an archive row: new_page_hash,old_page_hash,delta_blob. We then insert the new page into version_cache (keyed on the hash). If the page is large (100K?) it is gziped before inserting if the storage backend doesn't do this for us (some DB's like PostgreSQL do...). it is flaged to mark that it is 'second from top' in its record. We update the entry for the previous version to indicate that it is no longer second from the top. When someone requests a non-current version, we check the version_cache. If the version is there, we return it. We might keep a hit counter or a last used date, if we do, we update that. If it is not in the cache, we find the delta, and check to see if it is in the cache. We walk back deltas until we find the most recent version in the cache. We then forward apply diffs to generate all versions up to the desired cached copy. All generated versions are inserted into the version cache, although the hitcount/lastused on the middle versions should not be set as high as the desired version if we maintain that. Periodically or based on fixed storage pressure, objects in the version cache are purged. 'Second from the top' objects are never purged, objects used less often and less recently are dropped first, however, we try to maintain a version for every 50(? a tunable) revisions of a page or so. So thats basically the idea. Now I want to show you some data which makes this proposal compelling. Let us consider the article "Anarchism" on enwiki, it's reasonably large (average 57k over its entire life, 66k currently) but not huge, and its had a reasonably large number of revisions (4370 in the 10/29 dump)... it's also near the top of the dump which makes it quick to extract. :) It's not subject to an abnormal amount of editwarring or vandalism... pretty typical for what we probably want articles to be.. The concatination of all versions of the article is 242MB in size. This is how much it would take to store ideally without any compression. In reality the storage size for this article would be much larger due to non-ideal packing. If we gzip -9 the concatination, it is reduced to 80MB. This represents the savings we could get with completely ideal block compression. If we group the article into blocks of 5 revisions, we find the storage to require 85MB, which is a little more realistic. If we store based on the content hash, we eliminate 673 versions. Without gziping the size is now 206MB. Gzip -9ing each version alone reduces it to 69MB. (I didn't measure block compression for this one). If we compute a diff -du for each version with a new hash from the previous versions we find the concatenation of the diffs are 15MB. Gzipping the diffs one at a time gives us a 5.2MB file while gziping in blocks of 5 gives us 3MB. If we use bsdiff (http://www.daemonology.net/bsdiff/ fast and efficent, if you ignore the anti-social license) rather than diff -du, and work on the non-duplicate files we get a total output of 1.4MB. Plus bsdiff is much faster than diff -du, and much faster to apply than patch. If we use xdelta (1.1.3 tested) the total is 1.5MB. Xdelta is almost much faster than diff/patch. If we disable gzip in xdelta and block compress in groups of 5, the total is 968K. We can get that down to 500k in blocks of 100 with lzma.. 378k lzma all deltas. It takes my system about 4 seconds to apply all 3697 deltas recover the most recent version from the empty string.. and I suspect most of that time is spent in fork() since I'm starting xdelta once per diff. If we our cache tends to keep 1 full version around for every 50 that will only add a couple of megs of storage. So frankly, given the size of the current dumps and these results, I believe that it is likely that it would be possible to get the entire working set of Wikipedia article text into ram on the sort of hardware we could easily afford (a couple of 1u AMD64 boxes running some imaginary non-sucky version of memcached). Based on these numbers, the growth we've seen, and the cost curves for dram, I think we could continue to keep our working set in ram for the forseeable future. The computational costs of my proposed model can be substantially reduced by smart cache management, and are in any case infinitesimal by comparison to the reduction in IO cost by getting the 160:1 compression and getting the entire working set into ram.. As the wiki grows the gains will only be greater, and there will be less and less interest in most old page versions. Thoughts?

6 10

Error 1064 when importing xml file into mysql with mwdumper
by Martina Greiner 12 Nov '05

12 Nov '05

Hello, I tried to load the wikipedia 20051105 data dump into my mysql database with an installation of media wiki 5.1 and mysql 4.0.26 I used the command 56:~/Desktop mgreiner$ nohup java -server -jar mwdumper.jar --format=sql:1.5 20051105_pages_full.xml.bz2 | mysql wikidb & and received the following error: 56:~/Desktop mgreiner$ ERROR 1064 at line 82: You have an error in your SQL syntax. Check the manual that corresponds to your MySQL server version for the right syntax to use near ''\'\'\'Anarchism\'\'\' is a generic term describing various rev I suspect that it has something to do with the version of mysql. What version is recommended for installing wikipedia? I hope that somebody can help me. Martina Greiner

1 0

← Newer
1
...
8
9
10
11
12
13
14
...
18
Older →

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l November 2005