Wikitech-l July 2003

wikitech-l@lists.wikimedia.org

71 participants
153 discussions

New Installation
by Fred Bauder 05 Jul '03

05 Jul '03

Would some of you please look at the new installation at: http://www.internet-encyclopedia.org And see if you have any ideas regarding how to fix the bugs in it. It is using NewCodeBase, not Phase 3. Note that on the main page that the top portion of the page is dead, that is the links for the first few inches don't work. Also the popdown menu (when you have in in a stylesheet that does work) always sends you back to the main page rather than doin what you selected. Thanks, Fred Bauder

4 5

article on heise.de
by Kurt Jansson 05 Jul '03

05 Jul '03

Okay, there's an article on heise.de about the German Wikipedia (20.000 articles). And our speed is going doooowwwwwn. Please do what you can to speed things up. Please! Kurt

1 0

Tarquin's interlanguage link feature & dynamic dates
by Tim Starling 04 Jul '03

04 Jul '03

The dynamic date code now seems to be working satisfactorily. Thanks to Patrick and Wapcaplet for sorting me out on punctuation use. Additionally, I've implemented Tarquin's "quick hack" to make linking many translations together easier. Specifically, self links are ignored, so you can just copy the same block of text to every language. What's the standard procedure from here? How long are new features usually left on the test server before updating the real one? Note that the dynamic dates feature has already been discussed, and even announced at the village pump, so the only reason to wait before uploading is to check for bugs. Oh, I almost forgot. In fact I did forget to put it in the CVS summary. I fixed that <pre>\0</pre> bug I reported at the village pump. In the replacement string of preg_replace, backslashes need to be escaped, not just dollar signs. -- Tim Starling.

2 1

"Edit this section", TOC implemented
by erik_moeller＠gmx.de 04 Jul '03

04 Jul '03

I've just committed a bunch of new stuff to CVS: 1) "Edit this section". If you enable the user preference "Show links for editing individual sections", you get little "edit" links under each article section. These can be used to fetch just the text of that section, and edit it. No more scrolling through 30 K articles to edit a typo. No more wading through long discussion threads to add a short comment (provided they are organized using headlines). 2) Automatic table of contents. If the option "Show table of contents for articles with more than 3 headings" is enabled, a small TOC is added on top of the page with navigation links to the individual sections. From this it follows logically that we now have 3) Anchors for each article section (named after the section title, e.g. "External links" becomes "External_links"). So you can link to these from elsewhere. Regards, Erik

13 32

WikmediaOne login (was Re: Development tasks)
by Daniel Mayer 04 Jul '03

04 Jul '03

Thomas Corell wrote: > There is the problem that some users want and have different usernames in > different wiki's. The only way to handle this is keeping all those 'lokal' > users and first of all allow additional sigle-signon on the 'Master user > DB' which should handle not only single-sign-on, but even all those lokal > user accounts. Couldn't we use the wikimedia.org domain name for handling user logins? That way a user only has to sign into their WikimediaOne login and will then be logged-in to every Wikimedia project and subproject. But before that happens (and periodically after) we may want to flush out several thousand en.wikipedia user accounts that have been 1) idle for more than say 3 months and 2) have fewer than 10 total edits since the account was created. That would free up many user names so that actual contributors can use them. -- Daniel Mayer (aka mav)

13 23

RE: [Wikitech-l] Problems with Table of Contents and possible sol ution
by Dreyer, Jason 03 Jul '03

03 Jul '03

David Friedland wrote: > > I propose that whether and and where a TOC appears should be specified > by each article using a wiki tag indicating where the TOC should appear. > This would allow articles with many short sections to not have a TOC, > long articles with just a few sections have a TOC, and allow article > authors/editors to specify the best location for the TOC without relying > upon the wiki software to presume the very top of an article is the best > place. Some examples of an implementation like this can be found at IAwiki: http://www.iawiki.net/IAwikiIA http://www.iawiki.net/TextFormattingRules -Jason Dreyer

1 0

DE-Wikipedia slow
by Thomas Luft 03 Jul '03

03 Jul '03

Hi, at the moment (17:00 CEST) the German Wikipedia is very slow, fr, too. The English Wikipedia is running fine. Could someone have a look at this? Thanks Thomas aka Urbanus -- Thomas Luft, Burgholzweg 97, D-72070 Tuebingen, GERMANY Email: tluft(a)web.de Phone: +49 7071 408908 Fax: 408909

2 1

memcached
by Lightning 03 Jul '03

03 Jul '03

I noticed yesterday that Danga Interactive has released php bindings for memcacheD, which is well, really cool. memcacheD is a in-memory caching system for any tipe of data. It acts as a middle man in between the application and the db, and transparently cache's objects. Its currently being used in Livejournal and if you are a lj user you might have noticed the big speedup it caused. Its pretty cool how it works, and I think that wikipedia could really benefit from it since db load seems to be our main bottleneck. I can see memcached storing pretty much all the curr versions, which would cut down db use on reads a whole lot. I just don't know if there is enough memory to run this, maybe if a 3d server is ever aquired? wikipedia is growing fast, and so is the traffic, and the number of edits per day. This is great for the project, but its apparently killing the servers. There is a lot of hacks right now to reduce load (non-dynamic caching, miser mode). But I believe that this application could cut out a lot of the server load whithout reducing any functionality. Any thoughts anyone? Lightning http://www.danga.com/memcached/ ________________________________________________ This mail was sent by UebiMiau 2.5

1 0

Database design
by Timwi 03 Jul '03

03 Jul '03

Now that my own proposal to re-write the entire code in Perl again was met with great resistance, I propose to at least create an entirely new database structure, and then adapt the current code to it. I have studied the current database structure and see the following rather severe problems with it: * BLOBs that store article text are combined in the same table as meta-data (e.g. date, username of a change, change summary, minor flag, etc.). This is bad because variable-length fields like BLOBs negatively affect the performance of reading the table. Pages like the watchlist should not have to bother with variable-length data such as article text and would run a lot faster if they could get their data entirely from fixed-length rows. * Currently, all user properties and preferences, as well as article properties, are columns in a table. Although this is not a problem in terms of DB-reading performance, introducing a new user preference or other enhancements involve adding a new column, and adding a new column becomes extremely database-intensive as the database grows. LiveJournal uses a very clever system that will easily remedy this. One table, 'userproplist', stores the possible user properties (userprops), and another table, 'userprop', stores what user has what userprop with what value. This way all that is needed for adding a new userprop is adding a single row to the 'userproplist' table. The same would analogously apply to articleprops. Once we have that, the user table will hopefully remain very small (in terms of number of columns), so looking up a username (to name just an example) would be ridiculously efficient. * BLOBs that fulfill the same function (article text) are scattered across two tables (cur and old). This is bad because it means variable-length text has to be moved across tables on every edit. Very slow. Better to give every version/revision of an article (i.e. each item of article text) a permanent ID and use those IDs in the 'cur' and 'old' tables instead. Then have one large table, 'articletext' perhaps, mapping the IDs to their actual BLOBs. This eliminates the need to ever delete a BLOB (except perhaps when actually deleting an article with all its history, which is rare enough). Additionally, there isn't really a need for separate 'cur' and 'old' tables, especially when MemCacheD can take care of most recent versions. * You are using a 'recentchanges' table which, I presume, gets also updated with every edit. This, I assume is the idea behind it, allows the 'Recent Changes' page to quickly grab the most recent changes without having to find them elsewhere in the DB. Contrary to intuition, this is a bad idea. It is always a better idea to optimise for less DB writes even if it means a few more DB reads, because writes are so much slower. (I am so sure of this because LiveJournal has made these experiences with their "Friends Page": grabbing entries from all the friends' journals all over the place in the DB is faster than updating a "hint table" with every newly created entry.) In addition to these existing problems, of course there are things the database cannot currently handle, but were planned. While we're changing the DB, we could also add the following functionality to it: * Store translated website text, so translators don't have to dig through PHP code and submit a file to the mailing list. * A global table for bidirectional inter-wiki links. People should not have to add the same link to so many articles. In fact, taking this a step further, people should not even have to enter text like '[[fr:démocratie]]' into the article text when it's not part of the article text. There should be drop-downs underneath the big textbox listing languages, and little text boxes next to them for the target article name. Are you all still convinced that adapting the current code to all these radical changes is easier than rewriting it all from scratch? :-) Anyway. I'm tired. I'm going to bed. Good night. Timwi

16 39

Re: [Wikitech-l] Re: Dynamic dates
by Tim Starling 02 Jul '03

02 Jul '03

>Actually, I looked at the code to make some changes to my private language >file and didn't understand it. I know somehow what the code does but not >how, because I didn't want to spend that much time decoding all those >regexps that make the code pretty cryptic and hard to maintain - with the >added bonus of not having /any/ documentation. In case anyone's wondering, yes the code is in CVS now. But it's not finished: there's still at least one more update to Language.php to come. Sorry about that Jens but you'd probably be better off commenting the whole section out for now. As for not having any documentation, I can fix that in my next commit. >Why not rearrange the code into two segments: Segment one would parse the >linked date like present in the article into three numerical values: day, >month, year. The second segment then would only have to concat those values >(maybe with a converted month name) into the final form. This form could be >given in the usual "$1 $2 $3 $4" syntax (with e.g. $4 being the month >name). That also would take some load off the server that now has to >interpret lots of regexp patterns, which isn't too cheap because these >patterns normally first get converted into some form of a finite automaton >that is eventually given the input string for processing. (At least the >code by Tatu Ylonen proceeds in this way.) No, I think your way would be slower, not faster. There's a "pre-analysis" flag you're meant to use if you run the same regular expression many times, but according to the PHP manual it only provides a speedup if the regular expression starts with something other than a fixed character. So it sounds like there's no compilation. Besides, if regular expressions were slow, well, the rest of the parser would be in trouble. By splitting the replacing from the parsing, you're just turning one job into two. The same tasks are still required, there's just some extra administrative overhead. Plus since PHP is an interpretive language, I'd really rather have the O(N) code written in C. As for your way being simpler, well, I would contest that too. You code it your way, and we'll put them side by side. My way really is pretty simple if you understand Perl regular expressions. -- Tim Starling. _________________________________________________________________ Hotmail is now available on Australian mobile phones. Go to http://ninemsn.com.au/mobilecentral/signup.asp

1 0

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l July 2003