Wikitech-l April 2003

wikitech-l@lists.wikimedia.org

47 participants
72 discussions

by Thomas Corell

In the German wikipedia a list of the used qualifiers in the titles was discussed, and most of the participants think it will be a interesting feature. I know I have expressed it wrong, therefore an example: Cell (biology) is a homonym (cell) with a qualifier (biology). To get a proper list of those qualifiers and modify or eliminate wrong ones, a list would be very helpful. The discussion result was that such a unique list of those qualifiers from titles (table cur) and bl_to (brokenlinks) would make sense. Unfortunally I can give you only a proper PostgerSQL select statement (only table cur) but possibly someone can transfer this easy to mySQL: ======== WARNING: THIS IS NOT A VALID STATEMENT FOR WIKIPEDIA ========== SELECT DISTINCT substring(cur_title FROM '.+\$(.+)\$') AS p FROM cur; ========================= I WARNED YOU ================================= -- "\\(" ==> ( (needed for quoting) -- (.+) ==> the () - construct is used to select the part substring will return. For the titles "foo", "foo (bar)", "foo2 (bar)" and "bar (foo)" the result will be "bar" and "foo". This should only show the as-is state of these qualifiers! There is no intention for any automated process to enforce them, because no Wikipedian should get an error like "Qualifier not allowed" or so. This page is only for administrational and informational purposes! Of course additional features, like showing the matched pages and others, would be nice, but there the discussion must go on further, IMHO. If there are more questions, ask, I will try to answer them. Smurf -- ------------------------- Anthill inside! ---------------------------

20 years, 11 months

Patch for LanguageDe.php

by Thomas Corell

Some typos, as usually. Smurf -- ------------------------- Anthill inside! --------------------------- 69c69 < "et" => "Esti", --- > "et" => "Eesti", 402c402 < "prevn" => "letzte $1", --- > "prevn" => "vorherige $1",

20 years, 11 months

Developper limits

by Anthere

Hum, I am a bit embarrassed here. But well... We have a bottle neck problem I fear. Utilisateur:Alvaro asked to be a sysop on the french wiki quite a while ago, and nobody spoke against (indeed, several spoke for him). Then, since this is asked on our pump, he was first forgot for a while. Had to kindly ask again. So I put a message asking for him to be made sysop in the database on your page Brion, on the english wikipedia, as well as the metapedia if I remember well. He still is not sysop. It is clear it is a pain for you developpers to do this type of chore; Ed mentionned it several times. So, we wait, and ask again, and again. But, then, isnot there a way for us to make people on the international wikipedia ourselves ? Could there be something like a query, which could allow french sysop to make french user (just an example of course) a sysop automatically ? Maybe a list of user could be displayed, and one sysop could click on the name of one user to make him sysop ? Of course, it would be absolutely required then than a log is clearly visible to everyone, to avoid abuse. Is something like this possible or not ? Meanwhile, could someone please make Alvaro sysop ? Also, as Aoineko mentionned previously, the mispelling page is quite broken, and has never been working well since we are phase III (31th of october). It was already reported. I know you are overbooked. Is there something we can do (ok, not me, but there are some developpers among us) to make that work ? Do the other international wikipedias using accentuated letters do have the problem as well ? Anthere __________________________________ Do you Yahoo!? The New Yahoo! Search - Faster. Easier. Bingo. http://search.yahoo.com

20 years, 12 months

Caching

by Geoffrey Thomas

A suggestion for caching: the home page, [[Main Page]], never should include links to nonexistent pages. Could that be cached (in standard stylesheet), e.g., as /index.html? I would think that page is accessed very frequently and that caching it would save some database work for other pages. When the page is edited (only by a sysop), he/she would render the page to HTML, possibly by explicitly accessing /wiki/Main_Page, and save it as /index.html. Would this help some server strain? Could we also cache other pages known to link only to existing pages, such as this week in dates ([[April 30]], etc.), [[Current events]], and [[Recent deaths]]? Or are these pages updated too often to be useful? Could we also cache the protected pages? -[[User:Geoffrey]] __________________________________ Do you Yahoo!? The New Yahoo! Search - Faster. Easier. Bingo. http://search.yahoo.com

20 years, 12 months

Possible performance issue?

by Nick Reinking

I notice that /usr (/dev/sda2) is at 96%. ext2 has some pretty bad problems with fragmentation once it gets above a certain percentage. This can cause some pretty bad performance problems. Once it has fragmented, it is difficult to get it back to a contiguous state. There are defrag programs, but they are fairly scary. The only other way to get it back to normal is to back everything up, mkfs, and restore it. Perhaps somebody can remove a bunch of the packages that are installed that we don't use? -- Nick Reinking -- eschewing obfuscation since 1981 -- Minneapolis, MN

20 years, 12 months

[Sales@Computers4SURE.com: We've received your order #C030450733, Jimmy. Thank you.]

by Jimmy Wales

Motherboard for the server we're going to use as Wikipedia front end. I haven't heard from Jason yet as to when he can make the trip to install it, but possibly Friday or Monday. ----- Forwarded message from Sales(a)Computers4SURE.com ----- From: <Sales(a)Computers4SURE.com> Date: Tue, 29 Apr 2003 11:29:37 -0400 To: <jwales(a)bomis.com> Subject: We've received your order #C030450733, Jimmy. Thank you. Thank you for shopping with Computers4SURE.com. We would like to confirm that your order has been received. On 29-Apr-2003, you ordered: Item Quantity Price ---------------------------------------------------------- SERVERWORKS LE-T DUAL PGA370 1 $485.95 -----------------SNIP-----------------------

20 years, 12 months

Re: server names

by Daniel Mayer

How about Yin and Yang? Or chaos and opportunity? -- mav

20 years, 12 months

RE: [Wikitech-l] Ad-hoc changes

by Mark Christensen

The only issue I see would be if database threads crowd out apache on the old server, and all the foreign wiki's go down, while the English wiki stays up. This could create new tensions with the foreign wiki's. -----Original Message----- From: Lee Daniel Crocker [mailto:lee@piclab.com] Sent: Tuesday, April 29, 2003 5:13 PM To: wikitech-l(a)wikipedia.org Subject: Re: [Wikitech-l] Ad-hoc changes > (Brion Vibber <vibber(a)aludra.usc.edu>): > [Configuration info] Here's a thought: when we get the new server up, let's install the English wiki cleanly from the distribution, make sure it's all happy, then do the switch, _leaving the foreign wikis on the old server_ for a while, then do the same thing for each of the foreign ones in turn, updating them to the newest software, and making everything in sync, before we shut down apache on the old server. Any problems with that? -- Lee Daniel Crocker <lee(a)piclab.com> <http://www.piclab.com/lee/> "All inventions or works of authorship original to me, herein and past, are placed irrevocably in the public domain, and may be used or modified for any purpose, without permission, attribution, or notification."--LDC _______________________________________________ Wikitech-l mailing list Wikitech-l(a)wikipedia.org http://www.wikipedia.org/mailman/listinfo/wikitech-l

20 years, 12 months

Chat about Wikipedia performance?

by David A. Wheeler

Hi - clearly, it'd be great if Wikipedia had better performance. I looked at some of the "Database benchmarks" postings, but I don't see any analysis of what's causing the ACTUAL bottlenecks on the real system (with many users & full database). Has someone done that analysis? I suspect you guys have considered far more options, but as a newcomer who's just read the source code documentation, maybe some of these ideas will be helpful: 1. Perhaps for simple reads of the current article (cur), you could completely skip using MySQL and use the filesystem instead. Simple encyclopedia articles could be simply stored in the filesystem, one article per file. To avoid the huge directory problem (which many filesystems don't handle well, though Reiser does), you could use the terminfo trick.. create subdirectories for the first, second, and maybe even the third characters. E.G., "Europe" is in "wiki/E/u/r/Europe.text". The existence of a file can be used as the link test. This may or may not be faster than MySQL, but it's probably faster: the OS developers have been optimizing file access for a very long time, and instead of having userspace<->kernel<->userspace interaction, it's userspace<->kernel interaction. You also completely avoid locking and other joyless issues. 2. The generation of HTML from the Wiki format could be cached, as has been discussed. It could also be sped up, e.g., by rewriting it in flex. I suspect it'd be easy to rewrite the translation of Wiki to HTML in flex and produce something quite fast. My "html2wikipedia" is written in flex - it's really fast and didn't take long to write. The real problem is, I suspect that isn't the bottleneck. 3. You could start sending out text ASAP, instead of batching it. Many browsers start displaying text as it's available, so to users it might _feel_ faster. Also, holding text in-memory may create memory pressure that forces more useful stuff out of memory. Anyway, I don't know if these ideas are all that helpful, but I hope they are.

20 years, 12 months

Re: [Wikitech-l] Chat about Wikipedia performance?

by Tim Starling

>From: Lee Daniel Crocker <lee(a)piclab.com> >One things that would be nice is if the HTTP connection could be >dropped immediately after sending and before those database updates. >That's easy to do with threads in Java Servlets, but I haven't >found any way to do it with Apache/PHP. :P No, I looked into exactly this problem in connection with my own little project (improved Special:Movepage). PHP and threads don't mix. As far as I could see, the PHP subprocess has to exit (taking all threads with it) before Apache will drop the connection. Like Brion said, you'd have to set up another process, and use PHP's poorly documented IPC functions. As for what improvement it would achieve: it wouldn't reduce database load per view, it would just allow users to hit more pages sooner. I think caching HTML is the way to go, in the short term. If people don't want to code something complicated, you could ignore user preferences for now and only cache pages for "anonymous" users. The cached version could leave little notes in the HTML like <strong>Isaac Newton</strong> was a <<WIKILINK[[physics|physicist]]>> born in... and maybe <<USERIP>> (<a href ="http://www.wikipedia.org/wiki/User_talk:<<USERIP>>" class='internal' title="User talk:<<USERIP>>">Talk</a> Then a cache processing script would look up the link table and replace the links with real HTML. I imagine looking up the link table is much, much faster than looking up cur_text. Plus the cached text would be stored on the web server, thereby distributing disk load more evenly. As for invalidation, the easiest, and possibly ugliest way I can think of is implementing it in wfQuery() *cringe*. That's a very simple function with very diverse uses, but every single update query passes through that point. Just use a hash table (always in RAM) to store the article name of every cache entry, and remove the rows when they're invalidated. There'd also have to be a check for an altered user talk page. This could be handled with another of my <<TAGS>>. This idea is likely to be met with apathy. I'd like to code it myself, but I don't have Linux on my PC, or a broadband connection, or much free hard drive space, or... time. So there you have it: my two cents, backed up by hot air. -- Tim Starling. _________________________________________________________________ Hotmail now available on Australian mobile phones. Go to http://ninemsn.com.au/mobilecentral/hotmail_mobile.asp

20 years, 12 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l April 2003