Wikitech-l May 2003

wikitech-l@lists.wikimedia.org

59 participants
147 discussions

by Thomas Corell

In the German wikipedia a list of the used qualifiers in the titles was discussed, and most of the participants think it will be a interesting feature. I know I have expressed it wrong, therefore an example: Cell (biology) is a homonym (cell) with a qualifier (biology). To get a proper list of those qualifiers and modify or eliminate wrong ones, a list would be very helpful. The discussion result was that such a unique list of those qualifiers from titles (table cur) and bl_to (brokenlinks) would make sense. Unfortunally I can give you only a proper PostgerSQL select statement (only table cur) but possibly someone can transfer this easy to mySQL: ======== WARNING: THIS IS NOT A VALID STATEMENT FOR WIKIPEDIA ========== SELECT DISTINCT substring(cur_title FROM '.+\$(.+)\$') AS p FROM cur; ========================= I WARNED YOU ================================= -- "\\(" ==> ( (needed for quoting) -- (.+) ==> the () - construct is used to select the part substring will return. For the titles "foo", "foo (bar)", "foo2 (bar)" and "bar (foo)" the result will be "bar" and "foo". This should only show the as-is state of these qualifiers! There is no intention for any automated process to enforce them, because no Wikipedian should get an error like "Qualifier not allowed" or so. This page is only for administrational and informational purposes! Of course additional features, like showing the matched pages and others, would be nice, but there the discussion must go on further, IMHO. If there are more questions, ask, I will try to answer them. Smurf -- ------------------------- Anthill inside! ---------------------------

21 years

Experimental page caching in cvs, online

by Brion Vibber

I've finally gone ahead and hacked up that preliminary page caching I've been talking about; see new changes to Article.php & co. As an emergency measure I've put it up on larousse/www.wikipedia.org with only minimal testing. So far it's working great -- system load is way down, response time seems good. Presently it operates only on regular page views by users who are not logged in. I've tweaked the header in the corner so it no longer shows the IP address, so every anon's page will appear the same. (If someone's added to their talk page, this is detected and the cache is disabled, so the 'You have new messages' link will show and take them to the talk page. For pages that are determined cacheable, we check a cache directory for a file: if it exists and is not obsoleted by the 'last touched' timestamp already established for dealing with browser caching, we just load it up and pass it straight through. If there's no file or it's obsolete, we install an output buffer handler, and at the end we catch the whole page output and save it to the file. Caveats: - Invalidation of cached pages is controlled by the same mechanism that invalidates browser caches, and will be subject to some of the same bugs there. Problem areas may include undeletion, the talk/article page links, and anywhere where the link tables are broken. Some redirects may be funny, but hopefully not. :) - I'm pretty sure I excluded all the non-cacheable page view variants. I might have missed something, in which case bad pages could crop into the cache space. - There's a site-wide cache invalidation date settable in localsettings. I haven't actually tested it :) and there should probably be a sysopable or developerable clear-all-caches special page. This also needs to be worked in to affect the browser cache as well. - It should also be possible to explicitly clear the cache of a page and force it to regenerate in case it's screwed up. Perhaps a little button or something. - Some pages, like the main page, should be invalidated periodically or else never cached, because they contain special variables (time, article count) which may change. - This only affects non-logged-in users so far. But that makes up the greater part of our traffic, so that's okay for now. It makes the server faster for the rest of us. :) - The cache directory is divided up like the upload directory is; so there should be 4096 separate dirs. Should be plenty for keeping ext3 from going mad and killing us all for a while yet. Other notes: - Hypothetically we could fall back on cached pages if unable to contact the database. - Cache files are not deleted on invalidation; they're just assumed obsolete, and replaced when needed. - There's a fun new bug where logouts (or perhaps timeouts) leave a session in a funny state where the interface works as not-logged-in, but edits are saved with the formerly used user name (but still with 0 as the user id, so contribs doesn't work). See our now much happier servers: [brion@larousse w]$ uptime 15:26:42 up 14:57, 3 users, load average: 7.36, 9.50, 7.76 [brion@larousse w]$ free total used free shared buffers cached Mem: 1030952 996824 34128 0 176848 554108 -/+ buffers/cache: 265868 765084 Swap: 1020088 72416 947672 [brion@larousse w]$ vmstat 1 15 procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 1 0 0 72416 34104 176848 554248 0 10 8 88 393 210 73 3 24 0 0 0 72428 34020 176848 554300 0 172 0 172 761 394 60 6 34 0 0 0 72428 34636 176848 554312 0 0 0 616 380 187 30 1 69 0 0 0 72428 34644 176848 554356 0 156 0 156 496 338 43 4 53 0 0 0 72416 34480 176852 554416 0 0 28 0 512 308 64 2 34 1 0 0 72420 34472 176852 554456 0 200 0 200 479 274 48 2 50 2 0 2 72420 34472 176852 554464 0 0 0 208 527 341 82 6 12 3 0 0 72392 34712 176852 554512 0 204 0 536 552 420 55 4 41 1 0 0 72392 34720 176852 554548 0 0 0 0 436 287 42 0 58 1 0 0 72372 34732 176852 554564 0 208 0 208 536 315 32 8 60 2 0 0 72368 34568 176852 554612 0 0 20 0 521 320 53 3 45 1 0 0 72380 33980 176852 554620 0 224 0 840 491 318 29 6 65 1 0 0 72384 33716 176856 554688 0 108 12 108 487 302 55 1 44 2 0 0 72384 33856 176856 554712 0 0 0 0 410 237 38 0 62 0 0 0 72384 33836 176856 554724 0 192 0 192 299 163 14 2 84 [brion@pliny brion]$ uptime 3:28pm up 4 days, 5:28, 1 user, load average: 2.46, 3.49, 3.13 [brion@pliny brion]$ free total used free shared buffers cached Mem: 2068912 1973376 95536 0 35360 1155584 -/+ buffers/cache: 782432 1286480 Swap: 2047992 436568 1611424 [brion@pliny brion]$ vmstat 1 15 procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 0 0 0 436568 95020 35392 1155900 12 8 41 7 18 43 47 5 47 3 2 0 436568 90744 35444 1156460 4 0 572 728 477 730 36 5 59 2 4 1 436568 89072 35508 1157508 0 0 1084 1012 754 866 38 7 55 4 0 0 436568 88900 35516 1157696 0 0 180 60 515 656 37 3 60 5 0 0 436568 84020 35540 1157808 0 0 104 444 477 815 51 6 43 2 0 0 436568 80556 35544 1157816 0 0 0 204 401 596 47 3 50 1 0 0 436568 79796 35556 1157860 0 0 56 124 394 378 19 2 78 2 0 0 436568 79756 35560 1157900 0 0 32 180 513 655 23 7 70 0 1 0 436568 79740 35604 1157924 0 0 24 1165 377 357 34 10 56 1 0 0 436568 76008 35612 1157996 0 0 72 16 383 366 7 2 90 3 0 0 436568 67984 35616 1158148 0 0 152 0 380 391 14 3 83 0 1 0 436568 64404 35632 1158560 0 0 416 132 401 504 26 4 70 2 1 0 436568 67392 35692 1159556 0 0 992 856 538 639 13 6 82 1 5 3 436568 66912 35704 1159808 0 0 240 1782 573 737 59 6 35 0 1 1 436568 66880 35732 1159928 0 0 116 2194 707 732 43 9 48 Pliny's got room to expand, and Larousse's end still has optimization that can be done. So things are looking good! -- brion vibber (brion @ pobox.com)

21 years

Re: Chat about Wikipedia performance?

by David A. Wheeler

I've looked at Brion Vibber's "ps auxwww" output (thanks!!). Although the MySQL daemons take up the lion's share of memory, they don't take much of the %CPU, even in aggregate. Instead, the CPU seems to be taken up by the Apache daemons (/usr/local/apache/bin/httpd). It doesn't appear that one daemon takes up all the time; it appears spread out to some extent (a little bit by each, though there IS a lot of variance). Presumably this is due to each one executing the PHP scripts. Clearly speeding execution of the PHP scripts would help. One way is to reduce the work they have to do (e.g., caching the HTML). Another is coding the hotspot (e.g., as a loaded C module). But doing it right requires identifying what the hotspot is in the PHP scripts. Is there a way to enable performance monitoring in PHP, like gprof in C, to figure out where the hotspots in the PHP scripts are? Failing that, I guess you could insert monitoring points in various places (painful, painful). Of course, this doesn't mean that moving wikitext from MySQL to the filesystem, or using the filesystem as an HTML cache, is a bad idea. I don't know how transmitting data from MySQL to the scripts is accounted for; the transit time betwen script and MySQL may be hidden in the script performance measures.

21 years

Mailing lists on ibiblio?

by erik_moeller＠gmx.de

We don't want the servers on ibiblio, but how about the mailing lists? Putting all your eggs in one casket .. basket .. is never a good idea, no? Regards, Erik

21 years

Hmm, does this work?

by Brion Vibber

Fun with mail servers...

21 years

Testing

by Brion Vibber

sigh.......

21 years

testing

by Brion Vibber

whee

21 years

WikiPedia use in other application...

by Forrest Aldrich

(newbie here, so excuse any ignorance in the wiki-ism's) I want to set up a full documention base, using Wiki<something>, to supplement my work as a Sysadmin. I wonder if someone has done this before - for example, from things of compile time flags, to system notes, to upgrade instructions, etc. etc... I see Wiki as a potentially good application/tool for this. Of course, my question is specific to the code used (phase3) in Wikipedia. Thanks, Forrest

21 years

space beneath headings

by tarquin

this issue has raised its head again on the manual of style page. And it still bugs me that we're mixing up presentation with semantics. recap: the curent situation is this: == heading == Text: there will be little space between this and the heading == heading == Text following a blank line. There will be a gap between this and the heading So a quick question: 1. under what circumstances do we *definitely* need space between a heading an the following content? and 2. under what circumstances do we *definitely NOT*? The only one I can remember is that in tables (countries, elements, etc), we want *no* space. any others? because if not, we can resolve this problem with CSS: h1 , h2 etc { usual space } table h1 , table h2 etc { no space }

21 years

id vs name attributes and javascript

by Brion Vibber

While setting things up for the automated testing system, Lee made some changes to the HTML code, adding id tags & such so the tests could find things more easily. I'm not sure exactly how the code should be behaving, but there does seem to be a difference, as reported at http://www.wikipedia.org/wiki/Wikipedia%3ANew_server_madness : On edit pages, we've changed from: <form .... name='editform'> to <form id="editform" ....> However the latter form seems to break this JavaScript fragment in the page's onLoad handler: document.editform.wpTextbox1.focus() Mozilla gives the error "document.editform has no properties". IE 5.5 gives "'document.editform.wpTextbox1' is null or not an object". -- brion vibber (brion @ pobox.com)

21 years

← Newer
1
...
5
6
7
8
9
10
11
...
15
Older →

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l May 2003