Wikitech-l July 2002

wikitech-l@lists.wikimedia.org

20 participants
75 discussions

by Tomasz Wegrzanowski

Wikipedia is often extremely slow. What's the bottleneck ? * network i/o ? * database performance ? * wikipedia script performance ? * something else ?

21 years, 9 months

Re: testing new server

by lcrocker＠nupedia.com

Neil, many thanks for the bots--I've been running them too over my cable modem and my server so we're getting plenty of hits. We are getting good timing data in the logfiles, but at this point I'm most interested in testing robustness. I'd really like it if you could hack up a script that did random page edits as well--I tried to hack your python a bit to do that, parsing out the text in the <textarea> and sending it back, but I'm just not up on python enough to figure out why it wasn't behaving the way I expected. I'd also like to have a version of your bot that sends the text you get back through a validator. I'll look for one that would be relatively simple to plug in. BTW, you can get server-side timing data from a comment at the end of each returned page: that will tell you how much time the server actually took between the start of the script and serving the page, so it eliminates the delays caused by the client script and the actual transfer. Might be interesting to compare the two. 0

21 years, 9 months

Estimating parser time

by Neil Harris

I'm testing the postit script with and without non-ASCII chars, to see if the apparent speed difference is real. I'm not running any other load generators. Each test is an average over roughly 50 pages. I'll alternate between the with and without states. With: 2.39 secs average Without: 2.91 secs average With: 2.84 secs average Without: 2.98 secs average With: 2.64 Without: 3.18 Well, there does seem to be a difference, but it still might be noise. All with = 7.87 / 3 All without = 9.07 / 3 Difference = 1.2 / 3 = 0.4 seconds I then did two long runs: With: 2.74 (averaged over 151 pages) Without: 3.21 (averaged over 153 pages) Difference: 0.47 seconds Another two long runs: With: 2.78 (averaged over 151 pages) Without: 3.21 (averaged over 151 pages) It looks like the difference is probably real, I guess due to the greater likelihood of things that will make the regexp engine have to lookahead and backtrack to check if things are links. This is _not_ a big performance issue at the moment, as the Wikipedia load is read-mostly. However, it's worth considering as a place to look at in the future. By then, we'll probably be running a three-tier architecture, with the DB running on a separate machine from the PHP scripts, and so even then this may be a low-priority issue. Neil

21 years, 9 months

Stress testing of beta.wikipedia.com

by Neil Harris

I've noticed that beta.wikipedia.com has not got a robots.txt file. Yes, I know that most recent robots will read the metadata in the page, but I'm willing to bet that some of the dimmer or older ones don't. Should we have one? Also, there's no favicon.ico. I enclose a file Walone2.ico which should work if renamed to favicon.ico and placed at http://beta.wikipedia.com/favicon.ico Neil

21 years, 9 months

Stress testing of beta.wikipedia.com

by Neil Harris

Here is another data point regarding the performance of beta.wikipedia.com I'm now running 7 stressbots (page readers) 2 postits (page writers) concurrently accessing beta.wikipedia.com via a 512K DSL connection. I'm getting the following statistics: average page read time 2.9 seconds average page write time 4.9 seconds corresponding to an average page read rate of 7/2.9 = 2.4 pages/sec average page write rate of 2 / 4.9 = 0.41 pages/sec making a total sustained transaction rate of around 2.8 hits/sec, or around 240,000 hits/day or over 7 million hits a month. However, my inbound traffic is around 61 kbytes/sec doing this, so my 512k DSL link is currently the bottleneck, not the server. Dropping the concurrency to 3 stressbots 1 postit gives: average page read time 1.9 seconds average page write time 3.1 seconds corresponding to an average page read rate of 3/1.9 = 1.57 pages/sec average page write rate of 1 / 3.1 = 0.32 pages/sec total transaction rate: 1.9 hits/sec for an inbound traffic rate of about 26 kbytes/s, where my DSL link is no longer the bottleneck, but the system is under less load. To really stress test the server, we will need several clients to run at once on several different links. I'm going to stop the test now. It would be useful for testing if we could have a page that gave current Linux operating system stats, perhaps in a sysop-only page? Neil

21 years, 9 months

software conversion of wikipedia.de

by Kurt Jansson

What can we do to speed up the process? Some people are getting frustrated (also by the server problems) at the German wikipedia. I hope people don't leave the project because of the much to often unreachable or slow server, but I don't know. Hopefully the problems will be over after the move to the new server. We have translated the most important bug reports at http://test-de.wikipedia.com/wiki/wikipedia:Bug_report (German version at http://test-de.wikipedia.com/wiki/wikipedia:Beobachtete_Fehler ) I hope we have found most of them. Maybe my last mail was overlooked, but I'd still like to have sysop status at test-de.wikipedia.com, so that I'm able to play around with article renaming and all those other nice features we have now. Username: Kurt Jansson Maybe someone could also give sysop status to Ben-Zin, who is very active in testing the new software. When I'm sysop, can I give these rights to the other Wikipedians? Thanks! Kurt

21 years, 9 months

brokenlinks table

by Axel Boldt

Jan writes: > Ultimately the best solution would be to have a table wanted(title, > #pages) with an index on #pages (and a unique index on title), then > MySQL wouldn't need to sort at all. I don't know of the top of my head > if there are any other queries that depend on 'brokenlinks' but I > don't believe so and if there are not then I would recommend replacing > it. I believe we need the full brokenlinks information in order to initialize the links table once a new article is written, so that "What links here" will immediately work for new articles. Nevertheless, I think we should have a wanted table as above in addition. Space is really no issue, but time is, and Most Wanted is likely to be one of our more commonly called slow functions. Axel

21 years, 9 months

by Axel Boldt

> by blindly executing TeX when someone edits a page, we are assuming > that they haven't included any malicious code in their TeX source. TeX has two dangerous commands: shell escapes and writing to an arbitrary file. Both can be globally disabled (and are disabled by default in most TeX distributions). It is fairly easy however to write TeX which eats memory like crazy (TeX allows recursion :-), so we would have to somehow restrict the resources available to the TeX process. But we are of course right now already wide open to all sorts of denial-of-service attacks. Axel

21 years, 9 months

TTH: the TEX to HTML translator

by Neil Harris

Also see this (but it's commercial, except for limited exceptions, and there's no source). Available in HTML and MathML versions. http://hutchinson.belmont.ma.us/tth/index.html Neil

21 years, 9 months

mathml at ORCCA

by Neil Harris

Here's a TeX to MathML convertor project, with an online demo -- and I was surprised and pleased to see that Mozilla 1.0 for Win32 has MathML enabled! http://www.orcca.on.ca/mathml/projects.html#tex2mml

21 years, 9 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l July 2002