Would some of you please look at the new installation at:
http://www.internet-encyclopedia.org
And see if you have any ideas regarding how to fix the bugs in it. It is
using NewCodeBase, not Phase 3.
Note that on the main page that the top portion of the page is dead, that is
the links for the first few inches don't work. Also the popdown menu (when
you have in in a stylesheet that does work) always sends you back to the
main page rather than doin what you selected.
Thanks, Fred Bauder
Okay, there's an article on heise.de about the German Wikipedia (20.000
articles). And our speed is going doooowwwwwn. Please do what you can to
speed things up. Please!
Kurt
The dynamic date code now seems to be working satisfactorily. Thanks to
Patrick and Wapcaplet for sorting me out on punctuation use.
Additionally, I've implemented Tarquin's "quick hack" to make linking many
translations together easier. Specifically, self links are ignored, so you
can just copy the same block of text to every language.
What's the standard procedure from here? How long are new features usually
left on the test server before updating the real one? Note that the dynamic
dates feature has already been discussed, and even announced at the village
pump, so the only reason to wait before uploading is to check for bugs.
Oh, I almost forgot. In fact I did forget to put it in the CVS summary. I
fixed that <pre>\0</pre> bug I reported at the village pump. In the
replacement string of preg_replace, backslashes need to be escaped, not just
dollar signs.
-- Tim Starling.
I've just committed a bunch of new stuff to CVS:
1) "Edit this section". If you enable the user preference "Show links for
editing individual sections", you get little "edit" links under each
article section. These can be used to fetch just the text of that section,
and edit it. No more scrolling through 30 K articles to edit a typo. No
more wading through long discussion threads to add a short comment
(provided they are organized using headlines).
2) Automatic table of contents. If the option "Show table of contents for
articles with more than 3 headings" is enabled, a small TOC is added on
top of the page with navigation links to the individual sections. From
this it follows logically that we now have
3) Anchors for each article section (named after the section title, e.g.
"External links" becomes "External_links"). So you can link to these from
elsewhere.
Regards,
Erik
Thomas Corell wrote:
> There is the problem that some users want and have different usernames in
> different wiki's. The only way to handle this is keeping all those 'lokal'
> users and first of all allow additional sigle-signon on the 'Master user
> DB' which should handle not only single-sign-on, but even all those lokal
> user accounts.
Couldn't we use the wikimedia.org domain name for handling user logins? That
way a user only has to sign into their WikimediaOne login and will then be
logged-in to every Wikimedia project and subproject.
But before that happens (and periodically after) we may want to flush out
several thousand en.wikipedia user accounts that have been 1) idle for more
than say 3 months and 2) have fewer than 10 total edits since the account was
created.
That would free up many user names so that actual contributors can use them.
-- Daniel Mayer (aka mav)
David Friedland wrote:
>
> I propose that whether and and where a TOC appears should be specified
> by each article using a wiki tag indicating where the TOC should appear.
> This would allow articles with many short sections to not have a TOC,
> long articles with just a few sections have a TOC, and allow article
> authors/editors to specify the best location for the TOC without relying
> upon the wiki software to presume the very top of an article is the best
> place.
Some examples of an implementation like this can be found at IAwiki:
http://www.iawiki.net/IAwikiIAhttp://www.iawiki.net/TextFormattingRules
-Jason Dreyer
Hi,
at the moment (17:00 CEST) the German Wikipedia is very slow, fr, too.
The English Wikipedia is running fine. Could someone have a look at this?
Thanks
Thomas aka Urbanus
--
Thomas Luft, Burgholzweg 97, D-72070 Tuebingen, GERMANY
Email: tluft(a)web.de Phone: +49 7071 408908 Fax: 408909
I noticed yesterday that Danga Interactive has released php bindings for
memcacheD, which is well, really cool. memcacheD is a in-memory caching
system for any tipe of data. It acts as a middle man in between the
application and the db, and transparently cache's objects. Its currently
being used in Livejournal and if you are a lj user you might have noticed
the big speedup it caused. Its pretty cool how it works, and I think that
wikipedia could really benefit from it since db load seems to be our main
bottleneck. I can see memcached storing pretty much all the curr versions,
which would cut down db use on reads a whole lot. I just don't know if there
is enough memory to run this, maybe if a 3d server is ever aquired?
wikipedia is growing fast, and so is the traffic, and the number of edits
per day. This is great for the project, but its apparently killing the
servers. There is a lot of hacks right now to reduce load (non-dynamic
caching, miser mode). But I believe that this application could cut out a
lot of the server load whithout reducing any functionality. Any thoughts
anyone?
Lightning
http://www.danga.com/memcached/
________________________________________________
This mail was sent by UebiMiau 2.5
Now that my own proposal to re-write the entire code in Perl again was
met with great resistance, I propose to at least create an entirely new
database structure, and then adapt the current code to it. I have
studied the current database structure and see the following rather
severe problems with it:
* BLOBs that store article text are combined in the same table as
meta-data (e.g. date, username of a change, change summary, minor flag,
etc.). This is bad because variable-length fields like BLOBs negatively
affect the performance of reading the table. Pages like the watchlist
should not have to bother with variable-length data such as article text
and would run a lot faster if they could get their data entirely from
fixed-length rows.
* Currently, all user properties and preferences, as well as article
properties, are columns in a table. Although this is not a problem in
terms of DB-reading performance, introducing a new user preference or
other enhancements involve adding a new column, and adding a new column
becomes extremely database-intensive as the database grows. LiveJournal
uses a very clever system that will easily remedy this. One table,
'userproplist', stores the possible user properties (userprops), and
another table, 'userprop', stores what user has what userprop with what
value. This way all that is needed for adding a new userprop is adding a
single row to the 'userproplist' table. The same would analogously apply
to articleprops. Once we have that, the user table will hopefully remain
very small (in terms of number of columns), so looking up a username (to
name just an example) would be ridiculously efficient.
* BLOBs that fulfill the same function (article text) are scattered
across two tables (cur and old). This is bad because it means
variable-length text has to be moved across tables on every edit. Very
slow. Better to give every version/revision of an article (i.e. each
item of article text) a permanent ID and use those IDs in the 'cur' and
'old' tables instead. Then have one large table, 'articletext' perhaps,
mapping the IDs to their actual BLOBs. This eliminates the need to ever
delete a BLOB (except perhaps when actually deleting an article with all
its history, which is rare enough). Additionally, there isn't really a
need for separate 'cur' and 'old' tables, especially when MemCacheD can
take care of most recent versions.
* You are using a 'recentchanges' table which, I presume, gets also
updated with every edit. This, I assume is the idea behind it, allows
the 'Recent Changes' page to quickly grab the most recent changes
without having to find them elsewhere in the DB. Contrary to intuition,
this is a bad idea. It is always a better idea to optimise for less DB
writes even if it means a few more DB reads, because writes are so much
slower. (I am so sure of this because LiveJournal has made these
experiences with their "Friends Page": grabbing entries from all the
friends' journals all over the place in the DB is faster than updating a
"hint table" with every newly created entry.)
In addition to these existing problems, of course there are things the
database cannot currently handle, but were planned. While we're changing
the DB, we could also add the following functionality to it:
* Store translated website text, so translators don't have to dig
through PHP code and submit a file to the mailing list.
* A global table for bidirectional inter-wiki links. People should not
have to add the same link to so many articles. In fact, taking this a
step further, people should not even have to enter text like
'[[fr:démocratie]]' into the article text when it's not part of the
article text. There should be drop-downs underneath the big textbox
listing languages, and little text boxes next to them for the target
article name.
Are you all still convinced that adapting the current code to all these
radical changes is easier than rewriting it all from scratch? :-)
Anyway. I'm tired.
I'm going to bed.
Good night.
Timwi
>Actually, I looked at the code to make some changes to my private language
>file and didn't understand it. I know somehow what the code does but not
>how, because I didn't want to spend that much time decoding all those
>regexps that make the code pretty cryptic and hard to maintain - with the
>added bonus of not having /any/ documentation.
In case anyone's wondering, yes the code is in CVS now. But it's not
finished: there's still at least one more update to Language.php to come.
Sorry about that Jens but you'd probably be better off commenting the whole
section out for now. As for not having any documentation, I can fix that in
my next commit.
>Why not rearrange the code into two segments: Segment one would parse the
>linked date like present in the article into three numerical values: day,
>month, year. The second segment then would only have to concat those values
>(maybe with a converted month name) into the final form. This form could be
>given in the usual "$1 $2 $3 $4" syntax (with e.g. $4 being the month
>name). That also would take some load off the server that now has to
>interpret lots of regexp patterns, which isn't too cheap because these
>patterns normally first get converted into some form of a finite automaton
>that is eventually given the input string for processing. (At least the
>code by Tatu Ylonen proceeds in this way.)
No, I think your way would be slower, not faster. There's a "pre-analysis"
flag you're meant to use if you run the same regular expression many times,
but according to the PHP manual it only provides a speedup if the regular
expression starts with something other than a fixed character. So it sounds
like there's no compilation. Besides, if regular expressions were slow,
well, the rest of the parser would be in trouble.
By splitting the replacing from the parsing, you're just turning one job
into two. The same tasks are still required, there's just some extra
administrative overhead. Plus since PHP is an interpretive language, I'd
really rather have the O(N) code written in C.
As for your way being simpler, well, I would contest that too. You code it
your way, and we'll put them side by side. My way really is pretty simple if
you understand Perl regular expressions.
-- Tim Starling.
_________________________________________________________________
Hotmail is now available on Australian mobile phones. Go to
http://ninemsn.com.au/mobilecentral/signup.asp