Hi,
the past few days I've been experimenting a bit with Apache, mod_perl, MySQL and creating an entire own website. I've never done that before, and I think I've learnt a lot from this.
Now, for some reason, against all of your advice, I have started to program a Wiki, and by now it's already become a suitable basis for a Phase-IV Wikipedia software, including a database schema.
It really doesn't seem to be very difficult to re-program the current software in Perl, this time taking all the problems into account and designing everything right from the start (incl. single-login, multi-language watchlists, a better translation system and skin system (separated from the code), etc.) I've also made a lot of considerations and decisions with respect to database design and performance.
Should I continue with this?
Greetings, Timwi
Timwi wrote:
Hi,
the past few days I've been experimenting a bit with Apache, mod_perl, MySQL and creating an entire own website. I've never done that before, and I think I've learnt a lot from this.
Now, for some reason, against all of your advice, I have started to program a Wiki, and by now it's already become a suitable basis for a Phase-IV Wikipedia software, including a database schema.
It really doesn't seem to be very difficult to re-program the current software in Perl, this time taking all the problems into account and designing everything right from the start (incl. single-login, multi-language watchlists, a better translation system and skin system (separated from the code), etc.) I've also made a lot of considerations and decisions with respect to database design and performance.
Should I continue with this?
I wouldn't advise it. While making a simple wiki software is simple enough, once it startrs getting abigger and more complicated there is a lot more to be done. Plus, I think there was a reason behing programming in php vs Perl, im not sure.. but a re-write in a different language sounds a bit risky. Then add to that that you designed a new db schema and that all data will have to be converted from one to the other and you got yourself another issue. I think this is one of those times where "If it's not broken, don't fix it" I currently see no problems with the wiki software as it is. I mean sure we have a couple of issues here and there, but I dont think that means we need to re-write all the code. What I think would be more important is a clean-up of the code. And hopefully making the current code use a templating system. After that we could slowly move towards universal logins through a simple system I proposed earlier which would allow for a simple transition. After that is done, we could start working on unifying watch lists. etc. But at the moment I think its more important to keep the current software running without interruptions and optimizing it so it can deal with the high loads it has to handle.
Iwould however be very interested in seeing your DB schema and some info about your ideas for then ew software before completely shooting down the idea. just please don't post perl source full of perlisms like $/, $_, $? @var, %var. etc that crap is obfuscated and hurts my eyes.. i absolutely hate it. Hell if it was up to me, everything would be written in something in between c++ and php and there would be no such thing as arrays, only vectors.
Lightning
I think everybody was under the impression you were going to become a PHP guy. The question you really pose is "is the next WP incarnation going to be written in Perl" As a non-coder -- just gauging the pulse over the last several months about difference between the two, and of the general sense of things -- Id say "heckles no."
-S-
__________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com
Lightning wrote:
Timwi wrote:
Should I continue with this?
I wouldn't advise it.
Heh, knew it! ;-)
Plus, I think there was a reason behing programming in php vs Perl, im not sure..
Read the section "Why are we using PHP instead of perl?" on http://www.wikipedia.org/wiki/Wikipedia:PHP_script_FAQ
Then add to that that you designed a new db schema and that all data will have to be converted from one to the other
Of course I have thought about this. My plan would be to convert articles as they are requested by readers. That way the database would get converted gradually on demand, thus minimising possible delay and avoiding downtime.
"If it's not broken, don't fix it" I currently see no problems with the wiki software as it is.
Well, please allow me to reply something to this even if it may potentially be somewhat offensive. If it is, I am very sorry, and I do not mean to offend you. But the reason for this is that you are American and (as far as I can tell, sorry if I'm wrong) only use the English Wikipedia. People who are using several different language Wikipedias are seeing many more disadvantages and shortcomings in the current design. Add to that the fact that you are very much used to Wikipedia as it is now; it is often difficult to forego something familiar.
Additionally, the current database schema has performance problems that you will not be able to overcome without converting the database. Of course, you could convert it bit by bit, but then again, you could equally well start with an entirely new database schema...
After that we could slowly move towards universal logins through a simple system I proposed earlier which would allow for a simple transition.
Unfortunately, any change that would introduce a universal-login system to the current system, would be a hack solution in and of itself, simply because the current software was not designed to handle this.
Iwould however be very interested in seeing your DB schema
http://meta.wikipedia.org/wiki/Experimental_new_database_schema
Any comments, of course, are welcome.
and some info about your ideas for then ew software
Well, I've had quite a lot of ideas, I probably wouldn't know where to start. Of course, I have followed this mailing list and have picked up ideas here and there. I support the idea of a more-or-less independent layout system that will allow users to skin Wikipedia to their needs without the software needing to hard-core any HTML (with the possible exception of the contents of pages in the Special: namespace).
Other (not-so-important) improvements over the current system include translatable names of Special pages (not just the namespaces), all non-canonical URLs redirect to the canonical URLs, so every resource has one and only one place to live; easier URLs (/Article_name/edit instead of /w/wiki.phtml?title=Article_name&action=edit), being able to use Wikipedia's web interface in one language while viewing articles in another language ...
just please don't post perl source full of perlisms like $/, $_, $? @var, %var. etc that crap is obfuscated and hurts my eyes..
I haven't used $? yet :)))
But I have to admit to being a fan of regular expressions. I use a few of them currently to extract information from URLs (like the language code), but the two Special Pages I have created so far (Log In and Translate) do not use them at all, and I don't think most of them will (things like Watchlist, Recent Changes, Orphans etc.etc. will just be SQL statements and then an HTML table).
Greetings, Timwi
On Fri, Jul 25, 2003 at 01:42:29AM +0200, Timwi wrote:
Lightning wrote:
Plus, I think there was a reason behing programming in php vs Perl, im not sure..
Read the section "Why are we using PHP instead of perl?" on http://www.wikipedia.org/wiki/Wikipedia:PHP_script_FAQ
Selecting the language to use for Phase IV we should think about the features it has to provide.
I think we would benefit from a framework that provides easy to use shared information, information that any running instance can immediately access. Things like localization strings, Recent changes or other frequently requested but dynamic data.
Currently, localization strings are initialized for every request and thrown away afterwards. They could be a "global ressource" that's initiated only once.
I know that Java servlets provide these (initialize a hash during the init of the servlet and store a reference to it in the servleti), and that PHP doesn't. I'm not very good at mod_perl, does it allow e.g. a design of Recent Changes as a big array? Instead of writing a dedicated Recent Changes table it would be enough to write edits line by line to a journal and reload some thousand lines of journal on a server restart.
This would save us a lot of database queries (for RC), or allow dynamic changing of the localization strings while still having high performance (currently changes have to go through CVS before being applied to the production server, very slow process)
Then add to that that you designed a new db schema and that all data will have to be converted from one to the other
Of course I have thought about this. My plan would be to convert articles as they are requested by readers. That way the database would get converted gradually on demand, thus minimising possible delay and avoiding downtime.
This will be very hard to maintain if you want to have the special pages reenabled. You will have to do lots of joins and subselects. Nothing MySQL is very fast at.
"If it's not broken, don't fix it" I currently see no problems with the wiki software as it is.
Well, please allow me to reply something to this even if it may potentially be somewhat offensive. If it is, I am very sorry, and I do not mean to offend you. But the reason for this is that you are American and (as far as I can tell, sorry if I'm wrong) only use the English Wikipedia. People who are using several different language Wikipedias are seeing many more disadvantages and shortcomings in the current design. Add to that the fact that you are very much used to Wikipedia as it is now; it is often difficult to forego something familiar.
Additionally, the current database schema has performance problems that you will not be able to overcome without converting the database. Of course, you could convert it bit by bit, but then again, you could equally well start with an entirely new database schema...
I don't think the database schema is the source of our problems, I thinks it's the RDBMS. The slowdowns I have observed (Sudden slowdowns for a minute or two, server being fine afterwards) are always around the saving of a "List of ..." kind of article. This looks like locking issues that other RDBMSes that are not optimized for "read mostly" do not show.
and some info about your ideas for then ew software
Well, I've had quite a lot of ideas, I probably wouldn't know where to start. Of course, I have followed this mailing list and have picked up ideas here and there. I support the idea of a more-or-less independent layout system that will allow users to skin Wikipedia to their needs without the software needing to hard-core any HTML (with the possible exception of the contents of pages in the Special: namespace).
Other (not-so-important) improvements over the current system include translatable names of Special pages (not just the namespaces), all non-canonical URLs redirect to the canonical URLs, so every resource has one and only one place to live; easier URLs (/Article_name/edit instead of /w/wiki.phtml?title=Article_name&action=edit), being able to use Wikipedia's web interface in one language while viewing articles in another language ...
just please don't post perl source full of perlisms like $/, $_, $? @var, %var. etc that crap is obfuscated and hurts my eyes..
I haven't used $? yet :)))
I suddenly remember why I don't like Perl ...
JeluF
Jens Frank wrote:
On Fri, Jul 25, 2003 at 01:42:29AM +0200, Timwi wrote:
Lightning wrote:
Plus, I think there was a reason behing programming in php vs Perl, im not sure..
Read the section "Why are we using PHP instead of perl?" on http://www.wikipedia.org/wiki/Wikipedia:PHP_script_FAQ
Selecting the language to use for Phase IV we should think about the features it has to provide.
Well, the particular feature you have mentioned isn't specific to a programming language. Yes, mod_perl allows you to store something in a global hash and keep the data between requests, but obviously this will only work if you have only a single webserver (which I hope we're not going for).
This is a problem MemCacheD can solve - in fact, that's exactly what MemCacheD is for. And APIs for MemCacheD are, as has been said, available for both Perl and PHP (though the Perl one is the only one currently in use at a frequently-visited production website, livejournal.com.)
does it allow e.g. a design of Recent Changes as a big array?
Although the theoretical answer to this is 'yes', this is optimising at the wrong spot. I've said this before. The Recent Changes page is not a performance problem as long as it only reads (provided of course it is optimised enough). What really blocks the database is a write. With the current software and database schema, article text is inserted into 'old' and then also modified in 'cur' every time you edit a page - which is doubly bad because 'cur' is the table you read from when you to the most basic thing: displaying an article. It also updates the search index, which is accessed by the search function, which is probably the second-most-used function.
Instead of writing a dedicated Recent Changes table it would be enough to write edits line by line to a journal and reload some thousand lines of journal on a server restart.
I'm not exactly sure what you mean here by a 'journal' if not a dedicated database table?
Then add to that that you designed a new db schema and that all data will have to be converted from one to the other
Of course I have thought about this. My plan would be to convert articles as they are requested by readers. That way the database would get converted gradually on demand, thus minimising possible delay and avoiding downtime.
This will be very hard to maintain if you want to have the special pages reenabled. You will have to do lots of joins and subselects. Nothing MySQL is very fast at.
Well no, I wasn't really planning to have the database-intensive Special Pages (Orphans, Long articles, etc.) work with the old database. Especially Recent Changes doesn't need to because virtually anyone only uses the n most recent changes and no longer cares about those before the switch. Instead, I was planning to provide a Special Page listing all the articles that are still in the Phase-3 DB, and after a few weeks they'll all have been converted. It doesn't matter as much that "Long articles" and "Orphans" doesn't work reliably in this short interim period as it would if editing or even viewing articles wasn't possible.
I don't think the database schema is the source of our problems, I thinks it's the RDBMS.
I have outlined the problems I see with the current DB schema in a posting to this list dated 12.07.2003 03:58, Subject "Database design": http://mail.wikipedia.org/pipermail/wikitech-l/2003-July/004829.html The problem I mentioned above is only one of them.
I haven't used $? yet :)))
I suddenly remember why I don't like Perl ...
Don't get me wrong, I'm not disagreeing that Perl /is/ sometimes quite obfuscated - myself, I also hate the single-character variables ($/, $|, $`, $[, etc.). But apart from $_, they are all very rarely used.
Please don't start a religious war about this ;-)
Greetings, Timwi
Timwi wrote:
Jens Frank wrote:
On Fri, Jul 25, 2003 at 01:42:29AM +0200, Timwi wrote:
Lightning wrote:
Plus, I think there was a reason behing programming in php vs Perl, im not sure..
Read the section "Why are we using PHP instead of perl?" on http://www.wikipedia.org/wiki/Wikipedia:PHP_script_FAQ
Selecting the language to use for Phase IV we should think about the features it has to provide.
Well, the particular feature you have mentioned isn't specific to a programming language. Yes, mod_perl allows you to store something in a global hash and keep the data between requests, but obviously this will only work if you have only a single webserver (which I hope we're not going for).
This is a problem MemCacheD can solve - in fact, that's exactly what MemCacheD is for. And APIs for MemCacheD are, as has been said, available for both Perl and PHP (though the Perl one is the only one currently in use at a frequently-visited production website, livejournal.com.)
-- not to mention slashdot.org has started using memcached too.. thats a big name.. although I dont think you can compare them to the huge size of LJ
does it allow e.g. a design of Recent Changes as a big array?
Although the theoretical answer to this is 'yes', this is optimising at the wrong spot. I've said this before. The Recent Changes page is not a performance problem as long as it only reads (provided of course it is optimised enough). What really blocks the database is a write. With the current software and database schema, article text is inserted into 'old' and then also modified in 'cur' every time you edit a page - which is doubly bad because 'cur' is the table you read from when you to the most basic thing: displaying an article. It also updates the search index, which is accessed by the search function, which is probably the second-most-used function.
Instead of writing a dedicated Recent Changes table it would be enough to write edits line by line to a journal and reload some thousand lines of journal on a server restart.
I'm not exactly sure what you mean here by a 'journal' if not a dedicated database table?
Then add to that that you designed a new db schema and that all data will have to be converted from one to the other
Of course I have thought about this. My plan would be to convert articles as they are requested by readers. That way the database would get converted gradually on demand, thus minimising possible delay and avoiding downtime.
This will be very hard to maintain if you want to have the special pages reenabled. You will have to do lots of joins and subselects. Nothing MySQL is very fast at.
Well no, I wasn't really planning to have the database-intensive Special Pages (Orphans, Long articles, etc.) work with the old database. Especially Recent Changes doesn't need to because virtually anyone only uses the n most recent changes and no longer cares about those before the switch. Instead, I was planning to provide a Special Page listing all the articles that are still in the Phase-3 DB, and after a few weeks they'll all have been converted. It doesn't matter as much that "Long articles" and "Orphans" doesn't work reliably in this short interim period as it would if editing or even viewing articles wasn't possible.
I don't think the database schema is the source of our problems, I thinks it's the RDBMS.
I have outlined the problems I see with the current DB schema in a posting to this list dated 12.07.2003 03:58, Subject "Database design": http://mail.wikipedia.org/pipermail/wikitech-l/2003-July/004829.html The problem I mentioned above is only one of them.
I haven't used $? yet :)))
I suddenly remember why I don't like Perl ...
Don't get me wrong, I'm not disagreeing that Perl /is/ sometimes quite obfuscated - myself, I also hate the single-character variables ($/, $|, $`, $[, etc.). But apart from $_, they are all very rarely used.
Well hey, you know... its your time. so if you want to do this be my guest. it seems like a cool project. I mean I love php, i just do and i just dont like Perl cause i never did understand it to well.. im more of a C or C++ guy myself. But hey, if you try to write in nonobfuscated style and comment your code where something is not too logical I'll probably like to keep track of your code and look at it and see if i can come up with suggestions or even find bugs. But as I said, I have a hard time reading perl.. i like those languages where things are written exactly like you would expect them to be.. oh and i wont even look at regexps.. I just cant get them.. my brain segfaults and coredumps when i see them
Lightning
Lightning wrote:
I mean I love php, i just do and i just dont like Perl cause i never did understand it to well..
Interestingly enough, I can't really say I don't like PHP. I just sort of feel that Perl already does the job, so can I can't get myself to learn PHP. :)
But as I said, I have a hard time reading perl..
Please trust me -- I know *exactly* how you feel! I was in the same position not too long ago. Heh, I even remember seeing this statement: $a =~ s/$////; For obvious reasons, this is discouraged even among Perl programmers. It's called the 'leaning toothpick syndrome'. An equivalent statement would be: $a =~ s!$!//!; but it's actually the same as $a .= "//"; or $a = "$a//"; or $a = $a . "//"; which is of course more readable by orders of magnitude.
oh and i wont even look at regexps..
It is often possible to write Perl code which includes a regexp in such a way that the reader of the code doesn't have to read the regexp or understand how the regexp really works, but can nevertheless see what it does or what it's for:
if ($url !~ m!^(\w+)://((?:\w+.)*\w+)(?::(\d+))(/.*)$!) { return "Invalid URL format."; } else { my ($protocol, $host, $port, $path) = ($1, $2, $3, $4); # ... # ... continue processing ... # ... }
It should be pretty obvious without looking at the regexp that its purpose is to check URL syntax and to extract the protocol, hostname, port number and path from it.
(I've only just grabbed this regexp out of the air, so please don't hit me if it's completely wrong ...)
Timwi
On Fri, Jul 25, 2003 at 12:16:56AM +0200, Timwi wrote:
the past few days I've been experimenting a bit with Apache, mod_perl, MySQL and creating an entire own website. I've never done that before, and I think I've learnt a lot from this.
Now, for some reason, against all of your advice, I have started to program a Wiki, and by now it's already become a suitable basis for a Phase-IV Wikipedia software, including a database schema.
It really doesn't seem to be very difficult to re-program the current software in Perl, this time taking all the problems into account and designing everything right from the start (incl. single-login, multi-language watchlists, a better translation system and skin system (separated from the code), etc.) I've also made a lot of considerations and decisions with respect to database design and performance.
Should I continue with this?
While I'm much more a Perl guy than a PHP guy, I strongly advice against it. With very small error it can be said that rewriting software is NEVER a good idea. Forget your ego for a moment and code your ideas as patches to existing codebase - this way everyone's going to benefit most.
Timwi wrote:
Hi,
the past few days I've been experimenting a bit with Apache, mod_perl, MySQL and creating an entire own website. I've never done that before, and I think I've learnt a lot from this.
Now, for some reason, against all of your advice, I have started to program a Wiki, and by now it's already become a suitable basis for a Phase-IV Wikipedia software, including a database schema.
It really doesn't seem to be very difficult to re-program the current software in Perl, this time taking all the problems into account and designing everything right from the start (incl. single-login, multi-language watchlists, a better translation system and skin system (separated from the code), etc.) I've also made a lot of considerations and decisions with respect to database design and performance.
Should I continue with this?
Greetings, Timwi
Hi Tim, I'm glad to see another person interested in working on "new" code. I am not very familiar with the phase III code (still working on understanding it), but I think more in "build a new front-end from scratch", but using PHP (as I rather take the most backward compatible way to do it). And then start figuring out how to connect to old, tested code as imported classes. (and learn what sort of modifications they will need, as currently I only know there are plenty of them)
I really think the single username, single codebase is a good direction (single database may be dangerous in case it fails and the whole site breaks down). Anyway, I suggest you to leave the back-end logic for the people who already work on it. As you'll probably end up dealing with the same database, somewhat similar queries and same optimization challenges. As for the wiki syntax parser, I wouldn't want to rewrite that! (I hate parsers!)
For a new interface, I suggest you to look at my much thought-given example on http://www22.brinkster.com/rotemdan/phase4-demo-v1-1.htm
I have done even more improvements including fixing support for some browsers, support for zooming (uses em instead of px except for positioning the right column, which is positioned relative to the image). Organized and documented the CSS.
Some thoughts: * No need for "smarty" or similar SGML templates since strict XHTML/CSS does a nice job in separating content from formatting, see the link above.
It will be very easy to use it inside scripts, for example, to create a box with some options inside just do something like this:
<div class="titlebar"> <span id="lefttitlebar">This is the box title</span><span id="righttitlebar"><a href="">[ X ]</a></span> </div>
<div class="toolbox"> Some stuff in the <a href="">box</a> <br /> hello! </div>
And it will be automatically positioned and formatted by the CSS file.. Long live CSS!
* New language files may be needed for the new interface (though an ugly backward compatibility may be possible to implement), we should also think about using a template that's really easy to edit by "non-programmers" and doesn't depend on the script used. (XML or plain-text comes to mind, I haven't given any thought to this yet)
Another thing about language files: they shouldn't contain whole pages! and not even a bit of HTML formatting! I saw this ugly feat when I worked on translating the "Upload image" enormous string. These pages should be edited using the wiki, even though they contain markup.
Also: switching interface language using the preferences (dynamically), regardless of language of articles read/edited. (can really unify the different language wikis). Another cool thing: using the new CMS, for example, the same article in several different languages can be put on the same page, side by side, an excellent aid in translating! (flamebait warning: that will also require the whole site to move to UTF-8 or similiar encoding, so I don't know if that's possible with all languages..)
Rotem
Hi Rotem,
Rotem Dan wrote:
(single database may be dangerous in case it fails and the whole site breaks down).
Well, I'm neither a webserver expert nor a database expert, but I know how LiveJournal does it (and I've mentioned it here in the past). They use a number of independent webservers, and then put a "load balancer" in front of that. This load balancer receives the requests from the clients and distributes them evenly over the webservers. This way, if one webservers crashes, the site will only be marginally slower until this server is restarted. LiveJournal also has redundant load balancers in case one of *those* crashes.
As for the database, they use MySQL replication. That means there are several databases actually containing the same data; one of them is the "master" (which is where data is written), and these writes are then broadcast out to the other DBs, the "slaves". This way DB reads can be even distributed among several servers. This, in turn, means you don't need to keep buying bigger and better hardware, only more of it (and the old and crap hardware can still contribute to handling some of the load).
They (LiveJournal) have even written an independent Perl module (DBI::Role) which handles all of this (distributing DB reads across several slaves, and even weighting them according to their performance).
Now, LiveJournal actually also use clustering, i.e. they distribute the users' journals across several clusters (each of which has an own master and several slaves). I'm yet undecided whether it's a good idea to do the same with Wikipedia languages. Doing so would make multi-language watchlists difficult. However, not doing so may make the architecture not scalable enough. I'm not knowledgeable enough to judge this.
As for the wiki syntax parser, I wouldn't want to rewrite that! (I hate parsers!)
This is actually the part I'm looking forward to! (Since, as mentioned before, I love regexps ;-) )
For a new interface, I suggest you to look at my much thought-given example on http://www22.brinkster.com/rotemdan/phase4-demo-v1-1.htm
Well, no. I wasn't planning to design or integrate a new interface. I was going to create a skin system that will allow people to create skins liberally without needing to code. Then you could create your skin yourself :-)
Long live CSS!
Well, looking at what many sites have achieved with CSS (including but not limited to LiveJournal's Xcolibur scheme, which isn't yet the default), it does seem quite impressive. However, there are a few (probably minor) reasons I hate CSS, including but not limited to the incapability of simply placing something at the bottom of the browser window regardless of the main text flow, or centering a table. Also, making a site layout browser-independent seems to be even more difficult with CSS than with outdated HTML 2 (in my experience, anyway).
- New language files
Na-ah! No language files in /my/ implementation ;-) My translation system has everything in the DB. Translators can change the translatable texts using a web interface. (I specifically decided against applying the Wiki philosophy to this because things like the "Edit this page" link text apply to /every/ page on the /entire/ site, so it would be too easy to upset a whole bunch of people by changing it to something offensive.)
Another thing about language files: they shouldn't contain whole pages!
Yes, I thought the same -- and have planned to make longer page contents (like the explanation on "Upload file" you mentioned) wiki-like, so they are actually the contents of the page titled "Special:Upload" (or whatever it would be in other languages).
and not even a bit of HTML formatting!
Well, actually, I'd really rather keep /some/ simple HTML formatting in-place. Sometimes you want to bold a single word in a sentence, but bolding looks really ugly in Chinese and Japanese, so /they/ prefer /not/ to do it. So I keep the simplest of formatting in, so translators have at least /some/ control over it. (I am, at least, planning to use HTML in translatable strings /much/ less than LiveJournal does now; they have a /lot/ of HTML in their strings, and sometimes even BML, which is their own mark-up language which nobody knows.)
Also: switching interface language using the preferences (dynamically), regardless of language of articles read/edited.
Yep, planned. Also a URL parameter to force the interface in a particular language.
warning: that will also require the whole site to move to UTF-8 or similiar encoding, so I don't know if that's possible with all languages..
I am not aware of any language which has characters that are not included in Unicode, and I'm pretty certain none of the Wikipedia languages do. Of course, my implementation would simply use UTF-8 all the way through, thus avoiding all annoying encoding problems.
(Well, not all of them; I've already handled the case that someone may request a URL which contained invalid UTF-8; I assume Latin-1 in this case, convert and redirect. But, of course, articles and all interface stuff would be entirely UTF-8.)
Greetings, Timwi
Timwi wrote:
As for the wiki syntax parser, I wouldn't want to rewrite that! (I hate parsers!)
This is actually the part I'm looking forward to! (Since, as mentioned before, I love regexps ;-) )
If you're using perl... http://wiki.beyondunreal.com/wiki/Wookee
wiki parser in perl, free for use. Produces correct HTML (ie all <p> elements are correctly closed!)
- No need for "smarty" or similar SGML templates since strict
XHTML/CSS does a nice job in separating content from formatting, see the link above.
I think you are missing the point of a templating system and how it works..... it has nothing to do with SGML.... its just a way to separate php code from html code.. just a simple information hand-off interface...
Lightning
Timwi-
the past few days I've been experimenting a bit with Apache, mod_perl, MySQL and creating an entire own website. I've never done that before, and I think I've learnt a lot from this.
Great. mod_perl rules. Are you using DBI? In that case, it should be relatively easy to switch to PostgreSQL.
I encourage you to move forward with this, and to commit all code to a separate CVS module. The current code design is very messy, and there's a lot that can be done to improve it. But do not underestimate the complexity of the current software -- you will be spending months on this to get as far as we have. If you don't want to do this, you will end up very disappointed and frustrated because we won't adopt an incomplete solution. Any new version should at least have the features listed here:
http://wikipedia.sourceforge.net/features.html
That's a *lot*. In addition, if you really want to provide advantages over the current codebase, please *document* your code. The current codebase is horribly documented and I still haven't bothered to figure out through line by line reading what every function does. Perldoc is a great way to do this. The current documentation of your database schema is completely insufficient, though. Each table needs its own comment header explaining what it does, how and why.
Regarding interlanguage links, note that the problem is quite complex: http://article.gmane.org/gmane.science.linguistics.wikipedia.technical/ 2025
Here is some other stuff that would be important for a redesign: * have some kind of built in profiling * test each query on a large dataset before including it * have some better way to handle edit conflicts, for example, CVS style merging * have a better way to handle discussions, e.g. "Post a comment", "Reply to this", but still do it using wholly editable wiki-pages * have a category system built right in, perhaps using a meta namespace * have better image handling with auto-rescaling: http://article.gmane.org/gmane.science.linguistics.wikipedia.technical/ 2024 * have a template system that can be used by the wiki user: http://marc.theaimsgroup.com/?l=wikitech-l&m=105557077500936&w=2 this could perhaps be combined with the general template system.
Regards,
Erik
Erik Moeller wrote:
the past few days I've been experimenting a bit with Apache, mod_perl, MySQL and creating an entire own website. I've never done that before, and I think I've learnt a lot from this.
Great. mod_perl rules. Are you using DBI? In that case, it should be relatively easy to switch to PostgreSQL.
Does PostgreSQL support replication?
I encourage you to move forward with this, and to commit all code to a separate CVS module.
Well, to be entirely honest, I'd rather not use the SourceForge CVS server. It's cumbersome and full of problems. It doesn't work in Linux for me, and in Windows I have to keep entering my password for every single request (not just the commits).
The current code design is very messy
So *that's* why I didn't understand a bit of it! ;-)
If you don't want to do this, you will end up very disappointed and frustrated because we won't adopt an incomplete solution. Any new version should at least have the features listed here: http://wikipedia.sourceforge.net/features.html
Whee. You have a list of requirements. That is a Very Good Thing Indeed. I guess I'll copy this to my User page and then tick things off as I finish them.
That's a *lot*. In addition, if you really want to provide advantages over the current codebase, please *document* your code.
Yep. If there's one thing I hate, it's missing documentation. However, I like doing it later. Once the thing is finished and I know it works and does not need any more significant change, I can document it, or else the documentation will have been for nothing.
I still haven't bothered to figure out through line by line reading what every function does.
I have, and I couldn't make much of it :-) I didn't have that many problems with LJ's code, even though it's equally badly documented. (And no, I don't think it's solely because of PHP.)
Perldoc is a great way to do this.
OK. I've thought about using some sort of documentation system like that. I've never done that before though. I'll have a look.
The current documentation of your database schema is completely insufficient, though. Each table needs its own comment header explaining what it does, how and why.
Yes, of course I understand this. I didn't do this yet for the reasons mentioned above. Even now I have already changed some tables again. I'll document them when I feel I'm happy with them as they are.
Here is some other stuff that would be important for a redesign:
- have some kind of built in profiling
What exactly does this mean?
- test each query on a large dataset before including it
Yep
- have some better way to handle edit conflicts, for example, CVS style
merging
- have a better way to handle discussions, e.g. "Post a comment", "Reply
to this", but still do it using wholly editable wiki-pages
- have a category system built right in, perhaps using a meta namespace
Already planned (though that latter one would be near the bottom of the list)
- have better image handling with auto-rescaling:
http://article.gmane.org/gmane.science.linguistics.wikipedia.technical/ 2024
This seems to assume that images would have to be stored in a file system. I think it's better to store them in a database, possibly one separated from the rest.
- have a template system that can be used by the wiki user:
http://marc.theaimsgroup.com/?l=wikitech-l&m=105557077500936&w=2 this could perhaps be combined with the general template system.
I have indeed planned such a template system, but I'm not sure what you mean by the "general template system". Is that referring to the site skin system? I'd really rather keep those two separate. They are not related enough concepts.
Greetings and thanks, Timwi
Timwi-
Does PostgreSQL support replication?
There are several simple replication systems, DBMirror and RServ are part of postgresql-contrib. For setting up RServ, see
http://techdocs.postgresql.org/techdocs/settinguprserv.php
I haven't tried any of them. There are many good reasons to use PostgreSQL, though, and many good reasons to use MySQL, so it makes sense to use a database interface layer in case we ever want to switch.
Well, to be entirely honest, I'd rather not use the SourceForge CVS server. It's cumbersome and full of problems.
I never had any problems with it. Once it's set up properly, it seems to work reasonably well. You could also set up a project at BerliOS or Savannah:
http://developer.berlios.de/ http://savannah.gnu.org/
You definitely should use a CVS server if you expect any kind of constructive contributions.
Here is some other stuff that would be important for a redesign:
- have some kind of built in profiling
What exactly does this mean?
Keep a log of how much time is spent within each function so the parsing/ page generation process can be optimized accordingly. There are probably ready-made CPAN modules for this.
- have some better way to handle edit conflicts, for example, CVS style
merging
- have a better way to handle discussions, e.g. "Post a comment", "Reply
to this", but still do it using wholly editable wiki-pages
- have a category system built right in, perhaps using a meta namespace
Already planned (though that latter one would be near the bottom of the list)
Categories will likely go live soon (if Magnus finally finishes the damn thing), so a compatible scheme needs to be in place if we want to switch over.
- have better image handling with auto-rescaling:
http://article.gmane.org/gmane.science.linguistics.wikipedia.technical/ 2024
This seems to assume that images would have to be stored in a file system. I think it's better to store them in a database, possibly one separated from the rest.
I never understood this approach. It only seems to be associated with increased risks (reduced performance, increased risk of data corruption) and have no benefits. What exactly is the disadvantage of just storing a pointer to the local filesystem in the DB?
- have a template system that can be used by the wiki user:
http://marc.theaimsgroup.com/?l=wikitech-l&m=105557077500936&w=2 this could perhaps be combined with the general template system.
I have indeed planned such a template system, but I'm not sure what you mean by the "general template system". Is that referring to the site skin system? I'd really rather keep those two separate. They are not related enough concepts.
I do think there are similarities between the two. Any good template system, for site skins OR for internal rendering, will have to support things like loops, conditional blocks etc. This stuff needs to be parsed, so you might as well use a single library for doing so.
You are using proper object-oriented design, I hope?
Regards,
Erik
Erik Moeller wrote:
Timwi-
You definitely should use a CVS server if you expect any kind of constructive contributions.
Yes, I know :/ Thanks for your hints, but I think I'll stick to SourceForge just to go with the majority.
- have some kind of built in profiling
What exactly does this mean?
Keep a log of how much time is spent within each function
Oh, *that* kind of profiling. I have done that with C++ applications under Windows, but I never thought it'd be necessary for webserver applications... Clearly, at least our *current* problem - and also the current problem of LiveJournal (sorry I keep mentioning LiveJournal, but it's the first and only other major website I've major contributed at) - is database performance, not CPU usage.
Categories will likely go live soon (if Magnus finally finishes the damn thing), so a compatible scheme needs to be in place if we want to switch over.
Yes, point taken, but I don't expect the switch to happen soon :-) In fact, I don't really /expect/ the switch to happen at all, I regard it all as an experiment. Even if Wikipedia won't ever use my software, it was still a lot of fun and very educational to write it.
This seems to assume that images would have to be stored in a file system. I think it's better to store them in a database, possibly one separated from the rest.
I never understood this approach. It only seems to be associated with increased risks (reduced performance, increased risk of data corruption) and have no benefits. What exactly is the disadvantage of just storing a pointer to the local filesystem in the DB?
I suppose you're thinking there's an increased risk of data corruption because the database is all one file, and you're thinking if one bit of the file gets corrupted, it's all gone. As far as I have been taught, however, this is not actually the case. Any argument in this direction applies equally to file systems. If a range of bytes in the middle get corrupted, it would likely corrupt only one image/file. If the directory structure got corrupted, the file system would also lose everything.
The "reduced performance" argument is something I don't know anything about; it is possible that this is a good argument, but I'm not convinced.
Now to your question, "What exactly is the disadvantage of just storing a pointer to the local filesystem in the DB?" - The disadvantage is that it is more difficult to maintain a consistent state, i.e. a database without "dead links" and a file system free of orphans. Does this make sense? You can easily copy part or all of the database to another database or a backup medium or something, whereas with a filesystem you'd have to be careful not to copy database entries and then forget to also copy the relevant files.
- have a template system that can be used by the wiki user:
http://marc.theaimsgroup.com/?l=wikitech-l&m=105557077500936&w=2 this could perhaps be combined with the general template system.
I have indeed planned such a template system, but I'm not sure what you mean by the "general template system". Is that referring to the site skin system? I'd really rather keep those two separate. They are not related enough concepts.
I do think there are similarities between the two. Any good template system, for site skins OR for internal rendering, will have to support things like loops, conditional blocks etc. This stuff needs to be parsed, so you might as well use a single library for doing so.
You have convinced me about this. I'll try to combine the two.
You are using proper object-oriented design, I hope?
Hm. I haven't even thought of this. Thanks for pointing it out soon enough. :) I'll try to make things as object-oriented as possible. That will add to the educational nature of my experience, because - although I've done extensive object-oriented programming before - I've never done that in Perl (though I've read code that uses it). Should be fun :)
Greetings and thanks, Timwi
Timwi-
Oh, *that* kind of profiling. I have done that with C++ applications under Windows, but I never thought it'd be necessary for webserver applications... Clearly, at least our *current* problem - and also the current problem of LiveJournal (sorry I keep mentioning LiveJournal, but it's the first and only other major website I've major contributed at) - is database performance, not CPU usage.
Well, that's not necessarily true. CPU usage on Larousse (the webserver for En:) has been very high, and our page parser is very ugly and slow. Clearly we need optimizations on both fronts.
Please do take a look at the Wookee parser module that Tarquin pointed to. It uses a syntax very similar to Wikipedia's. http://wiki.beyondunreal.com/wiki/Wookee
Yes, point taken, but I don't expect the switch to happen soon :-) In fact, I don't really /expect/ the switch to happen at all, I regard it all as an experiment. Even if Wikipedia won't ever use my software, it was still a lot of fun and very educational to write it.
Sure. And if it's a decent wiki, you can add it to http://c2.com/cgi/wiki?WikiEngines and hope that people use it for other purposes.
I think it's better to store them in a database, possibly one separated from the rest.
I never understood this approach. It only seems to be associated with increased risks (reduced performance, increased risk of data corruption) and have no benefits. What exactly is the disadvantage of just storing a pointer to the local filesystem in the DB?
I suppose you're thinking there's an increased risk of data corruption because the database is all one file, and you're thinking if one bit of the file gets corrupted, it's all gone.
Not necessarily. But if the database file is corrupted, the database may no longer process the table correctly, in which case the binary data would have to be hand-extracted to rescue it. If you see the database as another layer on top of the file system, and argue that the same risks apply to the DB as to the filesystem, then you have doubled your risks by adding a DB layer. You would triple them by adding a database within the database, and so on. (We actually have done this with the Wikipedia table structure, where user properties are a kind of CSV table within the database. Really ugly.)
The "reduced performance" argument is something I don't know anything about; it is possible that this is a good argument, but I'm not convinced.
Well, then test it. Throw some 10 megabyte files into the database and compare the reading performance with multiple threads to direct Apache server access.
Now to your question, "What exactly is the disadvantage of just storing a pointer to the local filesystem in the DB?" - The disadvantage is that it is more difficult to maintain a consistent state, i.e. a database without "dead links" and a file system free of orphans.
Sure, but given that you only have to deal with create, move, delete implementation complexity is minimal, and the associated risks should be low. In addition, I *like* being able to just zip the entire image directory instead of having to extract the files from a MySQL table.
Hm. I haven't even thought of this. Thanks for pointing it out soon enough. :) I'll try to make things as object-oriented as possible. That will add to the educational nature of my experience, because - although I've done extensive object-oriented programming before - I've never done that in Perl (though I've read code that uses it). Should be fun :)
Perl-OOP is a bit ugly, but reasonably powerful. Check out perldoc perltoot for details. You'll want to look at "tie" especially, as this allows you to do some cool stuff with properties.
Regards,
Erik
Erik Moeller wrote:
for En:) has been very high, and our page parser is very ugly and slow. Clearly we need optimizations on both fronts.
Please do take a look at the Wookee parser module that Tarquin pointed to. It uses a syntax very similar to Wikipedia's. http://wiki.beyondunreal.com/wiki/Wookee
IT was written to replace UseModWiki's built-in parser. So I think the syntax is identical. (We've improved stuff like nested lists... minor difference)
Because it's OO, it's fairly easy to add new syntax rules.
Erik Moeller wrote:
Timwi-
Please do take a look at the Wookee parser module that Tarquin pointed to. It uses a syntax very similar to Wikipedia's. http://wiki.beyondunreal.com/wiki/Wookee
Aw! You're taking all the fun out!
How about I write my own first, and if you don't like it, we can still change it to Wookee?
Yes, point taken, but I don't expect the switch to happen soon :-) In fact, I don't really /expect/ the switch to happen at all, I regard it all as an experiment. Even if Wikipedia won't ever use my software, it was still a lot of fun and very educational to write it.
Sure. And if it's a decent wiki, you can add it to http://c2.com/cgi/wiki?WikiEngines and hope that people use it for other purposes.
Whoa. I didn't know there were so many Wiki engines already. Does none of them meet your needs better than Phase-3? (I guess not, because otherwise you probably would have switched, no?)
I think it's better to store them in a database.
I never understood this approach.
I suppose you're thinking there's an increased risk of data corruption because the database is all one file, and you're thinking if one bit of the file gets corrupted, it's all gone.
Not necessarily. But if the database file is corrupted, the database may no longer process the table correctly, in which case the binary data would have to be hand-extracted to rescue it.
But if the directory structure gets corrupted, the file system may no longer process the directory correctly, in which case the binary data would have to be hand-extracted to resue it.
If you see the database as another layer on top of the file system, and argue that the same risks apply to the DB as to the filesystem, then you have doubled your risks by adding a DB layer.
So far I was only talking about the effects of a failure, not the causes of them. It is not a given that the possible causes or their probability are doubled by adding a DB layer.
The most likely cause is a hardware failure. Quite obviously, the probability of this happening does not increase with the addition of a DB layer, nor does the amount of damage it can do.
Another possible cause is a bug in the software. Of course, the extra DB layer adds the risk of there being a bug in the RDBMS, but since the whole database is just one file, it reduces the risk of a bug in the OS's file system somewhat.
The "reduced performance" argument is something I don't know anything about; it is possible that this is a good argument, but I'm not convinced.
Well, then test it. Throw some 10 megabyte files into the database and compare the reading performance with multiple threads to direct Apache server access.
I've thought about something. In order to change one image into another, in the DB you just use one REPLACE query. With a DB/file system hybrid as you are suggesting, you would have to 1) write the new version of the image to a new file 2) change the filename in the database 3) delete the old file
But yeah, I'll test this somehow (I'm not a real expert at performance testing either...)
Now to your question, "What exactly is the disadvantage of just storing a pointer to the local filesystem in the DB?" - The disadvantage is that it is more difficult to maintain a consistent state, i.e. a database without "dead links" and a file system free of orphans.
Sure, but given that you only have to deal with create, move, delete implementation complexity is minimal, and the associated risks should be low.
That list is not quite complete - I have listed other uses that you seem to have ignored. Taking a backup (of a consistent state) would be difficult. Splitting the database into several (clustering) would be difficult. It would make replication difficult. Of course these are all administrative issue that the user never comes in contact with ...
In addition, I *like* being able to just zip the entire image directory instead of having to extract the files from a MySQL table.
That is, of course, one possible advantage, but I'm not quite convinced this possibility/feature is so widely and often needed.
Check out perldoc perltoot for details.
Thank you.
Greetings, Timwi
Timwi wrote:
Oh, *that* kind of profiling. I have done that with C++ applications under Windows, but I never thought it'd be necessary for webserver applications... Clearly, at least our *current* problem - and also the [...] is database performance, not CPU usage.
For ten years now, I've been working with performance issues in systems that use databases, and I've never found it to be so easy as you want it to appear.
Believing that "the problem is database performance" all too often leads people to blindly buy new hardware or change database software (from Oracle to Sybase or from MySQL to Postgres), as if printing more money would solve any "shortage of money" sort of problem. Instead, I think you have to look at *how* you *spend* the resources that you have got, and consider if you could spend them otherwise. If, perhaps, you are wasting and could start to save. As any economist can tell you, the first step is monitoring the current flow, to get an overview of what is going on. This is what the profiling is about. It's not about CPU performance or database performance. It is about locating the hole in the bucket.
The resource that we've got to work with is the two to five seconds of patience that each user brings to www.wikipedia.org. Before that time runs out, they should have their content delivered. If we have a hole in our bucket, where all time runs out, we better plug that hole. If we make a single call to a database that takes seven seconds to respond, we better fix that call (or the database). But if our code for every HTTP request makes seventy calls to a database that always responds in .1 seconds (if you think this example is unrealistic, you haven't been in the telecom industry), the entire HTTP transaction also takes a total of seven seconds, and we better fix our code to make fewer calls rather than put the blame on the poor database.
It's about wallclock time, not CPU time. And profiling is the way to go, if we want to do better than guess and pray. So far, the Wikipedia developers have mostly been guessing and praying.
I think the worst performance I ever saw from Wikipedia was in May 2002. You can dig up the thread from the archives of this list, e.g. http://mail.wikipedia.org/pipermail/wikitech-l/2002-May/000344.html
Right now, Wikipedia is flying and everybody is happy.
Lars Aronsson wrote:
Timwi wrote:
Oh, *that* kind of profiling. I have done that with C++ applications under Windows, but I never thought it'd be necessary for webserver applications... Clearly, at least our *current* problem - and also the [...] is database performance, not CPU usage.
For ten years now, I've been working with performance issues in systems that use databases, and I've never found it to be so easy as you want it to appear.
Believing that "the problem is database performance" all too often leads people to blindly buy new hardware or change database software (from Oracle to Sybase or from MySQL to Postgres), as if printing more money would solve any "shortage of money" sort of problem. Instead, I think you have to look at *how* you *spend* the resources that you have got, and consider if you could spend them otherwise. If, perhaps, you are wasting and could start to save. As any economist can tell you, the first step is monitoring the current flow, to get an overview of what is going on. This is what the profiling is about. It's not about CPU performance or database performance. It is about locating the hole in the bucket.
The resource that we've got to work with is the two to five seconds of patience that each user brings to www.wikipedia.org. Before that time runs out, they should have their content delivered. If we have a hole in our bucket, where all time runs out, we better plug that hole. If we make a single call to a database that takes seven seconds to respond, we better fix that call (or the database). But if our code for every HTTP request makes seventy calls to a database that always responds in .1 seconds (if you think this example is unrealistic, you haven't been in the telecom industry), the entire HTTP transaction also takes a total of seven seconds, and we better fix our code to make fewer calls rather than put the blame on the poor database.
It's about wallclock time, not CPU time. And profiling is the way to go, if we want to do better than guess and pray. So far, the Wikipedia developers have mostly been guessing and praying.
I think the worst performance I ever saw from Wikipedia was in May 2002. You can dig up the thread from the archives of this list, e.g. http://mail.wikipedia.org/pipermail/wikitech-l/2002-May/000344.html
Right now, Wikipedia is flying and everybody is happy.
Very true. I've been looking at the code, trying to find the db calls where I could optimize something. and trying to find bottle necks, but its pretty hard. The current code is *really* confusing. variables are often passed as globals rather than parameters which makes it horrible to look through the code and find whats going on. However, if anyone knows of any specific place where you get the feeling that smething is beeing done in a suboptimal place, name the action and I'll try to follow through and dig up the whats going on.
I sent a proposal on a way to bring Special:Longpages and Special:Shortpages back, but nobody replied. The code is very basic, and it was actually partially included in my proposal, but I recieved no reasons why it would be a good idea, or bad idea. I'd love to hear *something* about it from someone who knows db's and if my proposal is feasible or just unneccessary clunk.
Another fun think would be to log every query made to the test server for say.. 500 page views or so along with a microtimestamp from the begining to the end of the file. We could then properly analyze that file and try to find the bottle necks and problematic query's. That would be a great way to start optimization, by *finding* the problem areas. Another nice stat to have would be a ratio of views to edits and total views to watch list views. That would help us find ou where the change would make the most difference.
Lightning
Lightning wrote:
Very true. I've been looking at the code, trying to find the db calls where I could optimize something. and trying to find bottle necks, but its pretty hard. The current code is *really* confusing. variables are
Isn't code always confusing? :-) However, if you are looking *for* bottle necks, it means you are looking *at* the bottle. Try looking at the fluid instead. Nobody really cares about the bottle anyway.
Another fun think would be to log every query made to the test server for say.. 500 page views or so along with a microtimestamp from the begining to the end of the file. We could then properly analyze that file and try to find the bottle necks and problematic query's. That
You're on the right track here. However, the first thing we do when we try to analyze that log file is to filter out all entries that have a short roundtrip or response time, since they are not a problem. So it is rather unnecessary to log anything that has a short roundtrip. We only need to log the slow responses.
start = time(); ... /* do all the calls */ elapsed = time() - start; if (elapsed > OUR_THRESHOLD) Log("whooaaa, doing all the calls was very slow!!!");
Let's set the logging threshold so that only, say, 1% of all calls get logged. The overhead for looking at the system clock at the beginning and end of every call is neglectible. The overhead of writing a line to the log for 1% of all calls is also neglectible. There is no reason not to use this sort of logging in the live system.
If "start" is a global variable, you can write a subroutine to make the whole program more readable:
void TimeLog(string text) { time_t elapsed = time() - start; if (elapsed > OUR_THRESHOLD) Log(text + " took " + elapsed + " seconds"); }
Then the progam will look something like this:
start = time(); if (this_call == "GET") { ... /* do the get */ TimeLog("get"); } else if (this_call == "SAVE") { ... /* do the save */ TimeLog("save"); } else if (this_call == "STATISTICS") { ... /* do the statistics */ TimeLog("statistics"); }
The overhead for calling this subroutine for every transaction is also neglectible. It is a matter of microseconds at most, and any "performance problem" is a matter of several milliseconds.
Lars Aronsson wrote:
Lightning wrote:
Very true. I've been looking at the code, trying to find the db calls where I could optimize something. and trying to find bottle necks, but its pretty hard. The current code is *really* confusing. variables are
Isn't code always confusing? :-) However, if you are looking *for* bottle necks, it means you are looking *at* the bottle. Try looking at the fluid instead. Nobody really cares about the bottle anyway.
Another fun think would be to log every query made to the test server for say.. 500 page views or so along with a microtimestamp from the begining to the end of the file. We could then properly analyze that file and try to find the bottle necks and problematic query's. That
You're on the right track here. However, the first thing we do when we try to analyze that log file is to filter out all entries that have a short roundtrip or response time, since they are not a problem. So it is rather unnecessary to log anything that has a short roundtrip. We only need to log the slow responses.
start = time(); ... /* do all the calls */ elapsed = time() - start; if (elapsed > OUR_THRESHOLD) Log("whooaaa, doing all the calls was very slow!!!");
Let's set the logging threshold so that only, say, 1% of all calls get logged. The overhead for looking at the system clock at the beginning and end of every call is neglectible. The overhead of writing a line to the log for 1% of all calls is also neglectible. There is no reason not to use this sort of logging in the live system.
If "start" is a global variable, you can write a subroutine to make the whole program more readable:
void TimeLog(string text) { time_t elapsed = time() - start; if (elapsed > OUR_THRESHOLD) Log(text + " took " + elapsed + " seconds"); }
Then the progam will look something like this:
start = time(); if (this_call == "GET") { ... /* do the get */ TimeLog("get"); } else if (this_call == "SAVE") { ... /* do the save */ TimeLog("save"); } else if (this_call == "STATISTICS") { ... /* do the statistics */ TimeLog("statistics"); }
The overhead for calling this subroutine for every transaction is also neglectible. It is a matter of microseconds at most, and any "performance problem" is a matter of several milliseconds.
I agree with what you say. I just think that we should log the actual query's made, in case of slow files to see which query's need optimizing.
oh and in any case i think we'd be using microtime() not time()
one more thing, I know you said code is *allways* confusing, but i am quite fluid in php and c to a certain extent, and i can understand the code.. its just that i find the design of it horrible. I would never dream of using a global variable myself if i didnt have to... i just think that the overall design is sub-optimal. I see why people want to re-write it. its just obfuscated and i think the architecture of it could use quite some improvement, however i wouldnt know where to start since changing one thing would break so much, but i think the code should be clarified if we really expect to attract outside developers, the current code is just to hard to understand for people to casually look it over.
Lightning
I would like to ask developpers a question : I have a problem on my browser with indivisible spaces.
Please, this is an important question, the sort of issue that might lead to an edit war.
In french typography, indivisible space are put before some caracters, such as :
Recently, a new editor (a french language teacher) got on the french wiki. He insists on using these unbreakable spaces. As far as I know, he is the only one.
However, for some reasons I can't explain, my browser (a brand new one, last version, unicode friendly, page over 32 kb friendly, etc...) appears to be "breaking" these spaces. Though I can't figure how it is doing so. When I look at the history, I see nothing I can explain.
Still, Vincent insists that I do break his text. To fix it, he has to either revert my changes, or fix them himself, isssue upon which he is strongly complaining so requesting that I fix the stuff myself. Fact is, I can't see what I do wrong, and I can't imagine myself editing by hand his comments myself each time I edit an article or talk page, to avoid removing the non-breakable space (if only I could figure how not to do so !)
Let me show you an example :
Here is my edit (I was removing a div in a talk page, use of div is becoming more and more common, as well as use of anchors to support 40 or 50 kb pages)
http://fr.wikipedia.org/w/wiki.phtml?title=Discuter:Ph%E9nix&diff=82635&...
And here is Vincent next edit, restoring manually all the non-breakable spaces behind me
http://fr.wikipedia.org/w/wiki.phtml?title=Discuter:Ph%E9nix&diff=82637&...
The fact is that if I see him restoring the spaces, I see not in *my* edit where I broke these spaces. I only removed the div line. I just look from my job browser (a very regular PC), and I can't see anything special either.
Guys, this is a tough problem because that means either Vincent has to correct behind me each time, and complain loudly, and request that I fix the mess myself, or I can't edit the pages Vincent edits.
This is doubly important as Vincent and I strongly disagree on other matters as well, so I fear this particular point might disrupt our peaceful little world.
Is there anyone who would have a *clue* on that specific problem ? Is it solvable or not ?
Are there other wikis using non-breakable spaces ? Did they meet problems as well ?
__________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com
Anthere-
Recently, a new editor (a french language teacher) got on the french wiki. He insists on using these unbreakable spaces.
I don't think that's a good idea. It only makes pages harder to edit because of the ugly HTML entities.
Here is my edit (I was removing a div in a talk page, use of div is becoming more and more common, as well as use of anchors to support 40 or 50 kb pages)
http://fr.wikipedia.org/w/wiki.phtml?title=Discuter:Ph%E9nix&diff=82635&... d=82624
And here is Vincent next edit, restoring manually all the non-breakable spaces behind me
http://fr.wikipedia.org/w/wiki.phtml?title=Discuter:Ph%E9nix&diff=82637&... d=82635
I don't see how anything was "broken" by your browser and I strongly doubt that Netscape would have any such bug. Vincent inserted the s for the first time in the above edit after yours, as the page source code for the earlier revision shows. Any changes in display he noticed are likely the result of how his browser handles word/character wrapping, which may change when more words/characters are added. It could also be that he pasted the text from some word processor and inserted hard spaces in that application, which were lost during the copy and pasting. In any case, it is quite obvious that your edits did not break anything because the s Vincent added were still there after your further edits to the same page.
Regards,
Erik
--- Erik Moeller erik_moeller@gmx.de wrote:
Anthere-
Recently, a new editor (a french language teacher)
got
on the french wiki. He insists on using these unbreakable spaces.
I don't think that's a good idea. It only makes pages harder to edit because of the ugly HTML entities.
I know.
This is also why I remove the div tag after he used it for the second time on the same day (the first being on *my* page)
And why I don't like the anchors he is putting everywhere. Since most of his articles are very long (but very good and interesting as well), he puts some anchors in the text, so links from other articles may lead to the good paragraph in the very meaty one.
All this is good, but I dare not (and he wishes me not to) edit article such as the one on greek religion to avoid messing with the anchors.
This is very unwiki ihmo,
but when I protest, he says I am not the boss and this is only *my* opinion, and that I am forcing my desire of simplicity upon others, to the detriment of quality articles, which may be long, with sankrit or greek or whatever characters, and anchors, and non-breakable spaces, and div spaces :-((((
http://fr.wikipedia.org/w/wiki.phtml?title=Th%E9_chinois&action=edit
This one is a nightmare to me. I have no idea what other people see here, but as for myself, it is scarry
And since other editors either don't say anything or push the use of anchors, complicated html codes, fancy small bars, div tags and such, who am I to say anything all alone ???
This desolate me, but if the consensus is to make editing difficult, I can only try to salvage my little area of edition.
Here is my edit (I was removing a div in a talk
page,
use of div is becoming more and more common, as
well
as use of anchors to support 40 or 50 kb pages)
http://fr.wikipedia.org/w/wiki.phtml?title=Discuter:Ph%E9nix&diff=82635&...
d=82624
And here is Vincent next edit, restoring manually
all
the non-breakable spaces behind me
http://fr.wikipedia.org/w/wiki.phtml?title=Discuter:Ph%E9nix&diff=82637&...
d=82635
I don't see how anything was "broken" by your browser and I strongly doubt that Netscape would have any such bug. Vincent inserted the s for the first time in the above edit after yours, as the page source code for the earlier revision shows. Any changes in display he noticed are likely the result of how his browser handles word/character wrapping, which may change when more words/characters are added. It could also be that he pasted the text from some word processor and inserted hard spaces in that application, which were lost during the copy and pasting.
Ah, yes, he absolutely does so. Word processor.
I know because he lectured me on the fact I didnot know the proper usage of non-breakable spaces (which I do, but do not respect in the least ;-)). He gave me a link to a page explaining how to do non-breakable spaces from my editor. He also said any decent editor was supporting writing properly good french, and that I should not edit anything even on talk pages without having first run the grammatical/typo corrector. Because my editing without accents, and with spelling mistakes on talk pages was not respectful of readers. 'Cause we were not a chat, but an encyclopedia
We've got to be serious ! This is no joke work !
:-(((((( !!!!!
But there is *no* grammatical/typo corrector on a wiki edit page :-((
All I can conclude from this comment is that he is indeed writing all what he upload to wiki on a Word (or equivalent) editor first hand. Is that usual ??? It happens I prepare long articles on Word before, or that I save text on simpletext when there are lags, to avoid loosing them, but all I write is just straight edited in wiki.
The only other person I apparently messed text a couple of times on our pump is Aoineko.
Do you remember Aoi ? Do you write sometimes your text on editors rather than on wiki edit windows ?
is quite obvious that your edits did not break anything because the s Vincent added were still there after your further edits to the same page.
Well, I will try to explain that to him But he will probably answer that I am the only one doing so with his non breakable editor spaces.
Thanks Erik
__________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com
Well, you've got to bring it to a head. Pick something of his and edit it. And fight it out. Otherwise you're letting him establish a custom and practice that in the long run is unacceptable. Basically he is what we used to call "taking position" on you. More or less in the manner of Larry Sanger.
Let him worry about the anchors...just bull ahead...
That you are in essence poking what seems to be a skunk, albeit a pretty French one, will tell you the sacrifices you must bear...
The alternative is to permit him ownership of the article, an option not permitted by current Wikipedia policy (although I don't happen to be in agreement with it).
Fred Bauder
http://wwww.internet-encyclopedia.org
From: Anthere anthere6@yahoo.com Reply-To: wikitech-l@wikipedia.org Date: Fri, 25 Jul 2003 08:22:02 -0700 (PDT) To: wikitech-l@wikipedia.org Subject: Re: [Wikitech-l] Re: Phase IV
--- Erik Moeller erik_moeller@gmx.de wrote:
Anthere-
Recently, a new editor (a french language teacher)
got
on the french wiki. He insists on using these unbreakable spaces.
I don't think that's a good idea. It only makes pages harder to edit because of the ugly HTML entities.
I know.
This is also why I remove the div tag after he used it for the second time on the same day (the first being on *my* page)
And why I don't like the anchors he is putting everywhere. Since most of his articles are very long (but very good and interesting as well), he puts some anchors in the text, so links from other articles may lead to the good paragraph in the very meaty one.
All this is good, but I dare not (and he wishes me not to) edit article such as the one on greek religion to avoid messing with the anchors.
This is very unwiki ihmo,
but when I protest, he says I am not the boss and this is only *my* opinion, and that I am forcing my desire of simplicity upon others, to the detriment of quality articles, which may be long, with sankrit or greek or whatever characters, and anchors, and non-breakable spaces, and div spaces :-((((
http://fr.wikipedia.org/w/wiki.phtml?title=Th%E9_chinois&action=edit
This one is a nightmare to me. I have no idea what other people see here, but as for myself, it is scarry
And since other editors either don't say anything or push the use of anchors, complicated html codes, fancy small bars, div tags and such, who am I to say anything all alone ???
This desolate me, but if the consensus is to make editing difficult, I can only try to salvage my little area of edition.
Here is my edit (I was removing a div in a talk
page,
use of div is becoming more and more common, as
well
as use of anchors to support 40 or 50 kb pages)
http://fr.wikipedia.org/w/wiki.phtml?title=Discuter:Ph%E9nix&diff=82635&...
d=82624
And here is Vincent next edit, restoring manually
all
the non-breakable spaces behind me
http://fr.wikipedia.org/w/wiki.phtml?title=Discuter:Ph%E9nix&diff=82637&...
d=82635
I don't see how anything was "broken" by your browser and I strongly doubt that Netscape would have any such bug. Vincent inserted the s for the first time in the above edit after yours, as the page source code for the earlier revision shows. Any changes in display he noticed are likely the result of how his browser handles word/character wrapping, which may change when more words/characters are added. It could also be that he pasted the text from some word processor and inserted hard spaces in that application, which were lost during the copy and pasting.
Ah, yes, he absolutely does so. Word processor.
I know because he lectured me on the fact I didnot know the proper usage of non-breakable spaces (which I do, but do not respect in the least ;-)). He gave me a link to a page explaining how to do non-breakable spaces from my editor. He also said any decent editor was supporting writing properly good french, and that I should not edit anything even on talk pages without having first run the grammatical/typo corrector. Because my editing without accents, and with spelling mistakes on talk pages was not respectful of readers. 'Cause we were not a chat, but an encyclopedia
We've got to be serious ! This is no joke work !
:-(((((( !!!!!
But there is *no* grammatical/typo corrector on a wiki edit page :-((
All I can conclude from this comment is that he is indeed writing all what he upload to wiki on a Word (or equivalent) editor first hand. Is that usual ??? It happens I prepare long articles on Word before, or that I save text on simpletext when there are lags, to avoid loosing them, but all I write is just straight edited in wiki.
The only other person I apparently messed text a couple of times on our pump is Aoineko.
Do you remember Aoi ? Do you write sometimes your text on editors rather than on wiki edit windows ?
is quite obvious that your edits did not break anything because the s Vincent added were still there after your further edits to the same page.
Well, I will try to explain that to him But he will probably answer that I am the only one doing so with his non breakable editor spaces.
Thanks Erik
Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com _______________________________________________ Wikitech-l mailing list Wikitech-l@wikipedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
--- Fred Bauder fredbaud@ctelco.net wrote:
Well, you've got to bring it to a head. Pick something of his and edit it. And fight it out. Otherwise you're letting him establish a custom and practice that in the long run is unacceptable. Basically he is what we used to call "taking position" on you. More or less in the manner of Larry Sanger.
Let him worry about the anchors...just bull ahead...
That you are in essence poking what seems to be a skunk, albeit a pretty French one, will tell you the sacrifices you must bear...
The alternative is to permit him ownership of the article, an option not permitted by current Wikipedia policy (although I don't happen to be in agreement with it).
Fred Bauder
yup; quite true
Update : I posted Erik answer on our pump.
Vincent comment was that I was intellectually dishonest, as the message was clearly from me, pretending to be someone called Erik. In short, he said the answer was a forgery !
I also decided not to be *too* bold, and before reworking the article, I matter of factly proposed a solution for division of the article and renaming.
Vincent only answer was "I said I would take care of it myself"
That is all. I think that won't make it :-)
Anthere
__________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com
Le Fri, 25 Jul 2003 11:45:05 -0700 (PDT), Anthere anthere6@yahoo.com wrote:
Update : I posted Erik answer on our pump. Vincent comment was that I was intellectually dishonest, as the message was clearly from me, pretending to be someone called Erik. In short, he said the answer was a forgery !
It's a shame! I never wrote that; just look at the history. The text is even not well written (lacks a capital, does not use the right quote marks, etc.). Frankly, this is disgusting and, there, you are making a forgery. I can not believe you think *I* wrote this.
I also decided not to be *too* bold, and before reworking the article, I matter of factly proposed a solution for division of the article and renaming. Vincent only answer was "I said I would take care of it myself"
Yes, because, as I've already told, it's rather boring to do I do know where the links are.
--- Vincent Ramos siva-nataraja+spam@alussinan.org wrote:
Le Fri, 25 Jul 2003 11:45:05 -0700 (PDT), Anthere anthere6@yahoo.com wrote:
Update : I posted Erik answer on our pump. Vincent comment was that I was intellectually dishonest, as the message was clearly from me, pretending to be someone called Erik. In short, he said the answer was a forgery !
It's a shame! I never wrote that; just look at the history. The text is even not well written (lacks a capital, does not use the right quote marks, etc.). Frankly, this is disgusting and, there, you are making a forgery. I can not believe you think *I* wrote this.
which is precisely why I apologized for not having recognise a troll at
http://mail.wikipedia.org/pipermail/wikitech-l/2003-July/005164.html
and on your talk page.
I agree a lack of capital and wrong quote marks are definitly signs I should have noticed. you are the one who express himself the best on the fr wiki.
Now, you may wish to go on fueling the troll with screaming and bad words, it will just do what he wanted to get in doing so.
Which would not solve our problem in any way :-)
I also decided not to be *too* bold, and before reworking the article, I matter of factly proposed
a
solution for division of the article and renaming. Vincent only answer was "I said I would take care
of
it myself"
Yes, because, as I've already told, it's rather boring to do I do know where the links are.
Vincent. This is *precisely* why anchors are wrong. When it comes to a point only the main editor (or should I say the only ?) is able to edit, rename, rework, add, remove to an article, because of a complex linking system, there is a problem. That is exactly what I mean by simplicity of edition. And no, it is not only *my* agenda to support this. That is what I have been trying to explain to you. You say you want other editors to edit your text, but at the same time, you say you would prefer they do not do so, because the link system is tough, and only you know where the links are. Well, yes, an article should not have only one author, and everyone should rework it without fear of messing the whole place.
Please, if you do not understand that, say so here, so that perhaps somebody else explain better than I do.
__________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com
Le Fri, 25 Jul 2003 09:47:08 -0600, Fred Bauder fredbaud@ctelco.net wrote:
Well, you've got to bring it to a head. Pick something of his and edit it.
Thanks.
And fight it out. Otherwise you're letting him establish a custom and practice that in the long run is unacceptable. Basically he is what we used to call "taking position" on you. More or less in the manner of Larry Sanger.
You really should not try to judge people without knowing what they have to say.
Let him worry about the anchors...just bull ahead... That you are in essence poking what seems to be a skunk, albeit a pretty French one, will tell you the sacrifices you must bear...
Bravo. That's a kind of incisive judgement I find rather displaced: when one only knows what another one tells about someone (did you ever think Anthere could be wrong?) and takes it for granted whithout even verifiying the facts.
The alternative is to permit him ownership of the article, an option not permitted by current Wikipedia policy (although I don't happen to be in agreement with it).
I've *never* asked this. And whoever can modify the pages I wrote.
Good to know it was all a false alarm...
Fred Bauder
http://www.internet-encyclopedia.org
From: Vincent Ramos siva-nataraja+spam@alussinan.org Organization: Les Dieux anciens réunis Reply-To: wikitech-l@wikipedia.org Date: Fri, 25 Jul 2003 21:36:30 +0200 To: wikitech-l@wikipedia.org Subject: [Wikitech-l] Re: The New French Editor, was Re: Phase IV
Le Fri, 25 Jul 2003 09:47:08 -0600, Fred Bauder fredbaud@ctelco.net wrote:
Well, you've got to bring it to a head. Pick something of his and edit it.
Thanks.
And fight it out. Otherwise you're letting him establish a custom and practice that in the long run is unacceptable. Basically he is what we used to call "taking position" on you. More or less in the manner of Larry Sanger.
You really should not try to judge people without knowing what they have to say.
Let him worry about the anchors...just bull ahead... That you are in essence poking what seems to be a skunk, albeit a pretty French one, will tell you the sacrifices you must bear...
Bravo. That's a kind of incisive judgement I find rather displaced: when one only knows what another one tells about someone (did you ever think Anthere could be wrong?) and takes it for granted whithout even verifiying the facts.
The alternative is to permit him ownership of the article, an option not permitted by current Wikipedia policy (although I don't happen to be in agreement with it).
I've *never* asked this. And whoever can modify the pages I wrote.
Wikitech-l mailing list Wikitech-l@wikipedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Me too :-)
I made some of the changes about which I talked about
Consisting in major part of dividing the 48 kb page into two new pages.
Vincent has restored everything, declared an edit war, and ask for the page to be protected.
Ownership you said ?
:-)
--- Fred Bauder fredbaud@ctelco.net wrote:
Good to know it was all a false alarm...
Fred Bauder
http://www.internet-encyclopedia.org
From: Vincent Ramos
siva-nataraja+spam@alussinan.org
Organization: Les Dieux anciens r�unis Reply-To: wikitech-l@wikipedia.org Date: Fri, 25 Jul 2003 21:36:30 +0200 To: wikitech-l@wikipedia.org Subject: [Wikitech-l] Re: The New French Editor,
was Re: Phase IV
Le Fri, 25 Jul 2003 09:47:08 -0600, Fred Bauder fredbaud@ctelco.net wrote:
Well, you've got to bring it to a head. Pick
something of his and edit
it.
Thanks.
And fight it out. Otherwise you're letting him
establish a custom and
practice that in the long run is unacceptable.
Basically he is what we
used to call "taking position" on you. More or
less in the manner of
Larry Sanger.
You really should not try to judge people without
knowing what they have to
say.
Let him worry about the anchors...just bull
ahead...
That you are in essence poking what seems to be a
skunk, albeit a pretty
French one, will tell you the sacrifices you must
bear...
Bravo. That's a kind of incisive judgement I find
rather displaced: when
one only knows what another one tells about
someone (did you ever think
Anthere could be wrong?) and takes it for granted
whithout even verifiying
the facts.
The alternative is to permit him ownership of the
article, an option not
permitted by current Wikipedia policy (although I
don't happen to be in
agreement with it).
I've *never* asked this. And whoever can modify
the pages I wrote.
Wikitech-l mailing list Wikitech-l@wikipedia.org
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@wikipedia.org
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
__________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com
Vincint,
It is established policy to divide any article over 30kb since some browsers won't edit pages over that length. This is a settled issue.
Fred Bauder
http://www.internet-encyclopedia.org
From: Anthere anthere6@yahoo.com Reply-To: wikitech-l@wikipedia.org Date: Sat, 26 Jul 2003 02:07:02 -0700 (PDT) To: wikitech-l@wikipedia.org Subject: Re: [Wikitech-l] Re: The New French Editor and edit war
Me too :-)
I made some of the changes about which I talked about
Consisting in major part of dividing the 48 kb page into two new pages.
Vincent has restored everything, declared an edit war, and ask for the page to be protected.
Ownership you said ?
:-)
--- Fred Bauder fredbaud@ctelco.net wrote:
Good to know it was all a false alarm...
Fred Bauder
http://www.internet-encyclopedia.org
From: Vincent Ramos
siva-nataraja+spam@alussinan.org
Organization: Les Dieux anciens r?unis Reply-To: wikitech-l@wikipedia.org Date: Fri, 25 Jul 2003 21:36:30 +0200 To: wikitech-l@wikipedia.org Subject: [Wikitech-l] Re: The New French Editor,
was Re: Phase IV
Le Fri, 25 Jul 2003 09:47:08 -0600, Fred Bauder fredbaud@ctelco.net wrote:
Well, you've got to bring it to a head. Pick
something of his and edit
it.
Thanks.
And fight it out. Otherwise you're letting him
establish a custom and
practice that in the long run is unacceptable.
Basically he is what we
used to call "taking position" on you. More or
less in the manner of
Larry Sanger.
You really should not try to judge people without
knowing what they have to
say.
Let him worry about the anchors...just bull
ahead...
That you are in essence poking what seems to be a
skunk, albeit a pretty
French one, will tell you the sacrifices you must
bear...
Bravo. That's a kind of incisive judgement I find
rather displaced: when
one only knows what another one tells about
someone (did you ever think
Anthere could be wrong?) and takes it for granted
whithout even verifiying
the facts.
The alternative is to permit him ownership of the
article, an option not
permitted by current Wikipedia policy (although I
don't happen to be in
agreement with it).
I've *never* asked this. And whoever can modify
the pages I wrote.
Wikitech-l mailing list Wikitech-l@wikipedia.org
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@wikipedia.org
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com _______________________________________________ Wikitech-l mailing list Wikitech-l@wikipedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
--- Fred Bauder fredbaud@ctelco.net wrote:
Vincint,
It is established policy to divide any article over 30kb since some browsers won't edit pages over that length. This is a settled issue.
Fred Bauder
http://www.internet-encyclopedia.org
From: Anthere anthere6@yahoo.com Reply-To: wikitech-l@wikipedia.org Date: Sat, 26 Jul 2003 02:07:02 -0700 (PDT) To: wikitech-l@wikipedia.org Subject: Re: [Wikitech-l] Re: The New French
Editor and edit war
Me too :-)
I made some of the changes about which I talked
about
Consisting in major part of dividing the 48 kb
page
into two new pages.
Vincent has restored everything, declared an edit
war,
and ask for the page to be protected.
Ownership you said ?
:-)
--- Fred Bauder fredbaud@ctelco.net wrote:
Good to know it was all a false alarm...
Fred Bauder
http://www.internet-encyclopedia.org
From: Vincent Ramos
siva-nataraja+spam@alussinan.org
Organization: Les Dieux anciens r?unis Reply-To: wikitech-l@wikipedia.org Date: Fri, 25 Jul 2003 21:36:30 +0200 To: wikitech-l@wikipedia.org Subject: [Wikitech-l] Re: The New French Editor,
was Re: Phase IV
Le Fri, 25 Jul 2003 09:47:08 -0600, Fred Bauder fredbaud@ctelco.net wrote:
Well, you've got to bring it to a head. Pick
something of his and edit
it.
Thanks.
And fight it out. Otherwise you're letting him
establish a custom and
practice that in the long run is unacceptable.
Basically he is what we
used to call "taking position" on you. More or
less in the manner of
Larry Sanger.
You really should not try to judge people
without
knowing what they have to
say.
Let him worry about the anchors...just bull
ahead...
That you are in essence poking what seems to be
a
skunk, albeit a pretty
French one, will tell you the sacrifices you
must
bear...
Bravo. That's a kind of incisive judgement I
find
rather displaced: when
one only knows what another one tells about
someone (did you ever think
Anthere could be wrong?) and takes it for
granted
whithout even verifiying
the facts.
The alternative is to permit him ownership of
the
article, an option not
permitted by current Wikipedia policy (although
I
don't happen to be in
agreement with it).
I've *never* asked this. And whoever can modify
the pages I wrote.
I feel like choking Fred....
After several reversion (I should mention that I made a couple of light (non controversial) edits to the resulting articles (headings, better titles, links and such), so I would not really appreciate that my changes are removed just because we disagree on the division of the page.
The last comment put in the comment box by Vincent while reverting was
----- his comment
Dernier avertissement. Ensuite, je serais d�sol� de le faire, mais apposerai mon copyright sur cette page (les documents m'attribuant la paternit� de son contenu �tant � c�t� de moi).
----which means
Last warning. Next time, I will be sorry to do it, but I will add my copyright on this page (the documents I have just next to me attribute me the paternity of its content)
__________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com
Anthere-
----- his comment
Dernier avertissement. Ensuite, je serais dΘsolΘ de le faire, mais apposerai mon copyright sur cette page (les documents m'attribuant la paternitΘ de son contenu Θtant α c⌠tΘ de moi).
----which means
Last warning. Next time, I will be sorry to do it, but I will add my copyright on this page (the documents I have just next to me attribute me the paternity of its content)
He can put his copyright on the page; you can remove the copyright AND change the text, since all submitted text is FDL-licensed and a copyright notice to that effect is already at the bottom of each page.
Regards,
Erik
Yes, one of my grave defects is love of stirring up trouble.
Vincint,
I hope you can see, apart from the fun we are having, that anyone who logs on to the French Wikipedia, not just Anthere, can edit your article. I sympathise with your view that you should be able to maintain the integrity of your article and in most publishing situations you could, (like fun) but where you are you can't due to the GNU copyright (which by publishing on Wikipedia you have implicitly agreed to). I hope this clarifies the situation.
Fred Bauder
http://www.internet-encyclopedia.org
From: Anthere anthere6@yahoo.com Reply-To: wikitech-l@wikipedia.org Date: Sat, 26 Jul 2003 03:09:37 -0700 (PDT) To: wikitech-l@wikipedia.org Subject: Re: [Wikitech-l] Re: The New French Editor and edit war
--- Fred Bauder fredbaud@ctelco.net wrote:
Vincint,
It is established policy to divide any article over 30kb since some browsers won't edit pages over that length. This is a settled issue.
Fred Bauder
I feel like choking Fred....
After several reversion (I should mention that I made a couple of light (non controversial) edits to the resulting articles (headings, better titles, links and such), so I would not really appreciate that my changes are removed just because we disagree on the division of the page.
The last comment put in the comment box by Vincent while reverting was
----- his comment
Dernier avertissement. Ensuite, je serais d?sol? de le faire, mais apposerai mon copyright sur cette page (les documents m'attribuant la paternit? de son contenu ?tant ? c?t? de moi).
----which means
Last warning. Next time, I will be sorry to do it, but I will add my copyright on this page (the documents I have just next to me attribute me the paternity of its content)
Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com _______________________________________________ Wikitech-l mailing list Wikitech-l@wikipedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Le Fri, 25 Jul 2003 08:22:02 -0700 (PDT), Anthere anthere6@yahoo.com a �crit:
And why I don't like the anchors he is putting everywhere. Since most of his articles are very long (but very good and interesting as well), he puts some anchors in the text, so links from other articles may lead to the good paragraph in the very meaty one.
All this is good, but I dare not (and he wishes me not to) edit article such as the one on greek religion to avoid messing with the anchors.
I've just told you how you could do this; rather simple but daunting.
This is very unwiki ihmo, but when I protest, he says I am not the boss and this is only *my* opinion, and that I am forcing my desire of simplicity upon others, to the detriment of quality articles, which may be long, with sankrit or greek or whatever characters, and anchors, and non-breakable spaces, and div spaces :-((((
What kind of behavior is this? Will you then start crying?
http://fr.wikipedia.org/w/wiki.phtml?title=Th%E9_chinois&action=edit This one is a nightmare to me. I have no idea what other people see here, but as for myself, it is scarry
Of course, it's full of Chinese characters; as the French Wikipedia can't handle raw Unicode data, I have to use decimal entities instead.
This desolate me, but if the consensus is to make editing difficult, I can only try to salvage my little area of edition.
Ah, yes, he absolutely does so. Word processor.
Laugh. I do not even own it; I use Babelpad, a raw Unicode text editor http://uk.geocities.com/BabelStone1357. But you still believe I'm wrong; please look a the raw source of your modifications with an hexadecimal editor.
I know because he lectured me on the fact I didnot know the proper usage of non-breakable spaces (which I do, but do not respect in the least ;-)). He gave me a link to a page explaining how to do non-breakable spaces from my editor. He also said any decent editor was supporting writing properly good french, and that I should not edit anything even on talk pages without having first run the grammatical/typo corrector. Because my editing without accents, and with spelling mistakes on talk pages was not respectful of readers. 'Cause we were not a chat, but an encyclopedia We've got to be serious ! This is no joke work ! :-(((((( !!!!!
Frankly, I find this really childish.
All I can conclude from this comment is that he is indeed writing all what he upload to wiki on a Word (or equivalent) editor first hand. Is that usual ???
Nope.
is quite obvious that your edits did not break anything because the s Vincent added were still there after your further edits to the same page.
Of course: Anthere only modifies U+00A0, and not html entities.
Well, I will try to explain that to him But he will probably answer that I am the only one doing so with his non breakable editor spaces.
No.
--- Vincent Ramos siva-nataraja+spam@alussinan.org wrote:
Le Fri, 25 Jul 2003 08:22:02 -0700 (PDT), Anthere anthere6@yahoo.com a �crit:
All this is good, but I dare not (and he wishes me
not
to) edit article such as the one on greek religion
to
avoid messing with the anchors.
I've just told you how you could do this; rather simple but daunting.
No problem. We'll get rid of anchors. That will make the moves easier
This is very unwiki ihmo, but when I protest, he says I am not the boss and
this
is only *my* opinion, and that I am forcing my
desire
of simplicity upon others, to the detriment of
quality
articles, which may be long, with sankrit or greek
or
whatever characters, and anchors, and
non-breakable
spaces, and div spaces :-((((
What kind of behavior is this? Will you then start crying?
I'll tell you :-)
Ah, yes, he absolutely does so. Word processor.
Laugh. I do not even own it; I use Babelpad, a raw Unicode text editor http://uk.geocities.com/BabelStone1357. But you still believe I'm wrong; please look a the raw source of your modifications with an hexadecimal editor.
That is not the point. What is important ? The fact I am responsible or the fact you are ?
None. What is a problem is that *you* complain you have to edit all the time after me. This is what matter. Not anything else
is quite obvious that your edits did not break anything because the s Vincent added were
still there after your
further edits to the same page.
Of course: Anthere only modifies U+00A0, and not html entities.
Well, I will try to explain that to him But he will probably answer that I am the only one doing so with his non breakable editor spaces.
No.
Ah ? You mean you have similar problems with others than me ? That is interesting !
__________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com
On Fri, Jul 25, 2003 at 09:30:18PM +0200, Vincent Ramos wrote:
Le Fri, 25 Jul 2003 08:22:02 -0700 (PDT), Anthere anthere6@yahoo.com a ???crit:
This is very unwiki ihmo, but when I protest, he says I am not the boss and this is only *my* opinion, and that I am forcing my desire of simplicity upon others, to the detriment of quality articles, which may be long, with sankrit or greek or whatever characters, and anchors, and non-breakable spaces, and div spaces :-((((
What kind of behavior is this? Will you then start crying?
Vincent, I doubt that your behaviour is acceptable by any means.
Regards,
JeLuF
Vincent-
Of course: Anthere only modifies U+00A0, and not html entities.
Then don't use U+00A0 -- Anthere uses a reasonably modern browser, and if this bug happens in two constellations (Opera + Netscape 7), she will not be the only one to break these spaces. Use or don't use hard spaces at all.
Regards,
Erik
Erik Moeller wrote:
Anthere-
Recently, a new editor (a french language teacher) got on the french wiki. He insists on using these unbreakable spaces.
I don't think that's a good idea. It only makes pages harder to edit because of the ugly HTML entities.
I agree. It's ugly in source.
Why can't a browser see a text is in french and then treat a space between numbers or after punctuation as unbreakable? Has the w3c thought of this yet?
--- Erik Moeller erik_moeller@gmx.de wrote:
I don't see how anything was "broken" by your browser and I strongly doubt that Netscape would have any such bug. Vincent inserted the s for the first time in the above edit after yours, as the page source code for the earlier revision shows. Any changes in display he noticed are likely the result of how his browser handles word/character wrapping, which may change when more words/characters are added. It could also be that he pasted the text from some word processor and inserted hard spaces in that application, which were lost during the copy and pasting. In any case, it is quite obvious that your edits did not break anything because the s Vincent added were still there after your further edits to the same page.
Thank Shai for trusting me :-)
Vincent replied the example chosen was not good
I tried to translate his comment below
:�videmment, l'exemple choisi ne fonctionne pas puisque presque tout le texte appara�t comme modifi�. On ne voit donc pas le d�tail des op�rations. Maintenant, regardons avec [http://fr.wikipedia.org/w/wiki.phtml?title=Th%E9_chinois&diff=71928&... un exemple pertinent] : que voit-on ? Outre les changements volontaires, tous les mots encadr�s par <��> ou pr�c�dant <:> et <;> sont marqu�s en rouge. Il est pourtant improbable qu'Anthere ait remplac� tous ces termes et les signes de ponctuation par la m�me chose, car c'est bien ce que l'on semble constater (par exemple, ''ligne 1'' : � Note : � est remplac� par � Note : �, etc.). Or, si l'on prend la peine de regarder dans le d�tail ce qui a �t� modifi�, il suffit d'utiliser un �diteur hexad�cimal et de regarder la source, on voit qu'il s'agit des... espaces ins�cables, situ�es justement devant <:> et <;>, apr�s <�> et avant <�> (entre autres).
:Je prends toujours la ''ligne 1'', � partir de � Note � jusqu'au deux-points ; que dit l'�diteur ? :* version ant�rieure : 4E 6F 74 65 A0, ce qui correspond � ''N'', ''o'', ''t'', ''e'', ''[ ] (espace ins�cable)'' ; :* version corrig�e : 4E 6F 74 65 20 ; ce qui correspond � ''N'', ''o'', ''t'', ''e'', ''[ ] (espace normale)''.
:On peut s'amuser � v�rifier cela avec n'importe quel autre mot en rouge qui semble correspondre dans la version corrig�e exactement � ce qui est pr�sent dans la version ancienne. � chaque fois, le caract�re U+00A0 devient U+0020. Pour �viter cela, j'ai pris le parti de corriger le caract�re brut U+00A0 en entit� html <nowiki> </nowiki> ce qui, je l'accorde, rend le code disgracieux.
----------- Naturally, the chosen example does not work since most of the text appears modified. One can't see the details of the modifications. Now, let's consider
[http://fr.wikipedia.org/w/wiki.phtml?title=Th%E9_chinois&diff=71928&... a great example] : what do we see ?
Apart from volontary changes, all the words bordered (?) by <��> or before <:> et <;> are in red. However, it is unlikely that Anthere has replaced all this words (note : I did not) and all the punctuation signs by the same thing, though it is what it appears to be
For example, ''line 1'' : � Note : � is replaced by � Note : �, etc.). If we take great care to look in detail what has been modified, it is sufficient to use a hexadecimal editor and to consider the source. One can see that it is precisely ... non-breakable spaces, just before <:> and <;>, after <�> and before <�> (among others).
''line 1'' also, from � Note � to the double points, what does the editor says ?
:* anterior version : 4E 6F 74 65 A0, which corresponds with ''N'', ''o'', ''t'', ''e'', ''[ ] (non breakable space)'' ; :* corrected version : 4E 6F 74 65 20 ; which corresponds with ''N'', ''o'', ''t'', ''e'', ''[ ] (normal space)''.
One can play checking this with any other red word which appear to fit in the corrected version with the anterior version. Each time, the character U+00A0 is transformed in U+0020. To avoid this, I decided to correct the raw character U+00A0 in an html entity <nowiki> </nowiki>, which is, admitedely, ugly.
--------
Anthere : so ? :-(
__________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com
On Fri, Jul 25, 2003 at 12:15:35PM -0700, Anthere wrote:
--- Erik Moeller erik_moeller@gmx.de wrote:
Vincent replied the example chosen was not good
I tried to translate his comment below
:* anterior version : 4E 6F 74 65 A0, which corresponds with ''N'', ''o'', ''t'', ''e'', ''[ ] (non breakable space)'' ; :* corrected version : 4E 6F 74 65 20 ; which corresponds with ''N'', ''o'', ''t'', ''e'', ''[ ] (normal space)''.
Anthere : so ? :-(
OK, looks like he is right. Anthere, which browser are you currently using so that we can try to find what is doing the change?
Despite this I think wiki software for French Wikipedia should be changed to insert 's around colons, brackets, etc pp. to automatically render "good French" at reasonable editing efforts. Having those blanks unbreakable is what a computer should have to care about, not an editor.
In which cases are those non breakables usually used?
Regards,
JeLuF
Jens Frank wrote:
Despite this I think wiki software for French Wikipedia should be changed to insert 's around colons, brackets, etc pp. to automatically render "good French" at reasonable editing efforts. Having those blanks unbreakable is what a computer should have to care about, not an editor.
It *should* be the web browser that does this!
On Fri, Jul 25, 2003 at 08:55:23PM +0100, tarquin wrote:
Jens Frank wrote:
Despite this I think wiki software for French Wikipedia should be changed to insert 's around colons, brackets, etc pp. to automatically render "good French" at reasonable editing efforts. Having those blanks unbreakable is what a computer should have to care about, not an editor.
It *should* be the web browser that does this!
Agreed, but I think it might be easier to update wikipedia software than all available browsers.
JeLuF
--- Jens Frank JeLuF@gmx.de wrote:
On Fri, Jul 25, 2003 at 08:55:23PM +0100, tarquin wrote:
Jens Frank wrote:
Despite this I think wiki software for French
Wikipedia
should be changed to insert 's around
colons, brackets,
etc pp. to automatically render "good French" at
reasonable
editing efforts. Having those blanks unbreakable
is what a
computer should have to care about, not an
editor.
It *should* be the web browser that does this!
Agreed, but I think it might be easier to update wikipedia software than all available browsers.
JeLuF
Errr...a point came to my mind...
I switch my browser about a month ago
Mine was an old Opera. I had trouble with long pages, and messed meta quite often as it did not support unicode. But had no pb with it on the french wiki about one good month ago when some northern characters appeared in a myth article. I then discovered I messed some characters.
To suit Vincent needs, I really looked for another browser which could work on my system (mac os 9). And I finally found Netscape 7
The th� chinois article given as an example by Vincent is an article I edited with Opera 5. And that is after discussions with Vincent that I concluded my only three options were either to buy a new computer, or to quit wikipedia, or to find a unicode friendly navigator. Which I did.
The example I put this afternoon was done recently, under my new browser.
So...the same problem is occuring under Opera 5 and Netscape 7. Vincent claims it is my browser. So these are my browserS. Or my system. Or my plateform. Or what ?
__________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com
I'm not quite sure why you initiated this whole discussion in this thread of mine, because it doesn't really seem to have anything to do with my experimental design. Not that I mind really, but it will cause confusing threading in mail/news clients and the mailing list archive.
Anthere wrote:
So...the same problem is occuring under Opera 5 and Netscape 7. Vincent claims it is my browser. So these are my browserS. Or my system. Or my plateform. Or what ?
I haven't tried Opera 5, but seeing as it's very old, it may well have this bug.
I have, however, tried Mozilla 1.4, which is closely related to Netscape 7 (though I'm under Windows). It does have this bug. I've reported it to BugZilla: http://bugzilla.mozilla.org/show_bug.cgi?id=213924
So yes, it is very well possible that you are seeing the same bug in two independent browsers.
It appears to work fine for me in Opera 7.11 under Windows.
Greetings, Timwi
Le Fri, 25 Jul 2003 06:25:44 -0700 (PDT), Anthere anthere6@yahoo.com wrote
Recently, a new editor (a french language teacher) got on the french wiki. He insists on using these unbreakable spaces. As far as I know, he is the only one.
You should see what other say; for instance Didup, Panoramix (I quote: "Je suis l'un des integristes de la typographie ici (je mets des vrais guillemets, des espaces insecables, etc.")), Mokona, etc.
However, for some reasons I can't explain, my browser (a brand new one, last version, unicode friendly, page over 32 kb friendly, etc...) appears to be "breaking" these spaces. Though I can't figure how it is doing so. When I look at the history, I see nothing I can explain.
Still, Vincent insists that I do break his text. To fix it, he has to either revert my changes, or fix them himself, isssue upon which he is strongly complaining so requesting that I fix the stuff myself. Fact is, I can't see what I do wrong, and I can't imagine myself editing by hand his comments myself each time I edit an article or talk page, to avoid removing the non-breakable space (if only I could figure how not to do so !)
Well, let me give you a real example: http://fr.wikipedia.org/w/wiki.phtml?title=Th%E9_chinois&diff=71928&oldid=71914 If you had looked at the source with an hexadecimal editor, you could have seen (first line, from "Note" to ":"):
* my version: 4E 6F 74 65 A0, that is <N>, <o>, <t>, <e> and <[ ]> (non- breakable space); * Anthere's version: 4E 6F 74 65 20 ; that is <N>, <o>, <t>, <e>, <[ ]> (space).
Thus, Anthere made U+00A0 (NBSpace), a normal raw ISO-8859-1 character, become U+0020, the normal space.
Anyone can verify this fact
Vincent Ramos
--- Vincent Ramos siva-nataraja+spam@alussinan.org wrote:
Le Fri, 25 Jul 2003 06:25:44 -0700 (PDT), Anthere anthere6@yahoo.com wrote
Recently, a new editor (a french language teacher)
got
on the french wiki. He insists on using these unbreakable spaces. As far as I know, he is the
only
one.
You should see what other say; for instance Didup, Panoramix (I quote: "Je suis l'un des integristes de la typographie ici (je mets des vrais guillemets, des espaces insecables, etc.")), Mokona, etc.
Key word : integrist :-) (you said it)
However, for some reasons I can't explain, my
browser
(a brand new one, last version, unicode friendly,
page
over 32 kb friendly, etc...) appears to be
"breaking"
these spaces. Though I can't figure how it is
doing
so. When I look at the history, I see nothing I
can
explain.
Still, Vincent insists that I do break his text.
To
fix it, he has to either revert my changes, or fix them himself, isssue upon which he is strongly complaining so requesting that I fix the stuff
myself.
Fact is, I can't see what I do wrong, and I can't imagine myself editing by hand his comments myself each time I edit an article or talk page, to avoid removing the non-breakable space (if only I could figure how not to do so !)
Well, let me give you a real example:
http://fr.wikipedia.org/w/wiki.phtml?title=Th%E9_chinois&diff=71928&oldid=71914
If you had looked at the source with an hexadecimal editor, you could have seen (first line, from "Note" to ":"):
- my version: 4E 6F 74 65 A0, that is <N>, <o>, <t>,
<e> and <[ ]> (non- breakable space);
- Anthere's version: 4E 6F 74 65 20 ; that is <N>,
<o>, <t>, <e>, <[ ]> (space).
Thus, Anthere made U+00A0 (NBSpace), a normal raw ISO-8859-1 character, become U+0020, the normal space.
Anyone can verify this fact
Vincent Ramos
Yes. Anyone. But then what do you suggest ?
To a certain point, I do not care whether you or I are responsible of this. Clearly, our editing ways are incompatible. Still we are on the same place, so we need to find a solution
Five solutions
1) a technical explanation and solution is found, and you or I is able to make it up to solve the technical issue
2) You clearly see that I can't do anything to avoid this, so each time you edit a page, and I edit it after you, you go after me without complaining and correct each and other non-breaking space (admitedly, that makes a lot)
3) Each time you write an article, you put your name at the top of the page, and we officially declare I am forbidden to edit this page. Potentially an issue in terms of wiki principle
4) You or I agree to quit wikipedia
5) We do not accept non-breakable spaces on the french wikipedia
So ? Does anyone has an idea for option 1 ?
Cheers
Anthere
__________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com
For the record, some of the comments I attributed to Vincent (intellectual dishonesty) were from a troll. I apologize then. I would have appreciated that he indicated before that he did not voice these comments however :-(
__________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com
"Timwi" skribis:
Erik Moeller wrote:
I encourage you to move forward with this, and to commit all code to a separate CVS module.
Well, to be entirely honest, I'd rather not use the SourceForge CVS server. It's cumbersome and full of problems. It doesn't work in Linux for me, and in Windows I have to keep entering my password for every single request (not just the commits).
I use Cygwin's CVS with SSH-Connection - there you can create your own private key on your computer, and upload the public key to Sourceforge (there is an editbox for this purpose).
After that you don't need to type in the password ...
Paul
Erik Moeller wrote:
Timwi-
The current documentation of your database schema is completely insufficient, though. Each table needs its own comment header explaining what it does, how and why.
Something like this? http://www.lionking.org/~timwi/t/wikipedia/wikipedia-iv.html
Timwi
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 26 July 2003 01:15, Timwi timwi@gmx.net wrote:
http://www.lionking.org/~timwi/t/wikipedia/wikipedia-iv.html
Interesting.
A quick thought - why not have two-way inter-language links? I can't quite see why we have only one-way links, and one (Dutch?) or more versions of the Wikipedia, people are suggesting running a 'bot to insert them.
Also, re. using BLOBs to store images rather than a file system: I would imagine that distributing and clustering a set of databases would be more versatile and easier to accomplish (certainly, I've seen it done in the accessor program's code) than a file system sharing system, which (I imagine) would require work external to the program to set up, and so be messier...
- -- James D. Forrester mailto:jon@eh.org | mailto:csvla@dcs.warwick.ac.uk mailto:jamesdforrester@hotmail.com | mailto:james@jdforrester.org
James D. Forrester wrote:
A quick thought - why not have two-way inter-language links? I can't quite see why we have only one-way links, and one (Dutch?) or more versions of the Wikipedia, people are suggesting running a 'bot to insert them.
Its been discussed.. the answer is unfortunately no because of many reasons. including the factt hat there is not allways a 1:1 relation ship between articles. I will not go into this further because its been discussed a zillion times, check the archives if you want more info.
Also, re. using BLOBs to store images rather than a file system: I would imagine that distributing and clustering a set of databases would be more versatile and easier to accomplish (certainly, I've seen it done in the accessor program's code) than a file system sharing system, which (I imagine) would require work external to the program to set up, and so be messier...
I am from the school where they teach you that if you have DB problems, you don't go using it as a filesystem... The way I would do it is upload every image with an image-id and a rev- id like 3-1.png or 99-123.jpg. The database table for images contains the id's with, length, filesize, and format. You just place a normal http request for the image. BUT here's the thing instead of just running apache, we run a apache for the php pages and some super-lightweigh server that just serves out the images. it can be on the same or another server.. call it images.en.wikipedia.org.. or imagenes.es.wikipedia.org. Get y point? that way we take some load from apache, who is much less efficient at just serving out these single files because its so much larger, it has a bunch of modules compiled into it and has to check things like rewrite rules etc.. This is just a simple easy way to save some memory, and since files are identified by a conbination of id and revision id, thre is never any worry of accidentally overwritting something.
Thoughts or concerns? Yes I know it doesnt scale THAT well.. but at the time we have 2 servers for something like 20 wikis. And unless money starts falling magically out of the sky it might stay that way for a while. I just don't see load balancers and clustering in the near feature.. unless once the wikimedia foundations someone donates a whole lot of money into it...
Lightning
I think what James Forrester was saying, about the two-way langa links -- is that in order to put a link from a specific page to another specific page, there must be a 1:1 relationship established. Why not make it so that upon that transaction, the target language page gets a simple reverse tag added to the top of that page?
Two birds one stone, I think James was thinking...
-S-
__________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com
Some quick pseudocode (warning - from a non-coder) on the idea of adding a reverse tag. The main problems are finding the new tag, and any potential edit conflict. (minor)
{ langatag="[[(langa):*.*]] reversetag=[[(thislanguageprefix):article name]]
upon save page //newlangatag? compare currentarticle() (limit top paragraph) with previousarticle() (lim prgh1) currentarticle tagcount=() previousarticle tagcount=() //if same save page and terminate. //if different if less than previous save page and terminate. if more than previous findnewtag()
//compare article tags for tags in currentarticle make an array() for tags in previous article make an array()
//compare tags in array for duplicate tags the in both articles, remove from array
//for each remaining tag()
access target article cue top of the page add reversetag() // [[currentlanguage:currentarticle]] save article
//if article is changed by a bot during a useredit, add a bot message.
} .
__________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com
Well, IMHO you *should* continue if you accept that it might be "just for the fun of it" and never actually used. I learned PHP/MySQL while writing code for Nupedia, which died soon after my code went online (no cause and effect here;-)
*If* we decide to work on a Phase IV (one of my favourite movies BTW), I think we should go for C++. Two main reasons: - Probably the fastest way to work with arrays of char (=plain text), which is most of the internal working of a wiki software - Real OOP, in contrast to the PHP-class jokes. If we design a database class in C++, for example, the actual queries could be entirely capsuled from the rest of the program. That would make switching database versions or even databases (MySQL/Postgres/???) much easier.
Capsuled classes will also make it easy to write stand-alone-programs for offline reading, as I tried. A proof-of-principle-viewer is available in C++.
The main reason *not* to use C++ is that I don't know how to turn such a program into an apache module. Help on that would be appreciated.
Magnus
Timwi wrote:
Hi,
the past few days I've been experimenting a bit with Apache, mod_perl, MySQL and creating an entire own website. I've never done that before, and I think I've learnt a lot from this.
Now, for some reason, against all of your advice, I have started to program a Wiki, and by now it's already become a suitable basis for a Phase-IV Wikipedia software, including a database schema.
It really doesn't seem to be very difficult to re-program the current software in Perl, this time taking all the problems into account and designing everything right from the start (incl. single-login, multi-language watchlists, a better translation system and skin system (separated from the code), etc.) I've also made a lot of considerations and decisions with respect to database design and performance.
Should I continue with this?
Greetings, Timwi
Wikitech-l mailing list Wikitech-l@wikipedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
On Fri, Jul 25, 2003 at 10:50:53PM +0200, Magnus Manske wrote:
Well, IMHO you *should* continue if you accept that it might be "just for the fun of it" and never actually used. I learned PHP/MySQL while writing code for Nupedia, which died soon after my code went online (no cause and effect here;-)
*If* we decide to work on a Phase IV (one of my favourite movies BTW), I think we should go for C++. Two main reasons:
- Probably the fastest way to work with arrays of char (=plain text),
which is most of the internal working of a wiki software
- Real OOP, in contrast to the PHP-class jokes. If we design a database
class in C++, for example, the actual queries could be entirely capsuled from the rest of the program. That would make switching database versions or even databases (MySQL/Postgres/???) much easier.
Capsuled classes will also make it easy to write stand-alone-programs for offline reading, as I tried. A proof-of-principle-viewer is available in C++.
The main reason *not* to use C++ is that I don't know how to turn such a program into an apache module. Help on that would be appreciated.
Have you lost all self-preservation instinct already ? C/C++ is absolutely the most insecure option that exists. And if you want to use arrays of chars, you may as well turn off the root password - at this point it doesn't really make any different.
If speed was crucial we can use Ocaml, Common Lisp, compiled Java, or one of a few other compiled and GC-ed languages with clear separation of data and code.
Not to mention that using arrays of chars with hand-written parsers for them is almost always slower than naive Perl regular expressions, unless parsed language is really trivial, and it's not ever comparable in speed to lex+yacc.
Tomasz Wegrzanowski wrote:
On Fri, Jul 25, 2003 at 10:50:53PM +0200, Magnus Manske wrote:
Well, IMHO you *should* continue if you accept that it might be "just for the fun of it" and never actually used. I learned PHP/MySQL while writing code for Nupedia, which died soon after my code went online (no cause and effect here;-)
*If* we decide to work on a Phase IV (one of my favourite movies BTW), I think we should go for C++. Two main reasons:
- Probably the fastest way to work with arrays of char (=plain text),
which is most of the internal working of a wiki software
- Real OOP, in contrast to the PHP-class jokes. If we design a database
class in C++, for example, the actual queries could be entirely capsuled from the rest of the program. That would make switching database versions or even databases (MySQL/Postgres/???) much easier.
Capsuled classes will also make it easy to write stand-alone-programs for offline reading, as I tried. A proof-of-principle-viewer is available in C++.
The main reason *not* to use C++ is that I don't know how to turn such a program into an apache module. Help on that would be appreciated.
Have you lost all self-preservation instinct already ? C/C++ is absolutely the most insecure option that exists. And if you want to use arrays of chars, you may as well turn off the root password - at this point it doesn't really make any different.
If speed was crucial we can use Ocaml, Common Lisp, compiled Java, or one of a few other compiled and GC-ed languages with clear separation of data and code.
Not to mention that using arrays of chars with hand-written parsers for them is almost always slower than naive Perl regular expressions, unless parsed language is really trivial, and it's not ever comparable in speed to lex+yacc. _______________________________________________
While I love c and c++ I must agree. I just really like php and with php5 coming out, we have much to look for in the world of OOP. That and there is so much functionality being added to php all the time, I mean, its not nearly as much as PErl, but its catching up....
Lightning
Magnus Manske wrote:
*If* we decide to work on a Phase IV (one of my favourite movies BTW), I think we should go for C++. Two main reasons:
- Probably the fastest way to work with arrays of char (=plain text),
which is most of the internal working of a wiki software
Although you're probably right that C++ is very fast in terms of handling of character arrays, I'm not sure this miniscule speed benefit rectifies using a language that isn't UTF-8 aware. Perl 5.8 onwards understands UTF-8 and can work with it internally, i.e. in its own C code, so it shouldn't be significantly slower.
- Real OOP, in contrast to the PHP-class jokes.
That would be one reason I would greatly agree with you.
The main reason *not* to use C++ is that I don't know how to turn such a program into an apache module. Help on that would be appreciated.
I don't know that either. The only two ways of using a compiled binary in Apache that I know of would be either to set it as a handler for a filename extension, or to call it using CGI. Both methods are inacceptable because they create a new process for each HTTP request.
I've never heard of other web software written in C++ (apart from the *actual* webserver, Apache, of course). Not that I mind Wikipedia being an oddball in that respect, but if nobody did this before, maybe that's a sign it might not be a good idea.
Maybe we should write a whole new webserver (just joking).
Timwi
I've uploaded the following file to demonstrate... something... or just to show off, I don't know...
http://www.lionking.org/~timwi/t/wikipedia/wikipedia-iv.html
Well, basically, there are some questions and options left, and I would appreciate any feedback and discussion. For example, how would we handle language-specific user properties (read the paragraph underneath the 'user properties' heading)? etc.
Thanks, Timwi
Timwi wrote:
I've uploaded the following file to demonstrate... something... or just to show off, I don't know...
http://www.lionking.org/~timwi/t/wikipedia/wikipedia-iv.html
You forgot one parameter: Response time. As far as I know, phase III doesn't even have a design goal for response time, only a naive hope that it will be "as fast as possible", which in reality for the last week seems to be an average of 1.5 seconds, with 8% of requests taking more than 5 seconds. It has been far worse at times.
I think that phase IV should have a design goal for response times, and that your specification should outline a way to monitor them and keep them within the goal. I suggest setting a goal for the average of 1.0 seconds and no more than 1% should take more than 5 seconds.
In my experience, it is slow response times that make users leave a website never to return again. Lack of features or ugly design isn't half as bad.
A way to measure churn (the rate at which users abandon the site) would also be nice.
Lars-
Timwi wrote:
I've uploaded the following file to demonstrate... something... or just to show off, I don't know...
http://www.lionking.org/~timwi/t/wikipedia/wikipedia-iv.html
You forgot one parameter: Response time. As far as I know, phase III doesn't even have a design goal for response time, only a naive hope that it will be "as fast as possible", which in reality for the last week seems to be an average of 1.5 seconds, with 8% of requests taking more than 5 seconds. It has been far worse at times.
I agree quite strongly. One reason behind susning.nu's impressive growth has been the excellent response time of your site. Viewing it almost makes me wish I spoke Swedish, just because it's so pleasant to browse. In fact, Wikipedia initially was very fast, too, thanks to the well written UseMod engine. Too bad it's only file based.
Note that Phase III *can* be very fast, too. It is just constantly bogged down by unoptimized queries and slow parsing.
Regards,
Erik
Erik Moeller wrote:
I agree quite strongly. One reason behind susning.nu's impressive growth has been the excellent response time of your site. Viewing it almost makes me wish I spoke Swedish, just because it's so pleasant to browse. In fact,
In November and December 2002, many users jumped from susning.nu to sv.wikipedia because of the licensing rules, leading to sv.wikipedia's rise from 265 to 2650 articles in a month (and 11600 today). That jump would have been quite unthinkable before May 2002, when Wikipedia was so painfully slow.
This shows that GNU FDL licensing rules can be important to attract (certain, very active) users, but website performance is even more important. (The cynical example of Wernher von Braun comes to mind: Having worked for the nazis is less of a problem if you are a leading rocket scientist.)
Timwi-
I've uploaded the following file to demonstrate... something... or just to show off, I don't know... http://www.lionking.org/~timwi/t/wikipedia/wikipedia-iv.html
Quite impressive. Let's see how long it takes you to actually implement :-). Some notes:
1) You'll need some capability for banning signed in users. Having bans expire after a while would be nice, too.
2) New message notification needs to work for all users, not just signed in ones. In Phase III we use a table that stores IP addresses of users who have new messages.
3) Currently talk pages and normal pages are "bundled" when watching: You watch one, you also watch the other. You remove one, you remove the other. This behavior should not change.
4) While you're at it, you may want to think about a better access rights system than simple sysop/non-sysop. ACLs would probably work best.
As for the userprops, having a langid column seems to make the most sense.
Regards,
Erik
Erik Moeller wrote:
- You'll need some capability for banning signed in users.
What's the point? They'd just log out.
Having bans expire after a while would be nice, too.
OK, how about this:
CREATE TABLE bans ( userid int unsigned NULL, iprange char(13) NULL, bannerid int unsigned NOT NULL, reason varchar(255) NOT NULL, timestamp datetime NOT NULL, expires datetime NULL, PRIMARY KEY (userid, iprange) );
The idea here is that in each row, either userid or iprange is NULL (but not both). Banning a single IP is possible because anonymous edits generate pseudo-accounts and thus a userid.
As you can see, I have also added 'expires'.
- New message notification needs to work for all users, not just signed
in ones. In Phase III we use a table that stores IP addresses of users who have new messages.
Yes, and in my model, IP addresses that have ever edited something have a userid.
- Currently talk pages and normal pages are "bundled" when watching: You
watch one, you also watch the other. You remove one, you remove the other. This behavior should not change.
There's nothing wrong with adding (and removing) two rows each time you watch/unwatch something.
- While you're at it, you may want to think about a better access rights
system than simple sysop/non-sysop. ACLs would probably work best.
Hm. I'm not very familiar with the concept of ACLs (Access Control Lists, I assume). What in particular can they do that userprops cannot?
As for the userprops, having a langid column seems to make the most sense.
OK.
Thanks, Timwi
Timwi-
Erik Moeller wrote:
- You'll need some capability for banning signed in users.
What's the point? They'd just log out.
For one thing, we ban offensive usernames. For another, we don't give sysops the ability to view IP addresses of signed in users, so they can't ban their IP because they don't know it.
The idea here is that in each row, either userid or iprange is NULL (but not both). Banning a single IP is possible because anonymous edits generate pseudo-accounts and thus a userid.
Works for me.
- While you're at it, you may want to think about a better access rights
system than simple sysop/non-sysop. ACLs would probably work best.
Hm. I'm not very familiar with the concept of ACLs (Access Control Lists, I assume). What in particular can they do that userprops cannot?
Well, the idea is that you can set permissions on a per page basis, and do something like
* cannot edit Hans can edit Heinz can edit
* can edit Hans cannot edit Heinz cannot edit
* can edit anon cannot edit
This might be useful for, e.g., allowing certain users to edit the Main Page without giving them sysop privileges. It would also allow use of your engine in other CMS contexts. Might be doable via the articleprops.
Regards,
Erik
Erik Moeller wrote:
Timwi-
What's the point? They'd just log out.
For one thing, we ban offensive usernames.
In this case, the question arises: Do we want the username to continue existing in a banned state? This would leave the offensive username in edit histories. Might we not actually want to delete the account (marking all its edits as from a 'Deleted User') and then make it impossible to re-register it?
For another, we don't give sysops the ability to view IP addresses of signed in users, so they can't ban their IP because they don't know it.
I didn't know that. I guess that makes it possible to keep creating new accounts to overcome any ban.
So, shouldn't there perhaps be a feature that would let sysops ban the IP of a signed-in user without actually displaying that IP?
- While you're at it, you may want to think about a better access rights
system than simple sysop/non-sysop. ACLs would probably work best.
Hm. I'm not very familiar with the concept of ACLs (Access Control Lists, I assume). What in particular can they do that userprops cannot?
Well, the idea is that you can set permissions on a per page basis, and do something like
That sounds like fun. It calls for a new table that describes relations between users and articles:
CREATE TABLE relations ( userid int unsigned NOT NULL, articleid int unsigned NOT NULL, type int unsigned NOT NULL );
Then for the 'type' column we would define readable constants within the source code. So, for example, type 1 means the given user can edit that article. 2 means he cannot. The default is, of course, given by an articleprop ("protected").
But then again, maybe it would be better design to actually think about the different actions you can do with an article (edit, move, delete, etc.) and defining permissions (incl. default permissions) based on that. I'll think about that more, but just now it's too late and I'm gong to bed ;-)
Good night, Timwi
Timwi wrote:
Now, for some reason, against all of your advice, I have started to program a Wiki, and by now it's already become a suitable basis for a Phase-IV Wikipedia software, including a database schema.
Acting against all advice is the hallmark of genius and madness. :-)
Should I continue with this?
Sure, but be aware that it's a really big deal to ask people to totally switch. We have a fair amount of built-up investment in what we're using, and while we should *of course* switch to something better, there's a lot to consider.
Do you have a public url for us to view the fruits of your labor?
Jimmy Wales wrote:
Timwi wrote:
Now, for some reason, against all of your advice, I have started to program a Wiki, and by now it's already become a suitable basis for a Phase-IV Wikipedia software, including a database schema.
Acting against all advice is the hallmark of genius and madness. :-)
That's... an interesting way to look at it :)
Should I continue with this?
Sure, but be aware that it's a really big deal to ask people to totally switch. We have a fair amount of built-up investment in what we're using, and while we should *of course* switch to something better, there's a lot to consider.
As I said before, I consider this an experiment and it is mainly for myself to learn how to make an own website (which I've never done before). If it's not going to be used, it's okay, it'll still have been fun to make it.
Do you have a public url for us to view the fruits of your labor?
Well, the only "fruit" I have so far is http://www.lionking.org/~timwi/t/wikipedia/wikipedia-iv.html, which I have mentioned earlier. (Notice that some of this may already be out of date again.) I don't have a server to install my actual software on, and besides, it can't do anything yet anyway (unfortunately I've been distracted quite a bit the past few days).
Greetings, Timwi
Timwi wrote:
Well, the only "fruit" I have so far is http://www.lionking.org/~timwi/t/wikipedia/wikipedia-iv.html, which I have mentioned earlier. (Notice that some of this may already be out of date again.) I don't have a server to install my actual software on, and besides, it can't do anything yet anyway (unfortunately I've been distracted quite a bit the past few days).
did you notice my msg about how to bring special:short and special:long pages back? It's a simple hack but I kinda think its the best way to do this kinda thing data like this should be cached to take work off the db
Lightning
Lightning wrote:
did you notice my msg about how to bring special:short and special:long
I have seen all three of them, and I would have replied to one of them if I had an answer to give you.
With my suggested database structure, this would be easier because you wouldn't need to add a new column to a table that already contains a universe of data.
Timwi
wikitech-l@lists.wikimedia.org