I played a bit with database.
Here is something that looks like Wikipedia database schema but compiles in Posgres (and no, Wikipedia script won't be able to use that yet):
CREATE TABLE user_table ( user_id serial NOT NULL PRIMARY KEY, user_name varchar(255) NOT NULL default '', user_rights varchar(255) NOT NULL default '', user_password varchar(255) NOT NULL default '', user_newpassword varchar(255) NOT NULL default '', user_email varchar(255) NOT NULL default '', user_options varchar(65535) NOT NULL default '', user_newtalk smallint NOT NULL default '1' );
CREATE TABLE cur ( cur_id serial NOT NULL PRIMARY KEY, cur_namespace smallint NOT NULL default '0', cur_title varchar(255) NOT NULL default '', cur_text text NOT NULL default '', cur_comment varchar(255) NOT NULL default '', cur_user int NOT NULL default '0', cur_user_text varchar(255) NOT NULL default '', cur_timestamp char(14) NOT NULL default '', cur_restrictions varchar(255) NOT NULL default '', cur_counter bigint NOT NULL default '0', cur_ind_title varchar(255) NOT NULL default '', cur_ind_text text NOT NULL default '', cur_is_redirect boolean NOT NULL default 'f', cur_minor_edit boolean NOT NULL default 'f', cur_is_new boolean NOT NULL default 'f' );
CREATE TABLE old_table ( old_id serial NOT NULL PRIMARY KEY, old_namespace smallint NOT NULL default '0', old_title varchar(255) NOT NULL default '', old_text text NOT NULL default '', old_comment varchar(255) NOT NULL default '', old_user int NOT NULL default '0', old_user_text varchar(255) NOT NULL, old_timestamp char(14) NOT NULL default '', old_minor_edit boolean NOT NULL default 'f', old_flags varchar(255) NOT NULL default '' );
CREATE TABLE archive ( ar_namespace smallint NOT NULL default '0', ar_title varchar(255) NOT NULL default '', ar_text text NOT NULL default '', ar_comment varchar(255) NOT NULL default '', ar_user int NOT NULL default '0', ar_user_text varchar(255) NOT NULL, ar_timestamp char(14) NOT NULL default '', ar_minor_edit boolean NOT NULL default 'f', ar_flags varchar(255) NOT NULL default '' );
CREATE TABLE links ( l_from varchar(255) NOT NULL default '', l_to int NOT NULL default '0' );
CREATE TABLE brokenlinks ( bl_from int NOT NULL default '0', bl_to varchar(255) NOT NULL default '' );
CREATE TABLE imagelinks ( il_from varchar(255) NOT NULL default '', il_to varchar(255) NOT NULL default '' );
CREATE TABLE site_stats ( ss_row_id int NOT NULL PRIMARY KEY, ss_total_views bigint default '0', ss_total_edits bigint default '0', ss_good_articles bigint default '0' );
CREATE TABLE ipblocks ( ipb_address varchar(40) NOT NULL default '', ipb_user int NOT NULL default '0', ipb_by int NOT NULL default '0', ipb_reason varchar(255) NOT NULL default '', ipb_timestamp char(14) NOT NULL default '' );
CREATE TABLE image ( img_name varchar(255) NOT NULL default '', img_size int NOT NULL default '0', img_description varchar(255) NOT NULL default '', img_user int NOT NULL default '0', img_user_text varchar(255) NOT NULL default '', img_timestamp char(14) NOT NULL default '' );
CREATE TABLE oldimage ( oi_name varchar(255) NOT NULL default '', oi_archive_name varchar(255) NOT NULL default '', oi_size int NOT NULL default 0, oi_description varchar(255) NOT NULL default '', oi_user int NOT NULL default '0', oi_user_text varchar(255) NOT NULL default '', oi_timestamp char(14) NOT NULL default '' );
CREATE TABLE random ( ra_current smallint NOT NULL default 0, ra_title varchar(255) NOT NULL default '' );
CREATE TABLE recentchanges ( rc_timestamp varchar(14) NOT NULL default '', rc_cur_time varchar(14) NOT NULL default '', rc_user int NOT NULL default '0', rc_user_text varchar(255) NOT NULL default '', rc_namespace smallint NOT NULL default '0', rc_title varchar(255) NOT NULL default '', rc_comment varchar(255) NOT NULL default '', rc_minor boolean NOT NULL default 'f', rc_bot boolean NOT NULL default 'f', rc_new boolean NOT NULL default 'f', rc_cur_id int NOT NULL default '0', rc_this_oldid int NOT NULL default '0', rc_last_oldid int NOT NULL default '0' );
CREATE TABLE watchlist ( wl_user int NOT NULL, wl_namespace smallint NOT NULL default '0', wl_title varchar(255) NOT NULL default '' );
CREATE TABLE math ( math_inputhash char(16) NOT NULL PRIMARY KEY, math_outputhash char(16) NOT NULL, math_conservative bool NOT NULL, math_html text NOT NULL );
Of course, this isn't "real" code, and many changes will be necessary to make it any useful. Anyway: * user and old are reserved keywords in Postgres * why are we using tinyblob(1) instead of bool for boolean data ? * why are we creating so many indexes ? are all of them necessary ? * shouldn't we use database-provided timestamps instead of varchar(14) ? * rights and restrictions don't seem to be represented very efficiently. using integer flag for that seems like a better idea to me * it might be good idea to consider making our own searching system instead of relying on something provided by database
On mar, 2003-01-07 at 14:40, Tomasz Wegrzanowski wrote:
I played a bit with database.
Here is something that looks like Wikipedia database schema but compiles in Posgres (and no, Wikipedia script won't be able to use that yet):
:)))) Yay!!!!!! I *like* it when suggestions come with actual attempts at preliminary code.
Of course, this isn't "real" code, and many changes will be necessary to make it any useful. Anyway:
- user and old are reserved keywords in Postgres
Grrr! Well, 'user' can become 'users' easily enough. ;)
I'm also considering whether 'cur' and 'old' should be split up differently, into 'pageinfo' (title, restrictions, counter, redirect) and 'revisions' (text, comment, date, user, minor). This would perhaps be less radical than Jonathan's suggestion of a raw title<->numeric id table, but should simplify operations that work on both current and old revisions. (May need a shortcut for grabbing the current revision, perhaps a key from the pageinfo.)
- why are we using tinyblob(1) instead of bool for boolean data ?
Good question.
- why are we creating so many indexes ? are all of them necessary ?
On cur and old, we need indexes on title (to look up individual pages), timestamp (for sorting), and user (for contribs). Fulltext indexes for searching on the special index fields. Namespace presumably helps along with title?
On link tables, we're keying to indexed fields, and have a *lot* of rows which we may search from either end (whatlinkshere, orphan checks, wantedpages, etc etc).
Indexes on ipblocks table are probably useless, as it's very small.
User table has index on user name; helps with sorting the big user list. (What about lookup of options on every page view by logged in user? Is this number or name?)
Size, name, and timestamp fields in the image table may be used for sorting the huge and growing list o' files.
- shouldn't we use database-provided timestamps instead of varchar(14) ?
As long as we know how to manipulate them programmatically and format them for the selected language and time zone...
- rights and restrictions don't seem to be represented very efficiently. using integer flag for that seems like a better idea to me
Maybe. It's not really a bottleneck, though, and legibility isn't something to be hated.
- it might be good idea to consider making our own searching system instead of relying on something provided by database
Hell, why rely on a database? Why not write our own storage system? And filesystem... and virtual memory... and process manager... :)
We shouldn't have to be a wheel factory; if a good, fast, flexible search mechanism is available in our database we should use it.
-- brion vibber (brion @ pobox.com)
On Tue, Jan 07, 2003 at 03:38:09PM -0800, Brion Vibber wrote:
On mar, 2003-01-07 at 14:40, Tomasz Wegrzanowski wrote:
- it might be good idea to consider making our own searching system instead of relying on something provided by database
Hell, why rely on a database? Why not write our own storage system? And filesystem... and virtual memory... and process manager... :)
We shouldn't have to be a wheel factory; if a good, fast, flexible search mechanism is available in our database we should use it.
Searching doesn't seem like something "general", it's almost always specific to given usage.
Examples of magic that would be nice to have (nothing urgent): 1. On Polish Wikipedia, if someone tries a query and no hits are generated, and it didn't contain any diactrics, try to search undiactricized version of whole database. Some people for some reason write ASCII Polish while on Internet. 2. Dictionary searches would also need more magic. See: http://wiktionary.org/wiki/Wiktionary:Alternative_vision_of_multilingual_dic...
Tomasz Wegrzanowski wrote:
On Tue, Jan 07, 2003 at 03:38:09PM -0800, Brion Vibber wrote:
On mar, 2003-01-07 at 14:40, Tomasz Wegrzanowski wrote:
- it might be good idea to consider making our own searching system instead of relying on something provided by database
Hell, why rely on a database? Why not write our own storage system? And filesystem... and virtual memory... and process manager... :)
We shouldn't have to be a wheel factory; if a good, fast, flexible search mechanism is available in our database we should use it.
Searching doesn't seem like something "general", it's almost always specific to given usage.
Examples of magic that would be nice to have (nothing urgent):
- On Polish Wikipedia, if someone tries a query and no hits are generated, and it didn't contain any diactrics, try to search undiactricized version of whole database. Some people for some reason write ASCII Polish while on Internet.
- Dictionary searches would also need more magic. See: http://wiktionary.org/wiki/Wiktionary:Alternative_vision_of_multilingual_dic...
Wikitech-l mailing list Wikitech-l@wikipedia.org http://www.wikipedia.org/mailman/listinfo/wikitech-l
Hi there,
I'm VERY new here. Just read Alternative_vision_of_multilingual_dictionary and I think these are 'wise words'. Think I agree with most things said there. It is indeed a very fundamental problem. :-) The more I think about it, the less I want to say about it. I am just wondering whether we can learn from Kohonen's Self-Organising-algorithm. Can we use neural networks (fuzzy logic, etc) at all in Wiki software? ? (In general, neural networks don't perform well on 'symbolic' data, I know that but maybe,.. with a lot of preprocessing,... gathering statistical data, link-counts, and such,.... maybe then, the software could be made as 'self-organising' as humans are? I've played around a bit with Self-Organising-Maps and pattern-recognition in 'raw' data. But I'm not too familiar with linguistics,... or the 'symbolic-approach'.
As Wikis evolve, we'll need more and more sub-structuring, and more and more 'protection'. (Like in Kohonen's SOM, where 'neighbourhood-space' decreases during self-organisation, more and more (groups of!) people will start using their OWN Wiki-site. Because Wiki-SOFTWARE does not reorganise itself, humans will have to do it
Someone here on this list familiar with Kohonen-SOM-stuff?
This is a 'tricky' discussion here, on this list.... very exciting yes, on the one hand, but I also think it is a bit of an 'endless discussion', it's a very fundamental problem.... :-)
Just keep adding more and more features to the software will not solve the problem. Better think about removing 'options for the user' than adding new ones. Very curious how Wiki will develop in the future.
Does anyone know if there are C or C++, CGI or standalone-server versions of WikiSource?
Thanks, Pieter Suurmond
On mar, 2003-01-07 at 19:28, Pieter Suurmond wrote: [a bunch of stuff beyond my ken] [...]
Does anyone know if there are C or C++, CGI or standalone-server versions of WikiSource?
There are quite a few different implementations of the wiki concept in various languages and with various backends, including some standalone servers. See the big list at: http://c2.com/cgi/wiki?WikiEngines
-- brion vibber (brion @ pobox.com)
Brion Vibber wrote:
On mar, 2003-01-07 at 19:28, Pieter Suurmond wrote: [a bunch of stuff beyond my ken] [...]
Does anyone know if there are C or C++, CGI or standalone-server versions of WikiSource?
There are quite a few different implementations of the wiki concept in various languages and with various backends, including some standalone servers. See the big list at: http://c2.com/cgi/wiki?WikiEngines
-- brion vibber (brion @ pobox.com)
Wikitech-l mailing list Wikitech-l@wikipedia.org http://www.wikipedia.org/mailman/listinfo/wikitech-l
Thanks Brion,
I've carefully inspected http://c2.com/cgi/wiki and I've read about a C version (which seems to be lost) but I cannot find it anywhere.
Another thing I want to mention: On Silicon Graphics (IRIX6.5) and Netscape 4.7x, pages on XX.wikipedia.org look scrambled,... something goes wrong with layers or so: http://nl.wikipedia.org/wiki/Kort_Overzicht is quite UNreadable.
Well, thanks anyway. I like Wikipedia, Pieter Suurmond
On mer, 2003-01-08 at 03:13, Pieter Suurmond wrote:
Another thing I want to mention: On Silicon Graphics (IRIX6.5) and Netscape 4.7x, pages on XX.wikipedia.org look scrambled,... something goes wrong with layers or so:
Netscape 4.x is notoriously bad at following standard HTML and CSS. If you log in you can disable the quickbar in your preferences, and you'll have an easier time of it.
-- brion vibber (brion @ pobox.com)
wikitech-l@lists.wikimedia.org