1. move everything to Postgres 2. move everything to common database, with tables foo_cur, foo_old etc., where foo are language names 3. make single user table (needs some tweaking to allow slightly different preferences), single logging system, single recent changes, and all other nice things we can do with that 4. move everything to UTF-8, so we don't have to use %escapes in English Wikipedia 5. create table interwiki (source_lang, target_lang, source_title, target_title) 6. convert all cur_text by removing interwiki links from page tops, and add apropriate entries to interwiki. 7. compute transitive symmetric closure of foo_interwiki, and display that as interwiki links. If there will be many articles of the same language in it display it like: "English (Astronomy), English (Astrophysics), German" In practive it shouldn't be such a big problem. 8. when editing add interwiki links on top of page or in separate box. add Javascript button to add all from links from transitive symmetric closure of interwiki 9. add some magic functionality that allows changing many interwiki links at time. It is needed as transitive symmetric closure often contains many copies of the same. "delete all interwiki links to this page" and "change all interwiki links from X to Y" would probably be enough.
Computing symmetric transitive closure will require a bit of magic. I think we should keep results in database and change only - editing may be a bit slower, as long as viewing is not any worse than now.
"Tomasz Wegrzanowski" skribis:
- move everything to Postgres
- move everything to common database, with tables foo_cur, foo_old etc., where foo are language names
[eo] Do por cxiu lingvo alia tabelo. Kial? Cxu ne atributo por la lingvo en komuna tabelo suficxus?
[en] So for each new language a new table. Why? Doesn't a language field in one article table would be enough?
Pauxlo
On Tue, Jan 07, 2003 at 11:40:10PM +0100, Paul Ebermann wrote:
[eo] Do por cxiu lingvo alia tabelo. Kial? Cxu ne atributo por la lingvo en komuna tabelo suficxus?
[en] So for each new language a new table. Why? Doesn't a language field in one article table would be enough?
It should be faster that way, because locking issues on English tables won't cause any problems for us.
And too many English articles won't cause selects to be any slower for other languages that way.
On mar, 2003-01-07 at 15:06, Tomasz Wegrzanowski wrote:
On Tue, Jan 07, 2003 at 11:40:10PM +0100, Paul Ebermann wrote:
So for each new language a new table. Why? Doesn't a language field in one article table would be enough?
It should be faster that way, because locking issues on English tables won't cause any problems for us.
And too many English articles won't cause selects to be any slower for other languages that way.
I think a better approach is to make the system work more smoothly so that neither the other languages _nor_ the English section have to suffer from the English section's large number of articles.
Separate tables in the same database won't be any better than separate databases on the same server, which presently causes problems because the English tables will get locked up and additional requests get backed up until the server hits its open connection limit -- then other languages can't connect either, and you just have to wait it out.
If we can iron out the locks, I believe a combined table system will be both more flexible and easier to program against.
-- brion vibber (brion @ pobox.com)
On Tue, Jan 07, 2003 at 03:58:35PM -0800, Brion Vibber wrote:
On mar, 2003-01-07 at 15:06, Tomasz Wegrzanowski wrote:
On Tue, Jan 07, 2003 at 11:40:10PM +0100, Paul Ebermann wrote:
So for each new language a new table. Why? Doesn't a language field in one article table would be enough?
It should be faster that way, because locking issues on English tables won't cause any problems for us.
And too many English articles won't cause selects to be any slower for other languages that way.
I think a better approach is to make the system work more smoothly so that neither the other languages _nor_ the English section have to suffer from the English section's large number of articles.
Separate tables in the same database won't be any better than separate databases on the same server, which presently causes problems because the English tables will get locked up and additional requests get backed up until the server hits its open connection limit -- then other languages can't connect either, and you just have to wait it out.
Separate tables in single database is completely equivalent to separate tables in separate databases (databases aren't real, tables are real). I simply don't see any point in changing that.
Some "fair" system would be nice, with "other" languages having higher priorities than English ...
If we can iron out the locks, I believe a combined table system will be both more flexible and easier to program against.
That would be nice, but we should rather try to make minimal number of changes necessary for given goal.
Having only 'interwiki' and 'users' common is enough, we can merge the rest later if it won't result in any performance problems.
On mar, 2003-01-07 at 16:08, Tomasz Wegrzanowski wrote:
Separate tables in single database is completely equivalent to separate tables in separate databases (databases aren't real, tables are real). I simply don't see any point in changing that.
That was my point -- your suggestion (separate tables, one db) will be no faster than the present situation (separate tables, separate dbs), and little faster than Paul's suggestion (combined tables, one db), as logjams on the English wiki will continue to affect both the many anglophone users and the users on other languages in all three cases.
Having only 'interwiki' and 'users' common is enough, we can merge the rest later if it won't result in any performance problems.
True enough.
-- brion vibber (brion @ pobox.com)
On Tue, Jan 07, 2003 at 04:36:32PM -0800, Brion Vibber wrote:
On mar, 2003-01-07 at 16:08, Tomasz Wegrzanowski wrote:
Separate tables in single database is completely equivalent to separate tables in separate databases (databases aren't real, tables are real). I simply don't see any point in changing that.
That was my point -- your suggestion (separate tables, one db) will be no faster than the present situation (separate tables, separate dbs), and little faster than Paul's suggestion (combined tables, one db), as logjams on the English wiki will continue to affect both the many anglophone users and the users on other languages in all three cases.
It wasn't about performance but about common interwiki and users.
Having script open 2 databases would required two connections, wouldn't it ? Now that would be slower.
On Die, 2003-01-07 at 19:07, Tomasz Wegrzanowski wrote:
- move everything to Postgres
- move everything to common database, with tables foo_cur, foo_old etc., where foo are language names
- make single user table (needs some tweaking to allow slightly different preferences), single logging system, single recent changes, and all other nice things we can do with that
- move everything to UTF-8, so we don't have to use %escapes in English Wikipedia
- create table interwiki (source_lang, target_lang, source_title, target_title)
- convert all cur_text by removing interwiki links from page tops, and add apropriate entries to interwiki.
- compute transitive symmetric closure of foo_interwiki, and display that as interwiki links. If there will be many articles of the same language in it display it like: "English (Astronomy), English (Astrophysics), German" In practive it shouldn't be such a big problem.
- when editing add interwiki links on top of page or in separate box. add Javascript button to add all from links from transitive symmetric closure of interwiki
- add some magic functionality that allows changing many interwiki links at time. It is needed as transitive symmetric closure often contains many copies of the same. "delete all interwiki links to this page" and "change all interwiki links from X to Y" would probably be enough.
Have you read my proposal, Tomasz? You seem to be saying the same thing, only that you want to do other things at the same time (move to Postgres, have more shared tables etc.).
Regards,
Erik
wikitech-l@lists.wikimedia.org