Hello to all!
I have posted here [1] a proposal about adding two fields in the interwiki table (API URL and DB name).
One of the goals is to simplify interwiki transclusion, but those field might be useful for other interwiki applications.
Could you please read it and let me know about your remarks and suggestions, on this list and/or on the talk page?
Thanks in advance
[1] http://www.mediawiki.org/wiki/User:Peter17/Reasonably_efficient_interwiki_tr...
-- Peter Potrowl http://www.mediawiki.org/wiki/User:Peter17
On Sat, Jun 12, 2010 at 10:34 AM, Peter17 peter017@gmail.com wrote:
Hello to all!
I have posted here [1] a proposal about adding two fields in the interwiki table (API URL and DB name).
One of the goals is to simplify interwiki transclusion, but those field might be useful for other interwiki applications.
Could you please read it and let me know about your remarks and suggestions, on this list and/or on the talk page?
Thanks in advance
[1] http://www.mediawiki.org/wiki/User:Peter17/Reasonably_efficient_interwiki_tr...
-- Peter Potrowl http://www.mediawiki.org/wiki/User:Peter17
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I like lists better, so I'll reply here :)
In general, I like your approach. I've been suggesting to add column(s) like these for awhile now. Looking over your ideas, I have a few suggestions that I think might simplify it a bit:
1) iw_trans - I don't think this needs to become more than a boolean like it is. If we allow transwiki inclusion, we'll have to use a DB or API connection. Since a DB connection will always be preferable to an HTTP request to the API, it would be safe to use the existence of a db name as an indicator to use it, else fall back to the API.
2) iw_dbname / iw_api - You could probably combine these into one column. It could store a value like "dbname=abc;api=http://foo.com/etc" which would be loaded and split when the Interwiki object is constructed.
I'm more interested in the methodologies used in the Interwiki class in general as opposed to the specific use case of IW transclusion. I think our interwiki data requests can be standardized using some of the groundwork you're laying here, and that's what I'd really like to see in the long run :)
-Chad
2010/6/13 Chad innocentkiller@gmail.com:
- iw_trans - I don't think this needs to become more than a
boolean like it is. If we allow transwiki inclusion, we'll have to use a DB or API connection. Since a DB connection will always be preferable to an HTTP request to the API, it would be safe to use the existence of a db name as an indicator to use it, else fall back to the API.
I agree with this: direct DB access is always better.
- iw_dbname / iw_api - You could probably combine these into
one column. It could store a value like "dbname=abc;api=http://foo.com/etc" which would be loaded and split when the Interwiki object is constructed.
I think this is ugly. There's plenty of other cases in which we're just using separate fields like we're supposed to. Although I don't foresee any problems with cramming this stuff into one field, I would prefer putting them in separate fields.
Roan Kattouw (Catrope)
Roan Kattouw wrote:
2010/6/13 Chad innocentkiller@gmail.com:
- iw_trans - I don't think this needs to become more than a
boolean like it is. If we allow transwiki inclusion, we'll have to use a DB or API connection. Since a DB connection will always be preferable to an HTTP request to the API, it would be safe to use the existence of a db name as an indicator to use it, else fall back to the API.
I agree with this: direct DB access is always better.
- iw_dbname / iw_api - You could probably combine these into
one column. It could store a value like "dbname=abc;api=http://foo.com/etc" which would be loaded and split when the Interwiki object is constructed.
I think this is ugly. There's plenty of other cases in which we're just using separate fields like we're supposed to. Although I don't foresee any problems with cramming this stuff into one field, I would prefer putting them in separate fields.
Roan Kattouw (Catrope)
Do we need *both* values? It could simply contain http://foo.com/etc (API) or mysql://localhost:3306/abc (dbname)
2010/6/13 Platonides Platonides@gmail.com:
Do we need *both* values? It could simply contain http://foo.com/etc (API) or mysql://localhost:3306/abc (dbname)
I don't like using one field for two different things like that, but besides that, it'd be nice to have the API URL around for other purposes as well, as Chad hinted to. So I think we should definitely keep the two separate.
Roan Kattouw (Catrope)
Roan Kattouw wrote:
2010/6/13 Platonides Platonides@gmail.com:
Do we need *both* values? It could simply contain http://foo.com/etc (API) or mysql://localhost:3306/abc (dbname)
I don't like using one field for two different things like that, but besides that, it'd be nice to have the API URL around for other purposes as well, as Chad hinted to. So I think we should definitely keep the two separate.
Roan Kattouw (Catrope)
Note that you will still need all those pieces: *The database name and wiki prefix. *Server and port is desirable, since it may not always be on the same database server as the wiki (in which case we could reuse the same slave list). *The server is mysql. Probably uncommon to have several wikis communicated directly, on different kind of servers, but someone will take use of it, and it'll be easy to add.
2010/6/14 Platonides Platonides@gmail.com:
Note that you will still need all those pieces: *The database name and wiki prefix. *Server and port is desirable, since it may not always be on the same database server as the wiki (in which case we could reuse the same slave list). *The server is mysql. Probably uncommon to have several wikis communicated directly, on different kind of servers, but someone will take use of it, and it'll be easy to add.
On a $wgConf setup like WMF's, which is what we're aiming at, all this data is stored in $wgConf and the wiki ID can be fed into some LoadBalancer function (don't remember its name offhand) to get a DB connection. So you just need the wiki ID on these setups.
Roan Kattouw (Catrope)
2010/6/14 Roan Kattouw wrote:
2010/6/14 Platonides Platonides@gmail.com:
Note that you will still need all those pieces: *The database name and wiki prefix. *Server and port is desirable, since it may not always be on the same database server as the wiki (in which case we could reuse the same slave list). *The server is mysql. Probably uncommon to have several wikis communicated directly, on different kind of servers, but someone will take use of it, and it'll be easy to add.
On a $wgConf setup like WMF's, which is what we're aiming at, all this data is stored in $wgConf and the wiki ID can be fed into some LoadBalancer function (don't remember its name offhand) to get a DB connection. So you just need the wiki ID on these setups.
Roan Kattouw (Catrope)
I disagree. Most setups are not $wgConf. We shouldn't be adding a column for a database which only works for $wgConf.
On Mon, Jun 14, 2010 at 10:54 AM, Platonides Platonides@gmail.com wrote:
2010/6/14 Roan Kattouw wrote:
On a $wgConf setup like WMF's, which is what we're aiming at, all this data is stored in $wgConf and the wiki ID can be fed into some LoadBalancer function (don't remember its name offhand) to get a DB connection. So you just need the wiki ID on these setups.
Roan Kattouw (Catrope)
I disagree. Most setups are not $wgConf. We shouldn't be adding a column for a database which only works for $wgConf.
Not to mention that $wgConf sucks, and the entire model should be scrapped...
-Chad
2010/6/14 Platonides Platonides@gmail.com:
I disagree. Most setups are not $wgConf. We shouldn't be adding a column for a database which only works for $wgConf.
Turns out I was mistaken: this is not driven by $wgConf, but by LoadBalancer_Multi, which can be configured through $wgLBFactoryConf [1]. Normally, configuring that would only really make sense in a $wgConf enviroment, but it doesn't depend on it AFAIK, and allows access to shared DBs and the like. $wgLBFactoryConf is way easier to set up than $wgConf, BTW.
Roan Kattouw (Catrope)
On Sat, Jun 12, 2010 at 10:34 AM, Peter17 peter017@gmail.com wrote:
Hello to all!
I have posted here [1] a proposal about adding two fields in the interwiki table (API URL and DB name).
One of the goals is to simplify interwiki transclusion, but those field might be useful for other interwiki applications.
It seems to me that if we want to modify the core to support interwiki integration, there are any number of core tables that could benefit from DB name fields. E.g., user_newtalk could have a DB name field, so that users could be informed which wiki(s) they have new messages on. Do we ultimately want to implement such capabilities in the core, though, or through extensions? Presumably some tables will be harder to share than others, so in the harder cases it will make more sense to have global tables, kinda like what CentralAuth sets up, unless we want to do a major revamping of the code.
Hi!
I somewhat didn't jump here, as we simply don't use interwiki table on WMF sites, so the topic was out of interest. :)
It seems to me that if we want to modify the core to support interwiki integration, there are any number of core tables that could benefit from DB name fields.
I personally don't like the "interwiki integration", as pretty much nearly everything has to go through one of these methods:
1. Pulling from all wikis 2. Pushing to all wikis 3. Having central backend
All these have their own nightmares, and separation was quite often preventing us from madness. CentralAuth has added its own share of inefficiencies that nobody has been working on yet. Having shared data between multiple systems isn't the easiest problem usually, and it needs more attention than a single-time feature deployment.
E.g., user_newtalk could have a DB name field, so that users could be informed which wiki(s) they have new messages on.
Nooooeeesss (you're suggesting 2 here... :)
in the harder cases it will make more sense to have global tables, kinda like what CentralAuth sets up, unless we want to do a major revamping of the code.
I have no idea what major revamping you have in mind, when it comes to data sharing.
Do note that we don't have any data consistency framework for cross-database publishing, so you will always end up with inconsistencies around, that are not guarded by transactions. For each feature that means building a conflict/consistency management.. :)
Domas
On Wed, Jun 16, 2010 at 4:33 AM, Domas Mituzas midom.lists@gmail.comwrote:
I personally don't like the "interwiki integration", as pretty much nearly everything has to go through one of these methods:
- Pulling from all wikis
- Pushing to all wikis
- Having central backend
All these have their own nightmares, and separation was quite often preventing us from madness. CentralAuth has added its own share of inefficiencies that nobody has been working on yet. Having shared data between multiple systems isn't the easiest problem usually, and it needs more attention than a single-time feature deployment.
I'm hoping that CentralAuth can either be made easier for non-WMF wiki owners to install/configure/use, or that another, less-WMF-specific, and easier-to-work-with extension (or core functionality!) can be developed to fulfill the functions it performs, and more. Do you have any thoughts on which of those three options above are preferable to use in which situations? I'm presently working on Special:InterwikiWatchlist and Special:InterwikiRecentChanges pages that use shared global integration_page, integration_recentchanges, integration_watchlist, etc. tables. These are just like the existing page, recentchanges, watchlist, etc. tables, but they also have a global primary key and a field for a database identifier. Each wiki is responsible for pushing data to those global tables via hook functions whenever pages are created, edited, watched, deleted, etc.
I have no idea what major revamping you have in mind, when it comes to data sharing.
Do note that we don't have any data consistency framework for cross-database publishing, so you will always end up with inconsistencies around, that are not guarded by transactions. For each feature that means building a conflict/consistency management.. :)
There seem to be three major paths. (1) Let each wiki query all the others when it needs interwiki data such as recent changes, (2) share tables, or (3) share global tables. I think option #1 leads to constantly needing to use foreach statements to go through all the wikis, or else doing massive JOINs. I'm not sure what the pros and cons of option #1 are, as far as efficiency is concerned. Option #2 works OK for stuff like the user table, put people have told me that it's hopeless trying to share tables like the page table. My original idea had been to add a global primary key and a database identifier to the page table and then share it, but when the idea was roundly panned, I ended up instead going with option #3 by sharing a new global integration_page table that looks basically like what the shared page table would have looked like. The only difference is, it's changed through hook functions rather than though functions in Article.php.
-Tisane
On Wed, Jun 16, 2010 at 12:07 AM, Tisane tisane2718@gmail.com wrote:
It seems to me that if we want to modify the core to support interwiki integration, there are any number of core tables that could benefit from DB name fields. E.g., user_newtalk could have a DB name field, so that users could be informed which wiki(s) they have new messages on.
Why? Their home wiki is already stored in CentralAuth, as well as all wikis they're already a member of. A db name column in user_newtalk would be pretty useless.
Do we ultimately want to implement such capabilities in the core, though, or through extensions? Presumably some tables will be harder to share than others, so in the harder cases it will make more sense to have global tables, kinda like what CentralAuth sets up, unless we want to do a major revamping of the code.
Absolutely it should be in core. Right now, each time an extension (or core) author wants to do something with an interwiki site, they usually reinvent the wheel every time. Having a centralized (CORE!) methodology of obtaining a remote DB connection or API request for interwikis would be a huge step in the right direction.
As Domas points out, WMF doesn't use the interwiki table. Data like API urls and DB connection info should be extended to the IW cache, so users of that can make use of this data as well.
On Wed, Jun 16, 2010 at 4:33 AM, Domas Mituzas midom.lists@gmail.com wrote:
Do note that we don't have any data consistency framework for cross-database publishing, so you will always end up with inconsistencies around, that are not guarded by transactions. For each feature that means building a conflict/consistency management.. :)
You're right. But centralizing this sort of thing makes long term planning for that sort of thing easier. And by putting it in core you get more eyes on it and hopefully more people caring :)
-Chad
You're right. But centralizing this sort of thing makes long term planning for that sort of thing easier. And by putting it in core you get more eyes on it and hopefully more people caring :)
Well, it doesn't matter where these things are, in core, or externally - in both cases people ignore issues filed about problems :)
(e.g. https://bugzilla.wikimedia.org/show_bug.cgi?id=23339 )
Domas
On Thu, Jun 17, 2010 at 7:48 AM, Domas Mituzas midom.lists@gmail.comwrote:
You're right. But centralizing this sort of thing makes long term planning for that sort of thing easier. And by putting it in core you get more eyes on it and hopefully more people caring :)
Well, it doesn't matter where these things are, in core, or externally
- in both cases people ignore issues filed about problems :)
(e.g. https://bugzilla.wikimedia.org/show_bug.cgi?id=23339 )
Domas
Don't worry, I just drew attention to that bug by voting for it, so I'm sure help is on the way.
If only voting actually did summon help :p
-Chad
On Jun 17, 2010 3:03 PM, "Tisane" tisane2718@gmail.com wrote:
On Thu, Jun 17, 2010 at 7:48 AM, Domas Mituzas midom.lists@gmail.comwrote:
You're right. But centralizing this sort of thing makes long term planning for that sort of...
Don't worry, I just drew attention to that bug by voting for it, so I'm sure help is on the way.
_______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.o...
On Thu, Jun 17, 2010 at 3:07 PM, Chad innocentkiller@gmail.com wrote:
If only voting actually did summon help :p
-Chad
Just cast the "Summon Developer" spell and direct the dev to attack a particular bug to the best of his ability. http://paizo.com/pathfinderRPG/prd/spells/summonMonster.html
-Tisane
On Wed, Jun 16, 2010 at 6:41 AM, Chad innocentkiller@gmail.com wrote:
Why? Their home wiki is already stored in CentralAuth, as well as all wikis they're already a member of. A db name column in user_newtalk would be pretty useless.
Not all wiki farms will want to use CentralAuth; the difficulty in working with it will likely make most wiki owners want to share the user table instead. In fact, Brion recommended as much for new wiki farms that don't have existing sets of users that need to be merged. If there is interest in revamping CentralAuth to make it easier for the common man to use, then maybe it will be unnecessary to develop a separate integration system for wikis that share the user table.
Absolutely it should be in core. Right now, each time an extension (or core) author wants to do something with an interwiki site, they usually reinvent the wheel every time. Having a centralized (CORE!) methodology of obtaining a remote DB connection or API request for interwikis would be a huge step in the right direction.
The thing about the core is that then you have to get consensus to do anything major (or else it might get reverted), whereas with extensions, you have more freedom. WMF and other wikis can always choose not to install an extension; but if a feature they don't want to use is proposed to be added to the core, people might view it as adding unnecessary complexity and potential for unforeseen bugs to crop up, without any benefit to offset the hassle. On the other hand, standardization can happen without features being put into the core; for instance, whose wiki doesn't use the ParserFunctions extension?
Maybe it's not so bad if people reinvent the wheel for awhile; it's better than letting a feature go unimplemented because people couldn't agree on a standard or didn't agree on whether they even wanted a certain feature. Eventually, if a feature gets popular enough, there can be a merging of frameworks. Also, the process of working on an extension gives the dev opportunities to change his mind halfway through and switch to a different implementation method without wreaking a lot of havoc, and wasting the code review that had to be invested in making sure the original implementation was OK (with extensions, code review just gets deferred until someone suggests it be implemented on WMF). I guess another possibility is creating a branch of the core that has interwiki integration capabilities, and then merging it in when it's done.
You're right. But centralizing this sort of thing makes long term planning for that sort of thing easier. And by putting it in core you get more eyes on it and hopefully more people caring :)
People care about the extensions that run on WMF sites, right? For purposes of drawing eyes to it, that's almost as good as putting it into the core. Don't get me wrong, I think it should go into the core, as long as it's a good implementation.
-Tisane
I try to sum up the discussion and reply to some arguments:
2010/6/13 Chad innocentkiller@gmail.com:
- iw_trans - I don't think this needs to become more than a
boolean like it is. If we allow transwiki inclusion, we'll have to use a DB or API connection. Since a DB connection will always be preferable to an HTTP request to the API, it would be safe to use the existence of a db name as an indicator to use it, else fall back to the API.
I can keep iw_trans as a boolean, but I thought it would be a good idea to use it as a "selector" * 0 -> no interwiki transclusion * 1 -> transclusion by API * 2 -> transclusion by direct DB access through LBFactory
as it is possible to add new kinds of transclusion later (eg. 3 -> transclusion by DB file access, in case of a SQLite DB, etc.)
2010/6/13 Chad innocentkiller@gmail.com:
- iw_dbname / iw_api - You could probably combine these into
one column. It could store a value like "dbname=abc;api=http://foo.com/etc" which would be loaded and split when the Interwiki object is constructed.
I agree that it is possible, but I don't see the advantages of doing so...
2010/6/14 Platonides Platonides@gmail.com:
Do we need *both* values? It could simply contain http://foo.com/etc (API) or mysql://localhost:3306/abc (dbname)
I don't need both values for the function I'm writing, but as Chad said, he suggested to add this kind of fields some time ago, so, I suppose both fields can be useful for different purposes.
By the way, this is related to https://bugzilla.wikimedia.org/show_bug.cgi?id=20646
2010/6/16 Domas Mituzas midom.lists@gmail.com:
I somewhat didn't jump here, as we simply don't use interwiki table on WMF sites, so the topic was out of interest. :)
If we want to enable interwiki transclusion on WMF wikis using the code I'm writing, we'll need to use the interwiki table on those wikis... And we'll need to start a discussion about the interwiki prefixes to use.
-- Peter Potrowl http://www.mediawiki.org/wiki/User:Peter17
2010/6/19 Peter17 peter017@gmail.com:
If we want to enable interwiki transclusion on WMF wikis using the code I'm writing, we'll need to use the interwiki table on those wikis...
Not necessarily: we could simply add these fields to the interwiki cache as well.
And we'll need to start a discussion about the interwiki prefixes to use.
Yeah there's potentially a bit of an issue with chained interwiki prefixes like [[de:wikt:]] , although that could hopefully be resolved by grabbing the API URL for de: and asking that what the API URL for wikt: is, or by looking at the interwiki table in de: 's database for wikt: , or somehow looking at de: 's interwiki cache for wikt: (can this be done?)
Roan Kattouw (Catrope)
wikitech-l@lists.wikimedia.org