Reasonably efficient interwiki transclusion

List overview All Threads
Download

newer

older

Re: [Wikitech-l] [Foundation-l]...

Pending Changes (nee Flagged...

Peter17

24 May 2010 24 May '10

4:44 p.m.

Hello to all! I'm a French student and I am participating the Google Summer of Code this year on Mediawiki! My mentor is Roan Kattouw (Catrope) and my subject is "Reasonably efficient interwiki transclusion". You can see my application page here: [1]. I have already discussed with my mentor and we have prepared together a draft about my project: [2]. It sums up the current situation and includes some proposals. It is now open for comments, so, could you please read it and let me know about your remarks and suggestions, on this list and/or on the talk page? Thanks in advance [1] http://www.mediawiki.org/wiki/User:Peter17/GSoc_2010 [2] http://www.mediawiki.org/wiki/User:Peter17/Reasonably_efficient_interwiki_t… -- Peter Potrowl http://www.mediawiki.org/wiki/User:Peter17

Show replies by thread

Alex Brollo

24 May 24 May

5 p.m.

2010/5/24 Peter17 <peter017(a)gmail.com>

...

Thanks Peter! I'll follow your project with lots of interest. Nevertheless, a suggestion: take into account, from beginning, Labeled Section Transclusion! It's a mostly interesting extension, with lots of possible uses, but - unluckly and wrongly - it's only seen as a "wikisource tool" :-( . Obviosly you know that recently a template Iwpage, working in wikisource, does a limited interwiki transclusion. Alex

Amir E. Aharoni

5:18 p.m.

On Mon, May 24, 2010 at 17:44, Peter17 <peter017(a)gmail.com> wrote:

...

The title of the subject is a bit confusing. "Interwiki", for better or worse, refers to interlanguage links. Consider changing it to "cross-wiki" or something. -- אָמִיר אֱלִישָׁע אַהֲרוֹנִי Amir Elisha Aharoni http://aharoni.wordpress.com "We're living in pieces, I want to live in peace." - T. Moore

Chad

5:48 p.m.

On Mon, May 24, 2010 at 11:18 AM, Amir E. Aharoni <amir.aharoni(a)mail.huji.ac.il> wrote:

...

On Mon, May 24, 2010 at 17:44, Peter17 <peter017(a)gmail.com> wrote:

The title of the subject is a bit confusing. "Interwiki", for better or worse, refers to interlanguage links. Consider changing it to "cross-wiki" or something.

No it doesn't. Interwiki links don't have to be interlanguage links. Interlanguage links are a subset of interwiki links... those that happen to also be language codes. -Chad

Amir E. Aharoni

6:01 p.m.

On Mon, May 24, 2010 at 18:48, Chad <innocentkiller(a)gmail.com> wrote:

...

On Mon, May 24, 2010 at 11:18 AM, Amir E. Aharoni <amir.aharoni(a)mail.huji.ac.il> wrote:

The title of the subject is a bit confusing. "Interwiki", for better or worse, refers to interlanguage links.

No it doesn't. Interwiki links don't have to be interlanguage links. Interlanguage links are a subset of interwiki links... those that happen to also be language codes.

You are right, but that's why i wrote "for better or worse": I'd gladly call them "interlanguage", but very frequently people say "interwiki" and mean "interlanguage". Consider the name of http://meta.wikimedia.org/wiki/Pywikipediabot/interwiki.py , for example. -- אָמִיר אֱלִישָׁע אַהֲרוֹנִי Amir Elisha Aharoni http://aharoni.wordpress.com "We're living in pieces, I want to live in peace." - T. Moore

Platonides

7:14 p.m.

http://www.mediawiki.org/wiki/User:Peter17/Reasonably_efficient_interwiki_t… Seems it doesn't work so well. It was inadvertedly broken for wikitext transclusions when the interwiki points to the nice url. See 'wgEnableScaryTranscluding and Templates/Images?' thread at mediawiki-l

church.of.emacs.ml

25 May 25 May

1:42 a.m.

Hi, On 05/24/2010 04:44 PM, Peter17 wrote:

...

It is now open for comments, so, could you please read it and let me know about your remarks and suggestions, on this list and/or on the talk page?

first of all, I let me tell you that I'm really excited about this project. It may very well revolutionize the way we organize templates on Wikimedia and also other wiki farms. Some notes: 1. You propose a shared database. If I interpret this correctly, it only works inside a wiki set on the same server farm and doesn't include external wikis. For example, English Wikipedia could transclude templates from Meta Wiki, but not from Wikia. In contrast, $wgForeignFileRepos works for external Wikis (which is much better). 2. Parsing the wikitext at the home wiki makes it more difficult to use site magic words, e.g. {{CONTENTLANGUAGE}}. You'd have to pass one each and everyone as a template parameter (e.g. {{homewiki::templatename|lang={{CONTENTLANGUAGE}}}}) Kind regards, --Church of emacs

Chad

1:50 a.m.

On Mon, May 24, 2010 at 7:42 PM, church.of.emacs.ml <church.of.emacs.ml(a)googlemail.com> wrote:

...

Hi, On 05/24/2010 04:44 PM, Peter17 wrote:

It is now open for comments, so, could you please read it and let me know about your remarks and suggestions, on this list and/or on the talk page?

If it's done right, you should be able to put various backends on it just like the FileRepo code. Bug 20646 is a good start to something like this I think. Being able to store API urls or database connection info inside a iw_meta field would be awesome for this (and has lots of other applications as well). -Chad

2:27 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 5/24/2010 6:42 PM, church.of.emacs.ml wrote:

...

Hi, On 05/24/2010 04:44 PM, Peter17 wrote:

It is now open for comments, so, could you please read it and let me know about your remarks and suggestions, on this list and/or on the talk page?

Some notes: 1. You propose a shared database. If I interpret this correctly, it only

I would have to suggest to not go the shared database route unless the code can be fixed so that shared databases actually work with all of the DB backends.

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBCAAGBQJL+xl+AAoJEL+AqFCTAyc2MtIH/2jOIcy/G3saFoQSYYOwrQHP fYfbD6wYWO765cqgBh9Jb+j8RieVMJLDY74+ZN9dAXC1f99AcL7bNoNy6rtiHuVP e1Exc8lhJ4DgyqDGPEJ5xew8PmHxl2WiyLwvRUVB4Z3O6hOKiBHXaviXNzQ61WaJ IYcrsDoWddwe2NGx/esHmOj/yi8aYPeIwsVCBRjshlOKd29ARtvmzCaVzMP9nPRk IQz3HHLfkrDC+bkonHdvghNLSR9kfpuf0w495YRd0PWUQtkg6fY5QShaJJAinHjD XK0k4lrVHWKGz/u9r/6YtqFtlYJ/p1ZOuZKIy00wgsev1aEl7ayiNi/c3xdmCrw= =7Ham -----END PGP SIGNATURE-----

Aryeh Gregor

2:54 a.m.

On Mon, May 24, 2010 at 8:27 PM, Q <overlordq(a)gmail.com> wrote:

...

I would have to suggest to not go the shared database route unless the code can be fixed so that shared databases actually work with all of the DB backends.

I don't see why it shouldn't be easy to get it working with all DB backends. But in any case, for Wikimedia use, a shared database backend is pretty much a must. Having the application servers make HTTP requests to each other to retrieve templates rather than accessing the database directly is just silly, and is going to perform badly. Ideally the code should generalize to work with external wikis too, so that third parties can benefit from our templates as they do from our images. Maybe someday, a copy-pasted Wikipedia article will actually work . . . I can dream.

Peter17

1:41 p.m.

2010/5/25 Platonides <Platonides@gmail.com>:>

...

Seems it doesn't work so well. It was inadvertedly broken for wikitext transclusions when the interwiki points to the nice url. See 'wgEnableScaryTranscluding and Templates/Images?' thread at mediawiki-l

Well, in my tests, images are well included because I enabled $wgUseInstantCommons. As I wrote, "the parameters are totally ignored": they are indeed not substituted. 2010/5/25 Chad <innocentkiller(a)gmail.com>om>:

...

On Mon, May 24, 2010 at 7:42 PM, church.of.emacs.ml <church.of.emacs.ml(a)googlemail.com> wrote:

1. You propose a shared database. If I interpret this correctly, it only works inside a wiki set on the same server farm and doesn't include external wikis. For example, English Wikipedia could transclude templates from Meta Wiki, but not from Wikia. In contrast, $wgForeignFileRepos works for external Wikis (which is much better).

Yes. The shared database would be only for invalidating the cache when a template is edited. In my 3rd (preferred) solution, the templates are still fetched through the API. External wikis can transclude them and cache them for an arbitrary time, as ForeignAPIRepo does. 2010/5/25 church.of.emacs.ml <church.of.emacs.ml(a)googlemail.com>om>:

...

2. Parsing the wikitext at the home wiki makes it more difficult to use site magic words, e.g. {{CONTENTLANGUAGE}}. You'd have to pass one each and everyone as a template parameter (e.g. {{homewiki::templatename|lang={{CONTENTLANGUAGE}}}})

Ok, I will keep this in mind. Parsing the template on the home wiki seems necessary because it can use other templates hosted on that wiki to render correctly... I think it is the most logical way to do, isn't it? 2010/5/25 Aryeh Gregor <Simetrical+wikilist(a)gmail.com>om>:

...

On Mon, May 24, 2010 at 8:27 PM, Q <overlordq(a)gmail.com> wrote:

I would have to suggest to not go the shared database route unless the code can be fixed so that shared databases actually work with all of the DB backends.

Mmmh.... sorry, I'm not really sure I understand... My suggestion is to use a shared database that would store the remote calls, not the content of the pages... In my mind, fetching the distant pages would be done through the API, not by accessing directly the distant database. The external wikis will soon be able to access our images very easily with wgUseInstantCommons but it is still not an access to the database... Thanks for your remarks. About the question from Alex about transcluding sections: is it possible to request only a section through the API? I searched about this but didn't find :( -- Peter Potrowl http://www.mediawiki.org/wiki/User:Peter17

Alex Brollo

2:18 p.m.

...

About the question from Alex about transcluding sections: is it possible to request only a section through the API? I searched about this but didn't find :( -- Peter Potrowl

Ask ThomasV, #lst is particularly cared by him, to deepest level of knowledge! I guess he met too your same troubles. Alex

Chad

2:25 p.m.

On Tue, May 25, 2010 at 7:41 AM, Peter17 <peter017(a)gmail.com> wrote:

...

That's not scalable on Wikimedia sites. Making external HTTP requests to other wiki's APIs just isn't fast enough; you must use the database for remote wiki information in the WMF. I suggest taking a deeper look at how the FileRepo does things. The abstract class FileRepo handles the high-level stuff while LocalFile, ForeignDBRepo and ForeignAPIRepo handle the specific implementations for things like getting thumbnails or metadata. -Chad

church.of.emacs.ml

3:11 p.m.

On 05/25/2010 01:41 PM, Peter17 wrote:

...

2010/5/25 church.of.emacs.ml <church.of.emacs.ml(a)googlemail.com>om>:

Yes. When I think about this a bit more, it makes sense to parse on the home wiki, because otherwise (a) you couldn't include other remote templates or (b) you would need one API call per included template. Both not feasible. However, you'd have to worry that each distant wiki uses only a fair amount of the home wiki server's resources. E.g. set a limit of inclusions (that limit would have to be on the home-wiki-server-side) and disallow infinite loops (they're always fun). What do you propose for linking? If a template on the home wiki links to [[Foobar]], should that be an interwiki link to [[homewiki:Foobar]], or a local link in the distant wiki? In any case, there should be a way of differentiating home-wiki and distant-wiki references (links, inclusions). On 05/25/2010 02:25 PM, Chad wrote:

...

Do I understand this correctly... you can either access a foreign repository via the API (if you're on another server) or directly via the database (if you're on the same wiki farm)? Very cool stuff. Regards, Church of emacs

Platonides

3:22 p.m.

Aryeh Gregor wrote:

...

On Mon, May 24, 2010 at 8:27 PM, Q <overlordq(a)gmail.com> wrote:

I would have to suggest to not go the shared database route unless the code can be fixed so that shared databases actually work with all of the DB backends.

He can internally call the api from the other wiki via FauxRequest.

...

Ideally the code should generalize to work with external wikis too, so that third parties can benefit from our templates as they do from our images. Maybe someday, a copy-pasted Wikipedia article will actually work . . . I can dream.

I'm afraid that it will produce the opposite. A third party downloads a xml dump for offline use but it doesn't work because it needs a dozen templates from meta (in the worst case, templates from a dozen other wikis).

Platonides

3:30 p.m.

church.of.emacs.ml wrote:

...

However, you'd have to worry that each distant wiki uses only a fair amount of the home wiki server's resources. E.g. set a limit of inclusions (that limit would have to be on the home-wiki-server-side) and disallow infinite loops (they're always fun).

Infinite loops could only happen if both wikis can fetch from the other one. A simple solution would be to pass with the query who requested it originally. If the home wiki calls a different wiki, it would blame the one who asked for it (or maybe building a wiki + template path).

...

What do you propose for linking? If a template on the home wiki links to [[Foobar]], should that be an interwiki link to [[homewiki:Foobar]], or a local link in the distant wiki? In any case, there should be a way of differentiating home-wiki and distant-wiki references (links, inclusions).

The link itself could be generated partly with local data and partly with remote data, eg. a remote template containing "[[[Flag of {{{city}}}]]" called with a city parameter.

Tei

4:17 p.m.

On 25 May 2010 15:30, Platonides <Platonides(a)gmail.com> wrote:

...

church.of.emacs.ml wrote:

or request can have something like a deep counter, to stop request that need more than N iterations. So if you get a request with deep > 20, you can ignore that request. This don't stop a evil wiki passing a false deep level, but the idea of interwiki is a network built on top of the www of wikis you trusth, so you will not add a evil wiki -- -- ℱin del ℳensaje.

Aryeh Gregor

8:11 p.m.

(I'm going to use "local wiki" here for what Peter is calling "distant wiki", and "foreign wiki" for what he's calling "home wiki". This seems to better match the terminology we use for Commons.) On Tue, May 25, 2010 at 7:41 AM, Peter17 <peter017(a)gmail.com> wrote:

...

I think parsing the template on the local wiki is better, because it gives you more flexibility. For instance, it can use local {{SITENAME}} and so forth. {{CONTENTLANG}} would be especially useful, if we're assuming that templates will be transcluded to many languages. This doesn't mean that it has to use the local wiki's templates. There would be two ways to approach this: 1) Just don't use the local wiki's templates. Any template calls from the foreign wiki's template should go to the foreign wiki, not the local wiki. If this is being done over the API, then as an optimization, you could have the foreign wiki send back all templates that will be required, not just the actual template requested. 2) Use the local wiki's templates, and assume that the template on the foreign wiki is designed to be used remotely and will only call local templates when it's really desired. This gives even more flexibility if the foreign template is designed for this use, but it makes it harder to use templates that aren't designed for foreign use. At first glance, it seems to me that (1) is the best -- do all parsing on the local wiki, but use templates from the foreign wiki. This will cause errors if the local wiki doesn't have necessary extensions installed, like ParserFunctions, but it gives more flexibility overall. Another issue here is performance. Parsing is one of the most expensive operations MediaWiki does. Nobody's going to care much if foreign sites request a bunch of templates that can be served out of Squid, but if there are lots of foreign sites that are requesting giant infoboxes and those have to be parsed by Wikimedia servers, Domas is going to come along with an axe pretty soon and everyone's sites will break. Better to head that off at the pass.

...

What you're proposing is that Wikimedia servers do this on a cache miss: 1) An application server sends an HTTP request to a Squid with If-Modified-Since. 2) The Squid checks its cache, finds it's a miss, and passes the request to another Squid. 3) The other Squid checks its cache, finds it's a miss, and passes the request to a second application server. 4) The second application server loads up the MediaWiki API and sends a request to a database server. 5) The database server returns the result to the second application server. 6) The second application server returns the results to the Squids, which cache it and return it to the first application server. 7) The first application server caches the result in the database. What I'm proposing is that they do this: 1) An application server sends a database query to a database server (maybe even using an already-open connection). 2) The database server returns the result. Having Wikimedia servers send HTTP requests to each other instead of just doing database queries does not sound like a great idea to me. You're hitting several extra servers for no reason, including extra requests to an application server. On top of that, you're caching stuff in the database which is already *in* the database! FileRepo does this the Right Way, and you should definitely look at how that works. It uses polymorphism to use the database if possible, else the API. However, someone like Tim Starling should be consulted for a definitive performance assessment; don't rely on my word alone. On Tue, May 25, 2010 at 9:11 AM, church.of.emacs.ml <church.of.emacs.ml(a)googlemail.com> wrote:

...

Just have it return all needed templates at once if you want to minimize round-trips.

...

This is probably not enough. I really doubt Wikimedia is going to let a sizable fraction of its CPU time go to foreign template use. Serving images or plain old wikitext from Squid cache is very cheap, so that's not a big deal, but large-scale parsing will be too much, I suspect. (But again, ask Tim about this.)

...

Do I understand this correctly... you can either access a foreign repository via the API (if you're on another server) or directly via the database (if you're on the same wiki farm)? Very cool stuff.

Yes, that's how FileRepo works. On Tue, May 25, 2010 at 9:22 AM, Platonides <Platonides(a)gmail.com> wrote:

...

He can internally call the api from the other wiki via FauxRequest.

How will that interact with different configuration settings? I thought FauxRequest only handles requests to the current wiki.

...

My point is that ideally, you'd be able to copy-paste enwiki pages and then get the templates to work by configuring them to be fetched from enwiki. Even more ideally, you might want to fetch the enwiki templates as of the point in time your page was downloaded, in case the templates changed syntax (and also to allow indefinite caching). But I guess that's much better handled by just using a proper export, and having the templates included in that, so never mind. On Tue, May 25, 2010 at 9:30 AM, Platonides <Platonides(a)gmail.com> wrote:

...

An even simpler solution would be to only set up one wiki to allow this kind of foreign template request, the way Commons is set up now. But that might be limiting.

Roan Kattouw

8:58 p.m.

2010/5/25 Aryeh Gregor <Simetrical+wikilist(a)gmail.com>om>:

...

Having Wikimedia servers send HTTP requests to each other instead of just doing database queries does not sound like a great idea to me. You're hitting several extra servers for no reason, including extra requests to an application server. On top of that, you're caching stuff in the database which is already *in* the database! FileRepo does this the Right Way, and you should definitely look at how that works. It uses polymorphism to use the database if possible, else the API. However, someone like Tim Starling should be consulted for a definitive performance assessment; don't rely on my word alone.

This is true if, indeed, all parsing is done on the distant wiki. However, if parsing is done on the home wiki, you're not simply requesting data that's ready-baked in the DB and API calls make sense. I'm also not convinced this would be a huge performance problem because it'd only be done on parse (thanks to parser cache), but like you I trust Tim's verdict more than mine. Unlike Platonides suggested, you cannot use FauxRequest to do cross-wiki API requests. To the point of whether parsing on the on the distant wiki makes more sense: I guess there are points to be made both ways. I originally subscribed to the idea of parsing on the home wiki so expanding the same template with the same arguments would always result in the same (preprocessed) wikitext, but I do see how parsing on the local wiki would help for stuff like {{SITENAME}} and {{CONTENTLANG}}. Roan Kattouw (Catrope)

Aryeh Gregor

9:26 p.m.

On Tue, May 25, 2010 at 2:58 PM, Roan Kattouw <roan.kattouw(a)gmail.com> wrote:

...

That's true -- if parsing is done on the foreign wiki, then you'd have to do API calls or something, not read from the DB. Another reason to avoid that. :)

...

I'm also not convinced this would be a huge performance problem because it'd only be done on parse (thanks to parser cache), but like you I trust Tim's verdict more than mine.

Templates will often miss the parser cache, because different invocations will use different parameters. Even *with* the parser cache, parsing is *still* one of the most expensive operations Wikimedia does, so I'm not so sanguine.

Roan Kattouw

9:48 p.m.

2010/5/25 Aryeh Gregor <Simetrical+wikilist(a)gmail.com>om>:

...

I wasn't talking about the templates themselves hitting the parser cache, but about the pages that use them. Of course the number of pages using interwiki transclusion over time plus the edit rate of those pages could grow to become a problem. Also note that you wouldn't technically be parsing, just preprocessing on the home wiki, which is certain to be less expensive (how much less I don't know), and that you'd be doing this on some wiki anyway, so only the overhead involved in HTTP, the API and initializing the parser is relevant; the actual cost of the operation is not, because you're doing it someplace either way (of course this only applies intra-WMF, not to external clients). Roan Kattouw (Catrope)

Marco Schuster

10:09 p.m.

On Tue, May 25, 2010 at 8:58 PM, Roan Kattouw <roan.kattouw(a)gmail.com>wrote;wrote:

...

To the point of whether parsing on the on the distant wiki makes more sense: I guess there are points to be made both ways. I originally subscribed to the idea of parsing on the home wiki so expanding the same template with the same arguments would always result in the same (preprocessed) wikitext, but I do see how parsing on the local wiki would help for stuff like {{SITENAME}} and {{CONTENTLANG}}.

Why not mix it? Take other templates etc. from the source wiki and set magic stuff like time / contentlang to target wiki values. Marco -- VMSoft GbR Nabburger Str. 15 81737 München Geschäftsführer: Marco Schuster, Volker Hemmert http://vmsoft-gbr.de

Aryeh Gregor

11:03 p.m.

On Tue, May 25, 2010 at 3:48 PM, Roan Kattouw <roan.kattouw(a)gmail.com> wrote:

...

Also note that you wouldn't technically be parsing, just preprocessing on the home wiki, which is certain to be less expensive (how much less I don't know)

This is a good point.

...

and that you'd be doing this on some wiki anyway, so only the overhead involved in HTTP, the API and initializing the parser is relevant; the actual cost of the operation is not, because you're doing it someplace either way (of course this only applies intra-WMF, not to external clients).

External clients are what I'm worried about. It's a nonissue for intra-Wikimedia use, but if external clients start using a lot of CPU by using Wikimedia servers for parsing, I expect them to get shut down, and no one wants that. On Tue, May 25, 2010 at 4:09 PM, Marco Schuster <marco(a)harddisk.is-a-geek.org> wrote:

...

Why not mix it? Take other templates etc. from the source wiki and set magic stuff like time / contentlang to target wiki values.

That's what I suggested, basically. Use templates from the foreign wiki, but do the actual parsing locally, so you get local values for magic words and so on.

Platonides

11:50 p.m.

Aryeh Gregor wrote:

...

There are imho fewer variables set by the caller wiki, which could be passed with the query.

...

This doesn't mean that it has to use the local wiki's templates. There would be two ways to approach this: 1) Just don't use the local wiki's templates. Any template calls from the foreign wiki's template should go to the foreign wiki, not the local wiki. If this is being done over the API, then as an optimization, you could have the foreign wiki send back all templates that will be required, not just the actual template requested. 2) Use the local wiki's templates, and assume that the template on the foreign wiki is designed to be used remotely and will only call local templates when it's really desired. This gives even more flexibility if the foreign template is designed for this use, but it makes it harder to use templates that aren't designed for foreign use. At first glance, it seems to me that (1) is the best -- do all parsing on the local wiki, but use templates from the foreign wiki. This will cause errors if the local wiki doesn't have necessary extensions installed, like ParserFunctions, but it gives more flexibility overall.

Using 1 you could still allow calling the local template by using {{msg:xyz}}

...

Another issue here is performance. Parsing is one of the most expensive operations MediaWiki does. Nobody's going to care much if foreign sites request a bunch of templates that can be served out of Squid, but if there are lots of foreign sites that are requesting giant infoboxes and those have to be parsed by Wikimedia servers, Domas is going to come along with an axe pretty soon and everyone's sites will break. Better to head that off at the pass.

Probably time to revive the native preprocessor project. We may want to have both ways implemented, with one falling back on the other.

...

For intra-Wikimedia query, they could directly ask an apache. They can even send the query to localhost. Using the api seems the completely right approach for remote users, it can be later refined to add more backends. Anyway, I don't think api request would be cacheable by squids, so it would be directly passed to an application server.

...

On Tue, May 25, 2010 at 9:22 AM, Platonides <Platonides(a)gmail.com> wrote:

He can internally call the api from the other wiki via FauxRequest.

How will that interact with different configuration settings? I thought FauxRequest only handles requests to the current wiki.

Mmh, right. And we are too based on globals to have two mediawiki instances running in the same php environment :(

...

They would need to prepend the interwiki to all template incantations. That sounds like an option to import templates from foreign wiki first time it is used, being automatically updated as long it's not modified locally (skipping the need of the interwiki on the template, but then it'd conflict with local templates).

...

But I guess that's much better handled by just using a proper export, and having the templates included in that, so never mind.

Yes. Perhaps they could have a Special:ImportFromRemote to do one-click imports.

...

On Tue, May 25, 2010 at 9:30 AM, Platonides wrote:

An even simpler solution would be to only set up one wiki to allow this kind of foreign template request, the way Commons is set up now. But that might be limiting.

That's how I'd deploy it. But the code should be robust enough to handle the infinite loops that Peter presents.

Chad

11:53 p.m.

On Tue, May 25, 2010 at 5:50 PM, Platonides <Platonides(a)gmail.com> wrote:

...

But I guess that's much better handled by just using a proper export, and having the templates included in that, so never mind.

Yes. Perhaps they could have a Special:ImportFromRemote to do one-click imports.

And this is different from interwiki imports via the normal Special:Import how? -Chad

Aryeh Gregor

26 May 26 May

12:16 a.m.

On Tue, May 25, 2010 at 5:50 PM, Platonides <Platonides(a)gmail.com> wrote:

...

There are imho fewer variables set by the caller wiki, which could be passed with the query.

I don't get what you're saying here.

...

That's even worse. At least if it's cacheable, you have a *chance* of not hitting an Apache or the DB.

...

That's how I'd deploy it. But the code should be robust enough to handle the infinite loops that Peter presents.

I don't object to that, but I don't think it's essential.

Jim Tittsler

12:20 a.m.

On 2010-05-25 23:41, Peter17 wrote:

...

2010/5/25 Platonides <Platonides@gmail.com>:>

> Seems it doesn't work so well. It was inadvertedly broken for wikitext > transclusions when the interwiki points to the nice url. See > 'wgEnableScaryTranscluding and Templates/Images?' thread at mediawiki-l >

Well, in my tests, images are well included because I enabled $wgUseInstantCommons. As I wrote, "the parameters are totally ignored": they are indeed not substituted.

I found it a little surprising that $wgUploadPath needed to be an absolute path for this to work. I had imagined that as part of the transclusion the img URLs would have been transformed into the necessary remote wiki URL.

Peter17

2:44 a.m.

2010/5/26 Jim Tittsler <jt(a)onnz.net>et>:

...

On 2010-05-25 23:41, Peter17 wrote:

2010/5/25 Platonides <Platonides@gmail.com>:>

Well, in my tests, images are well included because I enabled $wgUseInstantCommons. As I wrote, "the parameters are totally ignored": they are indeed not substituted.

I didn't set $wgUploadPath. Just $wgUseInstantCommons = true; The images URLs are actually transformed to remote URLs: I work on my own local wiki, which address is http://localhost/mediawiki/ and transcluding {{mediawikiwiki::User:Peter17}} which contains [[File:Exquisite-network.png]] produces: <a href="http://www.mediawiki.org/wiki/File:Exquisite-network.png" class="image"><img alt="Exquisite-network.png" src="http://upload.wikimedia.org/wikipedia/commons/e/e1/Exquisite-netw… width="128" height="128" /></a>, so, it actually points to MediaWiki image description page and Commons image.

Alex Brollo

3:50 p.m.

@peter: here a recent thread into MediaWiki-API ml about API and sections: http://lists.wikimedia.org/pipermail/mediawiki-api/2010-May/subject.html No mention of labelled sections used by #lst exstesion ... :-( .... but remember the name of ThomasV as a reference. Alex

Platonides

9:28 p.m.

Peter17 wrote:

...

I think he points that the will be wrong unless $wgUploadPath is a full url (it is set as a full url for wmf wikis).

Peter17

28 May 28 May

8:16 a.m.

I have updated my proposal with a fourth version [1] I am still waiting for comments from Tim Starling. I have contacted him on IRC for this. [1] http://www.mediawiki.org/wiki/User:Peter17/Reasonably_efficient_interwiki_t… -- Peter Potrowl http://www.mediawiki.org/wiki/User:Peter17

Dmitriy Sintsov

3 Jun 3 Jun

1:03 p.m.

* Roan Kattouw <roan.kattouw(a)gmail.com> [Tue, 25 May 2010 20:58:54 +0200]:

...

2010/5/25 Aryeh Gregor <Simetrical+wikilist(a)gmail.com>om>: > Having Wikimedia servers send HTTP requests to each other instead of > just doing database queries does not sound like a great idea to me. > You're hitting several extra servers for no reason, including extra > requests to an application server. On top of that, you're caching > stuff in the database which is already *in* the database! FileRepo > does this the Right Way, and you should definitely look at how that > works. It uses polymorphism to use the database if possible, else

the

...

API. However, someone like Tim Starling should be consulted for a definitive performance assessment; don't rely on my word alone.

Having a something like FarmRequest or FarmApi classes would be a great think for wiki farms (I run a small one). Probably also would help to unificate the remote vs local farm code. Though, a Farm probably should become an object containing wiki configurations. Currently farms are a bit "hackish". Dmitriy

Daniel Friesen

8:31 p.m.

Dmitriy Sintsov wrote:

...

* Roan Kattouw <roan.kattouw(a)gmail.com> [Tue, 25 May 2010 20:58:54 +0200]:

the

API. However, someone like Tim Starling should be consulted for a definitive performance assessment; don't rely on my word alone.

^_^ "hackish" isn't that bad in some sense. I'm currently experimenting with some farm code that works completely outside of MediaWiki rather than as a extension sitting inside of it. Using a sandbox it can get access to the MediaWiki install and extract info from it in a secure way which couldn't be extracted as easily from the api. The system works more like a MediaWiki virtual machine than a MediaWiki installation turned WikiFarm. The result is a farm free of mapping issues which can give MediaWiki hostees much more control over the installation then they could on a normal WikiFarm, including the ability for different wiki on the wiki farm to run completely different versions of MediaWiki and upgrade independently, and have control over their own list of installed extensions. ;) In fact this works using complete raw unmodified MediaWiki source code. I have a few "source" directories with MediaWiki source, they don't have any changes to them, and then end up being run in the VM thinking they are a complete installation modified with all the stuff they need to run. ^_^ Tricking MediaWiki into thinking it's a single installation sitting on it's own from the outside is definitely "hackish". In any case, Farm{Request,Api} is a nice and interesting idea. ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name] -- ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Platonides

10:29 p.m.

Daniel Friesen wrote:

...

namespaces?

Happy-melon

4 Jun 4 Jun

1:33 a.m.

"Dmitriy Sintsov" <questpc(a)rambler.ru> wrote in message news:830714463.1275562997.168145444.10411@mcgi21.rambler.ru...

...

* Roan Kattouw <roan.kattouw(a)gmail.com> [Tue, 25 May 2010 20:58:54 +0200]

One way to achieve this would be to develop the MediaWiki class to actually be what it originally promised: an object representing a wiki, of which there can in principle be more than one instantiated at any one time. Configuration options could determine how the MediaWiki object accesses data, and consequently what sub-entities it is able to produce. --HM

Daniel Friesen

2:10 a.m.

Platonides wrote:

...

Daniel Friesen wrote:

namespaces?

For the sandboxing? No, I wanted to use runkit but had issues installing it. So I ended up messing with php's horrid proc_open to sandbox it in another process to act as the vm in the case my system needs to extract info from the wiki (not for virtualizing the actual wiki, that is done in-process in a different less wasteful way) I do have 5.3, but I'm not sure how I'd use php namespaces for that, especially without modifying MediaWiki. The only source modification I want to make at all is locally backporting any patch I commit to trunk to fix the issues with using special wiki configuration. -- ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Dmitriy Sintsov

4:51 a.m.

* Happy-melon <happy-melon(a)live.com> [Fri, 4 Jun 2010 00:33:30 +0100]:

...

One way to achieve this would be to develop the MediaWiki class to actually be what it originally promised: an object representing a wiki, of

which

...

there can in principle be more than one instantiated at any one time. Configuration options could determine how the MediaWiki object

accesses

...

data, and consequently what sub-entities it is able to produce.

Current MediaWiki class has some shortcomings. For example, when I've tried to setup rendering urls in my very own way and not using mod_rewrite, I've "cloned" and "refactored" index.php. The problem was with the following call: # warning: although instances of OutputPage and others are passed, # they are sometimes used as "fixed" wg* globals in other classes # so you cannot pass a non-global here, or use the different names # of passed instances $MW->initialize( $wgTitle, $wgArticle, $wgOut, $wgUser, $wgRequest ); First, I've made an instance of OutputPage with variable name different from default $wgOut. And $wgArticle, too. The engine didn't work as expected, it still was looking for the default names here and there. I was forced to use default wgOut and wgArticle names. But, then, there is no real incapsulation and there is no point to pass these as method parameters.. I'd imagine that "emulated" request or api through the local farm can be done really fast, while real remote interwiki call would be done in usual way (api). Dmitriy

Happy-melon

11:03 a.m.

"Dmitriy Sintsov" <questpc(a)rambler.ru> wrote in message news:1006208056.1275619880.71836632.61224@mcgi66.rambler.ru...

...

* Happy-melon <happy-melon(a)live.com> [Fri, 4 Jun 2010 00:33:30 +0100]:

One way to achieve this would be to develop the MediaWiki class to actually be what it originally promised: an object representing a wiki, of

which

there can in principle be more than one instantiated at any one time. Configuration options could determine how the MediaWiki object

accesses

data, and consequently what sub-entities it is able to produce.

Indeed; it does need a lot of work; doing it properly would probably deprecate all the state globals ($wg(Title|Parser|Article|Out|Request) etc); replacing them with member variables of the MediaWiki class. How other classes would access those variables is an interesting question; I could see an Article::getWiki()->getOut() chain, but that won't work for static functions. It would be a major overhaul, but would probably kill several birds with one stone. --HM

Dmitriy Sintsov

12:01 p.m.

* Happy-melon <happy-melon(a)live.com> [Fri, 4 Jun 2010 10:03:14 +0100]:

...

"Dmitriy Sintsov" <questpc(a)rambler.ru> wrote in message news:1006208056.1275619880.71836632.61224@mcgi66.rambler.ru... > * Happy-melon <happy-melon(a)live.com> [Fri, 4 Jun 2010 00:33:30

+0100]:

...

>> >> >> One way to achieve this would be to develop the MediaWiki class to >> actually >> be what it originally promised: an object representing a wiki, of > which >> there can in principle be more than one instantiated at any one

time.

...

>> Configuration options could determine how the MediaWiki object > accesses >> data, and consequently what sub-entities it is able to produce. >> > Current MediaWiki class has some shortcomings. For example, when

I've

...

> tried to setup rendering urls in my very own way and not using > mod_rewrite, I've "cloned" and "refactored" index.php. The problem

was

...

> with the following call: > > # warning: although instances of OutputPage and others are passed, > # they are sometimes used as "fixed" wg* globals in other classes > # so you cannot pass a non-global here, or use the different names > # of passed instances > $MW->initialize( $wgTitle, $wgArticle, $wgOut, $wgUser, $wgRequest

);

...

First, I've made an instance of OutputPage with variable name

different > from default $wgOut. And $wgArticle, too. The engine didn't work as > expected, it still was looking for the default names here and there.

...

> was forced to use default wgOut and wgArticle names. But, then,

there

...

is > no real incapsulation and there is no point to pass these as method > parameters.. > > I'd imagine that "emulated" request or api through the local farm

can

...

done really fast, while real remote interwiki call would be done in usual way (api). Dmitriy

other

...

classes would access those variables is an interesting question; I

could

...

see an Article::getWiki()->getOut() chain, but that won't work for static functions. It would be a major overhaul, but would probably kill several birds with one stone.

Hundreds of extensions would break :-( Compatibility is a huge burden. A crude but simpler approach would be having these globals saved in some context data structure and introduce Farm->switch() method, which would save/replace all the globals. Much less of core has to be changed, then. However, that's a bit more unreliable and risky. However, the code is fragile, anyway (from my exp, one typo sometimes can cause dreaded errors). Dmitriy

Happy-melon

12:10 p.m.

-------------------------------------------------- From: "Dmitriy Sintsov" <questpc(a)rambler.ru> Sent: Friday, June 04, 2010 11:01 AM To: "Happy-melon" <happy-melon(a)live.com>om>; "Wikimedia developers" <wikitech-l(a)lists.wikimedia.org> Subject: Re: [Wikitech-l] Reasonably efficient interwiki transclusion

...

* Happy-melon <happy-melon(a)live.com> [Fri, 4 Jun 2010 10:03:14 +0100]:

"Dmitriy Sintsov" <questpc(a)rambler.ru> wrote in message news:1006208056.1275619880.71836632.61224@mcgi66.rambler.ru... > * Happy-melon <happy-melon(a)live.com> [Fri, 4 Jun 2010 00:33:30

+0100]:

time.

I've

> tried to setup rendering urls in my very own way and not using > mod_rewrite, I've "cloned" and "refactored" index.php. The problem

was

);

First, I've made an instance of OutputPage with variable name

different > from default $wgOut. And $wgArticle, too. The engine didn't work as > expected, it still was looking for the default names here and there.

> was forced to use default wgOut and wgArticle names. But, then,

there

is > no real incapsulation and there is no point to pass these as method > parameters.. > > I'd imagine that "emulated" request or api through the local farm

can

done really fast, while real remote interwiki call would be done in usual way (api). Dmitriy

other

classes would access those variables is an interesting question; I

could

see an Article::getWiki()->getOut() chain, but that won't work for static functions. It would be a major overhaul, but would probably kill several birds with one stone.

MW 2.0? :-D You wouldn't need to remove the globals, at least immediately; you'd retain them as aliases for the relevant variables of the 'main' wiki; assuming that it makes sense to define one primary wiki, which it usually does. --HM

Keisial

5 Jun 5 Jun

12:57 p.m.

Daniel Friesen wrote:

...

I wanted to use runkit but had issues installing it. So I ended up messing with php's horrid proc_open to sandbox it in another process to act as the vm in the case my system needs to extract info from the wiki (not for virtualizing the actual wiki, that is done in-process in a different less wasteful way)

I was able to run runkit and run mediawiki inside. Sara Golemon hasn't cared about it for years, but the patches are all at http://pecl.php.net/bugs It should be quite easy to make it work on 5.2 (which was the latest version at the time).

5071

days inactive

5083

days old

wikitech-l@lists.wikimedia.org

Manage subscription

40 comments

16 participants

tags (0)

participants (16)

Alex Brollo
Amir E. Aharoni
Aryeh Gregor
Chad
church.of.emacs.ml
Daniel Friesen
Dmitriy Sintsov
Happy-melon
Jim Tittsler
Keisial
Marco Schuster
Peter17
Platonides
Q
Roan Kattouw
Tei