Hi all!
What is the best way to control load order of JS (or modules modules in general)? We (WMDE) are developing WikiPraise, a gadget that can show directly show who contributed which part of a given article revision, as shown to the user.
WikiPraise works by taking the wikitext annotated with authorship info (a "blame map", currently provided by the CollaborativeTrust project), render it to html, then note the offsets of the marker in the generated html (ignoring markup and whitespace), and store them. When showing a page, the gadget fetches this authorship info and applies it to the html DOM of the content of the page.
Thomas Schmidt (User:NetAction), who is developing WikiPraise for us, ran into a problem with other user scripts manipulating the DOM. Since we modify the DOM based on html offsets, any prior modification of the DOM will break the output. Thus, WikiPraise must run before any other JS that may modify the DOM.
So, what would be the best way to achieve that? Perhaps we could write an extension that does nothing but causing the WikiPraise JS code to be loaded early enough?
We could of course do the entire Wikitext-to-HTML transform on every page view, instead of trying to annotate the DOM based on a pre-generated, offset based blame map. But that would a) also have to happen before any other user script runs and b) would be prohibitively slow (try the WikiTrust plugin for firefox to see what i mean).
Note that we plan to have this enabled per default even for anons. The idea is to provide an easy way for fully compliant re-use citing all authors of some section of an article, and also allowing readers to get a better idea who wrote what, and what can be trusted.
Anyway: rendering the content based on the annotated wikitext on every page view would likely melt the servers, so it's not an option. We need to be able to apply some sort of pre-generated blame map to the DOM.
Or, of course, we could hook into the parser and do all this inside mediawiki. We we'd need to call out to fetch or generate the blame map when parsing. If we had a high performance blame implementation that could be tightly integrated with mediawiki, this would work. I would prefer that, and we might work towards that, but it would be nice to have a low-impact implementation first. And the current approach works pretty well, except for other user scripts interfering.
To try WikiPraise, put this into your common.js:
$.holdReady(true); mediaWiki.loader.load("https://toolserver.org/~netaction/wikitrust.js");
Note that this will disable all gadgets and custom scripts: $.holdReady(true) is a hack to prevent other user scripts from running. That sucks of course.
Any ideas?
-- daniel
On Wed, Jan 11, 2012 at 12:19 PM, Daniel Kinzler daniel@brightbyte.de wrote:
$.holdReady(true); mediaWiki.loader.load("https://toolserver.org/~netaction/wikitrust.js");
Note that this will disable all gadgets and custom scripts: $.holdReady(true) is a hack to prevent other user scripts from running. That sucks of course.
I think holdReady is probably the most reliable way to prevent scripts from messing with your DOM, although I defer to Krinkle for a more authoritative answer. Writing an extension allows you to contrrol where WikiPraise's <script> tag will be put, but I'm not sure how much that'll help you.
Do you need the clean DOM just for reading, or for writing as well? If you only need a clean DOM for reading and can write even to a dirty DOM, there are some alternatives: * hold the ready "lock" only for cloning the DOM, then release it and do your processing while other scripts run * use AJAX to fetch the HTML source of the page, and work with that
It would be really nice if your blame engine didn't rely on character offsets in the HTML, but used something more robust. As you said, the preferred implementation would be something that's close to the parser and puts extra annotations (like <span> tags) in the parser-generated HTML (the InlineEditor extension does this). Server-side blaming doesn't have to be expensive as long as you use an incremental blame-as-you-go implementation where you store a blame map for each revision, and after each edit you use the edit diff and the previous revision's blame map to generate the new revision's blame map. This should be a fairly cheap edit-time operation. You would need to generate blame maps for all old revisions, though, but that could be done offline on a few high-performance boxes before the feature is even enabled.
Roan
Hi Roan&Daniel!
Do you need the clean DOM just for reading, or for writing as well?
Read and write. WikiTrust does it very fast before the other scripts run. Then it releases the ready lock for all other scripts and adds its user interface.
use AJAX to fetch the HTML source of the page, and work with that
I did it. That doubled the traffic and destroyed most of the other scripts. Toggle TableOfContents, Maps and so on.
It would be really nice if your blame engine didn't rely on character offsets in the HTML, but used something more robust.
I did not see any error because of this so far. As long as we know what MediaWiki does we can be responsive on that.
As you said, the preferred implementation would be something that's close to the parser and puts extra annotations (like <span> tags) in the parser-generated HTML
You talk about up to several megabytes per page.
Server-side blaming doesn't have to be expensive as long as you use an incremental blame-as-you-go implementation where you store a blame map for each revision, and after each edit you use the edit diff and the previous revision's blame map to generate the new revision's blame map.
This is what Collaborativetrust already does. Unfortunately it does it not well.
Thomas
On 11.01.2012 14:12, Thomas Schmidt wrote:
Roan wrote:
I think holdReady is probably the most reliable way to prevent scripts from messing with your DOM, although I defer to Krinkle for a more authoritative answer.
Yes, unless they all use that :)
My impression was that it breaks quite a few user scripts. But maybe I'm mistaken? I havn't tested the latest version extensively.
If holdReady works well enough with other scripts, it's a good enough hack for now, I think.
It would be really nice if your blame engine didn't rely on character offsets in the HTML, but used something more robust.
I did not see any error because of this so far. As long as we know what MediaWiki does we can be responsive on that.
I would love to use something more robust, yes, but it also has to be compact and fast. I can't think of anything offhand. Maybe
As you said, the preferred implementation would be something that's close to the parser and puts extra annotations (like <span> tags) in the parser-generated HTML
You talk about up to several megabytes per page.
That'S not the main problem. The main problem is that we are doing this on thje outside. So, we would have a pre-calculated DOM with our annotations, and the DOM present on the page without our annotations, but possibly modified by scripts, and would have to merge the two. That only makes things wordse.
Server-side blaming doesn't have to be expensive as long as you use an incremental blame-as-you-go implementation where you store a blame map for each revision, and after each edit you use the edit diff and the previous revision's blame map to generate the new revision's blame map.
This would be my preferred solution, but it's way beyond the scope of the current project. The idea was to have a standalone script that works well enough to show that this kind of thing is indeed useful.
If we have the foundation's support for developing full blame/praise support in mediawiki, I even know who would be not only delighted but also qualified to write it (not for free, though). But with wikidata coming up, I doubt wmde would manage the project. Though I personally would love to see this happening asap.
In fact, I hope that some experience with the gadget based solution will convince the foundation that yes, we want that, we need that.
Hm, actually... fellowships arn't supposed to be for development stuff, are they?
This is what Collaborativetrust already does. Unfortunately it does it not well.
Well, the blame map they generate is better than any I have seen so far, they deal nicely with moved paragraphs, reverts, etc. But the "trust" part is massive overhead, and it's ocaml. Otoh, integrating this into mediawiki directly as an extension was their original approach, some code already exists.
Note that storing the blame maps for all revisions needs quite a bit of space. And yes, we need all revisions.
cheers daniel
2012/1/11 Daniel Kinzler daniel@brightbyte.de:
The main problem is that we are doing this on thje outside.
That would be a really great thing. When the user clicks a word we have to find out where he clicked. The data goes back to the server. The server finds out which scripts were used and generates the same HTML that is currently in the browser. Yes, it has to do something like executing the scripts. Then the server knows the revision of the clicked position.
The server sends rules back to the browser how to highlight the text. I think it is not possible with less traffic and less memory consumption in the browser. But all the small DOM manipulations for inserting the just needed SPAN elements will be slower than the existing UserScript which manipulates the whole page in one innerHTML call.
Yes, we could do all this on the client side with the help of the WikiTrust Sure sequences too. It will be a hard job to imitate all scripts in the hidden HTML but it is possible. But: Is there any advantage to the current UserScript?
Thomas
----- Original Message -----
From: "Thomas Schmidt" schmidt@netaction.de
As you said, the preferred implementation would be something that's close to the parser and puts extra annotations (like <span> tags) in the parser-generated HTML
You talk about up to several megabytes per page.
It was my snap reaction as well that putting span markers in the HTML at blame edges wasn't gonna scale very well... and could be pathological.
Cheers, -- jra
wikitech-l@lists.wikimedia.org