Thanks Marcin for the response.
I've provided comments and questions inline, where I have them.
On Nov 1, 2013, at 6:51 PM, Marcin Cieslak saper@saper.info wrote:
Shawn Jones sjone@cs.odu.edu wrote:
- The Memento protocol has a resource called a TimeMap [1]
that takes an article name and returns text formatted as application/link-format. This text contains a machine-readable list of all of the prior revisions (mementos) of this page. It is currently implemented as a SpecialPage which can be accessed like http://www.example.com/index.php/Special:TimeMap/Article_Name. Is this the best method, or is it more preferable for us to extend the Action class and add a new action to $wgActions in order to return a TimeMap from the regular page like http://www.example.com/index.php?title=Article_Name&action=gettimemap without using the SpecialPage? Is there another preferred way of solving this problem?
It just occured to be that if TimeMap were a microformat, this information could be embeded in to ?title=Article_Name&action=history itself.
Even then, if we need an additional MIME type for that maybe we could vary action=history response based on the desired MIME type (text/html or linking format).
It would be excellent to have it available as a Microformat. We had not considered it.
The way the Memento framework operates, these TimeMaps are directly accessible resources (e.g. GET http://example/TimeMap) and no additional processing is performed to extract them.
I'm glad you brought up the action=history. One of the ideas we had discussed was actually varying action=history with an additional set of arguments to produce the TimeMap.
We were concerned with what best fit into MediaWiki's future plans/goals/philosophy.
- In order to create the correct headers for use with the Memento
protocol, we have to generate URIs. To accomplish this, we use the $wgServer global variable (through a layer of abstraction); how do we correctly handle situations if it isn't set by the installation? Is there an alternative? Is there a better way to construct URIs?
We have wfExpandUrl (yes, there are some bugs currently wrt empty $wgServer now... https://bugzilla.wikimedia.org/show_bug.cgi?id=54950).
Actually, looking at our code, we have used wfExpandUrl, but can likely use it on the few lines left that access $wgServer. My longer response to Brian Wolff now seems unnecessary.
Now that I'm looking at the docs, it states "Assumes $wgServer is correct."
If the local installation munges $wgServer in some way, and we're not using it directly, then I guess it's their responsibility to deal with the fallout?
Can I count it good if I just move our remaining lines to use wfExpandUrl?
- Is there a way to get previous revisions of embedded content, like
images? I tried using the ImageBeforeProduceHTML hook, but found that setting the $time parameter didn't return a previous revision of an image. Am I doing something wrong? Is there a better way?
I'm not in a position to give you a full answer, but what I would do I would try to see if I can setup a MediaWiki with $wgInstantCommons = true and see how I can make ForeignAPIRepo to fetch older revisions from Wikimedia via API. Then we can have a look at other media storage backends, including those used by WMF installation.
I'll look into this.
- Some sites don't wish to have their past Talk/Discussion pages
accessible via Memento. We have the ability to exclude namespaces (Talk, Template, Category, etc.) via configurable option. By default it excludes nothing. What namespaces should be excluded by default?
There might be interesting issues about deleted content, some people feel very strongly about making it unavailable to others (partly due to some legal issues); some people setup wikis dedicated to provide content deleted from Wikipedia. Are you sure history should not be redacted at times? :-)
Not sure why somebody does not like archiving Talk pages like this but I think this feature could be enabled per-namespace like many others in MediaWiki. Archiving media and files will be certainly different and you will run into interesting issues with versioning Categories and Templates. Extension:FlaggedRevs has some method to track what kind of ancilliary content has been modified (FRInclusionManager.php and FRInclusionCache.php might be things to look at).
I'll look into this.
And a question back to you:
How are you going to handle versioning of stuff like MediaWiki:Common.js, MediaWiki:Common.css independently of the proper content itself? Some changes might affect presentation of the content meaningfully, for example see how https://en.wikipedia.org/wiki/Template:Nts works.
We had not considered Common.js and Common.css yet. Our first goal was to get previous page content loaded, then move on to include previous templates. Now we're looking at images.
I see that MediaWiki:Common.css and MediaWiki:Common.js DO have revision histories, which, in theory, means that we can somehow serve up old content. Any ideas on how to access them?
Thanks for pointing this out!
If you don't know already, PediaPress developed generator of static documents out of wiki content (http://code.pediapress.com/, see Extension:Collection) and they had to deal with lots of similar issues in their renderer, mwlib. The renderer accesses the wiki as a client and fetches all ancillary content as needed.
We'll have to look at PediaPress.
I appreciate the input,
--Shawn