Hi,
I'm currently working on the Memento Extension for Mediawiki, as announced earlier today by Herbert Van de Sompel.
The goal of this extension is to work with the Memento framework, which attempts to display web pages as they appeared at a given date and time in the past.
Our goal is for this to be a collaborative effort focusing on solving issues and providing functionality in "the Wikimedia Way" as much as possible.
Without further ado, I have the following technical questions (I apologize in advance for the fire hose):
1. The Memento protocol has a resource called a TimeMap [1] that takes an article name and returns text formatted as application/link-format. This text contains a machine-readable list of all of the prior revisions (mementos) of this page. It is currently implemented as a SpecialPage which can be accessed like http://www.example.com/index.php/Special:TimeMap/Article_Name. Is this the best method, or is it more preferable for us to extend the Action class and add a new action to $wgActions in order to return a TimeMap from the regular page like http://www.example.com/index.php?title=Article_Name&action=gettimemap without using the SpecialPage? Is there another preferred way of solving this problem?
2. We currently make several database calls using the the select method of the Database Object. After some research, we realized that Mediawiki provides some functions that do what we need without making these database calls directly. One of these needs is to acquire the oldid and timestamp of the first revision of a page, which can be done using Title->getFirstRevision()->getId() and Title->getFirstRevision()->getTimestamp() methods. Is there a way to get the latest ID and latest timestamp? I see I can do Title->getLatestRevID() to get the latest revision ID; what is the best way to get the latest timestamp?
3. In order to create the correct headers for use with the Memento protocol, we have to generate URIs. To accomplish this, we use the $wgServer global variable (through a layer of abstraction); how do we correctly handle situations if it isn't set by the installation? Is there an alternative? Is there a better way to construct URIs?
4. We use exceptions to indicate when showErrorPage should be run; should the hooks that catch these exceptions and then run showErrorPage also return false?
5. Is there a way to get previous revisions of embedded content, like images? I tried using the ImageBeforeProduceHTML hook, but found that setting the $time parameter didn't return a previous revision of an image. Am I doing something wrong? Is there a better way?
6. Are there any additional coding standards we should be following besides those on the "Manual:Coding_conventions" and "Manual:Coding Conventions - Mediawiki" pages?
7. We have two styles for serving pages back to the user: * 302-style[2], which uses a 302 redirect to tell the user's browser to go fetch the old revision of the page (e.g. http://www.example.com/index.php?title=Article&oldid=12345) * 200-style[3], which actually modifies the page content in place so that it resembles the old revision of the page Which of these styles is preferable as a default?
8. Some sites don't wish to have their past Talk/Discussion pages accessible via Memento. We have the ability to exclude namespaces (Talk, Template, Category, etc.) via configurable option. By default it excludes nothing. What namespaces should be excluded by default?
Thanks in advance for any advice, assistance, further discussion, and criticism on these and other topics.
Shawn M. Jones Graduate Research Assistant Department of Computer Science Old Dominion University
[1] http://www.mementoweb.org/guide/rfc/ID/#Pattern6 [2] http://www.mementoweb.org/guide/rfc/ID/#Pattern1.1 [3] http://www.mementoweb.org/guide/rfc/ID/#Pattern1.2
Hi, I responded inline.
On 11/1/13, Shawn Jones sjone@cs.odu.edu wrote:
Hi,
I'm currently working on the Memento Extension for Mediawiki, as announced earlier today by Herbert Van de Sompel.
The goal of this extension is to work with the Memento framework, which attempts to display web pages as they appeared at a given date and time in the past.
Our goal is for this to be a collaborative effort focusing on solving issues and providing functionality in "the Wikimedia Way" as much as possible.
Without further ado, I have the following technical questions (I apologize in advance for the fire hose):
- The Memento protocol has a resource called a TimeMap [1] that takes an
article name and returns text formatted as application/link-format. This text contains a machine-readable list of all of the prior revisions (mementos) of this page. It is currently implemented as a SpecialPage which can be accessed like http://www.example.com/index.php/Special:TimeMap/Article_Name. Is this the best method, or is it more preferable for us to extend the Action class and add a new action to $wgActions in order to return a TimeMap from the regular page like http://www.example.com/index.php?title=Article_Name&action=gettimemap without using the SpecialPage? Is there another preferred way of solving this problem?
Special Page vs Action is usually considered equally ok for this sort of thing. However creating an api module would probably be the preferred method to return such machine readable data about a page.
- We currently make several database calls using the the select method of
the Database Object. After some research, we realized that Mediawiki provides some functions that do what we need without making these database calls directly. One of these needs is to acquire the oldid and timestamp of the first revision of a page, which can be done using Title->getFirstRevision()->getId() and Title->getFirstRevision()->getTimestamp() methods. Is there a way to get the latest ID and latest timestamp? I see I can do Title->getLatestRevID() to get the latest revision ID; what is the best way to get the latest timestamp?
Use existing wrapper functions around DB calls where you can, but if you need to its ok to query the db directly.
For the last part, probably something along the lines of WikiPage::factory( $titleObj )->getRevision()->getTimestamp()
- In order to create the correct headers for use with the Memento
protocol, we have to generate URIs. To accomplish this, we use the $wgServer global variable (through a layer of abstraction); how do we correctly handle situations if it isn't set by the installation? Is there an alternative? Is there a better way to construct URIs?
$wgServer is always filled out (Setup.php sets it if user doesn't). However you probably shouldn't be using it directly. What the most appropriate method to use depends on what sort of urls you want, but generally the Title class has methods like getFullURL for this sort of thing.
- We use exceptions to indicate when showErrorPage should be run; should
the hooks that catch these exceptions and then run showErrorPage also return false?
I haven't looked at your code, so not sure about the context - but: In general a hook returns true to denote no futher processing should take place. Displaying an error message sounds like a good criteria to return true. That said, things may depend on the hook and what precisely you're doing.
- Is there a way to get previous revisions of embedded content, like
images? I tried using the ImageBeforeProduceHTML hook, but found that setting the $time parameter didn't return a previous revision of an image. Am I doing something wrong? Is there a better way?
FlaggedRevisions manages to set old version of an image, so its possible. I think you might want to do something with the BeforeParserFetchFileAndTitle hook as well. For the time parameter, make sure the function you're using has the $time parameter marked as pass by reference. Also note: the time parameter is the timestamp that the image version was created, it does not denote get whatever image would be relavent at the time specified (I believe).
- Are there any additional coding standards we should be following besides
those on the "Manual:Coding_conventions" and "Manual:Coding Conventions - Mediawiki" pages?
Those are the important ones. As a rule of thumb, try to make your code look like it fits in with the rest of mediawiki.
- We have two styles for serving pages back to the user:
- 302-style[2], which uses a 302 redirect to tell the user's browser
to go fetch the old revision of the page (e.g. http://www.example.com/index.php?title=Article&oldid=12345) * 200-style[3], which actually modifies the page content in place so that it resembles the old revision of the page Which of these styles is preferable as a default?
First reaction would be that the 302, as it more clearly indicates your viewing an old page, and people could copy and paste the url in order to get to see the exact same version. It also seems better to have different urls for different objects (caching and all). [That's just a first reaction, I haven't thought about it deeply]
- Some sites don't wish to have their past Talk/Discussion pages
accessible via Memento. We have the ability to exclude namespaces (Talk, Template, Category, etc.) via configurable option. By default it excludes nothing. What namespaces should be excluded by default?
That's going to be a political issue that varries by project probably. As a first approximation maybe default only to things in $wgContentNamespaces.
Thanks in advance for any advice, assistance, further discussion, and criticism on these and other topics.
Shawn M. Jones Graduate Research Assistant Department of Computer Science Old Dominion University
[1] http://www.mementoweb.org/guide/rfc/ID/#Pattern6 [2] http://www.mementoweb.org/guide/rfc/ID/#Pattern1.1 [3] http://www.mementoweb.org/guide/rfc/ID/#Pattern1.2 _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Good luck on developing your extension. Last of all I don't want to sound negative, but please keep in mind that if your goal is deployment on Wikipedia, that is not just a technical issue, but also a political one, and a goal that is rather hard to accomplish...
Cheers, Brian
On Fri, Nov 1, 2013 at 3:43 PM, Brian Wolff bawolff@gmail.com wrote:
Hi, I responded inline.
On 11/1/13, Shawn Jones sjone@cs.odu.edu wrote:
Hi,
I'm currently working on the Memento Extension for Mediawiki, as
announced
earlier today by Herbert Van de Sompel.
The goal of this extension is to work with the Memento framework, which attempts to display web pages as they appeared at a given date and time
in
the past.
Our goal is for this to be a collaborative effort focusing on solving
issues
and providing functionality in "the Wikimedia Way" as much as possible.
Without further ado, I have the following technical questions (I
apologize
in advance for the fire hose):
- The Memento protocol has a resource called a TimeMap [1] that takes
an
article name and returns text formatted as application/link-format. This text contains a machine-readable list of all of the prior revisions (mementos) of this page. It is currently implemented as a SpecialPage
which
can be accessed like http://www.example.com/index.php/Special:TimeMap/Article_Name. Is this
the
best method, or is it more preferable for us to extend the Action class
and
add a new action to $wgActions in order to return a TimeMap from the
regular
page like http://www.example.com/index.php?title=Article_Name&action=gettimemap without using the SpecialPage? Is there another preferred way of solving this problem?
Special Page vs Action is usually considered equally ok for this sort of thing. However creating an api module would probably be the preferred method to return such machine readable data about a page.
I disagree, but maybe that's because it's been a long-term goal of mine to kill action urls entirely.
-Chad
Thank you all very much for your timely responses.
I'll be reviewing them today and will probably have more questions as time goes on.
You've given us a lot to consider and discuss.
--Shawn
On Nov 1, 2013, at 7:33 PM, Chad <innocentkiller@gmail.commailto:innocentkiller@gmail.com> wrote:
On Fri, Nov 1, 2013 at 3:43 PM, Brian Wolff <bawolff@gmail.commailto:bawolff@gmail.com> wrote:
Hi, I responded inline.
On 11/1/13, Shawn Jones <sjone@cs.odu.edumailto:sjone@cs.odu.edu> wrote: Hi,
I'm currently working on the Memento Extension for Mediawiki, as announced earlier today by Herbert Van de Sompel.
The goal of this extension is to work with the Memento framework, which attempts to display web pages as they appeared at a given date and time in the past.
Our goal is for this to be a collaborative effort focusing on solving issues and providing functionality in "the Wikimedia Way" as much as possible.
Without further ado, I have the following technical questions (I apologize in advance for the fire hose):
1. The Memento protocol has a resource called a TimeMap [1] that takes an article name and returns text formatted as application/link-format. This text contains a machine-readable list of all of the prior revisions (mementos) of this page. It is currently implemented as a SpecialPage which can be accessed like http://www.example.com/index.php/Special:TimeMap/Article_Name. Is this the best method, or is it more preferable for us to extend the Action class and add a new action to $wgActions in order to return a TimeMap from the regular page like http://www.example.com/index.php?title=Article_Name&action=gettimemap without using the SpecialPage? Is there another preferred way of solving this problem?
Special Page vs Action is usually considered equally ok for this sort of thing. However creating an api module would probably be the preferred method to return such machine readable data about a page.
I disagree, but maybe that's because it's been a long-term goal of mine to kill action urls entirely.
-Chad _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.orgmailto:Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Thanks Brian, this is all good stuff.
To avoid text overload, I, too have responded inline where I have more comments and questions.
- We currently make several database calls using the the select method of
the Database Object. After some research, we realized that Mediawiki provides some functions that do what we need without making these database calls directly. One of these needs is to acquire the oldid and timestamp of the first revision of a page, which can be done using Title->getFirstRevision()->getId() and Title->getFirstRevision()->getTimestamp() methods. Is there a way to get the latest ID and latest timestamp? I see I can do Title->getLatestRevID() to get the latest revision ID; what is the best way to get the latest timestamp?
Use existing wrapper functions around DB calls where you can, but if you need to its ok to query the db directly.
For the last part, probably something along the lines of WikiPage::factory( $titleObj )->getRevision()->getTimestamp()
That enormous sound you heard was my palm hitting my forehead. Thanks for pointing that one out for me.
We'll be replacing our getFirstMemento and getLastMemento functions soon now that we have Mediawiki-esque solutions for them.
There are other instances in which we access the database: * <= given Timestamp (this is what gets the old revision of the page) * Time Map data (fetch the id and timestamp of the last 500 revisions)
I doubt there is something built into Mediawiki that already provides that capability. If there is, please advise. :)
- In order to create the correct headers for use with the Memento
protocol, we have to generate URIs. To accomplish this, we use the $wgServer global variable (through a layer of abstraction); how do we correctly handle situations if it isn't set by the installation? Is there an alternative? Is there a better way to construct URIs?
$wgServer is always filled out (Setup.php sets it if user doesn't). However you probably shouldn't be using it directly. What the most appropriate method to use depends on what sort of urls you want, but generally the Title class has methods like getFullURL for this sort of thing.
That makes me feel a little bit better about our dependencies.
Since our rewrite, we only use $wgServer (via abstraction) in two places now, and they both involve the TimeMap SpecialPage.
We actually have 3 different types of TimeMaps in the Memento Mediawiki Extension: 1. full (starter) - shows the latest 500 revisions 2. pivot descending - shows the last 500 (or less) revisions prior to a given timestamp pivot 3. pivot ascending - shows the next 500 (or less) revisions after a given timestamp pivot
The pivot ascending and pivot descending TimeMaps are what use the $wgServer URI.
They take the form of http://example.com/index.php/Special:TimeMap/20130720011113/1/Article for ascending and http://example.com/index.php/Special:TimeMap/20130720011113/-1/Page for descending.
The $wgServer variable is used (as $this->mwbaseurl) to construct the URIs like so:
$timeMapPage['uri'] = $this->mwbaseurl . '/' . SpecialPage::getTitleFor('TimeMap') . '/' . $pivotTimestamp . '/-1/' . $title;
A similar statement exists for a pivot ascending TimeMap elsewhere in the code.
I've been trying to find a way to eliminate the use of $wgServer altogether, but need to construct these URIs for headers, TimeMap entries, etc.
Is there a better way?
- Is there a way to get previous revisions of embedded content, like
images? I tried using the ImageBeforeProduceHTML hook, but found that setting the $time parameter didn't return a previous revision of an image. Am I doing something wrong? Is there a better way?
FlaggedRevisions manages to set old version of an image, so its possible. I think you might want to do something with the BeforeParserFetchFileAndTitle hook as well. For the time parameter, make sure the function you're using has the $time parameter marked as pass by reference. Also note: the time parameter is the timestamp that the image version was created, it does not denote get whatever image would be relavent at the time specified (I believe).
I'll have to experiment with that and get back.
Thanks again,
--Shawn
On 2013-11-02 11:53 AM, Shawn Jones wrote:
That makes me feel a little bit better about our dependencies.
Since our rewrite, we only use $wgServer (via abstraction) in two places now, and they both involve the TimeMap SpecialPage.
We actually have 3 different types of TimeMaps in the Memento Mediawiki Extension:
- full (starter) - shows the latest 500 revisions
- pivot descending - shows the last 500 (or less) revisions prior to a given timestamp pivot
- pivot ascending - shows the next 500 (or less) revisions after a given timestamp pivot
The pivot ascending and pivot descending TimeMaps are what use the $wgServer URI.
They take the form of http://example.com/index.php/Special:TimeMap/20130720011113/1/Article for ascending and http://example.com/index.php/Special:TimeMap/20130720011113/-1/Page for descending.
The $wgServer variable is used (as $this->mwbaseurl) to construct the URIs like so:
$timeMapPage['uri'] = $this->mwbaseurl . '/' . SpecialPage::getTitleFor('TimeMap') . '/' . $pivotTimestamp . '/-1/' . $title;
A similar statement exists for a pivot ascending TimeMap elsewhere in the code.
I've been trying to find a way to eliminate the use of $wgServer altogether, but need to construct these URIs for headers, TimeMap entries, etc.
Is there a better way?
$timeMapPage['uri'] = SpecialPage::getTitleFor( 'TimeMap', $pivotTimestamp . '/-1/' . $title )->get{???}URL();
{???} will be Full, Local, or Canonical depending on where you're outputting it.
* href="" -> Local * Somewhere used on other domains -> Full (does output protocol-relative) * Print and email -> Canonical * HTTP Headers -> Local + wfExpandUrl( , PROTO_CURRENT ); Unless you use OutputPage::redirect in which case you can simply use Local as url expansion is taken care for you.
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/]
This worked beautifully.
Thanks Daniel Friesen,
--Shawn ________________________________________ From: wikitech-l-bounces@lists.wikimedia.org [wikitech-l-bounces@lists.wikimedia.org] on behalf of Daniel Friesen [daniel@nadir-seen-fire.com] Sent: Saturday, November 02, 2013 7:08 PM To: Wikimedia developers Subject: Re: [Wikitech-l] Memento Extension for MediaWiki: Advice on Further Development
On 2013-11-02 11:53 AM, Shawn Jones wrote:
That makes me feel a little bit better about our dependencies.
Since our rewrite, we only use $wgServer (via abstraction) in two places now, and they both involve the TimeMap SpecialPage.
We actually have 3 different types of TimeMaps in the Memento Mediawiki Extension:
- full (starter) - shows the latest 500 revisions
- pivot descending - shows the last 500 (or less) revisions prior to a given timestamp pivot
- pivot ascending - shows the next 500 (or less) revisions after a given timestamp pivot
The pivot ascending and pivot descending TimeMaps are what use the $wgServer URI.
They take the form of http://example.com/index.php/Special:TimeMap/20130720011113/1/Article for ascending and http://example.com/index.php/Special:TimeMap/20130720011113/-1/Page for descending.
The $wgServer variable is used (as $this->mwbaseurl) to construct the URIs like so:
$timeMapPage['uri'] = $this->mwbaseurl . '/' . SpecialPage::getTitleFor('TimeMap') . '/' . $pivotTimestamp . '/-1/' . $title;
A similar statement exists for a pivot ascending TimeMap elsewhere in the code.
I've been trying to find a way to eliminate the use of $wgServer altogether, but need to construct these URIs for headers, TimeMap entries, etc.
Is there a better way?
$timeMapPage['uri'] = SpecialPage::getTitleFor( 'TimeMap', $pivotTimestamp . '/-1/' . $title )->get{???}URL();
{???} will be Full, Local, or Canonical depending on where you're outputting it.
* href="" -> Local * Somewhere used on other domains -> Full (does output protocol-relative) * Print and email -> Canonical * HTTP Headers -> Local + wfExpandUrl( , PROTO_CURRENT ); Unless you use OutputPage::redirect in which case you can simply use Local as url expansion is taken care for you.
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/]
_______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Fri, Nov 1, 2013 at 6:43 PM, Brian Wolff bawolff@gmail.com wrote:
I haven't looked at your code, so not sure about the context - but: In general a hook returns true to denote no futher processing should take place.
If we're talking about wfRunHooks hooks, the usual case is that they return *false* to indicate no further processing, and true means to *continue* processing.
On 2013-11-04 11:04 AM, "Brad Jorsch (Anomie)" bjorsch@wikimedia.org wrote:
On Fri, Nov 1, 2013 at 6:43 PM, Brian Wolff bawolff@gmail.com wrote:
I haven't looked at your code, so not sure about the context - but: In general a hook returns true to denote no futher processing should take place.
If we're talking about wfRunHooks hooks, the usual case is that they return *false* to indicate no further processing, and true means to *continue* processing.
-- Brad Jorsch (Anomie) Software Engineer Wikimedia Foundation
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
D'oh. You are of course correct. Sorry for the mistake.
As an aside, perhaps we should introduce constants for this. Its easy to mix up the two values.
-bawolff
Thanks Brian,
Defaulting to only allow $wgContentNamespaces, or more specifically, MWNamespace::getContentNamespaces(), worked great.
--Shawn
________________________________________ From: wikitech-l-bounces@lists.wikimedia.org [wikitech-l-bounces@lists.wikimedia.org] on behalf of Brian Wolff [bawolff@gmail.com] Sent: Friday, November 01, 2013 6:43 PM To: Wikimedia developers Subject: Re: [Wikitech-l] Memento Extension for MediaWiki: Advice on Further Development
Hi, I responded inline.
On 11/1/13, Shawn Jones sjone@cs.odu.edu wrote:
Hi,
I'm currently working on the Memento Extension for Mediawiki, as announced earlier today by Herbert Van de Sompel.
The goal of this extension is to work with the Memento framework, which attempts to display web pages as they appeared at a given date and time in the past.
Our goal is for this to be a collaborative effort focusing on solving issues and providing functionality in "the Wikimedia Way" as much as possible.
Without further ado, I have the following technical questions (I apologize in advance for the fire hose):
- The Memento protocol has a resource called a TimeMap [1] that takes an
article name and returns text formatted as application/link-format. This text contains a machine-readable list of all of the prior revisions (mementos) of this page. It is currently implemented as a SpecialPage which can be accessed like http://www.example.com/index.php/Special:TimeMap/Article_Name. Is this the best method, or is it more preferable for us to extend the Action class and add a new action to $wgActions in order to return a TimeMap from the regular page like http://www.example.com/index.php?title=Article_Name&action=gettimemap without using the SpecialPage? Is there another preferred way of solving this problem?
Special Page vs Action is usually considered equally ok for this sort of thing. However creating an api module would probably be the preferred method to return such machine readable data about a page.
- We currently make several database calls using the the select method of
the Database Object. After some research, we realized that Mediawiki provides some functions that do what we need without making these database calls directly. One of these needs is to acquire the oldid and timestamp of the first revision of a page, which can be done using Title->getFirstRevision()->getId() and Title->getFirstRevision()->getTimestamp() methods. Is there a way to get the latest ID and latest timestamp? I see I can do Title->getLatestRevID() to get the latest revision ID; what is the best way to get the latest timestamp?
Use existing wrapper functions around DB calls where you can, but if you need to its ok to query the db directly.
For the last part, probably something along the lines of WikiPage::factory( $titleObj )->getRevision()->getTimestamp()
- In order to create the correct headers for use with the Memento
protocol, we have to generate URIs. To accomplish this, we use the $wgServer global variable (through a layer of abstraction); how do we correctly handle situations if it isn't set by the installation? Is there an alternative? Is there a better way to construct URIs?
$wgServer is always filled out (Setup.php sets it if user doesn't). However you probably shouldn't be using it directly. What the most appropriate method to use depends on what sort of urls you want, but generally the Title class has methods like getFullURL for this sort of thing.
- We use exceptions to indicate when showErrorPage should be run; should
the hooks that catch these exceptions and then run showErrorPage also return false?
I haven't looked at your code, so not sure about the context - but: In general a hook returns true to denote no futher processing should take place. Displaying an error message sounds like a good criteria to return true. That said, things may depend on the hook and what precisely you're doing.
- Is there a way to get previous revisions of embedded content, like
images? I tried using the ImageBeforeProduceHTML hook, but found that setting the $time parameter didn't return a previous revision of an image. Am I doing something wrong? Is there a better way?
FlaggedRevisions manages to set old version of an image, so its possible. I think you might want to do something with the BeforeParserFetchFileAndTitle hook as well. For the time parameter, make sure the function you're using has the $time parameter marked as pass by reference. Also note: the time parameter is the timestamp that the image version was created, it does not denote get whatever image would be relavent at the time specified (I believe).
- Are there any additional coding standards we should be following besides
those on the "Manual:Coding_conventions" and "Manual:Coding Conventions - Mediawiki" pages?
Those are the important ones. As a rule of thumb, try to make your code look like it fits in with the rest of mediawiki.
- We have two styles for serving pages back to the user:
- 302-style[2], which uses a 302 redirect to tell the user's browser
to go fetch the old revision of the page (e.g. http://www.example.com/index.php?title=Article&oldid=12345) * 200-style[3], which actually modifies the page content in place so that it resembles the old revision of the page Which of these styles is preferable as a default?
First reaction would be that the 302, as it more clearly indicates your viewing an old page, and people could copy and paste the url in order to get to see the exact same version. It also seems better to have different urls for different objects (caching and all). [That's just a first reaction, I haven't thought about it deeply]
- Some sites don't wish to have their past Talk/Discussion pages
accessible via Memento. We have the ability to exclude namespaces (Talk, Template, Category, etc.) via configurable option. By default it excludes nothing. What namespaces should be excluded by default?
That's going to be a political issue that varries by project probably. As a first approximation maybe default only to things in $wgContentNamespaces.
Thanks in advance for any advice, assistance, further discussion, and criticism on these and other topics.
Shawn M. Jones Graduate Research Assistant Department of Computer Science Old Dominion University
[1] http://www.mementoweb.org/guide/rfc/ID/#Pattern6 [2] http://www.mementoweb.org/guide/rfc/ID/#Pattern1.1 [3] http://www.mementoweb.org/guide/rfc/ID/#Pattern1.2 _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Good luck on developing your extension. Last of all I don't want to sound negative, but please keep in mind that if your goal is deployment on Wikipedia, that is not just a technical issue, but also a political one, and a goal that is rather hard to accomplish...
Cheers, Brian
_______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Shawn Jones sjone@cs.odu.edu wrote:
- The Memento protocol has a resource called a TimeMap [1]
that takes an article name and returns text formatted as application/link-format. This text contains a machine-readable list of all of the prior revisions (mementos) of this page. It is currently implemented as a SpecialPage which can be accessed like http://www.example.com/index.php/Special:TimeMap/Article_Name. Is this the best method, or is it more preferable for us to extend the Action class and add a new action to $wgActions in order to return a TimeMap from the regular page like http://www.example.com/index.php?title=Article_Name&action=gettimemap without using the SpecialPage? Is there another preferred way of solving this problem?
It just occured to be that if TimeMap were a microformat, this information could be embeded in to ?title=Article_Name&action=history itself.
Even then, if we need an additional MIME type for that maybe we could vary action=history response based on the desired MIME type (text/html or linking format).
- In order to create the correct headers for use with the Memento
protocol, we have to generate URIs. To accomplish this, we use the $wgServer global variable (through a layer of abstraction); how do we correctly handle situations if it isn't set by the installation? Is there an alternative? Is there a better way to construct URIs?
We have wfExpandUrl (yes, there are some bugs currently wrt empty $wgServer now... https://bugzilla.wikimedia.org/show_bug.cgi?id=54950).
- Is there a way to get previous revisions of embedded content, like
images? I tried using the ImageBeforeProduceHTML hook, but found that setting the $time parameter didn't return a previous revision of an image. Am I doing something wrong? Is there a better way?
I'm not in a position to give you a full answer, but what I would do I would try to see if I can setup a MediaWiki with $wgInstantCommons = true and see how I can make ForeignAPIRepo to fetch older revisions from Wikimedia via API. Then we can have a look at other media storage backends, including those used by WMF installation.
- We have two styles for serving pages back to the user:
Which of these styles is preferable as a default?
- 302-style[2], which uses a 302 redirect to tell the user's browser to go fetch the old revision of the page (e.g. http://www.example.com/index.php?title=Article&oldid=12345)
- 200-style[3], which actually modifies the page content in place so that it resembles the old revision of the page
I guess that 302 is better. Sounds like a much better idea due to caching (to me).
- Some sites don't wish to have their past Talk/Discussion pages
accessible via Memento. We have the ability to exclude namespaces (Talk, Template, Category, etc.) via configurable option. By default it excludes nothing. What namespaces should be excluded by default?
There might be interesting issues about deleted content, some people feel very strongly about making it unavailable to others (partly due to some legal issues); some people setup wikis dedicated to provide content deleted from Wikipedia. Are you sure history should not be redacted at times? :-)
Not sure why somebody does not like archiving Talk pages like this but I think this feature could be enabled per-namespace like many others in MediaWiki. Archiving media and files will be certainly different and you will run into interesting issues with versioning Categories and Templates. Extension:FlaggedRevs has some method to track what kind of ancilliary content has been modified (FRInclusionManager.php and FRInclusionCache.php might be things to look at).
And a question back to you:
How are you going to handle versioning of stuff like MediaWiki:Common.js, MediaWiki:Common.css independently of the proper content itself? Some changes might affect presentation of the content meaningfully, for example see how https://en.wikipedia.org/wiki/Template:Nts works.
If you don't know already, PediaPress developed generator of static documents out of wiki content (http://code.pediapress.com/, see Extension:Collection) and they had to deal with lots of similar issues in their renderer, mwlib. The renderer accesses the wiki as a client and fetches all ancillary content as needed.
//Saper
On 11/1/13, Marcin Cieslak saper@saper.info wrote:
I'm not in a position to give you a full answer, but what I would do I would try to see if I can setup a MediaWiki with $wgInstantCommons = true and see how I can make ForeignAPIRepo to fetch older revisions from Wikimedia via API. Then we can have a look at other media storage backends, including those used by WMF installation.
For what its worth, currently ForeignAPIRepo is marked as a repo not supporting "old" versions of files. However in terms of the interface, its probably pretty straightforward to change that. All one needs to do is implement some methods to get old versions of files. I assume the original reason is that that is a lot of extra API requests for something most people don't care about. (If we did this, I imagine we'd want it disabled by default, and configurable as a Repo option, as the average instant commons user doesn't need that, and it would slow things down)
---bawolff
Thanks Marcin for the response.
I've provided comments and questions inline, where I have them.
On Nov 1, 2013, at 6:51 PM, Marcin Cieslak saper@saper.info wrote:
Shawn Jones sjone@cs.odu.edu wrote:
- The Memento protocol has a resource called a TimeMap [1]
that takes an article name and returns text formatted as application/link-format. This text contains a machine-readable list of all of the prior revisions (mementos) of this page. It is currently implemented as a SpecialPage which can be accessed like http://www.example.com/index.php/Special:TimeMap/Article_Name. Is this the best method, or is it more preferable for us to extend the Action class and add a new action to $wgActions in order to return a TimeMap from the regular page like http://www.example.com/index.php?title=Article_Name&action=gettimemap without using the SpecialPage? Is there another preferred way of solving this problem?
It just occured to be that if TimeMap were a microformat, this information could be embeded in to ?title=Article_Name&action=history itself.
Even then, if we need an additional MIME type for that maybe we could vary action=history response based on the desired MIME type (text/html or linking format).
It would be excellent to have it available as a Microformat. We had not considered it.
The way the Memento framework operates, these TimeMaps are directly accessible resources (e.g. GET http://example/TimeMap) and no additional processing is performed to extract them.
I'm glad you brought up the action=history. One of the ideas we had discussed was actually varying action=history with an additional set of arguments to produce the TimeMap.
We were concerned with what best fit into MediaWiki's future plans/goals/philosophy.
- In order to create the correct headers for use with the Memento
protocol, we have to generate URIs. To accomplish this, we use the $wgServer global variable (through a layer of abstraction); how do we correctly handle situations if it isn't set by the installation? Is there an alternative? Is there a better way to construct URIs?
We have wfExpandUrl (yes, there are some bugs currently wrt empty $wgServer now... https://bugzilla.wikimedia.org/show_bug.cgi?id=54950).
Actually, looking at our code, we have used wfExpandUrl, but can likely use it on the few lines left that access $wgServer. My longer response to Brian Wolff now seems unnecessary.
Now that I'm looking at the docs, it states "Assumes $wgServer is correct."
If the local installation munges $wgServer in some way, and we're not using it directly, then I guess it's their responsibility to deal with the fallout?
Can I count it good if I just move our remaining lines to use wfExpandUrl?
- Is there a way to get previous revisions of embedded content, like
images? I tried using the ImageBeforeProduceHTML hook, but found that setting the $time parameter didn't return a previous revision of an image. Am I doing something wrong? Is there a better way?
I'm not in a position to give you a full answer, but what I would do I would try to see if I can setup a MediaWiki with $wgInstantCommons = true and see how I can make ForeignAPIRepo to fetch older revisions from Wikimedia via API. Then we can have a look at other media storage backends, including those used by WMF installation.
I'll look into this.
- Some sites don't wish to have their past Talk/Discussion pages
accessible via Memento. We have the ability to exclude namespaces (Talk, Template, Category, etc.) via configurable option. By default it excludes nothing. What namespaces should be excluded by default?
There might be interesting issues about deleted content, some people feel very strongly about making it unavailable to others (partly due to some legal issues); some people setup wikis dedicated to provide content deleted from Wikipedia. Are you sure history should not be redacted at times? :-)
Not sure why somebody does not like archiving Talk pages like this but I think this feature could be enabled per-namespace like many others in MediaWiki. Archiving media and files will be certainly different and you will run into interesting issues with versioning Categories and Templates. Extension:FlaggedRevs has some method to track what kind of ancilliary content has been modified (FRInclusionManager.php and FRInclusionCache.php might be things to look at).
I'll look into this.
And a question back to you:
How are you going to handle versioning of stuff like MediaWiki:Common.js, MediaWiki:Common.css independently of the proper content itself? Some changes might affect presentation of the content meaningfully, for example see how https://en.wikipedia.org/wiki/Template:Nts works.
We had not considered Common.js and Common.css yet. Our first goal was to get previous page content loaded, then move on to include previous templates. Now we're looking at images.
I see that MediaWiki:Common.css and MediaWiki:Common.js DO have revision histories, which, in theory, means that we can somehow serve up old content. Any ideas on how to access them?
Thanks for pointing this out!
If you don't know already, PediaPress developed generator of static documents out of wiki content (http://code.pediapress.com/, see Extension:Collection) and they had to deal with lots of similar issues in their renderer, mwlib. The renderer accesses the wiki as a client and fetches all ancillary content as needed.
We'll have to look at PediaPress.
I appreciate the input,
--Shawn
Hi,
No responses to your specific questions, but just to mention I worked some years ago on an extension [1] aiming at retrieving the as-exact-as-possible display of the page at a given past datetime, because the current implementation of oldid is only "past wikitext with current context (templates, images, etc.)".
I mainly implemented the retrieval of old versions of templates, but a lot of other smaller improvements could be done (MediaWiki messages, styles/JS, images, etc.). With this approach, some details are irremediably lost (e.g. number of articles at given timedate, some tricky delete-and-move actions, etc.) and additional informations would have to be recorded to retrieve more exactly the past versions.
[1] https://www.mediawiki.org/wiki/Extension:BackwardsTimeTravel
~ Seb35
Le Fri, 01 Nov 2013 20:50:06 +0100, Shawn Jones sjone@cs.odu.edu a écrit:
Hi,
I'm currently working on the Memento Extension for Mediawiki, as announced earlier today by Herbert Van de Sompel.
The goal of this extension is to work with the Memento framework, which attempts to display web pages as they appeared at a given date and time in the past.
Our goal is for this to be a collaborative effort focusing on solving issues and providing functionality in "the Wikimedia Way" as much as possible.
Without further ado, I have the following technical questions (I apologize in advance for the fire hose):
- The Memento protocol has a resource called a TimeMap [1] that takes
an article name and returns text formatted as application/link-format. This text contains a machine-readable list of all of the prior revisions (mementos) of this page. It is currently implemented as a SpecialPage which can be accessed like http://www.example.com/index.php/Special:TimeMap/Article_Name. Is this the best method, or is it more preferable for us to extend the Action class and add a new action to $wgActions in order to return a TimeMap from the regular page like http://www.example.com/index.php?title=Article_Name&action=gettimemap without using the SpecialPage? Is there another preferred way of solving this problem?
- We currently make several database calls using the the select method
of the Database Object. After some research, we realized that Mediawiki provides some functions that do what we need without making these database calls directly. One of these needs is to acquire the oldid and timestamp of the first revision of a page, which can be done using Title->getFirstRevision()->getId() and Title->getFirstRevision()->getTimestamp() methods. Is there a way to get the latest ID and latest timestamp? I see I can do Title->getLatestRevID() to get the latest revision ID; what is the best way to get the latest timestamp?
- In order to create the correct headers for use with the Memento
protocol, we have to generate URIs. To accomplish this, we use the $wgServer global variable (through a layer of abstraction); how do we correctly handle situations if it isn't set by the installation? Is there an alternative? Is there a better way to construct URIs?
- We use exceptions to indicate when showErrorPage should be run;
should the hooks that catch these exceptions and then run showErrorPage also return false?
- Is there a way to get previous revisions of embedded content, like
images? I tried using the ImageBeforeProduceHTML hook, but found that setting the $time parameter didn't return a previous revision of an image. Am I doing something wrong? Is there a better way?
- Are there any additional coding standards we should be following
besides those on the "Manual:Coding_conventions" and "Manual:Coding Conventions - Mediawiki" pages?
- We have two styles for serving pages back to the user:
- 302-style[2], which uses a 302 redirect to tell the user's
browser to go fetch the old revision of the page (e.g. http://www.example.com/index.php?title=Article&oldid=12345) * 200-style[3], which actually modifies the page content in place so that it resembles the old revision of the page Which of these styles is preferable as a default?
- Some sites don't wish to have their past Talk/Discussion pages
accessible via Memento. We have the ability to exclude namespaces (Talk, Template, Category, etc.) via configurable option. By default it excludes nothing. What namespaces should be excluded by default?
Thanks in advance for any advice, assistance, further discussion, and criticism on these and other topics.
Shawn M. Jones Graduate Research Assistant Department of Computer Science Old Dominion University
[1] http://www.mementoweb.org/guide/rfc/ID/#Pattern6 [2] http://www.mementoweb.org/guide/rfc/ID/#Pattern1.1 [3] http://www.mementoweb.org/guide/rfc/ID/#Pattern1.2 _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Seb35,
I came across your extension a month ago. Ours is different in that it is also implementing the Memento protocol as used by the Internet Archive, Archive-It, and others.
I do however, appreciate your insight in trying to solve many of the same problems. I, too, was trying to address the retrieval of old versions of templates, which brought me to your extension. Your use of BeforeParserFetchTemplateAndtitle inspired parts of our Template solution.
We're currently trying to figure out how to handle images.
What did you mean by MediaWiki messages? Are you referring to the Messages API as part of I18N?
Thanks again,
--Shawn
On Nov 1, 2013, at 9:50 PM, Seb35 seb35wikipedia@gmail.com wrote:
Hi,
No responses to your specific questions, but just to mention I worked some years ago on an extension [1] aiming at retrieving the as-exact-as-possible display of the page at a given past datetime, because the current implementation of oldid is only "past wikitext with current context (templates, images, etc.)".
I mainly implemented the retrieval of old versions of templates, but a lot of other smaller improvements could be done (MediaWiki messages, styles/JS, images, etc.). With this approach, some details are irremediably lost (e.g. number of articles at given timedate, some tricky delete-and-move actions, etc.) and additional informations would have to be recorded to retrieve more exactly the past versions.
[1] https://www.mediawiki.org/wiki/Extension:BackwardsTimeTravel
~ Seb35
Le Fri, 01 Nov 2013 20:50:06 +0100, Shawn Jones sjone@cs.odu.edu a écrit:
Hi,
I'm currently working on the Memento Extension for Mediawiki, as announced earlier today by Herbert Van de Sompel.
The goal of this extension is to work with the Memento framework, which attempts to display web pages as they appeared at a given date and time in the past.
Our goal is for this to be a collaborative effort focusing on solving issues and providing functionality in "the Wikimedia Way" as much as possible.
Without further ado, I have the following technical questions (I apologize in advance for the fire hose):
The Memento protocol has a resource called a TimeMap [1] that takes an article name and returns text formatted as application/link-format. This text contains a machine-readable list of all of the prior revisions (mementos) of this page. It is currently implemented as a SpecialPage which can be accessed like http://www.example.com/index.php/Special:TimeMap/Article_Name. Is this the best method, or is it more preferable for us to extend the Action class and add a new action to $wgActions in order to return a TimeMap from the regular page like http://www.example.com/index.php?title=Article_Name&action=gettimemap without using the SpecialPage? Is there another preferred way of solving this problem?
We currently make several database calls using the the select method of the Database Object. After some research, we realized that Mediawiki provides some functions that do what we need without making these database calls directly. One of these needs is to acquire the oldid and timestamp of the first revision of a page, which can be done using Title->getFirstRevision()->getId() and Title->getFirstRevision()->getTimestamp() methods. Is there a way to get the latest ID and latest timestamp? I see I can do Title->getLatestRevID() to get the latest revision ID; what is the best way to get the latest timestamp?
In order to create the correct headers for use with the Memento protocol, we have to generate URIs. To accomplish this, we use the $wgServer global variable (through a layer of abstraction); how do we correctly handle situations if it isn't set by the installation? Is there an alternative? Is there a better way to construct URIs?
We use exceptions to indicate when showErrorPage should be run; should the hooks that catch these exceptions and then run showErrorPage also return false?
Is there a way to get previous revisions of embedded content, like images? I tried using the ImageBeforeProduceHTML hook, but found that setting the $time parameter didn't return a previous revision of an image. Am I doing something wrong? Is there a better way?
Are there any additional coding standards we should be following besides those on the "Manual:Coding_conventions" and "Manual:Coding Conventions - Mediawiki" pages?
We have two styles for serving pages back to the user:
- 302-style[2], which uses a 302 redirect to tell the user's browser to go fetch the old revision of the page (e.g. http://www.example.com/index.php?title=Article&oldid=12345)
- 200-style[3], which actually modifies the page content in place so that it resembles the old revision of the page
Which of these styles is preferable as a default?
Some sites don't wish to have their past Talk/Discussion pages accessible via Memento. We have the ability to exclude namespaces (Talk, Template, Category, etc.) via configurable option. By default it excludes nothing. What namespaces should be excluded by default?
Thanks in advance for any advice, assistance, further discussion, and criticism on these and other topics.
Shawn M. Jones Graduate Research Assistant Department of Computer Science Old Dominion University
[1] http://www.mementoweb.org/guide/rfc/ID/#Pattern6 [2] http://www.mementoweb.org/guide/rfc/ID/#Pattern1.1 [3] http://www.mementoweb.org/guide/rfc/ID/#Pattern1.2 _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Le Sat, 02 Nov 2013 21:15:01 +0100, Shawn Jones sjone@cs.odu.edu a écrit :
Seb35,
I came across your extension a month ago. Ours is different in that it is also implementing the Memento protocol as used by the Internet Archive, Archive-It, and others.
I do however, appreciate your insight in trying to solve many of the same problems. I, too, was trying to address the retrieval of old versions of templates, which brought me to your extension. Your use of BeforeParserFetchTemplateAndtitle inspired parts of our Template solution.
We're currently trying to figure out how to handle images.
What did you mean by MediaWiki messages? Are you referring to the Messages API as part of I18N?
In my attempt to recreate the more exact previous display of a past version, I thought about retrieving the old version of interface messages (whose the stylesheet MediaWiki:Common.css/js et others stylesheets); I thought also to modifications of LocalSettings.php and MediaWiki version in order to recreate previous bugs, but this would be quite difficult and probably not very interesting from a user point of view (and probably not secure).
Seb35
Thanks again,
--Shawn
On Nov 1, 2013, at 9:50 PM, Seb35 seb35wikipedia@gmail.com wrote:
Hi,
No responses to your specific questions, but just to mention I worked some years ago on an extension [1] aiming at retrieving the as-exact-as-possible display of the page at a given past datetime, because the current implementation of oldid is only "past wikitext with current context (templates, images, etc.)".
I mainly implemented the retrieval of old versions of templates, but a lot of other smaller improvements could be done (MediaWiki messages, styles/JS, images, etc.). With this approach, some details are irremediably lost (e.g. number of articles at given timedate, some tricky delete-and-move actions, etc.) and additional informations would have to be recorded to retrieve more exactly the past versions.
[1] https://www.mediawiki.org/wiki/Extension:BackwardsTimeTravel
~ Seb35
Le Fri, 01 Nov 2013 20:50:06 +0100, Shawn Jones sjone@cs.odu.edu a écrit:
Hi,
I'm currently working on the Memento Extension for Mediawiki, as announced earlier today by Herbert Van de Sompel.
The goal of this extension is to work with the Memento framework, which attempts to display web pages as they appeared at a given date and time in the past.
Our goal is for this to be a collaborative effort focusing on solving issues and providing functionality in "the Wikimedia Way" as much as possible.
Without further ado, I have the following technical questions (I apologize in advance for the fire hose):
- The Memento protocol has a resource called a TimeMap [1] that
takes an article name and returns text formatted as application/link-format. This text contains a machine-readable list of all of the prior revisions (mementos) of this page. It is currently implemented as a SpecialPage which can be accessed like http://www.example.com/index.php/Special:TimeMap/Article_Name. Is this the best method, or is it more preferable for us to extend the Action class and add a new action to $wgActions in order to return a TimeMap from the regular page like http://www.example.com/index.php?title=Article_Name&action=gettimemap without using the SpecialPage? Is there another preferred way of solving this problem?
- We currently make several database calls using the the select
method of the Database Object. After some research, we realized that Mediawiki provides some functions that do what we need without making these database calls directly. One of these needs is to acquire the oldid and timestamp of the first revision of a page, which can be done using Title->getFirstRevision()->getId() and Title->getFirstRevision()->getTimestamp() methods. Is there a way to get the latest ID and latest timestamp? I see I can do Title->getLatestRevID() to get the latest revision ID; what is the best way to get the latest timestamp?
- In order to create the correct headers for use with the Memento
protocol, we have to generate URIs. To accomplish this, we use the $wgServer global variable (through a layer of abstraction); how do we correctly handle situations if it isn't set by the installation? Is there an alternative? Is there a better way to construct URIs?
- We use exceptions to indicate when showErrorPage should be run;
should the hooks that catch these exceptions and then run showErrorPage also return false?
- Is there a way to get previous revisions of embedded content, like
images? I tried using the ImageBeforeProduceHTML hook, but found that setting the $time parameter didn't return a previous revision of an image. Am I doing something wrong? Is there a better way?
- Are there any additional coding standards we should be following
besides those on the "Manual:Coding_conventions" and "Manual:Coding Conventions - Mediawiki" pages?
- We have two styles for serving pages back to the user:
- 302-style[2], which uses a 302 redirect to tell the user's
browser to go fetch the old revision of the page (e.g. http://www.example.com/index.php?title=Article&oldid=12345) * 200-style[3], which actually modifies the page content in place so that it resembles the old revision of the page Which of these styles is preferable as a default?
- Some sites don't wish to have their past Talk/Discussion pages
accessible via Memento. We have the ability to exclude namespaces (Talk, Template, Category, etc.) via configurable option. By default it excludes nothing. What namespaces should be excluded by default?
Thanks in advance for any advice, assistance, further discussion, and criticism on these and other topics.
Shawn M. Jones Graduate Research Assistant Department of Computer Science Old Dominion University
[1] http://www.mementoweb.org/guide/rfc/ID/#Pattern6 [2] http://www.mementoweb.org/guide/rfc/ID/#Pattern1.1 [3] http://www.mementoweb.org/guide/rfc/ID/#Pattern1.2 _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hi Shawn,
Thanks for starting this discussion!
Other than the suggestions that've been provided, how are you looking for the WMF to help you with this extension? Our engineers are very limited on time, so it might be helpful to hear from you about how you'd like us to help.
Thanks, Dan
On 1 November 2013 19:50, Shawn Jones sjone@cs.odu.edu wrote:
Hi,
I'm currently working on the Memento Extension for Mediawiki, as announced earlier today by Herbert Van de Sompel.
The goal of this extension is to work with the Memento framework, which attempts to display web pages as they appeared at a given date and time in the past.
Our goal is for this to be a collaborative effort focusing on solving issues and providing functionality in "the Wikimedia Way" as much as possible.
Without further ado, I have the following technical questions (I apologize in advance for the fire hose):
- The Memento protocol has a resource called a TimeMap [1] that takes an
article name and returns text formatted as application/link-format. This text contains a machine-readable list of all of the prior revisions (mementos) of this page. It is currently implemented as a SpecialPage which can be accessed like http://www.example.com/index.php/Special:TimeMap/Article_Name. Is this the best method, or is it more preferable for us to extend the Action class and add a new action to $wgActions in order to return a TimeMap from the regular page like http://www.example.com/index.php?title=Article_Name&action=gettimemapwit... using the SpecialPage? Is there another preferred way of solving this problem?
- We currently make several database calls using the the select method
of the Database Object. After some research, we realized that Mediawiki provides some functions that do what we need without making these database calls directly. One of these needs is to acquire the oldid and timestamp of the first revision of a page, which can be done using Title->getFirstRevision()->getId() and Title->getFirstRevision()->getTimestamp() methods. Is there a way to get the latest ID and latest timestamp? I see I can do Title->getLatestRevID() to get the latest revision ID; what is the best way to get the latest timestamp?
- In order to create the correct headers for use with the Memento
protocol, we have to generate URIs. To accomplish this, we use the $wgServer global variable (through a layer of abstraction); how do we correctly handle situations if it isn't set by the installation? Is there an alternative? Is there a better way to construct URIs?
- We use exceptions to indicate when showErrorPage should be run; should
the hooks that catch these exceptions and then run showErrorPage also return false?
- Is there a way to get previous revisions of embedded content, like
images? I tried using the ImageBeforeProduceHTML hook, but found that setting the $time parameter didn't return a previous revision of an image. Am I doing something wrong? Is there a better way?
- Are there any additional coding standards we should be following
besides those on the "Manual:Coding_conventions" and "Manual:Coding Conventions - Mediawiki" pages?
- We have two styles for serving pages back to the user:
- 302-style[2], which uses a 302 redirect to tell the user's
browser to go fetch the old revision of the page (e.g. http://www.example.com/index.php?title=Article&oldid=12345) * 200-style[3], which actually modifies the page content in place so that it resembles the old revision of the page Which of these styles is preferable as a default?
- Some sites don't wish to have their past Talk/Discussion pages
accessible via Memento. We have the ability to exclude namespaces (Talk, Template, Category, etc.) via configurable option. By default it excludes nothing. What namespaces should be excluded by default?
Thanks in advance for any advice, assistance, further discussion, and criticism on these and other topics.
Shawn M. Jones Graduate Research Assistant Department of Computer Science Old Dominion University
[1] http://www.mementoweb.org/guide/rfc/ID/#Pattern6 [2] http://www.mementoweb.org/guide/rfc/ID/#Pattern1.1 [3] http://www.mementoweb.org/guide/rfc/ID/#Pattern1.2 _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hi Dan,
Thank you very much for your offer of assistance form the WMF. We have several issues that need to be addressed.
1. Completely eliminating the use of Mediawiki's global variables.
In our extension, we have eliminated the use of all of Mediawiki's global variables except $wgScriptPath. We use it to construct the URIs for the Memento headers with the wfAppendQuery and wfExpandUrl functions. Is there a better way to get the full URI for the Mediawiki installation (including the 'index.php' part of the path) without resorting to this variable so we can reconstruct the URIs of past articles?
2. Test installations
We were hoping one of your test Wikipedia instances was available so that the community could experiment with our extension further.
3. How best to handle performance testing
We are planning on conducting performance testing, either at Los Alamos, Old Dominion University, or one of the test Wikipedia instances, and wanted your input to determine what credible experiments should we set up to demonstrate the performance impact of our extension on a Mediawiki installation.
Our plan was to the following test groups: 1. no Memento Mediawiki extension installed - access to current and old revision (memento) pages 2. no Memento Mediawiki extension installed - using a screen scraping script to simulate the use of the history pages associated with each article in a way that attempts to achieve the goals of Memento, but only via Mediawiki's native UI 3. no Memento Mediawiki extension installed - use of Mediawiki's existing XML api to achieve the same goals of Memento 4. use of our native Memento Mediawiki extension with only the mandatory headers - access to current and old revision (memento) pages 5. use of our native Memento Mediawiki extension with only the mandatory headers - with the focus on performing time negotiation and acquiring the correct revision 6. use of our native Memento Mediawiki extension with all headers - access to current and old revision (memento) pages 7. use of our native Memento Mediawiki extension with all headers - again focusing on time negotiation
During each of these test runs, we would use a utility like vmstat, iostat, and/or collectl to measure load on the system, including memory/disk access, and compare the results across multiple runs.
Also, are there pre-existing tools for testing Mediawiki that we should be using and is there anything we are missing with our methodology?
3. Architectural feedback to ensure that we've followed Mediawiki's best practices
Our extension is more object-oriented than its first incarnation, utilizing a mediator pattern, strategy pattern, template methods and factory methods to achieve its goals. I can generate a simplified inheritance diagram to show the relationships, but was wondering if we should trim down the levels of inheritance for performance reasons.
4. Advice on how best to market this extension
We can advertise the extension on the wikitech-l and mediawiki-l lists, and do have a Mediawiki Extension page, but were wondering if there were conferences, web sites, etc. that could be used to help get the word out that our extension is available for use, review, input, and further extension. Any advice would be most helpful.
Thanks in advance,
Shawn M. Jones Graduate Research Assistant Department of Computer Science Old Dominion University ________________________________________ From: wikitech-l-bounces@lists.wikimedia.org [wikitech-l-bounces@lists.wikimedia.org] on behalf of Dan Garry [dgarry@wikimedia.org] Sent: Monday, November 11, 2013 5:47 PM To: Wikimedia developers Subject: Re: [Wikitech-l] Memento Extension for MediaWiki: Advice on Further Development
Hi Shawn,
Thanks for starting this discussion!
Other than the suggestions that've been provided, how are you looking for the WMF to help you with this extension? Our engineers are very limited on time, so it might be helpful to hear from you about how you'd like us to help.
Thanks, Dan
On 1 November 2013 19:50, Shawn Jones sjone@cs.odu.edu wrote:
Hi,
I'm currently working on the Memento Extension for Mediawiki, as announced earlier today by Herbert Van de Sompel.
The goal of this extension is to work with the Memento framework, which attempts to display web pages as they appeared at a given date and time in the past.
Our goal is for this to be a collaborative effort focusing on solving issues and providing functionality in "the Wikimedia Way" as much as possible.
Without further ado, I have the following technical questions (I apologize in advance for the fire hose):
- The Memento protocol has a resource called a TimeMap [1] that takes an
article name and returns text formatted as application/link-format. This text contains a machine-readable list of all of the prior revisions (mementos) of this page. It is currently implemented as a SpecialPage which can be accessed like http://www.example.com/index.php/Special:TimeMap/Article_Name. Is this the best method, or is it more preferable for us to extend the Action class and add a new action to $wgActions in order to return a TimeMap from the regular page like http://www.example.com/index.php?title=Article_Name&action=gettimemapwit... using the SpecialPage? Is there another preferred way of solving this problem?
- We currently make several database calls using the the select method
of the Database Object. After some research, we realized that Mediawiki provides some functions that do what we need without making these database calls directly. One of these needs is to acquire the oldid and timestamp of the first revision of a page, which can be done using Title->getFirstRevision()->getId() and Title->getFirstRevision()->getTimestamp() methods. Is there a way to get the latest ID and latest timestamp? I see I can do Title->getLatestRevID() to get the latest revision ID; what is the best way to get the latest timestamp?
- In order to create the correct headers for use with the Memento
protocol, we have to generate URIs. To accomplish this, we use the $wgServer global variable (through a layer of abstraction); how do we correctly handle situations if it isn't set by the installation? Is there an alternative? Is there a better way to construct URIs?
- We use exceptions to indicate when showErrorPage should be run; should
the hooks that catch these exceptions and then run showErrorPage also return false?
- Is there a way to get previous revisions of embedded content, like
images? I tried using the ImageBeforeProduceHTML hook, but found that setting the $time parameter didn't return a previous revision of an image. Am I doing something wrong? Is there a better way?
- Are there any additional coding standards we should be following
besides those on the "Manual:Coding_conventions" and "Manual:Coding Conventions - Mediawiki" pages?
- We have two styles for serving pages back to the user:
- 302-style[2], which uses a 302 redirect to tell the user's
browser to go fetch the old revision of the page (e.g. http://www.example.com/index.php?title=Article&oldid=12345) * 200-style[3], which actually modifies the page content in place so that it resembles the old revision of the page Which of these styles is preferable as a default?
- Some sites don't wish to have their past Talk/Discussion pages
accessible via Memento. We have the ability to exclude namespaces (Talk, Template, Category, etc.) via configurable option. By default it excludes nothing. What namespaces should be excluded by default?
Thanks in advance for any advice, assistance, further discussion, and criticism on these and other topics.
Shawn M. Jones Graduate Research Assistant Department of Computer Science Old Dominion University
[1] http://www.mementoweb.org/guide/rfc/ID/#Pattern6 [2] http://www.mementoweb.org/guide/rfc/ID/#Pattern1.1 [3] http://www.mementoweb.org/guide/rfc/ID/#Pattern1.2 _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- Dan Garry Associate Product Manager for Platform Wikimedia Foundation _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org