Hi,
I would like to flag a large number of wiki pages based on whether their HTML passes a certain test, so that failing pages can be easily listed and counted. The flags should adapt when pages are created or modified. (The specific use case is collecting file pages which do not have machine-readable author and license information embedded.)
I have been thinking of adding such pages to a maintenance category from a parser hook (the test logic is already part of the imageinfo/extmetadata API and would be easy to reuse), is that a good way to do this? If so, what's the best way to achieve it? Is it OK to just add categories as needed via $parser->getOutput()->addCategory() or can that mess up internal state such as the categorylinks table?
Alternatively, the Cite extension just parses and appends a message to the end of the text on ParserBeforeTidy when it encounters an error, and the message contains wikitext to include a category. That seems like a clever way of maintaining flexibility so it is easy to change the category name or add extra text for a call to action without any need for a code change. Is that approach safe/cheap?
thanks Gergő
On 9/14/14, Gergo Tisza gtisza@wikimedia.org wrote:
Hi,
I would like to flag a large number of wiki pages based on whether their HTML passes a certain test, so that failing pages can be easily listed and counted. The flags should adapt when pages are created or modified. (The specific use case is collecting file pages which do not have machine-readable author and license information embedded.)
I have been thinking of adding such pages to a maintenance category from a parser hook (the test logic is already part of the imageinfo/extmetadata API and would be easy to reuse), is that a good way to do this? If so, what's the best way to achieve it? Is it OK to just add categories as needed via $parser->getOutput()->addCategory() or can that mess up internal state such as the categorylinks table?
Alternatively, the Cite extension just parses and appends a message to the end of the text on ParserBeforeTidy when it encounters an error, and the message contains wikitext to include a category. That seems like a clever way of maintaining flexibility so it is easy to change the category name or add extra text for a call to action without any need for a code change. Is that approach safe/cheap?
thanks Gergő _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
There's two ways that this is usually done, either page_props table, or tracking categories. Provided the hook you use runs before linksupdate (which is any hook in the parser), you should be fine in adding such things.
To add a page property, you would do something like $parser->getOutput()->setProperty( 'prop name', 'optionally some extra arbitrary data' );
Pages can be found via Special:PagesWithProp or direct db query.
To add a tracking category: $parser->addTrackingCategory( 'tracking cat name' );
You also have to define a message for the tracking category name and a description message, add it to $wgTrackingCategories. See the code docs for Parser::addTrackingCategory and $wgTrackingCategories.
Generally page props are used for obscure things that a user is unlikely to care about or cases where you need special cache invalidation behaviour on change (there's special support for that with $wgPagePropLinkInvalidations), where tracking categories are more properties the user is interested in. Its possible to also make the tracking category by off by default until users turn it "on" by editing a mediawiki namespace page by making the category name defualt to '-'.
In the use case you describe I think tracking category is more suited.
--bawolff
Thanks for the info!
I have been unable to figure out the right place to interact with the parser, though. As far as I can see, there are no hooks between calling the parser and calling linksupdate, and the hooks which are internal to the parser have no knowledge of what they are parsing: the main wikitext or some random interface message. That's fine for extensions which use tracking categories to trace when something is broken, but I am trying to find out when something is missing; the logic would be triggered on every interface message or section preview or whatever.
Short of adding a new hook (to the end of Content::getParserOutput maybe), I don't see how this could be done.
On Mon, Sep 15, 2014 at 3:03 AM, Brian Wolff bawolff@gmail.com wrote:
On 9/14/14, Gergo Tisza gtisza@wikimedia.org wrote:
Hi,
I would like to flag a large number of wiki pages based on whether their HTML passes a certain test, so that failing pages can be easily listed
and
counted. The flags should adapt when pages are created or modified. (The specific use case is collecting file pages which do not have machine-readable author and license information embedded.)
I have been thinking of adding such pages to a maintenance category from
a
parser hook (the test logic is already part of the imageinfo/extmetadata API and would be easy to reuse), is that a good way to do this? If so, what's the best way to achieve it? Is it OK to just add categories as needed via $parser->getOutput()->addCategory() or can that mess up
internal
state such as the categorylinks table?
Alternatively, the Cite extension just parses and appends a message to
the
end of the text on ParserBeforeTidy when it encounters an error, and the message contains wikitext to include a category. That seems like a clever way of maintaining flexibility so it is easy to change the category name
or
add extra text for a call to action without any need for a code change.
Is
that approach safe/cheap?
thanks Gergő _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
There's two ways that this is usually done, either page_props table, or tracking categories. Provided the hook you use runs before linksupdate (which is any hook in the parser), you should be fine in adding such things.
To add a page property, you would do something like $parser->getOutput()->setProperty( 'prop name', 'optionally some extra arbitrary data' );
Pages can be found via Special:PagesWithProp or direct db query.
To add a tracking category: $parser->addTrackingCategory( 'tracking cat name' );
You also have to define a message for the tracking category name and a description message, add it to $wgTrackingCategories. See the code docs for Parser::addTrackingCategory and $wgTrackingCategories.
Generally page props are used for obscure things that a user is unlikely to care about or cases where you need special cache invalidation behaviour on change (there's special support for that with $wgPagePropLinkInvalidations), where tracking categories are more properties the user is interested in. Its possible to also make the tracking category by off by default until users turn it "on" by editing a mediawiki namespace page by making the category name defualt to '-'.
In the use case you describe I think tracking category is more suited.
--bawolff
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org