I've added a few lightly abstract methods to replace some of the direct
comparisons we make related to namespaces.
When you want to see if a title is part of a namespace, instead of writing
this:
$title->getNamespace() == NS_USER
You can now (or rather, if you don't have compat issues, PLEASE DO use it):
$title->inNamespace( NS_USER );
When you need to make a test if a page is part of a subject or a talk eg
either User or User_talk instead of something verbose like:
$title->getNamespace() == NS_USER || $title->getNamespace() == NS_USER_TALK
Please use:
$title->hasSubjectNamespace( NS_USER );
hasSubjectNamespace will return true if the title's namespace's subject
namespace matches the subject namespace of the namespace you pass in.
If you're writing verbose code testing if a title is in any of a number of
namespaces by using in_array you can use inNamespaces (note the 's'):
$title->inNamespace( NS_USER, NS_TEMPLATE );
To be honest, I don't have any good example use cases on hand of where you
would use that, but I didn't want the lack of that functionality and the
simplicity of in_array to be a valid rationale in not making use of these
abstract interfaces to namespace info.
Likewise there are two MWNamespace methods to match. MWNamespace::equals
and MWNamespace::subjectEquals.
And I DO encourage people making $ns == NS_???? comparisons to use
MWNamespace::equals( $ns, NS_???? ) instead. Even though technically right
now MWNamespace::equals is in fact `return $ns1 == $ns2;`.
This is a little relevant to the "MediaWiki should use a reservation
system for namespaces" bug:
https://bugzilla.wikimedia.org/show_bug.cgi?id=31063
The idea is essentially to drop our practice of passing around integers
and instead start passing around keys like "USER", "SMW_PROPERTY", etc...
MediaWiki would have a namespace registration system where when given a
new key it would reserve a new namespace number for that key.
Instead of extensions being forced to declare what integers they are going
to use and coordinate with other extension developers so that extensions
don't conflict they can instead just make a call to MediaWiki declaring a
string based key like "SMW_PROPERTY" which should not be confusable with
another extension and then MediaWiki will register an integer in the
database for that namespace and reserve it for use with that key. This
also has the benefit that if you install an extension then uninstall it,
you shouldn't lose the contents of the namespace, and when you re-install
it'll start working again without issues like conflicts with titles
created in NS_MAIN that match the prefix used. Theoretically changing the
content language of your wiki from say 'fr' to 'it' could be made in such
a way that MediaWiki won't break existing links and instead the old
i18n'ed NS will end up as an alias.
The idea also fits in with another bug asking for a namespace manager ui.
If we switch to a namespace registration system it will be much easier to
create an administrative ui for this.
Having these abstract interfaces for namespace comparison around will mean
that in the future if we do in fact start passing around things like
"USER" instead of integers, there should be no issue of bugs cropping up
if some code happens to come together and for an unfortunate reason you
happen to have both "USER" and `2` passed from different sources. Making
use of Title::inNamespace and MWNamespace::equals will ensure that even if
you have "USER" and `2` they will be considered equivalent. Unlike what
would happen if you'd used == directly.
--
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]
Hello all,
is there a query language for wiki syntax?
(NOTE: I really do not mean the Wikipedia API here.)
I am looking for an easy way to scrape data from Wiki pages.
In this way, we could apply a crowd-sourcing approach to knowledge
extraction from Wikis.
There must be thousands of data scraping approaches. But is there one
amongst them that has developed a "wiki scraper language" ?
Maybe with some sort of fuzziness involved, if the pages are too messy.
I have not yet worked with the XML transformation of the wiki markup:
*action=expandtemplates **
generatexml - Generate XML parse tree
Is it any good for issueing XPATH queries ?
Thank you very much,
Sebastian
--
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Projects: http://nlp2rdf.org , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org
Hi,
We discussed a bit yesterday in #mediawiki some features of toolserver and
their availability on labs in the future, I know that wikimedia labs are
still not being used for application hosting (wikipedia bots etc.) so I
think that it would be cool to expand this discussion also to this mailing
list.
I started a list of "wanted features" as Sumanah asked me for here:
http://www.mediawiki.org/wiki/WMF_Projects/Wikimedia_Labs/Toolserver_featur…
so
please insert some other requested stuff there if you think that something
is missing (I am pretty sure there are many things which I forgot to
mention).
I have absolutely no clue how this part of labs is going to be designed but
I think that having several instances shared between bot operators (similar
fashion as current toolserver is designed) with shared storage for /home
between servers would make sense. Having separate instance for each bot
operator would IMHO eat too much system resources (bots need just few mb of
ram, os needs hundreds), maybe some more complicated bots could have own
dedicated instance if there were needs to have some system customizations.
Concerning features, I told Ryan yesterday that most of users would
probably appreciate to have possibility to at least forward system mail to
their own e-mail boxes (atm most of accounts on toolserver have their mail
forwarded to account owner e-mail), however Ryan told me that there are
some security implications so I don't know if this could be possible in
future.
Another thing I forgot to mention was that toolserver application servers
allow users to access directly their own www folder, so that bots can
produce some output which can be accessed from outside (example:
http://toolserver.org/~petrb/logs) this could be also problem because
virtual servers don't have public ip, so maybe it would be cool if we had
some public www server which would be connected to same storage as /home
folders of those application instances (probably accessible directly from
wmflabs.org).
Any ideas? Thanks!
Hi guys,
I encountered an issue with my newly installed mediawiki 1.17.0. I firstly create a page called "Hello" in Main page. Then entered "Hello" page and tried to create a subpage of "Hello" by using [[/Test/]] in "Hello". But, it shows "/Test/" in "Hello" page and it actually not a subpage "Test" of "Hello", but a subpage "/Test/" of Main Page. That means I can create a subpage by suing [[/example/]]
Do you know why this issue happen? Is it a bug or did I do any wrong settings?
Thanks a lot for you help!
Yi
hi,
I want to programmatically extract lists from list pages on Wikipedia. That
is to say, if there is a page that mostly consists of a list (list of
episodes, list of presidents, etc.) I want to be able to extract the list
from the page, with article names/links. Has anyone already done this? can
anyone suggest a good strategy?
FredZ
If you haven't tried it yet, please give the release candidate a try:
http://download.wikimedia.org/mediawiki/1.18/mediawiki-1.18.0rc1.tar.gz
If you've tried it out and found a problem, please let us know.
But, if it works for you, please let us know that, too.
For example, I upgraded a MediaWiki site that I maintain from 1.15 to
1.18. Except for some trouble with the customized skin, things went
smoothly.
I added my report to
https://www.mediawiki.org/wiki/MediaWiki_roadmap/1.18/Installation_reports
but it is looking pretty lonely right now. Please add your own
experience.
Thanks,
Mark.
update.php now makes lines that are twice as long as it used to.
https://bugzilla.wikimedia.org/show_bug.cgi?id=32508 . In the past on
lines that did nothing, only half the message would be printed. Now the
user is getting overloaded with long messages about items even when nothing in
the database was changed.
Why the two layers of <a> even if it passes
$ validate http://en.wikipedia.org/wiki/Oaxtepec
*** Errors validating Oaxtepec: ***
Error at line 2, character 33: there is no attribute "class"
Error at line 152, character 10: end tag for "ul" which is not finished
Error at line 177, character 10: end tag for "ul" which is not finished
Why the </a> halfway through the first pair of
Coordinates: 18°54′N 98°58′W / 18.9°N 98.967°W / 18.9; -98.967
Why does it look fine in some browsers but ah-ha caught you in emacs-w3m?
Could it be that Firefox and Chromium are fooled into thinking that the
outer <a id...> which lasts through the whole six coordinates should
render as a clickable link... which it apparently does even with
stylesheets off. Only emacs-w3m renders it right, revealing the badly
written HTML!
Here's the code,
<p><span style="font-size: small;"><a id="coordinates"><a href="/wiki/Geographic_coordinate_system" title="Geographic coordinate system">Coordinates</a>: <span class="plainlinks nourlexpansion"><a rel="nofollow" class="external text" href="http://toolserver.org/~geohack/geohack.php?pagename=Oaxtepec&params=18_…"><span class="geo-default"><span class="geo-dms" title="Maps, aerial photos, and other data for this location"><span class="latitude">18°54′N</a> <span class="longitude">98°58′W</span></span></span><span class="geo-multi-punct"> / </span><span class="geo-nondefault"><span class="geo-dec" title="Maps, aerial photos, and other data for this location">18.9°N 98.967°W</span><span style="display:none"> / <span class="geo">18.9; -98.967</span></span></span></a></span></span></span></p>