Hi,
I think the problem is that langlinks did/can/could use &#xxxx;
https://no.wikipedia.org/w/index.php?title=Wikipedia:Om&diff=9814748&am…
However, those &'s dont appear to be in the current API langlinks
results for the old revision.
https://no.wikipedia.org/w/api.php?action=query&prop=langlinks&revi…
But I wouldnt be surprised if the MW 1.18 API did literally emit the
langlinks unparsed.
Just as MW API currently emits #redirect targets unparsed.
On Fri, Mar 13, 2015 at 7:42 AM, Fabian Neundorf
<CommodoreFabianus(a)gmx.de> wrote:
I've been working on html2unicode in the last days
and I stumbled upon
the fact that a & also works as a normal ampersand, so that
&amp; for example gets converted into &. Now the commit which
introduced it into core (fc61025 [1]) is not really descriptive so I
searched in compat's code and found the corresponding commit f97dfb0
[2].
There it links to the discussion on @xqt's talk page [3] which doesn't
really explain what is happening there. The API never returns HTML
entities unless it's the content of a page. I've been testing [4] such
a link and [[&]] does work but not [[&amp;]]. Also the entitey
gets properly encoded, but [[&nbsp;]] also only once.
My question here is why is it necessary and especially in core which
only does API requests which shouldn't suffer from such a problem it
could be changed probably. The only reason I see if something is
decoding text improperly and converts into &nbsp; which
shouldn't be our concern.
Fabian
[1]:
https://github.com/wikimedia/pywikibot-core/commit/fc6102527e4c556cd77aa877…
[2]:
https://git.wikimedia.org/blobdiff/pywikibot%2Fcompat.git/f97dfb0d1ca49751c…
[3]:
https://de.wikipedia.org/w/index.php?title=Benutzer_Diskussion%3AXqt&ac…
[4]:
https://en.wikipedia.org/wiki/User:XZise/linktest
_______________________________________________
Pywikipedia-l mailing list
Pywikipedia-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
--
John Vandenberg