Dear all,
I have encountered a problem when I invoke Wiki API in C# by "POST".
The result I get is that:
<?xml version="1.0" ?>
- <api>
- <query>
- <allpages>
<p pageid="290" ns="0" title="A" />
<p pageid="13547196" ns="0" title="A"" />
<p pageid="9068190" ns="0" title="A"H" />
<p pageid="9068184" ns="0" title="A"h" />
<p pageid="9192091" ns="0" title="A$" />
<p pageid="27551355" ns="0" title="A$$hole: How I Got Rich
& Happy by Not
Giving a Damn About Anyone & How You Can, Too" />
<p pageid="27551358" ns="0" title="A$$hole: How I Got Rich
and Happy by
Not Giving a Shit About You" />
<p pageid="3566260" ns="0" title="A&A" />
<p pageid="11298846" ns="0" title="A&AEE" />
<p pageid="24081644" ns="0" title="A&AS" />
<p pageid="28397693" ns="0" title="A&A
(disambiguation)" />
<p pageid="20546645" ns="0" title="A&A Building"
/>
My code:
String WikiURL = "http://en.wikipedia.org/w/api.php";
Encoding myEncoding = Encoding.GetEncoding("UTF-8");
Uri url = new Uri(WikiURL);
HttpWebRequest http =
(HttpWebRequest)HttpWebRequest.Create(url);
http.Method = "POST";
http.ContentType = "application/x-www-form-urlencoded";
http.UserAgent = "Mozilla/4.0 (compatible; MSIE 8.0; Windows
NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR
3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; InfoPath.3; MS-RTC LM
8)";
http.Accept = "image/jpeg, application/x-ms-application,
image/gif, application/xaml+xml, image/pjpeg, application/x-ms-xbap,
application/x-shockwave-flash, application/vnd.ms-excel,
application/vnd.ms-powerpoint, application/msword, */*";
http.Headers.Add("Accept-Language","en-us");
http.Headers.Add("Accept-Charset","utf-8");
byte[] bytePostDdata = Encoding.UTF8.GetBytes(postData);
http.ContentLength = bytePostDdata.Length;
using (Stream postStream = http.GetRequestStream()) {
postStream.Write(bytePostDdata, 0,
bytePostDdata.Length);
postStream.Close();
}
HttpWebResponse response = http.GetResponse() as
HttpWebResponse;
StreamReader stream = new
StreamReader(response.GetResponseStream(), myEncoding);
string result = stream.ReadToEnd();
response.Close();
stream.Close();
It's wrong; first the pageid is not right ; sencond, the title is not
available in the wiki. The encoding and decoding both use "utf-8".
I have tried many methods, but I still can't solve it. Please give me a
help. I need that.
But when I access wiki in IE8.0 or firefox by "GET", the result is right. I
can't get where is wrong.
-----Original Message-----
From: mediawiki-api-bounces(a)lists.wikimedia.org
[mailto:mediawiki-api-bounces@lists.wikimedia.org] On Behalf Of
mediawiki-api-request(a)lists.wikimedia.org
Sent: Tuesday, January 25, 2011 10:35 AM
To: mediawiki-api(a)lists.wikimedia.org
Subject: Mediawiki-api Digest, Vol 43, Issue 5
Send Mediawiki-api mailing list submissions to
mediawiki-api(a)lists.wikimedia.org
To subscribe or unsubscribe via the World Wide Web, visit
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
or, via email, send a message with subject or body 'help' to
mediawiki-api-request(a)lists.wikimedia.org
You can reach the person managing the list at
mediawiki-api-owner(a)lists.wikimedia.org
When replying, please edit your Subject line so it is more specific than
"Re: Contents of Mediawiki-api digest..."
Today's Topics:
1. Re: Retrieving images instead of math markup (Alex Brollo)
2. What is the Full URL of the images returned by a wikipedia
query... (A O)
3. Re: What is the Full URL of the images returned by a
wikipedia query... (Betacommand)
4. Re: What is the Full URL of the images returned by a
wikipedia query... (Brad Jorsch)
5. header intact (Zhihua Wu)
----------------------------------------------------------------------
Message: 1
Date: Thu, 20 Jan 2011 17:31:37 +0100
From: Alex Brollo <alex.brollo(a)gmail.com>
Subject: Re: [Mediawiki-api] Retrieving images instead of math markup
To: "MediaWiki API announcements & discussion"
<mediawiki-api(a)lists.wikimedia.org>
Message-ID:
<AANLkTik0xoW59-TGtYYkew7XAxEgNvz67FxnpRA2octR(a)mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"
2011/1/20 Gabriel Sandor <gabi.t.sandor(a)gmail.com>
So eventually I tried this and indeed it works as
expected. I have one
more question though.
I've seen that most of the complex math formulas are converted into
.png images with some long names - for instance, the
<math>\iiiint\limits_F \, dx\,dy\,dz\,dt</math> formula (triple
integral) is converted into an image with the name
49005f50f3ba2dfade3a265ebe363ee9.png. I'd like to know, is this file
name unique for each formula ? And is it persisted on the wiki's
server indefinitely, just like other images in articles ? To be more
clear, is the triple integral formula always going to be associated to
this 49005f50f3ba2dfade3a265ebe363ee9.png
file ?
I'm trying to implement a cache mechanism in my app that tries to also
deal with images generated from math markup (besides usual images in
articles), that's why I have this curiosity. I'd like to know if I can
safely associate a math markup string with a file name so that there's
no need to retrieve the image from the server again when I encounter that
formula.
On Mon, Dec 13, 2010 at 1:23 PM, Roan Kattouw
<roan.kattouw(a)gmail.com>wrote;wrote:
2010/12/13 Gabriel Sandor
<gabi.t.sandor(a)gmail.com>om>:
Is it possible to retrieve (preferably via the
MediaWiki API) an
image representing a mathematical formula given in the <math> tags
that are frequently encountered in Wikipedia articles ?
There's no direct way
to do this, although I guess it could be
implemented. A workaround would be to do something like
http://en.wikipedia.org/w/api.php?action=parse&text=
<math>\gamma=\frac{1}{\sqrt{1-v^2}}</math>&format=yamlfm
, which will give you the HTML generated for this <math> tag, which
could be an image (like in this case), or HTML if the formula is
sufficiently simple (try a^2+b^2=c^2 for instance).
Roan Kattouw (Catrope)
Yes, the name is unique, I discovered by reverse engineering that it is
merely "the MD5 transformation of the normalized TeX code". An intelligent
trick that, I guess, points directly on the png image without any need to
calculate it again: I presume that the name only is calculated, and, if the
png exists, it is uploaded! The "normalized TeX code", I guess, is the text
that you can see browsing the html code, into the "alt" attribute of the
image tag.
There are online free MD5 coders somewhere into the web, try the conversion
of alternate text.
Nothing of this is documented, I discovered it by myself; can be, I'm
absolutely wrong. :-)
Alex