Dear all, I have encountered a problem when I invoke Wiki API in C# by "POST". The result I get is that:
<?xml version="1.0" ?> - <api> - <query> - <allpages> <p pageid="290" ns="0" title="A" /> <p pageid="13547196" ns="0" title="A"" /> <p pageid="9068190" ns="0" title="A"H" /> <p pageid="9068184" ns="0" title="A"h" /> <p pageid="9192091" ns="0" title="A$" /> <p pageid="27551355" ns="0" title="A$$hole: How I Got Rich & Happy by Not Giving a Damn About Anyone & How You Can, Too" /> <p pageid="27551358" ns="0" title="A$$hole: How I Got Rich and Happy by Not Giving a Shit About You" /> <p pageid="3566260" ns="0" title="A&A" /> <p pageid="11298846" ns="0" title="A&AEE" /> <p pageid="24081644" ns="0" title="A&AS" /> <p pageid="28397693" ns="0" title="A&A (disambiguation)" /> <p pageid="20546645" ns="0" title="A&A Building" />
My code: String WikiURL = "http://en.wikipedia.org/w/api.php"; Encoding myEncoding = Encoding.GetEncoding("UTF-8");
Uri url = new Uri(WikiURL); HttpWebRequest http = (HttpWebRequest)HttpWebRequest.Create(url); http.Method = "POST"; http.ContentType = "application/x-www-form-urlencoded"; http.UserAgent = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; InfoPath.3; MS-RTC LM 8)"; http.Accept = "image/jpeg, application/x-ms-application, image/gif, application/xaml+xml, image/pjpeg, application/x-ms-xbap, application/x-shockwave-flash, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */*"; http.Headers.Add("Accept-Language","en-us"); http.Headers.Add("Accept-Charset","utf-8"); byte[] bytePostDdata = Encoding.UTF8.GetBytes(postData); http.ContentLength = bytePostDdata.Length; using (Stream postStream = http.GetRequestStream()) { postStream.Write(bytePostDdata, 0, bytePostDdata.Length); postStream.Close(); }
HttpWebResponse response = http.GetResponse() as HttpWebResponse; StreamReader stream = new StreamReader(response.GetResponseStream(), myEncoding); string result = stream.ReadToEnd();
response.Close(); stream.Close();
It's wrong; first the pageid is not right ; sencond, the title is not available in the wiki. The encoding and decoding both use "utf-8". I have tried many methods, but I still can't solve it. Please give me a help. I need that. But when I access wiki in IE8.0 or firefox by "GET", the result is right. I can't get where is wrong. -----Original Message----- From: mediawiki-api-bounces@lists.wikimedia.org [mailto:mediawiki-api-bounces@lists.wikimedia.org] On Behalf Of mediawiki-api-request@lists.wikimedia.org Sent: Tuesday, January 25, 2011 10:35 AM To: mediawiki-api@lists.wikimedia.org Subject: Mediawiki-api Digest, Vol 43, Issue 5
Send Mediawiki-api mailing list submissions to mediawiki-api@lists.wikimedia.org
To subscribe or unsubscribe via the World Wide Web, visit https://lists.wikimedia.org/mailman/listinfo/mediawiki-api or, via email, send a message with subject or body 'help' to mediawiki-api-request@lists.wikimedia.org
You can reach the person managing the list at mediawiki-api-owner@lists.wikimedia.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of Mediawiki-api digest..."
Today's Topics:
1. Re: Retrieving images instead of math markup (Alex Brollo) 2. What is the Full URL of the images returned by a wikipedia query... (A O) 3. Re: What is the Full URL of the images returned by a wikipedia query... (Betacommand) 4. Re: What is the Full URL of the images returned by a wikipedia query... (Brad Jorsch) 5. header intact (Zhihua Wu)
----------------------------------------------------------------------
Message: 1 Date: Thu, 20 Jan 2011 17:31:37 +0100 From: Alex Brollo alex.brollo@gmail.com Subject: Re: [Mediawiki-api] Retrieving images instead of math markup To: "MediaWiki API announcements & discussion" mediawiki-api@lists.wikimedia.org Message-ID: AANLkTik0xoW59-TGtYYkew7XAxEgNvz67FxnpRA2octR@mail.gmail.com Content-Type: text/plain; charset="iso-8859-1"
2011/1/20 Gabriel Sandor gabi.t.sandor@gmail.com
So eventually I tried this and indeed it works as expected. I have one more question though.
I've seen that most of the complex math formulas are converted into .png images with some long names - for instance, the <math>\iiiint\limits_F , dx,dy,dz,dt</math> formula (triple integral) is converted into an image with the name 49005f50f3ba2dfade3a265ebe363ee9.png. I'd like to know, is this file name unique for each formula ? And is it persisted on the wiki's server indefinitely, just like other images in articles ? To be more clear, is the triple integral formula always going to be associated to this 49005f50f3ba2dfade3a265ebe363ee9.png file ? I'm trying to implement a cache mechanism in my app that tries to also deal with images generated from math markup (besides usual images in articles), that's why I have this curiosity. I'd like to know if I can safely associate a math markup string with a file name so that there's no need to retrieve the image from the server again when I encounter that
formula.
On Mon, Dec 13, 2010 at 1:23 PM, Roan Kattouw
roan.kattouw@gmail.comwrote:
2010/12/13 Gabriel Sandor gabi.t.sandor@gmail.com:
Is it possible to retrieve (preferably via the MediaWiki API) an image representing a mathematical formula given in the <math> tags that are frequently encountered in Wikipedia articles ?
There's no direct way to do this, although I guess it could be implemented. A workaround would be to do something like http://en.wikipedia.org/w/api.php?action=parse&text= <math>\gamma=\frac{1}{\sqrt{1-v^2}}</math>&format=yamlfm , which will give you the HTML generated for this <math> tag, which could be an image (like in this case), or HTML if the formula is sufficiently simple (try a^2+b^2=c^2 for instance).
Roan Kattouw (Catrope)
Yes, the name is unique, I discovered by reverse engineering that it is merely "the MD5 transformation of the normalized TeX code". An intelligent trick that, I guess, points directly on the png image without any need to calculate it again: I presume that the name only is calculated, and, if the png exists, it is uploaded! The "normalized TeX code", I guess, is the text that you can see browsing the html code, into the "alt" attribute of the image tag.
There are online free MD5 coders somewhere into the web, try the conversion of alternate text.
Nothing of this is documented, I discovered it by myself; can be, I'm absolutely wrong. :-)
Alex
2011/1/25 Zhihua Wu wuzhh01@gmail.com:
Dear all, I have encountered a problem when I invoke Wiki API in C# by "POST". The result I get is that:
What was the URL you used? Which POST parameters did you use?
Roan Kattouw (Catrope)
Zhihua Wu wrote:
Dear all, I have encountered a problem when I invoke Wiki API in C# by "POST". The result I get is that:
<?xml version="1.0" ?>
<api>
<query>
<allpages> <p pageid="290" ns="0" title="A" /> <p pageid="13547196" ns="0" title="A"" /> <p pageid="9068190" ns="0" title="A"H" /> <p pageid="9068184" ns="0" title="A"h" /> <p pageid="9192091" ns="0" title="A$" /> <p pageid="27551355" ns="0" title="A$$hole: How I Got Rich & Happy by Not
Giving a Damn About Anyone & How You Can, Too" />
<p pageid="27551358" ns="0" title="A$$hole: How I Got Rich and Happy by Not Giving a Shit About You" /> <p pageid="3566260" ns="0" title="A&A" /> <p pageid="11298846" ns="0" title="A&AEE" /> <p pageid="24081644" ns="0" title="A&AS" /> <p pageid="28397693" ns="0" title="A&A (disambiguation)" /> <p pageid="20546645" ns="0" title="A&A Building" />
This looks like the result of http://en.wikipedia.org/w/api.php?action=query&list=allpages&apprefi... Looks a bit odd, as you are posting xml with the entities unescaped.
http.UserAgent = "Mozilla/4.0 (compatible; MSIE 8.0; Windows
NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; InfoPath.3; MS-RTC LM 8)";
Don't try to disguise yourself as a browser. It is considered of bad education and can lead to being blocked. Use a specific UserAgent which allow to contact you if your bot breaks something.
It's wrong; first the pageid is not right ;
Why isn't it right?
sencond, the title is not available in the wiki. The encoding and decoding both use "utf-8".
Have you % encoded the special characters? Just appending "http://en.wikipedia.org/wiki/" + "A"H" won't work.
I have tried many methods, but I still can't solve it. Please give me a help. I need that. But when I access wiki in IE8.0 or firefox by "GET", the result is right. I can't get where is wrong.
You don't even state what you want...
mediawiki-api@lists.wikimedia.org