Hi!
When I query the Estonian Wikipedia's Web API for the article's first sentence, I sometimes get empty response. Actually it gives back an horizontal rule and thats it.
For example: https://et.wikipedia.org/w/api.php?action=query&prop=extracts%7Ccategori...
gives only an horizontal rule as the extract: "extract": "<hr />",
Can anyone say what is happening here. Is the article's source organized in a wrong way or is it a problem on the APIs sentence parser side?
Best regards Kristian Kankainen
Sentence handling algorithm appears to suck at HTML handling. I've filed a bug for it: https://bugzilla.wikimedia.org/show_bug.cgi?id=71671 As a workaround, try plaintext extracts: https://et.wikipedia.org/w/api.php?action=query&prop=extracts%7Ccategori... or switch from requesting a number of sentences to a number of characters.
On Sun, Oct 5, 2014 at 8:34 AM, Kristian Kankainen kristian@eki.ee wrote:
Hi!
When I query the Estonian Wikipedia's Web API for the article's first sentence, I sometimes get empty response. Actually it gives back an horizontal rule and thats it.
For example: https://et.wikipedia.org/w/api.php?action=query&prop=extracts%7Ccategori... exsentences=1&redirects=&format=jsonfm&cllimit=10&exlimit=1&indexpageids=& maxlag=10&titles=järv
gives only an horizontal rule as the extract: "extract": "<hr />",
Can anyone say what is happening here. Is the article's source organized in a wrong way or is it a problem on the APIs sentence parser side?
Best regards Kristian Kankainen
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Thank you. I found the git for the code. Could you maybe tell me a bit more exactly where the sentence algorithm is (or is not but should be). I might take a look at it some day next week. The name of the relevant function and filename or something like that for pointers.
Kristian
05.10.2014 19:31, Max Semenik kirjutas:
Sentence handling algorithm appears to suck at HTML handling. I've filed a bug for it: https://bugzilla.wikimedia.org/show_bug.cgi?id=71671 As a workaround, try plaintext extracts: https://et.wikipedia.org/w/api.php?action=query&prop=extracts%7Ccategori... or switch from requesting a number of sentences to a number of characters.
On Sun, Oct 5, 2014 at 8:34 AM, Kristian Kankainen <kristian@eki.ee mailto:kristian@eki.ee> wrote:
Hi! When I query the Estonian Wikipedia's Web API for the article's first sentence, I sometimes get empty response. Actually it gives back an horizontal rule and thats it. For example: https://et.wikipedia.org/w/api.php?action=query&prop=extracts|categories&exsentences=1&redirects=&format=jsonfm&cllimit=10&exlimit=1&indexpageids=&maxlag=10&titles=järv <https://et.wikipedia.org/w/api.php?action=query&prop=extracts%7Ccategories&exsentences=1&redirects=&format=jsonfm&cllimit=10&exlimit=1&indexpageids=&maxlag=10&titles=j%C3%A4rv> gives only an horizontal rule as the extract: "extract": "<hr />", Can anyone say what is happening here. Is the article's source organized in a wrong way or is it a problem on the APIs sentence parser side? Best regards Kristian Kankainen _______________________________________________ Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org <mailto:Mediawiki-api@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
-- Best regards, Max Semenik ([[User:MaxSem]])
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
http://git.wikimedia.org/blob/mediawiki%2Fextensions%2FTextExtracts.git/mast... function getFirstSentences.
On Sun, Oct 5, 2014 at 11:16 AM, Kristian Kankainen kristian@eki.ee wrote:
Thank you. I found the git for the code. Could you maybe tell me a bit more exactly where the sentence algorithm is (or is not but should be). I might take a look at it some day next week. The name of the relevant function and filename or something like that for pointers.
Kristian
05.10.2014 19:31, Max Semenik kirjutas:
Sentence handling algorithm appears to suck at HTML handling. I've filed a bug for it: https://bugzilla.wikimedia.org/show_bug.cgi?id=71671 As a workaround, try plaintext extracts: https://et.wikipedia.org/w/api.php?action=query&prop=extracts%7Ccategori... or switch from requesting a number of sentences to a number of characters.
On Sun, Oct 5, 2014 at 8:34 AM, Kristian Kankainen kristian@eki.ee wrote:
Hi!
When I query the Estonian Wikipedia's Web API for the article's first sentence, I sometimes get empty response. Actually it gives back an horizontal rule and thats it.
For example:
https://et.wikipedia.org/w/api.php?action=query&prop=extracts%7Ccategori...
gives only an horizontal rule as the extract: "extract": "<hr />",
Can anyone say what is happening here. Is the article's source organized in a wrong way or is it a problem on the APIs sentence parser side?
Best regards Kristian Kankainen
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
-- Best regards, Max Semenik ([[User:MaxSem]])
Mediawiki-api mailing listMediawiki-api@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
mediawiki-api@lists.wikimedia.org