https://blog.wikimedia.org/2012/01/16/wikipedias-community-calls-for-anti-s…
Editing pages on English Wikipedia via the web service API will be
disabled for 24 hours beginning at 05:00 UTC on Wednesday, January 18,
as part of the anti-SOPA/PIPA blackout.
--
Sumana Harihareswara
Volunteer Development Coordinator
Wikimedia Foundation
I am having trouble figuring out how to give XMLStarlet the right Xpath to
query nodes in a Wikipedia XML document. This is an xpath problem, really,
not a starlet problem. I don't understand how to formulate the xpath
portion of the xmlstarlet call. Help!
curl "
> http://en.wikipedia.org/w/api.php?action=opensearch&search=Bullwinkle&names…"
> -o bullwinkle.xml
> returns:
> ?xml version="1.0"?>
> <SearchSuggestion version="2.0" xmlns="
> http://opensearch.org/searchsuggest2">
> <Query xml:space="preserve">Bullwinkle</Query>
> <Section>
> <Item>
> <Text xml:space="preserve">Bullwinkle</Text>
> <Description xml:space="preserve">Bullwinkle may refer to:</Description>
> <Url xml:space="preserve">http://en.wikipedia.org/wiki/Bullwinkle</Url>
> </Item>
> <Item>
> <Text xml:space="preserve">Bullwinkle J. Moose</Text>
> <Description xml:space="preserve">Bullwinkle J. </Description>
> <Url xml:space="preserve">http://en.wikipedia.org/wiki/Bullwinkle_J._Moose</Url>
> ...
I try:
> xmlstarlet sel -N x=http://opensearch.org/searchsuggest2 -t -v
> "count(/SearchSuggestion/Section/@Item)" bullwinkle.xml
which I want to count the items, but it won't.
WHat I am working to do is to extract the text and url values and put them
into a csv file. How to do this is explained at
http://xmlstar.sourceforge.net/doc/UG/ch04s01.html (about 2/3 way down) but
you have to know how to formulate the xpath for the source xml doc--which I
don't!
Any help would be much appreciated.
-----------------------------------------------------
Subscribe to the Nimble Books Mailing List http://eepurl.com/czS- for
monthly updates
Hello
http://en.wiktionary.org/w/api.php?format=json&action=query&titles=murky&rv…
yields json contents for the word 'murky'.
content = json_returned_content.
content
{"query":{"pages":{"54377":{"pageid":54377,"ns":0,"title":"murky","revisions":[{"*":"==English==\n\n===Etymology===\nCognate
to or directly from {{etyl|non}} {{term|myrkr}}. Compare Russian,
Serbian [[\u043c\u0440\u0430\u043a]].\n\n===Pronunciation===\n*
{{audio|en-us-murky.ogg|Audio (US)}}\n\n*
{{rhymes|\u025c\u02d0(r)ki}}\n\n===Adjective===\n{{en-adj|murkier|murkiest}}\n\n#
Hard to see through, as a fog or mist.\n# [[gloomy|Gloomy]], [[dark]],
[[dim]].\n# [[obscure|Obscure]], [[indistinct]], [[cloudy]].\n#
Dishonest, [[shady]].\n\n====Synonyms====\n* [[dark]]\n\n====Related
terms====\n* [[murk]]\n* [[murkily]]\n*
[[murkiness]]\n\n====Translations====\n{{trans-top|hard to see
through}}\n* Dutch: [[troebel]], [[troebele]]\n* Finnish:
{{t+|fi|samea}}\n* French: {{t+|fr|sombre}}, {{t+|fr|trouble}}\n*
German: {{t+|de|d\u00fcster}}, {{t+|de|tr\u00fcb}}\n{{trans-mid}}\n*
Romanian: {{t-|ro|tulbure}}\n* Russian:
{{t|ru|\u043c\u0443\u0442\u043d\u044b\u0439|tr=m\u00fatnyj}}\n*
[[Scots]]: {{t\u00f8|sco|mirk|xs=Scots}}\n{{trans-bottom}}\n{{trans-see|gloomy}}\n{{trans-see|obscure}}\n{{trans-top|dishonest,
shady}}\n* Russian:
{{t+|ru|\u0442\u0451\u043c\u043d\u044b\u0439|tr=t'\u00f3mnyj}},
{{t+|ru|\u0433\u0440\u044f\u0437\u043d\u044b\u0439|tr=gr'\u00e1znyj}}\n{{trans-mid}}\n{{trans-bottom}}\n{{checktrans-top}}\n*
{{ttbc|da}}: {{t-|da|m\u00f8rk}}, {{t-|da|dunkel}},
{{t-|da|dyster}}\n* {{ttbc|he}}: [[\u05e2\u05db\u05d5\u05e8,
\u05de\u05d8\u05d5\u05e9\u05d8\u05e9]]\n{{trans-mid}}\n* {{ttbc|is}}:
[[myrkr]]\n{{trans-bottom}}\n\n====External links====\n* {{R:Webster
1913}}\n* {{R:Century
1911}}\n\n[[et:murky]]\n[[io:murky]]\n[[kn:murky]]\n[[lt:murky]]\n[[hu:murky]]\n[[mg:murky]]\n[[ml:murky]]\n[[my:murky]]\n[[pl:murky]]\n[[ru:murky]]\n[[fi:murky]]\n[[sv:murky]]\n[[ta:murky]]\n[[te:murky]]\n[[vi:murky]]\n[[zh:murky]]"}]}}}}
content['query']['pages']['54377']['revisions][0]['*'] yields meaning
and other related contents.
I am interested to retrieve the meaning of the word. how can I do it?
In this scenario I find api to be unusable since pageid is dynamically
generated which is required to access the contents.
Yes I can use xml and find contents inside <rev></rev> tag, but how
will some one fetch synonym alone and part of speech alone?
--
*
Thanks & Regards
"Talk is cheap, show me the code" -- Linus Torvalds
kracekumar
www.kracekumar.com
*