Trung Dinh wrote:
Hi all, I have an issue why trying to parse data fetched from wikipedia api. This is the piece of code that I am using: api_url = 'http://en.wikipedia.org/w/api.php' api_params = 'action=query&list=recentchanges&rclimit=5000&rctype=edit&rcnamespace=0&rc dir=newer&format=json&rcstart=20160504022715'
f = urllib2.Request(api_url, api_params) print ('requesting ' + api_url + '?' + api_params) source = urllib2.urlopen(f, None, 300).read() source = json.loads(source)
json.loads(source) raised the following exception " Expecting , delimiter: line 1 column 817105 (char 817104"
I tried to use source.encode('utf-8') and some other encodings but they all didn't help. Do we have any workaround for that issue ? Thanks :)
Hi.
Weird, I can't reproduce this error. I had to import the "json" and "urllib2" modules, but after doing so, executing the code you provided here worked fine for me: https://phabricator.wikimedia.org/P3009.
You probably want to use 'https://en.wikipedia.org/w/api.php' as your end-point (HTTPS, not HTTP).
As far as I know, JSON is always encoded as UTF-8, so you shouldn't need to encode or decode the data explicitly.
The error you're getting generally means that the JSON was malformed for some reason. It seems unlikely that MediaWiki's api.php is outputting invalid JSON, but I suppose it's possible.
Since you're coding in Python, you may be interested in a framework such as https://github.com/alexz-enwp/wikitools.
MZMcBride