[Erik Moeller dropped from CC list; no response to any emails over past week on these issues.]
On Sat, 2009-12-12 at 18:39 -0800, Fabrice Florin wrote:
Hi guys,
I have added our web engineer Subbu Sastry to this thread, as he would know whether or not it's feasible for us to give you this data.
Hi Subbu!
Wikinews has a few techies who hack things together for the site. Bawolff specialises in Javascript with unusual exception errors containing creative spelling mistakes. I'm going to suggest one of his widgets for NewsTrust (later), I think you and Fabrice will both quite like it.
We've also Jon (ShakataGaNai) who helps with a number of other coding things. Some less-noticeable people also help with other bits and pieces.
Personally I've little experience with developing server-side things like a web API; but, does 20+ years as a systems analyst help? Mostly working on Billing Systems and Enterprise Resource Planning, I did do some XML-spewing designs for that. I'd probably have no problems with looking at your database structures and identifying the information I think most useful to expose in an API. Happy to drop Fabrice and some of the other CC'd people from that discussion. If there's a need for a non-disclosure agreement to get at such data it won't be the first time I've signed one.
We don't yet have a full API, though our widgets function a bit like an API.
One new widget we have been considering is a rating widget which a third-party could put on their site, to show the NT story rating for a particular story on that site. It might also be possible to show the source rating we have for that source, if known. We hadn't planned on doing this right away, but it's in the queue of things we would consider doing, if requested by one of our partners.
I'd say the advantage of an API is the scripts for your widgets should be greatly simplified, and you can freely license examples of them.
The drawback is you'll want to do various logs and analyses on API usage so you can block any particularly abusive sites; just like web spiders that don't respect robots.txt end up blocked everywhere except SEO linkbuilder sites.
Your request seems a bit different, if I understand it correctly: you would want to display the ratings for sources cited by your articles, is that right? if that's the case, it may be sufficient for us to get just the URL. We would then look up that URL in our DB, and if we have it on file, that would allow us to provide the story rating and number of reviews. We may also be able to provide the source rating and number of reviews for the source associated with that story at the same time. Lastly, it may be possible to provide the source rating and reviews for the source typically associated with that domain name, though this is a risky proposition, because often a story featured on a site is not really from that site. So it would be best to ask for source ratings by specifying a source name, but you would need to request the exact name we use for that source, which could be prone to human error.
Yes. At the moment there's a significant percentage of Wikinews articles are what we call "synthesis" articles. They contain no original research, but are constructed through using multiple independent sources which must be listed at the foot of the article.
Within the Wikicode this looks as follows:
*{{source|url=http://news.example.com/articleurl |title=Name of story, as given by publisher |author=the article's author(s), if specified by the publisher |pub=The name of the publisher. This *should* be as listed on Wikipedia |date=Monthname daynumber, year - as specified by the publisher}}
This was one point another contributor raised off-list; we currently list all used sources with no regard to their reliability or reputation. That can see Fox News listed and "supposedly" on a par with the BBC, Reuters, or PBS. Your own critique of Iain's article on the Garuda pilot's conviction noted we'd not had contact with some key primary sources; as independents, with zero financial backing for our reporting activities, getting that can be challenging. International phone calls can soon mount up if you're looking for comment from the other side of the globe. Personally, I've sunk between €500-€1,000 into setting up our wikinewsie.org domain, mostly used so we're not emailing people with addresses like "fluffykitteh1024@hotspace.com".
In any case, we should probably prioritize the tasks you are considering, so we know which is most important to you from an editorial standpoint.
I don't want to end up pushing NewsTrust to develop something that would have limited use outside that of Wikinews. However, I do think that the elimination of cross-server scripting vulnerabilities would be a big selling point for a published API.
Is it more important for you to have your own articles display a story rating? or to give a rating to the third-party articles cited as sources for your own articles?
Both, I think. But, that's the beauty of doing it with an API; anyone could do either.
If it's the latter, how often would you need this information to be updated? If it's an old story, its story rating is unlikely to change much after a month or two after its release, so maybe you could settle for a one-time rating -- the source rating is more likely to change over time, but not by much. So maybe a once-a-month or less frequent update might be fine.
Bawolff's input on this suggests the volume of requests to NewsTrust would naturally tail off as articles age. Thus:
* Someone request a Wikinews article. * Javascript on Wikinews activates, parses required parameters from the source template, and sends them to our back end (the ToolServer). * If less than 10-15 minutes since NewsTrust last queried, back end returns cached data. * Else the back end submits a new request * If NewsTrust returns updated data (instead of an "unchanged" response) the back end updates its stats and sends that on to the reader via the Javascript invoked above
Either way, we would need to figure out how important all this is to you, and if we can squeeze in some simple technology that addresses most of your needs.
As you've mentioned, and one of the other headaches we have, something from AP, Reuters, or AFP can end up on dozens - if not hundreds - of newspaper sites. Wikinews tends to push for people to go back to the wire site or, say, Google News' hosting of these. We also push for the wire to be cited as the author (eg Reuters); that *might* help NewsTrust consolidate the different URLs because the article title is generally only changed if the site publishing it applies a house style for capitalisation.
If, perhaps as a more long-term goal for NewsTrust, you were getting that data you could tie up all the different URLs for a Reuters or AP story, group under a unique article identifier, and expose that in the API so, once you've queried with a URL, the API asks for future requests to use a much shorter identifier.
But this is a good conversation to have, and we appreciate your thinking about these creative uses of NewsTrust for your site.
I did warn you Wikinewsies will steal anything that isn't nailed down and watched by armed guards. :-P
Oh, and the widget I said I'd suggest:
http://en.wikinews.org/wiki/MediaWiki_talk:Gadget-dictionaryLookupHover.js
Subbu will likely follow this most quickly; it's a freely licensed piece of Javascript that uses Wiktionary (another WMF project) to do a dictionary lookup of any work a user double clicks on.
It is multilingual, so if NewsTrust account holders could set a "Mother tongue" option they wouldn't get definitions in an English default, but their chosen language. (The gadget looks up "example" in Wiktionary, tracks down the link to the definition in "Mother tongue", and displays it in a small pop-up window.)
If you'd like to try it out on Wikinews, sign up for an account, log in, select your preferences at the top of the page, go to the gadgets tab, and look for and enable Wiktionary Hover.
Here endeth another shameless plug for Bawolff's Javascript-fu.