Hi Reid,
My responses to your responses are inline.
-------- Message original -------- Sujet: Re: [Wiki-research-l] Proposal: build a wiki literature review wiki-style De : Reid Priedhorsky reid@reidster.net Pour : wiki-research-l@lists.wikimedia.org Date : March-22-11 11:56:24 AM
(I don't know the details of MediaWiki search well, so some of the following may not be quite right.) What MediaWiki would give us is fulltext search. So while it would be easy to search for "John Smith", and that query would find papers authored by John Smith plus perhaps other stuff; however, one cannot search for "author = John Smith" and get only results where the author field matches John Smith and no others.
However, it does seem like Semantic MediaWiki has this type of search and otherwise behaves much like plain MediaWiki.
I actually wasn't familiar with the full functionality of Semantic MediaWiki (http://semantic-mediawiki.org/wiki/Semantic_MediaWiki) until I looked it up after your comments. From what I can see, it certainly seems to have the capabilities to maintain all the key metadata that would be necessary for myself and I assume most other researchers (e.g. authors, dates, publication source, URLs to HTML or PDF versions, etc.).
There also appear to be various options for Semantic MediaWiki hosting: Wikia, Referata, etc. It would be nice to not have to deal with the sysadmin aspects of the project.
I agree that going with a reliable host would be the way to go. I think that for the nature of our project, choosing a paid Referata plan would probably be better than going for Wikia. I for one could probably easily find grant funding to keep it going.
One final note on bibliographic software: many of these claim to do automatic import of a reference simply by pointing the software at the publisher's web page for the references. But I have never seen this work correctly; always, the imported data needs significant cleanup, enough that personally I'd rather type it in manually anyway. For example, titles of ACM papers aren't even correctly cased on the official ACM pages (e.g., http://dx.doi.org/10.1145/1753326.1753615)!
My only experience with "scraping" pages is with Zotero, and it does it beautifully. I assume (but don't know) that the current generation of other bibliography software would also do a good job. Anyway, Zotero has a huge support community, and scrapers for major sources (including Google Scholar for articles and Amazon for books) are kept very well up to date for the most part.
Bibliographic software then also typically does not include the proper metadata for automatically lower-casing titles in citations. For example, the title "Path Selection: Novel Interaction Technique for Wikipedia" should be lower-cased as "Path selection: Novel interaction technique for Wikipedia". But so often I see papers with "Path selection: novel interaction technique for wikipedia". It's embarrassing.
That's definitely a software design flaw; Zotero is certainly rather bad at this point.
But, if we were writing our own (e.g.) MediaWiki -> BibTeX export script, we could automatically note that "Novel" should be capitalized (because it begins the subtitle) as well as provide for people to indicate explicitly title words that should remain capitalized. (In this instance, the proper BibTeX export syntax would be "Path Selection: {Novel} Interaction Technique for {Wikipedia}".)
I like the idea of including export facilities in our SMW version, giving users the option of what they would like to export to.
Would it be feasible to have both, and use them concurrently so that researchers could use one or the other, or both, as they prefer? I'm thinking of something like this (for purpose of illustration, let's call the chosen MediaWiki instance MW and the chosen dedicated online shared bibliographic tool BT):
Bi-directional synchronization is hard to get right, particularly when the two sides have different data models. I think we are much better off declaring one or the other to be the master and the rest should remain read-only (i.e. export rather than synchronization).
I like this idea; with SMW as the primary, editable source, a read-only Zotero library imported from the SMW would work well. The problem, though, is that duplicate detection would need to prevent imports from adding existing articles. A complete overwrite would not work, since this would break article IDs for word processor integration. Zotero has been slow on implementing duplicate detection, but they finally have a very impressive solution in alpha (http://www.zotero.org/blog/new-release-multilingual-zotero-with-duplicates-d...).
Thanks, Reid, for your great suggestions. I hope this can become a reality.
~ Chitu