I have been working with Sam and others for some time now on brainstorming a proposal for the Foundation to create a centralized wiki of citations, a WikiCite so to speak, if that is not the eventual name. My plan is to continue to discuss with folks who are knowledgeable and interested in such a project and to have the feedback I receive go into the proposal which I hope to write this summer. The proposal white paper will then be sent around to interested parties for corrections and feedback, including on-wiki and mailing lists, before eventually landing at the Foundation officially. As we know WMF has not started a new project in some years, so there is no official process. Thus I find it important to get it right.
The basic idea is a centralized wiki that contains citation information that other MediaWikis and WMF projects can then reference using something like a {{cite}} template or a simple link. The community can document the citation, the author, the book etc.. and, in one idealization, all citations across all wikis would point to the same article on WikiCite. Users can use this wiki as their personal bibliography as well, as collections of citations can be exported in arbitrary citation formats. This general plan would allow community aggregation of metadata and community documentation of sources along arbitrary dimensions (quality, trust, reliability, etc.). The hope is that such a resource would then expand on that wiki and across the projects into summarizations of collections of sources (lit reviews) that make navigating entire fields of literature easier and more reliable, getting you out of the trap of not being aware of the global context that a particular source sits in.
To give all a more concrete view, here is an example from some software that I have implemented in our lab called WikiPapers. Please take note that while this is a scientific literature example, the idea is general to *all publications ever*. Also, while I have implemented a feature-full version of a WikiCite, it's important to point out that for the WMF project we will need a new extension that handles the needs of the project exactly, and in PHP (I use Python :).
The name of the wiki article is a unique key that is a combination of the author names and the year, in the following format: Author1Author2Author3EtAl10b. This works for scientific articles, but we may find we need to modify the key for other kinds of sources. The content of the wiki article is composed of an infobox constructed via the Citation template, and any other text and media the community determines it is useful and legal to include in the article. Example article:
Screenshot of how this infobox renders on our wiki: http://grey.colorado.edu/mediawiki/sites/mingus/images/0/0e/KangHsuKrajbichE...
Title: KangHsuKrajbichEtAl09
{{Citation |publisher=SAGE Publications |dateadded=2010-07-17 |author=Kang M.J. and Hsu M. and Krajbich I.M. and Loewenstein G. and McClure S.M. and Wang J.T. and Camerer C.F. |url=http://pss.sagepub.com/content/20/8/963.full |abstract=Curiosity has been described as a desire for learning and knowledge, but its underlying mechanisms are not well understood. We scanned subjects with functional magnetic resonance imaging while they read trivia questions. The level of curiosity when reading questions was correlated with activity in caudate regions previously suggested to be involved in anticipated reward. This finding led to a behavioral study, which showed that subjects spent more scarce resources (either limited tokens or waiting time) to find out answers when they were more curious. The functional imaging also showed that curiosity increased activity in memory areas when subjects guessed incorrectly, which suggests that curiosity may enhance memory for surprising new information. This prediction about memory enhancement was confirmed in a behavioral study: Higher curiosity in an initial session was correlated with better recall of surprising answers 1 to 2 weeks later. |title=The Wick in the Candle of Learning |bibtex type=article |number=8 |volume=20 |owner=Sethherd |journal=Psychological Science |year=2009 |cites=O'ReillyFrank06,Cowan95,Wise04,Fuster80,Panksepp98,KakadeDayan02b,DelgadoLockeStengerEtAl03,BrewerZhaoDesmondEtAl98,DelgadoNystromFiez00,Beatty82,Baddeley92,Waanabe96,Roland93lm,DelgadoNystromFissellEtAl00,WagnerSchacterRotteEtAl98,SeymourDawDayanEtAl07,ODoherty04,BandettiniMoonen99,ODohertyDayanFristonEtAl03,RogersOwenRobbins99,KnutsonWestdorpKaiserEtAl00,CircuitryMemory,OReillyFrank06,Watanabe96a,BrewerZhaoGabrieli98,WagnerSchacterBuckner98,RogersOwenMiddletonEtAl99,Baddeley86,Watanabe96,Rolls96a,PallerWagner02 |cited_by=Author1Author2Author3EtAl10,etc... |pages=963 }}
Then, any other WMF wiki, or any other MediaWiki, could cite this universal entry by simply typing {{cite|KangHsuKrajbichEtAl09}}
Additionally, if a technology such as Semantic MediaWiki is used (as it is in WikiPapers), arbitrary lists of collections of literature can be generated by constructing simple queries that are boolean combinations of template properties. Given that SMW does not scale well, I have a plan that uses Lucene instead for fast, scalable dynamic generation of collections of citations. Imagine the possibilities..
Feel free to provide your feedback on this idea, in addition to your own ideas, in this thread, or to me personally. I am especially interested in the potential benefits to the WMF projects that you see, and to hear your thoughts on the potential of this project on its own, as that will feature prominently in the proposal. Additionally, what do you think WikiCite would eventually be like, once it is fully matured?
Brian Mingus Graduate Student Computational Cognitive Neuroscience Lab University of Colorado at Boulder
On Mon, Jul 19, 2010 at 11:22 AM, phoebe ayers phoebe.wiki@gmail.comwrote:
There have been a number of proposals floated in the Wikimedia community over the years to build a wiki-based project for collecting journal citation information. For those interested in that topic, you might want to check out the University of Prince Edward Island's "knowledge for all" project proposal -- it proposes to build an open universal citation index (to serve as an alternative to the many hundreds of proprietary citation index products that libraries currently buy). This of course is not the first attempt at this problem, but it's an interesting proposal that's getting a bit of buzz in the library community. http://library.upei.ca/k4all
-- phoebe
--
- I use this address for lists; send personal messages to phoebe.ayers
<at> gmail.com *
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Brian J Mingus, 19/07/2010 22:20:
The basic idea is a centralized wiki that contains citation information that other MediaWikis and WMF projects can then reference using something like a {{cite}} template or a simple link. The community can document the citation, the author, the book etc.. and, in one idealization, all citations across all wikis would point to the same article on WikiCite. Users can use this wiki as their personal bibliography as well, as collections of citations can be exported in arbitrary citation formats.
I have already mentioned it before, but this description looks quite similar to http://bibdex.org/ . Maybe we should join forces (i.e., send your proposal also to Sunir Shah).
Nemo
Hey folks,
I've been lurking on this list since the beginning of time and saw this fly by. Thanks Nemo for the shout out. That is pretty much what Bibdex is about. My inspiration was a Big Hairy Goal to provide a central place where the body of academic knowledge can be curated by the public in a wiki style. It's different than Wikipedia because there is no NPOV and often research needs to be secret.
I originally tried this with both MeatballWiki and a similar service called BibWiki. Bibdex is my latest adaptation based on what I learnt. The current iteration embraces the face that academia is built on controversy. Different groups need to have space to express different opinions apart from others. So, I rebuilt the software so that research groups can create their own public annotated bibliographies and control who has access to write to those bibliographies, much like Google Groups has different levels of public and private access control.
My understanding is that WikiCite is focused specifically on the needs of the WMF projects. That has its own set of interesting use cases.
By the way, the http://www.openlibrary.org project is very inspiring and in a similar vein, albeit restricted to books.
Cheers, Sunir, Bibdex
On Mon, Jul 19, 2010 at 6:03 PM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Brian J Mingus, 19/07/2010 22:20:
The basic idea is a centralized wiki that contains citation information that other MediaWikis and WMF projects can then reference using something like a {{cite}} template or a simple link. The community can document the citation, the author, the book etc.. and, in one idealization, all citations across all wikis would point to the same article on WikiCite. Users can use this wiki as their personal bibliography as well, as collections of citations can be exported in arbitrary citation formats.
I have already mentioned it before, but this description looks quite similar to http://bibdex.org/ . Maybe we should join forces (i.e., send your proposal also to Sunir Shah).
Nemo
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Brian,
The meta process for new project proposals is still the cleanest one for suggesting a specific Project and presenting it alongside similar projects.
It would be helpful if you could update a related project proposal on meta -- say, [[m:WikiBibliography]], if that seems relevant. (I just cleaned that page up and merged in an older proposal that had been obfuscated.)
Or you can create a new project proposal... WikiCite as a name can be confusing, since it has been used to refer to this bibliographic idea, but also to refer to the idea of citations for every statement or fact - something closer to a blame or trust solution that includes citations in its transactions.
We should figure out how this project would work with acawiki, and possibly bibdex. Bibdex doesn't aim to And it would be helpful to have a publicly-viewable demo to play with -- could you clone your current wiki and populate the result with dummy data?
I love the idea of having a global place to discuss citations -- ALL citations -- something that OpenLibrary, the arXiv, and anyone else hosting cited documents could point to for every one of its works.
Sam.
On Mon, Jul 19, 2010 at 6:03 PM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Brian J Mingus, 19/07/2010 22:20:
The basic idea is a centralized wiki that contains citation information that other MediaWikis and WMF projects can then reference using something like a {{cite}} template or a simple link. The community can document the citation, the author, the book etc.. and, in one idealization, all citations across all wikis would point to the same article on WikiCite. Users can use this wiki as their personal bibliography as well, as collections of citations can be exported in arbitrary citation formats.
I have already mentioned it before, but this description looks quite similar to http://bibdex.org/ . Maybe we should join forces (i.e., send your proposal also to Sunir Shah).
Nemo
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
On Mon, Jul 19, 2010 at 9:37 PM, Samuel Klein meta.sj@gmail.com wrote:
Brian,
The meta process for new project proposals is still the cleanest one for suggesting a specific Project and presenting it alongside similar projects.
It would be helpful if you could update a related project proposal on meta -- say, [[m:WikiBibliography]], if that seems relevant. (I just cleaned that page up and merged in an older proposal that had been obfuscated.)
Thanks for your work on this - definitely in the right direction! I will consider whether I feel it's the right way for me to get started. One point is that I am pointing more in the direction of a long-form proposal, and I have more experience writing white-paper proposals for academia. I certainly want it to end up on wiki, but when TPTB finally read the proposal perhaps they will find it more persuasive if it is a professional looking document that lands in their inbox.
Or you can create a new project proposal... WikiCite as a name can be confusing, since it has been used to refer to this bibliographic idea, but also to refer to the idea of citations for every statement or fact
- something closer to a blame or trust solution that includes
citations in its transactions.
Another name that I have come up with is OpenScholar. I still rather like it, but suspect it has too much of a scientific ring to it? Names are certainly very important so we should do more work on this avenue. Including a list of names in the proposal would be a good idea, and perhaps the final name will be a combination of existing name proposals.
We should figure out how this project would work with acawiki, and possibly bibdex. Bibdex doesn't aim to And it would be helpful to have a publicly-viewable demo to play with -- could you clone your current wiki and populate the result with dummy data?
The problem with WikiPapers is that it has too many features! A feature-thin version would be ideal for the proposal though, so I will plan to have some kind of a demo site available.
I love the idea of having a global place to discuss citations -- ALL citations -- something that OpenLibrary, the arXiv, and anyone else hosting cited documents could point to for every one of its works.
Exactly :)
Brian
Sam.
On Mon, Jul 19, 2010 at 6:03 PM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Brian J Mingus, 19/07/2010 22:20:
The basic idea is a centralized wiki that contains citation information
that
other MediaWikis and WMF projects can then reference using something
like a
{{cite}} template or a simple link. The community can document the
citation,
the author, the book etc.. and, in one idealization, all citations
across
all wikis would point to the same article on WikiCite. Users can use
this
wiki as their personal bibliography as well, as collections of citations
can
be exported in arbitrary citation formats.
I have already mentioned it before, but this description looks quite similar to http://bibdex.org/ . Maybe we should join forces (i.e., send your proposal also to Sunir Shah).
Nemo
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Samuel Klein identi.ca:sj w:user:sj
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Hi Brian and others,
Interesting project. At WikiSym and Wikimania there were some discussions on the issue of bibliographic databases - and more generally about structured data in wikis and I mentioned your project briefly in my talk. Daniel Kinzler (which might be on this mailing list) showed some initial efforts for bibliographic databasing in Wikipedia. He did not reveal much (I don't know if it is appropriate to tell about Daniel's project - but now I have done it anyway...). I have started to build a bibliographic wiki (Brede Wiki) that is entirely separate from Wikimedia. It is available from here:
My Wikimania talk about that wiki and related issues is available here (the video may come later):
http://commons.wikimedia.org/wiki/File:Finn_%C3%85rup_Nielsen_-_Wikipedia_is...
I also think that it would be interesting with some bibliographic support, for two-way citation tracking and commenting on articles (for example), but I furthermore find that particular in science article we often find data that is worth structuring and put in a database or a structured wiki, so that we can extract the data for meta-analysis and specialized information retrieval. That is what I also do in the Brede Wiki. I use the templates to store such data. So if such a system as yours is implemented we should not just think of it as a bibliographic database but in more broader terms: A data wiki.
Yours and my system shares some similarities. Here are some differences:
As the 'key' (the wiki page title) I use the (lowercase) title of the article. That might be more reader friendly - but usually longer. I think that KangHsuKrajbichEtAl09 is too camel-cased. Neither the title nor author list + year will be unique, so we need some predictable disambig.
I have one field to each author so that I can automatically link authors. I use author1, author2, etc. fields. Likewise for URLs: url1, url2, etc. In this way I can also 'database' authors, ie., I have a wiki page for each author (regardless of notability). Also journals and organizations and events are available in my wiki.
I do not include abstracts in my CC-by-sa'ed wiki, since I am not sure how publishers regard the copyright for abstracts. Neither I am sure about the forward cites. Most commerical publishers hide the cites for unpaid viewing. Including cites in CC-by-sa material on a large-scale may infringe publishers' copyright. Perhaps it is possible to negotiate with some publishers. We need some talk with 'closed access' publishers before we add a such data.
I am not sure what 'owner' is in your format. Surely you cant have owners in Wikimedia/MediaWiki wiki? And 'dateadded' would already be recorded in the revision history.
We probably need to check on the final format of the bibliographic template to make sure it is easy translatable to the most common bibliographic formats: bibtex, refman, Z3988 microformat, pubmed, etc.
As I understand there are issue with Semantic MediaWiki with respect to performance and security that needs to be resolved before a large scale deployment within Wikimedia Foundation projects. I heard that Markus Krötzsch is going to Oxford to work on core SMW, so there might come some changes to SMW in the future. Code audit of SMW lacks.
It not 'necessarily necessary' to make a new Wikimedia project. There has been a suggestion (in the meta or strategy wiki) just to use a namespace in Wikipedia. You could then have a page called http://en.wikipedia.org/wiki/Bib:The_wick_in_the_candle_of_learning
I would say that a page called:
http://en.wikipedia.org/wiki/The_wick_in_the_candle_of_learning
would be the way to do it. But that would never pass the deletionists. :-)
/Finn
On Mon, 19 Jul 2010, Brian J Mingus wrote:
I have been working with Sam and others for some time now on brainstorming a proposal for the Foundation to create a centralized wiki of citations, a WikiCite so to speak, if that is not the eventual name. My plan is to continue to discuss with folks who are knowledgeable and interested in such a project and to have the feedback I receive go into the proposal which I hope to write this summer. The proposal white paper will then be sent around to interested parties for corrections and feedback, including on-wiki and mailing lists, before eventually landing at the Foundation officially. As we know WMF has not started a new project in some years, so there is no official process. Thus I find it important to get it right.
The basic idea is a centralized wiki that contains citation information that other MediaWikis and WMF projects can then reference using something like a {{cite}} template or a simple link. The community can document the citation, the author, the book etc.. and, in one idealization, all citations across all wikis would point to the same article on WikiCite. Users can use this wiki as their personal bibliography as well, as collections of citations can be exported in arbitrary citation formats. This general plan would allow community aggregation of metadata and community documentation of sources along arbitrary dimensions (quality, trust, reliability, etc.). The hope is that such a resource would then expand on that wiki and across the projects into summarizations of collections of sources (lit reviews) that make navigating entire fields of literature easier and more reliable, getting you out of the trap of not being aware of the global context that a particular source sits in.
To give all a more concrete view, here is an example from some software that I have implemented in our lab called WikiPapers. Please take note that while this is a scientific literature example, the idea is general to *all publications ever*. Also, while I have implemented a feature-full version of a WikiCite, it's important to point out that for the WMF project we will need a new extension that handles the needs of the project exactly, and in PHP (I use Python :).
The name of the wiki article is a unique key that is a combination of the author names and the year, in the following format: Author1Author2Author3EtAl10b. This works for scientific articles, but we may find we need to modify the key for other kinds of sources. The content of the wiki article is composed of an infobox constructed via the Citation template, and any other text and media the community determines it is useful and legal to include in the article. Example article:
Screenshot of how this infobox renders on our wiki: http://grey.colorado.edu/mediawiki/sites/mingus/images/0/0e/KangHsuKrajbichE...
Title: KangHsuKrajbichEtAl09
{{Citation |publisher=SAGE Publications |dateadded=2010-07-17 |author=Kang M.J. and Hsu M. and Krajbich I.M. and Loewenstein G. and McClure S.M. and Wang J.T. and Camerer C.F. |url=http://pss.sagepub.com/content/20/8/963.full |abstract=Curiosity has been described as a desire for learning and knowledge, but its underlying mechanisms are not well understood. We scanned subjects with functional magnetic resonance imaging while they read trivia questions. The level of curiosity when reading questions was correlated with activity in caudate regions previously suggested to be involved in anticipated reward. This finding led to a behavioral study, which showed that subjects spent more scarce resources (either limited tokens or waiting time) to find out answers when they were more curious. The functional imaging also showed that curiosity increased activity in memory areas when subjects guessed incorrectly, which suggests that curiosity may enhance memory for surprising new information. This prediction about memory enhancement was confirmed in a behavioral study: Higher curiosity in an initial session was correlated with better recall of surprising answers 1 to 2 weeks later. |title=The Wick in the Candle of Learning |bibtex type=article |number=8 |volume=20 |owner=Sethherd |journal=Psychological Science |year=2009 |cites=O'ReillyFrank06,Cowan95,Wise04,Fuster80,Panksepp98,KakadeDayan02b,DelgadoLockeStengerEtAl03,BrewerZhaoDesmondEtAl98,DelgadoNystromFiez00,Beatty82,Baddeley92,Waanabe96,Roland93lm,DelgadoNystromFissellEtAl00,WagnerSchacterRotteEtAl98,SeymourDawDayanEtAl07,ODoherty04,BandettiniMoonen99,ODohertyDayanFristonEtAl03,RogersOwenRobbins99,KnutsonWestdorpKaiserEtAl00,CircuitryMemory,OReillyFrank06,Watanabe96a,BrewerZhaoGabrieli98,WagnerSchacterBuckner98,RogersOwenMiddletonEtAl99,Baddeley86,Watanabe96,Rolls96a,PallerWagner02 |cited_by=Author1Author2Author3EtAl10,etc... |pages=963 }}
Then, any other WMF wiki, or any other MediaWiki, could cite this universal entry by simply typing {{cite|KangHsuKrajbichEtAl09}}
Additionally, if a technology such as Semantic MediaWiki is used (as it is in WikiPapers), arbitrary lists of collections of literature can be generated by constructing simple queries that are boolean combinations of template properties. Given that SMW does not scale well, I have a plan that uses Lucene instead for fast, scalable dynamic generation of collections of citations. Imagine the possibilities..
Feel free to provide your feedback on this idea, in addition to your own ideas, in this thread, or to me personally. I am especially interested in the potential benefits to the WMF projects that you see, and to hear your thoughts on the potential of this project on its own, as that will feature prominently in the proposal. Additionally, what do you think WikiCite would eventually be like, once it is fully matured?
Brian Mingus Graduate Student Computational Cognitive Neuroscience Lab University of Colorado at Boulder
On Mon, Jul 19, 2010 at 11:22 AM, phoebe ayers phoebe.wiki@gmail.comwrote:
There have been a number of proposals floated in the Wikimedia community over the years to build a wiki-based project for collecting journal citation information. For those interested in that topic, you might want to check out the University of Prince Edward Island's "knowledge for all" project proposal -- it proposes to build an open universal citation index (to serve as an alternative to the many hundreds of proprietary citation index products that libraries currently buy). This of course is not the first attempt at this problem, but it's an interesting proposal that's getting a bit of buzz in the library community. http://library.upei.ca/k4all
___________________________________________________________________
Finn Aarup Nielsen, DTU Informatics, Denmark Lundbeck Foundation Center for Integrated Molecular Brain Imaging http://www.imm.dtu.dk/~fn/ http://nru.dk/staff/fnielsen/ ___________________________________________________________________
On Tue, Jul 20, 2010 at 8:06 AM, Finn Aarup Nielsen fn@imm.dtu.dk wrote:
.. It not 'necessarily necessary' to make a new Wikimedia project. There has been a suggestion (in the meta or strategy wiki) just to use a namespace in Wikipedia. You could then have a page called http://en.wikipedia.org/wiki/Bib:The_wick_in_the_candle_of_learning
I would say that a page called:
http://en.wikipedia.org/wiki/The_wick_in_the_candle_of_learning
would be the way to do it. But that would never pass the deletionists. :-)
French Wikipedia already has a namespace dedicated to pages about references.
http://fr.wikipedia.org/wiki/R%C3%A9f%C3%A9rence:Index
There is quite a bit of activity in this namespace:
http://fr.wikipedia.org/w/index.php?namespace=104&tagfilter=&title=S...
English Wikipedia has a few groups of citation pages with bots that fill in the details.
http://en.wikipedia.org/wiki/Special:PrefixIndex/Template:cite_doi http://en.wikipedia.org/wiki/Special:PrefixIndex/Template:cite_pmid
-- John Vandenberg
Hi all
A central place for managing Bibliographic data for use with Citations is something that has been discussed by the German community for a long time. To me, it consists of two parts: a project for managing the structured data, and a machanism for uzsing that data on the wikis.
I have been working on the latter recently, and there's a working prototype: on http://prototype.wikimedia.org/wmde-sandbox-1/Wikipedia:DataTransclusion you can see how data records can be included from external sources. A demo for the actual on-wiki use can be found at http://prototype.wikimedia.org/wmde-sandbox-1/Ameisenigel#Literatur, where {{ISBN|0868400467}} is used to show the bibliographic info for that book. (side note: the prototype wikis are slow. sorry about that).
Fetching and showing the data is done using http://www.mediawiki.org/wiki/Extension:DataTransclusion. Care has been taken to make this secure and scalable.
For a first demo, I'm using teh ISBN as the key, but any kind of key could be used to reference resources other than books.
For demoing managing the data by ourselves, I have set up ab SMW instance. An example bib record is at http://prototype.wikimedia.org/wmde-bib/ISBN:0451526538, it's used across wikis at http://prototype.wikimedia.org/wmde-sandbox-1/Wikipedia:DataTransclusion. Note that changes will show delayed, as the data is cached for a while.
When discussing these things, please keep in mind that there are two components: fetching and displaying external data records, and managing structured data in a wiki style. The former is much simpler than the latter. I think we should really aim at getting both, but we can start off with transclusing external data much faster, if we allow no-so-wiki data sources. For ISBN-based queries, we could simply fetch information from http://openlibrary.org - or the open knowledge foundation's http://bibliographica.org, once it's working.
In the context of bibdex, I recommend to also have a look at http://bibsonomy.org - it's a university research project, open source, and is quite similar to bibdex (and to what citeulike used to be).
As to managing structured data ourselves: I have talked a lot with Erik Möller and Markus Krötzsch about this, and I'm in touch with the people wo make DBpedia and OntoWiki. Everyone wants this. But it's not simple at all to get it right (efficient versioning of multilingual data in a document oriented database, anyone? want inference? reasoning, even? yay...). So the plan is currently to hatch a concrete plan for this. And I imagine that bibliographical and biographical info will be among the first used cases.
cheers, daniel
On Tue, Jul 20, 2010 at 5:10 AM, Daniel Kinzler daniel@brightbyte.dewrote:
Hi all
A central place for managing Bibliographic data for use with Citations is something that has been discussed by the German community for a long time. To me, it consists of two parts: a project for managing the structured data, and a machanism for uzsing that data on the wikis.
I have been working on the latter recently, and there's a working prototype: on http://prototype.wikimedia.org/wmde-sandbox-1/Wikipedia:DataTransclusion you can see how data records can be included from external sources. A demo for the actual on-wiki use can be found at http://prototype.wikimedia.org/wmde-sandbox-1/Ameisenigel#Literatur, where {{ISBN|0868400467}} is used to show the bibliographic info for that book. (side note: the prototype wikis are slow. sorry about that).
Fetching and showing the data is done using http://www.mediawiki.org/wiki/Extension:DataTransclusion. Care has been taken to make this secure and scalable.
For a first demo, I'm using teh ISBN as the key, but any kind of key could be used to reference resources other than books.
For demoing managing the data by ourselves, I have set up ab SMW instance. An example bib record is at http://prototype.wikimedia.org/wmde-bib/ISBN:0451526538, it's used across wikis at http://prototype.wikimedia.org/wmde-sandbox-1/Wikipedia:DataTransclusion. Note that changes will show delayed, as the data is cached for a while.
When discussing these things, please keep in mind that there are two components: fetching and displaying external data records, and managing structured data in a wiki style. The former is much simpler than the latter. I think we should really aim at getting both, but we can start off with transclusing external data much faster, if we allow no-so-wiki data sources. For ISBN-based queries, we could simply fetch information from http://openlibrary.org - or the open knowledge foundation's http://bibliographica.org, once it's working.
In the context of bibdex, I recommend to also have a look at http://bibsonomy.org - it's a university research project, open source, and is quite similar to bibdex (and to what citeulike used to be).
As to managing structured data ourselves: I have talked a lot with Erik Möller and Markus Krötzsch about this, and I'm in touch with the people wo make DBpedia and OntoWiki. Everyone wants this. But it's not simple at all to get it right (efficient versioning of multilingual data in a document oriented database, anyone? want inference? reasoning, even? yay...). So the plan is currently to hatch a concrete plan for this. And I imagine that bibliographical and biographical info will be among the first used cases.
Hi Daniel,
Have you considered that Lucene is the perfect backend for this kind of project? What kinds of faults do you see with it? At least in my mind, we can mold it to our needs here. It has the core capabilities found in Semantic MediaWiki, and it is fast and scalable.
I say this as a serious user of Semantic MediaWiki. I have seen that it can't scale well without an alternate backend, and I wonder what kind of monumental effort will be required to make it scale to tens or hundreds of millions of documents, each of which containing 20-50 properties. Lucene can already do this, SMW, not so much ;-)
Brian
cheers, daniel
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
On Mon, Jul 19, 2010 at 4:06 PM, Finn Aarup Nielsen fn@imm.dtu.dk wrote:
Hi Brian and others,
I also think that it would be interesting with some bibliographic support, for two-way citation tracking and commenting on articles (for example), but I furthermore find that particular in science article we often find data that is worth structuring and put in a database or a structured wiki, so that we can extract the data for meta-analysis and specialized information retrieval. That is what I also do in the Brede Wiki. I use the templates to store such data. So if such a system as yours is implemented we should not just think of it as a bibliographic database but in more broader terms: A data wiki.
Although the technology required to make a WikiCite happen will be applicable to a more generalized wiki for storing data I think that is too broad for the current proposal. A WMF analogue to Google Base is an entirely new beast that has its own requirements. I certainly think it's an interesting and worthwhile idea, but I don't feel that we are there yet.
As the 'key' (the wiki page title) I use the (lowercase) title of the
article. That might be more reader friendly - but usually longer. I think that KangHsuKrajbichEtAl09 is too camel-cased. Neither the title nor author list + year will be unique, so we need some predictable disambig.
I noticed that AcaWiki is using the title, but I am personally not a fan of it. The motivation for using a key comes from BibTeX. When you cite an entry in a publication in LaTeX, you type \cite{key}. Also, I think most bibliographic formats support such a key. The idea is that there is a universal token that you can type into Google that will lead you to the right item. The predictable disambig is in the format I sent out (which likely needs modification for other kinds of sources). The format is Author1Author2Author3EtAlYYb. Here is a real world example from a pair of very prolific scientists, Deco & Rolls, who published at least three papers together in 2005. In our lab we have really come to love these keys - they are very memorable tokens that you can verbally pass on to other scientists in the midst of a discussion. Eventually, if they enter the key you have given them into Google, they will get the right entry at "WikiCite".
DecoRolls05 - Synaptic and spiking dynamics underlying reward reversal in the orbitofrontal cortex. DecoRolls05b - Sequential memory: a putative neural and synaptic dynamical mechanism. DecoRolls05c - Attention, short-term memory, and action selection: a unifying theory.
I have one field to each author so that I can automatically link authors.
This is accomplished via Semantic Forms, using the arraymap parser function. You just provide a comma-separated list of authors, and they each get semantic property definitions and deep linking to all papers published by that author.
I do not include abstracts in my CC-by-sa'ed wiki, since I am not sure how
publishers regard the copyright for abstracts. Neither I am sure about the forward cites. Most commerical publishers hide the cites for unpaid viewing. Including cites in CC-by-sa material on a large-scale may infringe publishers' copyright. Perhaps it is possible to negotiate with some publishers. We need some talk with 'closed access' publishers before we add a such data.
Yes, I have added many nice features to WikiPapers that can unfortunately not make it into the proposed WMF project. Some can, some can't. For example, adding papers to the wiki is via a one click bookmarklet. First, you highlight the title of a paper anywhere on the web, be it a webpage, e-mail, or journal site. Then, you click your "Add to wiki" bookmarklet. On my webserver I am running the citation scraping software from Connotea, CiteULike, and Zotero. I also have a Google Scholar scraper and PubMed importer. You can choose to use one of those sources, or you can choose to merge all of the metadata together. It's automatically added to the wiki for you. Additionally, I have written a bash script that is very adept at getting the pdfs from journals, so it automatically tries to download the pdf and upload it to the wiki for you. I have also implemented the ability to compute the articles that an article cites, and vice versa. With respect to abstracts these scrapers aren't that great. Abstracts usually come from PubMed, whose database you can license, but you cannot change their metadata IIRC.
Ultimately, I think the community will have to take a very careful look at what data can be added to the wiki and design policies accordingly. On Wikipedia I believe copyright enforcement has largely been up to the community, and it takes a long time to converge on appropriate policies. Needless to say, much of the technologies I described in the last paragraph would not be found legal on a public wiki.
I am not sure what 'owner' is in your format. Surely you cant have owners in
Wikimedia/MediaWiki wiki? And 'dateadded' would already be recorded in the revision history.
The 'owner' field is a misnomer, but in lieu of mysql support it lets you know which individuals have that entry in their personal bibliographies. dateadded is needed due to what at least used to be a bug in Semantic MediaWiki.
We probably need to check on the final format of the bibliographic template
to make sure it is easy translatable to the most common bibliographic formats: bibtex, refman, Z3988 microformat, pubmed, etc.
I have written extensive amounts of Python interchange code between wiki template syntax and BibTeX. I chose BibTeX because it is rather standard, our lab uses it, and it is very similar to template syntax. Also, I use Bibutils to convert from BibTeX to most popular formats, and vice versa for mass import of bibliographies: http://www.scripps.edu/~cdputnam/software/bibutils/
As I understand there are issue with Semantic MediaWiki with respect to
performance and security that needs to be resolved before a large scale deployment within Wikimedia Foundation projects. I heard that Markus Krötzsch is going to Oxford to work on core SMW, so there might come some changes to SMW in the future. Code audit of SMW lacks.
As I was writing a custom Lucene search engine for WikiPapers I realized that it is a perfect replacement for Semantic MediaWiki. Lucene has fields, it supports boolean operators and you can format its output. All that is needed is to write the Lucene backend (perhaps just modifying MWLucene) and write a parser function that supports using templates for formatting of the output of queries. Lucene is extremely fast and can scale to whatever we can imagine doing. That's my proposed plan.
It not 'necessarily necessary' to make a new Wikimedia project. There has
been a suggestion (in the meta or strategy wiki) just to use a namespace in Wikipedia. You could then have a page called http://en.wikipedia.org/wiki/Bib:The_wick_in_the_candle_of_learning
I believe it is necessary. First, the idea is for any mediawiki anywhere (and any software with appropriate extensions) to be able to cite the same source. Secondly, the project would be multilingual.
Cheers,
Brian Mingus Graduate Student Computational Cognitive Neuroscience Lab University of Colorado at Boulder
Hi Brian,
On 20 Jul 2010, at 18:02, Brian J Mingus wrote:
On Mon, Jul 19, 2010 at 4:06 PM, Finn Aarup Nielsen fn@imm.dtu.dk wrote:
Hi Brian and others,
I also think that it would be interesting with some bibliographic support, for two-way citation tracking and commenting on articles (for example), but I furthermore find that particular in science article we often find data that is worth structuring and put in a database or a structured wiki, so that we can extract the data for meta-analysis and specialized information retrieval. That is what I also do in the Brede Wiki. I use the templates to store such data. So if such a system as yours is implemented we should not just think of it as a bibliographic database but in more broader terms: A data wiki.
Although the technology required to make a WikiCite happen will be applicable to a more generalized wiki for storing data I think that is too broad for the current proposal. A WMF analogue to Google Base is an entirely new beast that has its own requirements. I certainly think it's an interesting and worthwhile idea, but I don't feel that we are there yet.
As the 'key' (the wiki page title) I use the (lowercase) title of the article. That might be more reader friendly - but usually longer. I think that KangHsuKrajbichEtAl09 is too camel-cased. Neither the title nor author list + year will be unique, so we need some predictable disambig.
I noticed that AcaWiki is using the title, but I am personally not a fan of it. The motivation for using a key comes from BibTeX. When you cite an entry in a publication in LaTeX, you type \cite{key}. Also, I think most bibliographic formats support such a key. The idea is that there is a universal token that you can type into Google that will lead you to the right item. The predictable disambig is in the format I sent out (which likely needs modification for other kinds of sources). The format is Author1Author2Author3EtAlYYb. Here is a real world example from a pair of very prolific scientists, Deco & Rolls, who published at least three papers together in 2005. In our lab we have really come to love these keys - they are very memorable tokens that you can verbally pass on to other scientists in the midst of a discussion. Eventually, if they enter the key you have given them into Google, they will get the right entry at "WikiCite".
DecoRolls05 - Synaptic and spiking dynamics underlying reward reversal in the orbitofrontal cortex. DecoRolls05b - Sequential memory: a putative neural and synaptic dynamical mechanism. DecoRolls05c - Attention, short-term memory, and action selection: a unifying theory.
Citation keys of this sort work, but they have to be decided on by some external system. Who decides which paper is -, b, and c? Publication order would be one way to do it -- but that's complicated, especially with online first publication, or overlapping conferences.
I think whether they're memorable tokens might vary by person... Sure, the author and year will be identifiable, even memorable. But the a, b, c?
If you want to support more than recent works, I'd urge YYYY instead of YY. Then we only have an issue for pre-0 stuff. :)
Also consider differentiating authors from title and year, perhaps with slashes. author1-author2-author3-etal/YYYY/b I'm not convinced that -'s are better than capital letters (author last names can have both)...
I have one field to each author so that I can automatically link authors.
This is accomplished via Semantic Forms, using the arraymap parser function. You just provide a comma-separated list of authors, and they each get semantic property definitions and deep linking to all papers published by that author.
Sure -- unless authors have the same name, or use different forms of the name.
One of my coauthors goes by John G. Breslin for disambiguration since his name is common -- but on the institute website he's credited as John Breslin, since that's the only name the system recognizes.
In other words, some authority control will be needed. Libraries have a long history with this. Groups of booklovers do it, too. For instance, here's the LibraryThing page for John Smith: http://www.librarything.com/author/smithjohn Notice that you can split and join authors -- LibraryThing's way of giving users the ability to join and separate. Or see http://www.librarything.com/author/carrolllewis Sometimes there are difficult questions -- such as "Is Lewis Carroll the same as Charles Dodgson?" - which depends on what you mean by "same".
For the scope of the potential problem, look at highly published authors -- for instance the "alternative names" list for Dante: http://www.worldcat.org/identities/lccn-n78-95495
I do not include abstracts in my CC-by-sa'ed wiki, since I am not sure how publishers regard the copyright for abstracts. Neither I am sure about the forward cites. Most commerical publishers hide the cites for unpaid viewing. Including cites in CC-by-sa material on a large-scale may infringe publishers' copyright. Perhaps it is possible to negotiate with some publishers. We need some talk with 'closed access' publishers before we add a such data.
Yes, I have added many nice features to WikiPapers that can unfortunately not make it into the proposed WMF project. Some can, some can't. For example, adding papers to the wiki is via a one click bookmarklet. First, you highlight the title of a paper anywhere on the web, be it a webpage, e-mail, or journal site. Then, you click your "Add to wiki" bookmarklet. On my webserver I am running the citation scraping software from Connotea, CiteULike, and Zotero. I also have a Google Scholar scraper and PubMed importer. You can choose to use one of those sources, or you can choose to merge all of the metadata together. It's automatically added to the wiki for you. Additionally, I have written a bash script that is very adept at getting the pdfs from journals, so it automatically tries to download the pdf and upload it to the wiki for you. I have also implemented the ability to compute the articles that an article cites, and vice versa. With respect to abstracts these scrapers aren't that great. Abstracts usually come from PubMed, whose database you can license, but you cannot change their metadata IIRC.
Ultimately, I think the community will have to take a very careful look at what data can be added to the wiki and design policies accordingly. On Wikipedia I believe copyright enforcement has largely been up to the community, and it takes a long time to converge on appropriate policies. Needless to say, much of the technologies I described in the last paragraph would not be found legal on a public wiki.
I am not sure what 'owner' is in your format. Surely you cant have owners in Wikimedia/MediaWiki wiki? And 'dateadded' would already be recorded in the revision history.
The 'owner' field is a misnomer, but in lieu of mysql support it lets you know which individuals have that entry in their personal bibliographies. dateadded is needed due to what at least used to be a bug in Semantic MediaWiki.
We probably need to check on the final format of the bibliographic template to make sure it is easy translatable to the most common bibliographic formats: bibtex, refman, Z3988 microformat, pubmed, etc.
I have written extensive amounts of Python interchange code between wiki template syntax and BibTeX. I chose BibTeX because it is rather standard, our lab uses it, and it is very similar to template syntax. Also, I use Bibutils to convert from BibTeX to most popular formats, and vice versa for mass import of bibliographies: http://www.scripps.edu/~cdputnam/software/bibutils/
BibTeX is good for backwards compatibility, but I'd urge a richer data format -- probably based on bibo RDF: http://bibliontology.com/ It's already widely used: http://bibliontology.com/projects
As I understand there are issue with Semantic MediaWiki with respect to performance and security that needs to be resolved before a large scale deployment within Wikimedia Foundation projects. I heard that Markus Krötzsch is going to Oxford to work on core SMW, so there might come some changes to SMW in the future. Code audit of SMW lacks.
As I was writing a custom Lucene search engine for WikiPapers I realized that it is a perfect replacement for Semantic MediaWiki. Lucene has fields, it supports boolean operators and you can format its output. All that is needed is to write the Lucene backend (perhaps just modifying MWLucene) and write a parser function that supports using templates for formatting of the output of queries. Lucene is extremely fast and can scale to whatever we can imagine doing. That's my proposed plan.
It not 'necessarily necessary' to make a new Wikimedia project. There has been a suggestion (in the meta or strategy wiki) just to use a namespace in Wikipedia. You could then have a page called http://en.wikipedia.org/wiki/Bib:The_wick_in_the_candle_of_learning
I believe it is necessary. First, the idea is for any mediawiki anywhere (and any software with appropriate extensions) to be able to cite the same source. Secondly, the project would be multilingual.
I think somebody's mentioned OpenLibrary on this thread. In case not: http://openlibrary.org/ Its scope is limited to books, but their interests are similar.
-Jodi
Cheers,
Brian Mingus Graduate Student Computational Cognitive Neuroscience Lab University of Colorado at Boulder
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
On Tue, Jul 20, 2010 at 11:56 AM, Jodi Schneider jodi.schneider@deri.orgwrote:
Hi Brian,
On 20 Jul 2010, at 18:02, Brian J Mingus wrote:
On Mon, Jul 19, 2010 at 4:06 PM, Finn Aarup Nielsen fn@imm.dtu.dk wrote:
Hi Brian and others,
I also think that it would be interesting with some bibliographic support, for two-way citation tracking and commenting on articles (for example), but I furthermore find that particular in science article we often find data that is worth structuring and put in a database or a structured wiki, so that we can extract the data for meta-analysis and specialized information retrieval. That is what I also do in the Brede Wiki. I use the templates to store such data. So if such a system as yours is implemented we should not just think of it as a bibliographic database but in more broader terms: A data wiki.
Although the technology required to make a WikiCite happen will be applicable to a more generalized wiki for storing data I think that is too broad for the current proposal. A WMF analogue to Google Base is an entirely new beast that has its own requirements. I certainly think it's an interesting and worthwhile idea, but I don't feel that we are there yet.
As the 'key' (the wiki page title) I use the (lowercase) title of the
article. That might be more reader friendly - but usually longer. I think that KangHsuKrajbichEtAl09 is too camel-cased. Neither the title nor author list + year will be unique, so we need some predictable disambig.
I noticed that AcaWiki is using the title, but I am personally not a fan of it. The motivation for using a key comes from BibTeX. When you cite an entry in a publication in LaTeX, you type \cite{key}. Also, I think most bibliographic formats support such a key. The idea is that there is a universal token that you can type into Google that will lead you to the right item. The predictable disambig is in the format I sent out (which likely needs modification for other kinds of sources). The format is Author1Author2Author3EtAlYYb. Here is a real world example from a pair of very prolific scientists, Deco & Rolls, who published at least three papers together in 2005. In our lab we have really come to love these keys - they are very memorable tokens that you can verbally pass on to other scientists in the midst of a discussion. Eventually, if they enter the key you have given them into Google, they will get the right entry at "WikiCite".
DecoRolls05 - Synaptic and spiking dynamics underlying reward reversal in the orbitofrontal cortex. DecoRolls05b - Sequential memory: a putative neural and synaptic dynamical mechanism. DecoRolls05c - Attention, short-term memory, and action selection: a unifying theory.
Citation keys of this sort work, but they have to be decided on by some external system. Who decides which paper is -, b, and c? Publication order would be one way to do it -- but that's complicated, especially with online first publication, or overlapping conferences.
I think whether they're memorable tokens might vary by person... Sure, the author and year will be identifiable, even memorable. But the a, b, c?
If you want to support more than recent works, I'd urge YYYY instead of YY. Then we only have an issue for pre-0 stuff. :)
Also consider differentiating authors from title and year, perhaps with slashes. author1-author2-author3-etal/YYYY/b I'm not convinced that -'s are better than capital letters (author last names can have both)...
The key seems to be a very important point, so it's important that we get it right. My thinking is guided by several constraints. First, I strongly dislike the numeric keys used at sites such as CiteULike and most database sites (such as 7523225). To the greatest degree possible I believe the key should actually convey what is behind the link. On the other hand, the key should not be too long. Numeric keys maximize the shortness while telling you nothing , whereas titles as keys are very long and don't give you some of the most important information - the authors and the year it was published. The key format I have suggested does seem to have a flaw, being that it easily becomes ambiguous and you must resort to a token that is not easily memorable. Then again, even though many authors and sets of authors will publish multiple items in a year, the vast majority of works have a unique set of authors for a given year.
I like your suggestion that the abc disambiguator be chosen based on the first date of publication, and I also like the prospect of using slashes since they can't be contained in names. Using the full year is a good idea too. We can combine these to come up with a key that, in principle, is guaranteed to be unique. This key would contain:
1) The first three author names separated by slashes 2) If there are more than three authors, an EtAl 3) Some or all of the date. For instance, if there is only one source by this set of authors that year, we can just use YYYY. However, once another source by those set of authors is added, the key should change to MMDDYYYY or similar. If there are multiple publications on the same day, we can resort to abc. Redirects and disambiguation pages can be set up when a key changes.
Since the slashes are somewhat cumbersome, perhaps we can not make them mandatory, but similarly use them only when they are necessary in order to "escape" a name. In the case that one of the authors does not have a slash in their name - the dominant case - we can stick to the easily legible and niecly compact CamelCase format.
Example keys generated by this algorithm:
KangHsuKrajbichEtAl2009 Author1Author2/Author-Three/2009 Author1Author2AuthorThree10032009 Author1Author2AuthorThree12312009
I have one field to each author so that I can automatically link authors.
This is accomplished via Semantic Forms, using the arraymap parser function. You just provide a comma-separated list of authors, and they each get semantic property definitions and deep linking to all papers published by that author.
Sure -- unless authors have the same name, or use different forms of the name.
One of my coauthors goes by John G. Breslin for disambiguration since his name is common -- but on the institute website he's credited as John Breslin, since that's the only name the system recognizes.
In other words, some authority control will be needed. Libraries have a long history with this. Groups of booklovers do it, too. For instance, here's the LibraryThing page for John Smith: http://www.librarything.com/author/smithjohn Notice that you can split and join authors -- LibraryThing's way of giving users the ability to join and separate. Or see http://www.librarything.com/author/carrolllewis Sometimes there are difficult questions -- such as "Is Lewis Carroll the same as Charles Dodgson?" - which depends on what you mean by "same".
For the scope of the potential problem, look at highly published authors -- for instance the "alternative names" list for Dante: http://www.worldcat.org/identities/lccn-n78-95495
LibraryThing is a great example of how to do disambiguation. We can only hope that we can likewise someday have a user community as pedantic and dedicated as theirs ;-) A big part of their success is in providing their users with straightforward tools for doing the disambig work.
I do not include abstracts in my CC-by-sa'ed wiki, since I am not sure how
publishers regard the copyright for abstracts. Neither I am sure about the forward cites. Most commerical publishers hide the cites for unpaid viewing. Including cites in CC-by-sa material on a large-scale may infringe publishers' copyright. Perhaps it is possible to negotiate with some publishers. We need some talk with 'closed access' publishers before we add a such data.
Yes, I have added many nice features to WikiPapers that can unfortunately not make it into the proposed WMF project. Some can, some can't. For example, adding papers to the wiki is via a one click bookmarklet. First, you highlight the title of a paper anywhere on the web, be it a webpage, e-mail, or journal site. Then, you click your "Add to wiki" bookmarklet. On my webserver I am running the citation scraping software from Connotea, CiteULike, and Zotero. I also have a Google Scholar scraper and PubMed importer. You can choose to use one of those sources, or you can choose to merge all of the metadata together. It's automatically added to the wiki for you. Additionally, I have written a bash script that is very adept at getting the pdfs from journals, so it automatically tries to download the pdf and upload it to the wiki for you. I have also implemented the ability to compute the articles that an article cites, and vice versa. With respect to abstracts these scrapers aren't that great. Abstracts usually come from PubMed, whose database you can license, but you cannot change their metadata IIRC.
Ultimately, I think the community will have to take a very careful look at what data can be added to the wiki and design policies accordingly. On Wikipedia I believe copyright enforcement has largely been up to the community, and it takes a long time to converge on appropriate policies. Needless to say, much of the technologies I described in the last paragraph would not be found legal on a public wiki.
I am not sure what 'owner' is in your format. Surely you cant have owners
in Wikimedia/MediaWiki wiki? And 'dateadded' would already be recorded in the revision history.
The 'owner' field is a misnomer, but in lieu of mysql support it lets you know which individuals have that entry in their personal bibliographies. dateadded is needed due to what at least used to be a bug in Semantic MediaWiki.
We probably need to check on the final format of the bibliographic template
to make sure it is easy translatable to the most common bibliographic formats: bibtex, refman, Z3988 microformat, pubmed, etc.
I have written extensive amounts of Python interchange code between wiki template syntax and BibTeX. I chose BibTeX because it is rather standard, our lab uses it, and it is very similar to template syntax. Also, I use Bibutils to convert from BibTeX to most popular formats, and vice versa for mass import of bibliographies: http://www.scripps.edu/~cdputnam/software/bibutils/
BibTeX is good for backwards compatibility, but I'd urge a richer data format -- probably based on bibo RDF: http://bibliontology.com/ It's already widely used: http://bibliontology.com/projects
It was probably a mistake for me to describe WikiPapers as designed around BibTeX. In fact, it's designed around mediawiki templates. From templates as your start, you can support any other format for both import and export.
As I understand there are issue with Semantic MediaWiki with respect to
performance and security that needs to be resolved before a large scale deployment within Wikimedia Foundation projects. I heard that Markus Krötzsch is going to Oxford to work on core SMW, so there might come some changes to SMW in the future. Code audit of SMW lacks.
As I was writing a custom Lucene search engine for WikiPapers I realized that it is a perfect replacement for Semantic MediaWiki. Lucene has fields, it supports boolean operators and you can format its output. All that is needed is to write the Lucene backend (perhaps just modifying MWLucene) and write a parser function that supports using templates for formatting of the output of queries. Lucene is extremely fast and can scale to whatever we can imagine doing. That's my proposed plan.
It not 'necessarily necessary' to make a new Wikimedia project. There has
been a suggestion (in the meta or strategy wiki) just to use a namespace in Wikipedia. You could then have a page called http://en.wikipedia.org/wiki/Bib:The_wick_in_the_candle_of_learning
I believe it is necessary. First, the idea is for any mediawiki anywhere (and any software with appropriate extensions) to be able to cite the same source. Secondly, the project would be multilingual.
I think somebody's mentioned OpenLibrary on this thread. In case not: http://openlibrary.org/ Its scope is limited to books, but their interests are similar.
-Jodi
Cheers,
Brian Mingus Graduate Student Computational Cognitive Neuroscience Lab University of Colorado at Boulder
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Hi guys! I'm glad my little post helped re-start such a productive conversation.
Since some people are replying only to the research-l list and some to both research-l and foundation-l (my fault for cc'ing both) maybe we should centralize this discussion (at least of the nitty gritty metadata issues) on the research list for now? thread here: http://lists.wikimedia.org/pipermail/wiki-research-l/2010-July/thread.html
Of course the perennial issue of how to propose a new WMF project is very much a foundation-l topic.
regards, phoebe
On Tue, Jul 20, 2010 at 12:26 PM, Brian J Mingus Brian.Mingus@colorado.edu wrote:
On Tue, Jul 20, 2010 at 11:56 AM, Jodi Schneider jodi.schneider@deri.org wrote:
Hi Brian, On 20 Jul 2010, at 18:02, Brian J Mingus wrote:
On Mon, Jul 19, 2010 at 4:06 PM, Finn Aarup Nielsen fn@imm.dtu.dk wrote:
Hi Brian and others,
I also think that it would be interesting with some bibliographic support, for two-way citation tracking and commenting on articles (for example), but I furthermore find that particular in science article we often find data that is worth structuring and put in a database or a structured wiki, so that we can extract the data for meta-analysis and specialized information retrieval. That is what I also do in the Brede Wiki. I use the templates to store such data. So if such a system as yours is implemented we should not just think of it as a bibliographic database but in more broader terms: A data wiki.
On Tue, Jul 20, 2010 at 9:26 PM, Brian J Mingus Brian.Mingus@colorado.edu wrote:
I like your suggestion that the abc disambiguator be chosen based on the first date of publication, and I also like the prospect of using slashes since they can't be contained in names. Using the full year is a good idea too. We can combine these to come up with a key that, in principle, is guaranteed to be unique. This key would contain:
- The first three author names separated by slashes
why not separate by pluses? they don't form part of names either, and don't cause problems with wiki page titles.
- If there are more than three authors, an EtAl
don't think that's necessary if we get the abc part right.
- Some or all of the date. For instance, if there is only one source by
this set of authors that year, we can just use YYYY. However, once another source by those set of authors is added, the key should change to MMDDYYYY or similar.
I don't think it is a good idea to change one key as a function of updates on another, except for a generic disambiguation tag.
If there are multiple publications on the same day, we can resort to abc. Redirects and disambiguation pages can be set up when a key changes.
As Jodi pointed out already, the exact date is often not clearly identifiable, so I would go simply for the year. Instead of an alphabetic abc, one could use some function of the article title (e.g. the first three words thereof, or the initials of the first three words), always in lower case.
An even less ambiguous abc would be starting page (for printed stuff) or article number (for online only) but this brings us back to the 7523225 problem you mentioned above.
Since the slashes are somewhat cumbersome, perhaps we can not make them mandatory, but similarly use them only when they are necessary in order to "escape" a name. In the case that one of the authors does not have a slash in their name - the dominant case - we can stick to the easily legible and niecly compact CamelCase format.
Example keys generated by this algorithm:
KangHsuKrajbichEtAl2009
Kang+Hsu+Krajbich+2009+the+wick+in or Kang+Hsu+Krajbich+2009+twi
also note that the CamelCase key does not yield results in a google search, whereas the first plused variant brings up the right work correctly, while the plused one with initialed title tends to bring at least something written by or cited from these authors.
Author1Author2/Author-Three/2009
Author1+Author2+Author-Three+2009+just+another+article or Author1+Author2+Author-Three+2009+jat
Of course, it does not have to be _exactly_ three authors, nor three words from the title, and it does not solve the John Smith (or Zheng Wang) problem.
Daniel
- The first three author names separated by slashes
why not separate by pluses? they don't form part of names either, and don't cause problems with wiki page titles.
I like this... however, how would you represent this in a URL? Also note that using plusses in page names don't work with all server configurations, since plus has a special meaning in URLs.
- Some or all of the date. For instance, if there is only one source by
this set of authors that year, we can just use YYYY. However, once another source by those set of authors is added, the key should change to MMDDYYYY or similar.
I don't think it is a good idea to change one key as a function of updates on another, except for a generic disambiguation tag.
I agree. And if you *have* to use the full date, use YYYYMMDD, not the other way around, please.
Since the slashes are somewhat cumbersome, perhaps we can not make them mandatory, but similarly use them only when they are necessary in order to "escape" a name. In the case that one of the authors does not have a slash in their name - the dominant case - we can stick to the easily legible and niecly compact CamelCase format.
Example keys generated by this algorithm:
KangHsuKrajbichEtAl2009
Kang+Hsu+Krajbich+2009+the+wick+in or Kang+Hsu+Krajbich+2009+twi
Both seem good, though i would suggest to form a convention to ignore any leading "the" and "a", to a more distinctive 3 word suffix.
Of course, it does not have to be _exactly_ three authors, nor three words from the title, and it does not solve the John Smith (or Zheng Wang) problem.
It also doesn't solve issues with transliteration: Merik Möller may become "Moeller" or "Moller", Jakob Voß may become "Voss" or "Vosz" or even "VoB", etc. In case of chinese names, it's often not easy to decide which part is the last name.
To avoid this kind of ambiguity, i suggest to automatically apply some type of normalization and/or hashing. There is quite a bit of research about this kind of normalisation out there, generally with the aim of detecting duplicates. Perhaps we can learn from bibsonomy.org, have a look how they do it: http://www.bibsonomy.org/help/doc/inside.html.
Gotta love open source university research projects :)
-- daniel
On 21 Jul 2010, at 09:42, Daniel Kinzler wrote:
Kang+Hsu+Krajbich+2009+the+wick+in
This seems best to me of what's proposed so far.
Both seem good, though i would suggest to form a convention to ignore any leading "the" and "a", to a more distinctive 3 word suffix.
While that's a good idea, then we'd have to know all "indistinctive" words in all languages. (Die, Der, La, L', ...)
There are still going to be duplicates, alas...
Of course, it does not have to be _exactly_ three authors, nor three words from the title, and it does not solve the John Smith (or Zheng Wang) problem.
It also doesn't solve issues with transliteration: Merik Möller may become "Moeller" or "Moller", Jakob Voß may become "Voss" or "Vosz" or even "VoB", etc. In case of chinese names, it's often not easy to decide which part is the last name.
To avoid this kind of ambiguity, i suggest to automatically apply some type of normalization and/or hashing. There is quite a bit of research about this kind of normalisation out there, generally with the aim of detecting duplicates. Perhaps we can learn from bibsonomy.org, have a look how they do it: http://www.bibsonomy.org/help/doc/inside.html.
Good idea!
-Jodi
Jodi Schneider schrieb:
On 21 Jul 2010, at 09:42, Daniel Kinzler wrote:
Kang+Hsu+Krajbich+2009+the+wick+in
This seems best to me of what's proposed so far.
Both seem good, though i would suggest to form a convention to ignore any leading "the" and "a", to a more distinctive 3 word suffix.
While that's a good idea, then we'd have to know all "indistinctive" words in all languages. (Die, Der, La, L', ...)
Stopword lists for major languages exists, and where they don't, they are easily created, even automatically. Word frequency analysis on a few megabyte of text is cheap these days :)
-- daniel
On Wed, 21 Jul 2010, Jodi Schneider wrote:
On 21 Jul 2010, at 09:42, Daniel Kinzler wrote:
Kang+Hsu+Krajbich+2009+the+wick+in
This seems best to me of what's proposed so far.
Both seem good, though i would suggest to form a convention to ignore any leading "the" and "a", to a more distinctive 3 word suffix.
While that's a good idea, then we'd have to know all "indistinctive" words in all languages. (Die, Der, La, L', ...)
There are still going to be duplicates, alas...
Of course, it does not have to be _exactly_ three authors, nor three words from the title, and it does not solve the John Smith (or Zheng Wang) problem.
It also doesn't solve issues with transliteration: Merik Möller may become "Moeller" or "Moller", Jakob Voß may become "Voss" or "Vosz" or even "VoB", etc. In case of chinese names, it's often not easy to decide which part is the last name.
I have a large bibtex file where I (mostly) use Surname + one initial + year + first important word (http://neuro.imm.dtu.dk/software/lyngby/doc/lyngby.bib)
So for example: AaltoS2002Neuroanatomical
There are lots of special cases
"M. C. B. {\AA}berg" becomes AbergM2006Multivariate (transliterate Å)
"Anissa Abi-Dargham" AbiDarghamA2000Measurement (discard dash).
ACM computer classification system "ACM1998Computing" (an organization as an author: do you use 'association' or 'ACM'?)
"A Content-Driven Reputation System for the {Wikipedia}" -> AdlerB2007ContentDriven (discarding slash in title and camelcasing)
"$[^{15}$O$]$water {PET}: More ``Noise'' than Signal?" -> StrotherS1996Owater (here we have sharp parentheses that will be a problem in wiki text. I suppose that in chemistry it becomes even worse)
"On the Distribution of the Quotient of two chance variables" becomes CurtissJ1941On (as 'On' here is not regarded as a stopword).
Modelling the fMRI response using smooth FIR filters -> NielsenF2001ModelingfMRI (extra word because of collision with "Modeling of locations in the {BrainMap} database: Detection of outliers"
With 3 author + year + title you sometimes run into collisions:
author = {J. M. Ollinger and Gordon L. Shulman and M. Corbetta}, title = {Separating Processes within a Trial in Event-Related Functional {MRI}. {II}. Analysis},
author = {J. M. Ollinger and Gordon L. Shulman and M. Corbetta}, title = {Separating Processes within a Trial in Event-Related Functional {MRI}. {I}. The Method},
When dealing with scientific articles it is not always possible to use the full given name, since sometimes you just know the initial.
I know one called Vibe Frøkjær. Presumable because she is afraid the PubMed and others will not be able to handle the Nordic letters she writes her name as Vibe G. Frokjaer in science contexts. Other authors may write her as Vibe G. Frøkjær.
Articles usually one have one edition. Sometimes you find reprinted versions here and there. For books there might be different versions and you need to find out whether you want to have the key to the 'Work', 'Expression', 'Manifestation' or 'Item' to use the wording from
http://en.wikipedia.org/wiki/Functional_Requirements_for_Bibliographic_Recor...
The French Wikipedia has a page for each book title ('work' regardless of language and editions). Editions are listed with multiple infoboxes on the page. In this way there is not a one-to-one correspondence between wiki page and, say, ISBN. It seems the best to me to have one page for a 'work' where you collect comments. However, in citations with page numbers you need the 'expression' because of page break differences between versions.
I like the French way, except that each book has two pages: One under the 'Reference' namespace and another under the 'Template' namespace.
The French tend to use "Title (authors)" as key in the Reference namespace. Mostly fullname:
http://fr.wikipedia.org/wiki/R%C3%A9f%C3%A9rence:Weaving_the_Web_(Tim_Berner...)
But sometimes diverge a bit:
http://fr.wikipedia.org/wiki/R%C3%A9f%C3%A9rence:Theory_of_numbers_(HardyWri...)
The associated template has somewhat unpredictable name, e.g.,
http://fr.wikipedia.org/wiki/Mod%C3%A8le:HardyWright
They link in the template instatiations, e.g., "auteurs=[[Tim Berners-Lee]], Mark Fischetti" which I still don't like and would instead suggest:
author1=Tim Berners-Lee | author2=Mark Fischetti and templates [[{{{author1}}}]], [[{{{author1}}}]] or perhaps better for disambig [[{{authorlink1}}}|{{{author1}}}]], [[{{{authorlink2|{{{author2}}}]] This way you allow for easier extraction and you do not need SMW array processing to distinguish the names.
It seems to me that the French has come a long way. I am surprised that only John Vandenberg has pointed to the French efforts. I was not aware of it before.
Do anyone knows anything about the French discussions on the introduction of the 'Reference' namespace? Should we just implement the French system on the English Wikipedia and we are there?
/Finn
___________________________________________________________________
Finn Aarup Nielsen, DTU Informatics, Denmark Lundbeck Foundation Center for Integrated Molecular Brain Imaging http://www.imm.dtu.dk/~fn/ http://nru.dk/staff/fnielsen/ ___________________________________________________________________
On Wed, Jul 21, 2010 at 5:49 AM, Finn Aarup Nielsen fn@imm.dtu.dk wrote:
On Wed, 21 Jul 2010, Jodi Schneider wrote:
On 21 Jul 2010, at 09:42, Daniel Kinzler wrote:
Kang+Hsu+Krajbich+2009+the+wick+in
This seems best to me of what's proposed so far.
Both seem good, though i would suggest to form a convention to ignore any leading "the" and "a", to a more distinctive 3 word suffix.
While that's a good idea, then we'd have to know all "indistinctive" words in all languages. (Die, Der, La, L', ...)
There are still going to be duplicates, alas...
Of course, it does not have to be _exactly_ three authors, nor three
words from the title, and it does not solve the John Smith (or Zheng Wang) problem.
It also doesn't solve issues with transliteration: Merik Möller may become "Moeller" or "Moller", Jakob Voß may become "Voss" or "Vosz" or even "VoB", etc. In case of chinese names, it's often not easy to decide which part is the last name.
I have a large bibtex file where I (mostly) use Surname + one initial + year + first important word ( http://neuro.imm.dtu.dk/software/lyngby/doc/lyngby.bib)
So for example: AaltoS2002Neuroanatomical
There are lots of special cases
"M. C. B. {\AA}berg" becomes AbergM2006Multivariate (transliterate Å)
"Anissa Abi-Dargham" AbiDarghamA2000Measurement (discard dash).
ACM computer classification system "ACM1998Computing" (an organization as an author: do you use 'association' or 'ACM'?)
"A Content-Driven Reputation System for the {Wikipedia}" -> AdlerB2007ContentDriven (discarding slash in title and camelcasing)
"$[^{15}$O$]$water {PET}: More ``Noise'' than Signal?" -> StrotherS1996Owater (here we have sharp parentheses that will be a problem in wiki text. I suppose that in chemistry it becomes even worse)
"On the Distribution of the Quotient of two chance variables" becomes CurtissJ1941On (as 'On' here is not regarded as a stopword).
Modelling the fMRI response using smooth FIR filters -> NielsenF2001ModelingfMRI (extra word because of collision with "Modeling of locations in the {BrainMap} database: Detection of outliers"
With 3 author + year + title you sometimes run into collisions:
author = {J. M. Ollinger and Gordon L. Shulman and M. Corbetta}, title = {Separating Processes within a Trial in Event-Related Functional {MRI}. {II}. Analysis},
author = {J. M. Ollinger and Gordon L. Shulman and M. Corbetta}, title = {Separating Processes within a Trial in Event-Related Functional {MRI}. {I}. The Method},
When dealing with scientific articles it is not always possible to use the full given name, since sometimes you just know the initial.
I know one called Vibe Frøkjær. Presumable because she is afraid the PubMed and others will not be able to handle the Nordic letters she writes her name as Vibe G. Frokjaer in science contexts. Other authors may write her as Vibe G. Frøkjær.
Articles usually one have one edition. Sometimes you find reprinted versions here and there. For books there might be different versions and you need to find out whether you want to have the key to the 'Work', 'Expression', 'Manifestation' or 'Item' to use the wording from
http://en.wikipedia.org/wiki/Functional_Requirements_for_Bibliographic_Recor...
The French Wikipedia has a page for each book title ('work' regardless of language and editions). Editions are listed with multiple infoboxes on the page. In this way there is not a one-to-one correspondence between wiki page and, say, ISBN. It seems the best to me to have one page for a 'work' where you collect comments. However, in citations with page numbers you need the 'expression' because of page break differences between versions.
I like the French way, except that each book has two pages: One under the 'Reference' namespace and another under the 'Template' namespace.
The French tend to use "Title (authors)" as key in the Reference namespace. Mostly fullname:
http://fr.wikipedia.org/wiki/R%C3%A9f%C3%A9rence:Weaving_the_Web_(Tim_Berner...)http://fr.wikipedia.org/wiki/R%C3%A9f%C3%A9rence:Weaving_the_Web_(Tim_Berners-Lee)
But sometimes diverge a bit:
http://fr.wikipedia.org/wiki/R%C3%A9f%C3%A9rence:Theory_of_numbers_(HardyWri...)http://fr.wikipedia.org/wiki/R%C3%A9f%C3%A9rence:Theory_of_numbers_(HardyWright)
The associated template has somewhat unpredictable name, e.g.,
http://fr.wikipedia.org/wiki/Mod%C3%A8le:HardyWrighthttp://fr.wikipedia.org/wiki/Mod%C3%A8le:HardyWright
They link in the template instatiations, e.g., "auteurs=[[Tim Berners-Lee]], Mark Fischetti" which I still don't like and would instead suggest:
author1=Tim Berners-Lee | author2=Mark Fischetti and templates [[{{{author1}}}]], [[{{{author1}}}]] or perhaps better for disambig [[{{authorlink1}}}|{{{author1}}}]], [[{{{authorlink2|{{{author2}}}]] This way you allow for easier extraction and you do not need SMW array processing to distinguish the names.
It seems to me that the French has come a long way. I am surprised that only John Vandenberg has pointed to the French efforts. I was not aware of it before.
Do anyone knows anything about the French discussions on the introduction of the 'Reference' namespace? Should we just implement the French system on the English Wikipedia and we are there?
/Finn
Finn,
I'm not a fan of including a portion of the the title for a couple of reasons. First, it's not required to make the key unique. Second, it makes the key longer than necessary. Third, the first word or words from a title are not guaranteed to convey any meaning.
Regarding a Reference: namespace, I can see how this has some utility and why projects have moved to it. However, I consider it a stopgap solution that projects have implemented when what they really want is a proper wiki for citations. Here are a few quick things that you can't do (or would have to go out of your way to do) with just a Reference namespace that you can do with a wiki dedicated to all the world's citations:
- Custom reports that are boolean combinations of citation fields, ala SMW. This requires substantive new technology as SMW doesn't scale. - User bibliographies which are a logical subset of all literature ever published. - Conduct a search of the literature. - A new set of policies that are not necessarily NPOV, regarding the creation of articles that discuss collections of literature (lit review-like concept). The content of these policies will emerge over years with the help of a community. These articles could, for instance, help people who are navigating a new area of a literature avoid getting stuck in local minima. It could point out the true global context to them. It could point out experimenter biases in the literature; for example, a recent article was published where it was found that citation networks in academic literature can have a tendency to form based on the assumption of authority, when in fact that authority is false, bringing a whole thread of publications into doubt. - Create wiki articles about individual sources.
While I am not dedicated to any of these things happening, I also do not wish to rule them out. The hope is that a new community will emerge around the project and guide it in the direction that is most useful. My hope in this thread is that we can identify some of the most likely cases and imagine what it will be like, so that we can convey this vision to the Foundation and they can get a sense of the potential importance of the project.
Brian
On 21 Jul 2010, at 19:47, Brian J Mingus wrote:
Finn,
I'm not a fan of including a portion of the the title for a couple of reasons. First, it's not required to make the key unique. Second, it makes the key longer than necessary. Third, the first word or words from a title are not guaranteed to convey any meaning.
Regarding a Reference: namespace, I can see how this has some utility and why projects have moved to it. However, I consider it a stopgap solution that projects have implemented when what they really want is a proper wiki for citations. Here are a few quick things that you can't do (or would have to go out of your way to do) with just a Reference namespace that you can do with a wiki dedicated to all the world's citations:
- Custom reports that are boolean combinations of citation fields, ala SMW. This requires substantive new technology as SMW doesn't scale.
- User bibliographies which are a logical subset of all literature ever published.
Not sure why a Reference namespace couldn't do this.
- Conduct a search of the literature.
Or this (you can search just one namespace)
- A new set of policies that are not necessarily NPOV, regarding the creation of articles that discuss collections of literature (lit review-like concept). The content of these policies will emerge over years with the help of a community. These articles could, for instance, help people who are navigating a new area of a literature avoid getting stuck in local minima. It could point out the true global context to them. It could point out experimenter biases in the literature; for example, a recent article was published where it was found that citation networks in academic literature can have a tendency to form based on the assumption of authority, when in fact that authority is false, bringing a whole thread of publications into doubt.
I'm not sure that literature reviews belong in the same wiki as citations. That's definitely a different namespace. :)
- Create wiki articles about individual sources.
This might or might not be the same wiki -- but that could be interesting.
I could imagine a page for a journal being pulled in from several sources: the collection of citations in the wiki for that journal, RSS from the current contents (license permitting), a Wikipedia page about the journal (if it exists), a link to author guidelines/submission info, open access info from SHERPA/ROMEO, .... In this vision, very little of the content "lives" in this wiki itself. Rather, it's templated from numerous other places.... Perhaps in the way "buy this book" links are handled in librarything -- there are numerous external links which can be activated with a checkbox, and some external content that is pulled in based on copyright review.
While I am not dedicated to any of these things happening, I also do not wish to rule them out. The hope is that a new community will emerge around the project and guide it in the direction that is most useful. My hope in this thread is that we can identify some of the most likely cases and imagine what it will be like, so that we can convey this vision to the Foundation and they can get a sense of the potential importance of the project.
Scoping is a big problem, I think -- because it would help to have a vision of which of several related tasks/endpoints is primary.
I think an investigation of what fr.wikipedia is doing would be really useful -- does anybody edit there, or have an interest in digging into that? Questions might include: What is the reference namespace doing? What isn't it doing, that they wish it would? Did they consider alternatives to a namespace? How is maintenance going? Do they see the reference namespace as longstanding into the future, or as a stopgap?
-Jodi
The model for this is WP:Book sources, though this relies upon the user selecting the appropriate places to look, rather than guiding him.
On Wed, Jul 21, 2010 at 6:33 PM, Jodi Schneider jodi.schneider@deri.org wrote:
On 21 Jul 2010, at 19:47, Brian J Mingus wrote:
Finn, I'm not a fan of including a portion of the the title for a couple of reasons. First, it's not required to make the key unique. Second, it makes the key longer than necessary. Third, the first word or words from a title are not guaranteed to convey any meaning. Regarding a Reference: namespace, I can see how this has some utility and why projects have moved to it. However, I consider it a stopgap solution that projects have implemented when what they really want is a proper wiki for citations. Here are a few quick things that you can't do (or would have to go out of your way to do) with just a Reference namespace that you can do with a wiki dedicated to all the world's citations:
- Custom reports that are boolean combinations of citation fields, ala SMW.
This requires substantive new technology as SMW doesn't scale.
- User bibliographies which are a logical subset of all literature ever
published.
Not sure why a Reference namespace couldn't do this.
- Conduct a search of the literature.
Or this (you can search just one namespace)
- A new set of policies that are not necessarily NPOV, regarding the
creation of articles that discuss collections of literature (lit review-like concept). The content of these policies will emerge over years with the help of a community. These articles could, for instance, help people who are navigating a new area of a literature avoid getting stuck in local minima. It could point out the true global context to them. It could point out experimenter biases in the literature; for example, a recent article was published where it was found that citation networks in academic literature can have a tendency to form based on the assumption of authority, when in fact that authority is false, bringing a whole thread of publications into doubt.
I'm not sure that literature reviews belong in the same wiki as citations. That's definitely a different namespace. :)
- Create wiki articles about individual sources.
This might or might not be the same wiki -- but that could be interesting. I could imagine a page for a journal being pulled in from several sources: the collection of citations in the wiki for that journal, RSS from the current contents (license permitting), a Wikipedia page about the journal (if it exists), a link to author guidelines/submission info, open access info from SHERPA/ROMEO, .... In this vision, very little of the content "lives" in this wiki itself. Rather, it's templated from numerous other places.... Perhaps in the way "buy this book" links are handled in librarything -- there are numerous external links which can be activated with a checkbox, and some external content that is pulled in based on copyright review.
While I am not dedicated to any of these things happening, I also do not wish to rule them out. The hope is that a new community will emerge around the project and guide it in the direction that is most useful. My hope in this thread is that we can identify some of the most likely cases and imagine what it will be like, so that we can convey this vision to the Foundation and they can get a sense of the potential importance of the project.
Scoping is a big problem, I think -- because it would help to have a vision of which of several related tasks/endpoints is primary. I think an investigation of what fr.wikipedia is doing would be really useful -- does anybody edit there, or have an interest in digging into that? Questions might include: What is the reference namespace doing? What isn't it doing, that they wish it would? Did they consider alternatives to a namespace? How is maintenance going? Do they see the reference namespace as longstanding into the future, or as a stopgap? -Jodi _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
On Wed, Jul 21, 2010 at 4:33 PM, Jodi Schneider jodi.schneider@deri.orgwrote:
On 21 Jul 2010, at 19:47, Brian J Mingus wrote:
Finn,
I'm not a fan of including a portion of the the title for a couple of reasons. First, it's not required to make the key unique. Second, it makes the key longer than necessary. Third, the first word or words from a title are not guaranteed to convey any meaning.
Regarding a Reference: namespace, I can see how this has some utility and why projects have moved to it. However, I consider it a stopgap solution that projects have implemented when what they really want is a proper wiki for citations. Here are a few quick things that you can't do (or would have to go out of your way to do) with just a Reference namespace that you can do with a wiki dedicated to all the world's citations:
- Custom reports that are boolean combinations of citation fields, ala SMW.
This requires substantive new technology as SMW doesn't scale.
- User bibliographies which are a logical subset of all literature ever
published.
Not sure why a Reference namespace couldn't do this.
- Conduct a search of the literature.
Or this (you can search just one namespace)
- A new set of policies that are not necessarily NPOV, regarding the
creation of articles that discuss collections of literature (lit review-like concept). The content of these policies will emerge over years with the help of a community. These articles could, for instance, help people who are navigating a new area of a literature avoid getting stuck in local minima. It could point out the true global context to them. It could point out experimenter biases in the literature; for example, a recent article was published where it was found that citation networks in academic literature can have a tendency to form based on the assumption of authority, when in fact that authority is false, bringing a whole thread of publications into doubt.
I'm not sure that literature reviews belong in the same wiki as citations. That's definitely a different namespace. :)
- Create wiki articles about individual sources.
This might or might not be the same wiki -- but that could be interesting.
I could imagine a page for a journal being pulled in from several sources: the collection of citations in the wiki for that journal, RSS from the current contents (license permitting), a Wikipedia page about the journal (if it exists), a link to author guidelines/submission info, open access info from SHERPA/ROMEO, .... In this vision, very little of the content "lives" in this wiki itself. Rather, it's templated from numerous other places.... Perhaps in the way "buy this book" links are handled in librarything -- there are numerous external links which can be activated with a checkbox, and some external content that is pulled in based on copyright review.
While I am not dedicated to any of these things happening, I also do not wish to rule them out. The hope is that a new community will emerge around the project and guide it in the direction that is most useful. My hope in this thread is that we can identify some of the most likely cases and imagine what it will be like, so that we can convey this vision to the Foundation and they can get a sense of the potential importance of the project.
Scoping is a big problem, I think -- because it would help to have a vision of which of several related tasks/endpoints is primary.
I think an investigation of what fr.wikipedia is doing would be really useful -- does anybody edit there, or have an interest in digging into that? Questions might include: What is the reference namespace doing? What isn't it doing, that they wish it would? Did they consider alternatives to a namespace? How is maintenance going? Do they see the reference namespace as longstanding into the future, or as a stopgap?
-Jodi
More broadly speaking, a reference namespace does not accomplish the goal of having a free repository of all citations, complete with collections of citations curated by the community, and documentation of those citations by the community, in various forms to be determined by the community. While it is possible to create specialized cases that suit the narrow needs of individual projects, I and many of the people I have spoken to see a justification for a broader vision. This broader vision is directly in line with the WMF mission of giving free access to the world's knowledge. One of the first steps must be making the Wikipedia's aware of that knowledge, and enabling them to build linked networks of information around it.
Brian
Sure, but first, is this capable of being done at all? I have never seen a method of bibliographic control that can cope with the complete range of publications, even just print publications. Perhaps we need to proceed within narrow domains.
Second, is this capable of being done by crowd-sourcing, or does it require enforceable standards? The work of Open Library is not a promising model, being a uncontrolled mix, done to many different standards. Actually, within the domain of scientific journal articles from the last 10 years in Western languages, the best current method seems to be a mechanical algorithm, the one used by Google Scholar. True, it does not aggregate perfectly--but it does aggregate better than any other existing database. And it does not get them all--nor could it no matter how much improved, for many of the versions that are actually available are off limits to its crawlers.
On Wed, Jul 21, 2010 at 7:02 PM, Brian J Mingus brian.mingus@colorado.edu wrote:
On Wed, Jul 21, 2010 at 4:33 PM, Jodi Schneider jodi.schneider@deri.org wrote:
On 21 Jul 2010, at 19:47, Brian J Mingus wrote:
Finn, I'm not a fan of including a portion of the the title for a couple of reasons. First, it's not required to make the key unique. Second, it makes the key longer than necessary. Third, the first word or words from a title are not guaranteed to convey any meaning. Regarding a Reference: namespace, I can see how this has some utility and why projects have moved to it. However, I consider it a stopgap solution that projects have implemented when what they really want is a proper wiki for citations. Here are a few quick things that you can't do (or would have to go out of your way to do) with just a Reference namespace that you can do with a wiki dedicated to all the world's citations:
- Custom reports that are boolean combinations of citation fields, ala
SMW. This requires substantive new technology as SMW doesn't scale.
- User bibliographies which are a logical subset of all literature ever
published.
Not sure why a Reference namespace couldn't do this.
- Conduct a search of the literature.
Or this (you can search just one namespace)
- A new set of policies that are not necessarily NPOV, regarding the
creation of articles that discuss collections of literature (lit review-like concept). The content of these policies will emerge over years with the help of a community. These articles could, for instance, help people who are navigating a new area of a literature avoid getting stuck in local minima. It could point out the true global context to them. It could point out experimenter biases in the literature; for example, a recent article was published where it was found that citation networks in academic literature can have a tendency to form based on the assumption of authority, when in fact that authority is false, bringing a whole thread of publications into doubt.
I'm not sure that literature reviews belong in the same wiki as citations. That's definitely a different namespace. :)
- Create wiki articles about individual sources.
This might or might not be the same wiki -- but that could be interesting. I could imagine a page for a journal being pulled in from several sources: the collection of citations in the wiki for that journal, RSS from the current contents (license permitting), a Wikipedia page about the journal (if it exists), a link to author guidelines/submission info, open access info from SHERPA/ROMEO, .... In this vision, very little of the content "lives" in this wiki itself. Rather, it's templated from numerous other places.... Perhaps in the way "buy this book" links are handled in librarything -- there are numerous external links which can be activated with a checkbox, and some external content that is pulled in based on copyright review.
While I am not dedicated to any of these things happening, I also do not wish to rule them out. The hope is that a new community will emerge around the project and guide it in the direction that is most useful. My hope in this thread is that we can identify some of the most likely cases and imagine what it will be like, so that we can convey this vision to the Foundation and they can get a sense of the potential importance of the project.
Scoping is a big problem, I think -- because it would help to have a vision of which of several related tasks/endpoints is primary. I think an investigation of what fr.wikipedia is doing would be really useful -- does anybody edit there, or have an interest in digging into that? Questions might include: What is the reference namespace doing? What isn't it doing, that they wish it would? Did they consider alternatives to a namespace? How is maintenance going? Do they see the reference namespace as longstanding into the future, or as a stopgap? -Jodi
More broadly speaking, a reference namespace does not accomplish the goal of having a free repository of all citations, complete with collections of citations curated by the community, and documentation of those citations by the community, in various forms to be determined by the community. While it is possible to create specialized cases that suit the narrow needs of individual projects, I and many of the people I have spoken to see a justification for a broader vision. This broader vision is directly in line with the WMF mission of giving free access to the world's knowledge. One of the first steps must be making the Wikipedia's aware of that knowledge, and enabling them to build linked networks of information around it. Brian
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
On Wed, Jul 21, 2010 at 5:47 PM, David Goodman dgoodmanny@gmail.com wrote:
Sure, but first, is this capable of being done at all? I have never seen a method of bibliographic control that can cope with the complete range of publications, even just print publications. Perhaps we need to proceed within narrow domains.
I assume that by range you mean the number of publications in a domain, and that by domain you mean the type of publication, be it a book, webpage or map.
The generic nature of a markup such as wiki template syntax allows us to easily adapt the same application to new domains. The challenge of the range within a domain is largely one of resolving ambiguities, which can be settled with policies that carefully adjudicate troublesome cases.
Second, is this capable of being done by crowd-sourcing, or does it require enforceable standards? The work of Open Library is not a promising model, being a uncontrolled mix, done to many different standards. Actually, within the domain of scientific journal articles from the last 10 years in Western languages, the best current method seems to be a mechanical algorithm, the one used by Google Scholar. True, it does not aggregate perfectly--but it does aggregate better than any other existing database. And it does not get them all--nor could it no matter how much improved, for many of the versions that are actually available are off limits to its crawlers.
In my conception the enforceable standards are to emerge in the meta pages of this project based on the actual issues that the community encounters.
Googlebot has many deep web accounts to journals online. When you search Google Scholar the relevance algorithm is actually comparing your query to the content of pdf pages which you do not have permission to access. Of course, Google can't access them all, but many publishers have found it in their interest to give them a complimentary account since it drives subscription rates.
We can rely on individuals, particularly academics, who have access to the deep web to help us curate the bibliography. And we can rely on the massive number of personal bibliographies already out there to help us get good coverage.
Cleaning up the mass of bibliographic content that I anticipate would be uploaded by users would require the writing of bots in coordination with the creation of policy pages.
Getting rid of copyright material would be handled in the same manner, I presume. After major content publishers see what we are doing, I am sure they will let us know their opinion about what we can and cannot do. It seems likely that they will overreach their bounds, and as I have seen on Wikipedia, the community members will happily ignore them. Or, if they think the requests are actually in compliance with the law, they will comply.
Brian
On Wed, Jul 21, 2010 at 9:49 PM, Finn Aarup Nielsen fn@imm.dtu.dk wrote:
.. Do anyone knows anything about the French discussions on the introduction of the 'Reference' namespace? Should we just implement the French system on the English Wikipedia and we are there?
This was discussed on en.wp in late 2007...
http://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29/Archive_...
The proposal on fr.wp in early 2006:
http://fr.wikipedia.org/wiki/Wikip%C3%A9dia:Prise_de_d%C3%A9cision/Espace_r%...
-- John Vandenberg
Thanks for those links, John.
I agree that a separate project is needed to have a central source that all language versions of all projects can reference. The citations version of Commons.
I like the French model of using "Article name (Authors)" as a key. Perhaps with "Article name (Authors, Year)" if needed to disambiguate. This shares a design principle with the move away from CamelCase to freeform article titles: one should be able to insert an article name into a natural sentence, and link the appropriate section of the sentence, and have it take you to the appropriate article.
To DGG's question: in the long run, the scope of "all cited works" can be captured in such a project, at least for the works cited on a wiki Project -- anyone making a new citation would either find it already in the project or would add it. Whether this covers all works cited by active academics of scholars depends on how effectively we draw them into our community and help them see where an extra minute of work on their part will help thousands of their readers, reviewers, and reusers.
SJ.
On Thu, Jul 22, 2010 at 12:13 AM, John Vandenberg jayvdb@gmail.com wrote:
On Wed, Jul 21, 2010 at 9:49 PM, Finn Aarup Nielsen fn@imm.dtu.dk wrote:
.. Do anyone knows anything about the French discussions on the introduction of the 'Reference' namespace? Should we just implement the French system on the English Wikipedia and we are there?
This was discussed on en.wp in late 2007...
http://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29/Archive_...
The proposal on fr.wp in early 2006:
http://fr.wikipedia.org/wiki/Wikip%C3%A9dia:Prise_de_d%C3%A9cision/Espace_r%...
-- John Vandenberg
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
There's a difference between a project to centralize the various references in Wikipedia, and an attempt to build a universal bibliographic database. The first is a reasonable project, though I think everyone involved has underestimated the extent to which normalization and manual aggregation will be needed.
On Thu, Jul 22, 2010 at 7:25 PM, Samuel Klein meta.sj@gmail.com wrote:
Thanks for those links, John.
I agree that a separate project is needed to have a central source that all language versions of all projects can reference. The citations version of Commons.
I like the French model of using "Article name (Authors)" as a key. Perhaps with "Article name (Authors, Year)" if needed to disambiguate. This shares a design principle with the move away from CamelCase to freeform article titles: one should be able to insert an article name into a natural sentence, and link the appropriate section of the sentence, and have it take you to the appropriate article.
To DGG's question: in the long run, the scope of "all cited works" can be captured in such a project, at least for the works cited on a wiki Project -- anyone making a new citation would either find it already in the project or would add it. Whether this covers all works cited by active academics of scholars depends on how effectively we draw them into our community and help them see where an extra minute of work on their part will help thousands of their readers, reviewers, and reusers.
SJ.
On Thu, Jul 22, 2010 at 12:13 AM, John Vandenberg jayvdb@gmail.com wrote:
On Wed, Jul 21, 2010 at 9:49 PM, Finn Aarup Nielsen fn@imm.dtu.dk wrote:
.. Do anyone knows anything about the French discussions on the introduction of the 'Reference' namespace? Should we just implement the French system on the English Wikipedia and we are there?
This was discussed on en.wp in late 2007...
http://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29/Archive_...
The proposal on fr.wp in early 2006:
http://fr.wikipedia.org/wiki/Wikip%C3%A9dia:Prise_de_d%C3%A9cision/Espace_r%...
-- John Vandenberg
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Samuel Klein identi.ca:sj w:user:sj
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
David,
In the m:WikiBibliography draft proposal I have briefly tried to explain the difference you allude to. Wikipedia is a project dedicated to synthesizing every notable topic into an encyclopedia. Since Wikipedia doesn't contain original research, eventually every statement there should be able to be traced to its source. The opposite also holds true - eventually every notable topic will be able to be traced back to Wikipedia. We don't necessarily have to cite all sources that a topic is mentioned in within a given article, but it is desirable to document the relationships between these sources so that we understand the true context. These really are two sides of the same problem, and the project proposal aims to cover both sides.
Brian
ps: Once people top-post it makes it challenging to bottom post without breaking thread continuity. Since I always top-post at work I don't mind doing it, but I just wanted to note that I know it might irk some people:)
On Sat, Jul 24, 2010 at 12:30 PM, David Goodman dgoodmanny@gmail.comwrote:
There's a difference between a project to centralize the various references in Wikipedia, and an attempt to build a universal bibliographic database. The first is a reasonable project, though I think everyone involved has underestimated the extent to which normalization and manual aggregation will be needed.
On Thu, Jul 22, 2010 at 7:25 PM, Samuel Klein meta.sj@gmail.com wrote:
Thanks for those links, John.
I agree that a separate project is needed to have a central source that all language versions of all projects can reference. The citations version of Commons.
I like the French model of using "Article name (Authors)" as a key. Perhaps with "Article name (Authors, Year)" if needed to disambiguate. This shares a design principle with the move away from CamelCase to freeform article titles: one should be able to insert an article name into a natural sentence, and link the appropriate section of the sentence, and have it take you to the appropriate article.
To DGG's question: in the long run, the scope of "all cited works" can be captured in such a project, at least for the works cited on a wiki Project -- anyone making a new citation would either find it already in the project or would add it. Whether this covers all works cited by active academics of scholars depends on how effectively we draw them into our community and help them see where an extra minute of work on their part will help thousands of their readers, reviewers, and reusers.
SJ.
On Thu, Jul 22, 2010 at 12:13 AM, John Vandenberg jayvdb@gmail.com
wrote:
On Wed, Jul 21, 2010 at 9:49 PM, Finn Aarup Nielsen fn@imm.dtu.dk
wrote:
.. Do anyone knows anything about the French discussions on the
introduction of
the 'Reference' namespace? Should we just implement the French system
on the
English Wikipedia and we are there?
This was discussed on en.wp in late 2007...
http://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29/Archive_...
The proposal on fr.wp in early 2006:
http://fr.wikipedia.org/wiki/Wikip%C3%A9dia:Prise_de_d%C3%A9cision/Espace_r%...
-- John Vandenberg
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Samuel Klein identi.ca:sj w:user:sj
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- David Goodman, Ph.D, M.L.S. http://en.wikipedia.org/wiki/User_talk:DGG
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
David wrote:
There's a difference between a project to centralize the various references in Wikipedia, and an attempt to build a universal bibliographic database. The first is a reasonable project, though I think everyone involved has underestimated the extent to which normalization and manual aggregation will be needed.
Well said. Reminds me on Erik Möllers Wikimania talk about Free Knowledge projects beyond the Encyclopedia: You need a clearly articulated mission. There already are many projects to create a universal bibliographic database (Worldcat, The Open Library, LibraryThing etc.) and all either failed or have a specific scope. A wiki-based bibliographic database for sources in Wikimedia projects ("citations version of Commons") is a reasonable scope, I think. "Lets just collect all bibliographic data we can get onto a gigantic pile of data" is not. Let's better focus on real use cases, such as citations in Wikimedia projects.
SJ wrote:
I like the French model of using "Article name (Authors)" as a key. Perhaps with "Article name (Authors, Year)" if needed to disambiguate. This shares a design principle with the move away from CamelCase to freeform article titles: one should be able to insert an article name into a natural sentence, and link the appropriate section of the sentence, and have it take you to the appropriate article.
With free form titles there will be no general 100% schema (there are always exceptions) but a general rule to start with is needed. There at least 32 ways to combine only title and authors: which to put first, which character to separate author names, order of names and name-parts, ways abbreviate etc. - and this are only the possibilities if its a simple English title with English author names!
If you are looking for a method to define one schema please have a loot at the Citation Style Language and use or define a citation style in CSL so users of Zotero, Mendeley and other bibliographic software can automatically create a key from given bibliographic data.
To DGG's question: in the long run, the scope of "all cited works" can be captured in such a project, at least for the works cited on a wiki Project -- anyone making a new citation would either find it already in the project or would add it. Whether this covers all works cited by active academics of scholars depends on how effectively we draw them into our community and help them see where an extra minute of work on their part will help thousands of their readers, reviewers, and reusers.
Again: there already *are* communities that collect and share bibliographic data - why should they move to a new project with unclear mission and unusable software (we need much more then Liquid Threads) that was never created for this task? Everyone talking about a Wikimedia project with bibliographic data should *at least* have a look at Zotero, CSL, The Open Library, and LibraryThing first and make clear then what a new project should copy from this existing projects and what should be done differently. Please do not reinvent a wheel that nobody beside some Wikimediacs want.
Don't get me wrong: I also want such a free bibliographic wiki database. But to attract more then a little fraction of the declining number of Wikipedia authors we need a clear mission and usable software for this task - I seen neither the one nor the other.
Jakob
On 24 Jul 2010, at 23:01, Jakob wrote:
...to attract more then a little fraction of the declining number of Wikipedia authors we need a clear mission and usable software for this task - I seen neither the one nor the other.
I think focusing on Wikimedia's citation needs is the most promising, especially if this is intended to be a WMF project.
As for mission -- yes -- let's talk about what problem we're trying to solve. Two central ones come to mind: 1. Improve verifiability by making it possible to start with a source and verify all claims made by referencing that source [1] 2. Make it easier for editors to give references, and readers to use them [2] Are those the right problems? Are there others? [3]
To figure out what the right problems are, I think it would help to look at the pain points -- and their solutions -- the hacks and proposals related to citations. Hacks include plugins and templates people have made to make MediaWiki more citation-friendly. Proposals include the ones on strategy wiki.
Anybody want to take a look through?
Some of the hacks and proposals are listed here: http://strategy.wikimedia.org/wiki/Category:Proposals_related_to_citations Could you add other hacks, proposals, and conversations related to citations, if you know of them?
-Jodi
[1] This can be done using backlinks. http://en.wikipedia.org/wiki/Special:WhatLinksHere/Template:Greenwood%26Earn... )
[2] I think of this as "actionable references" -- we'd have to explain exactly what the desirable qualities are. Adding to bilbiographic managers in one click is one of mine. :)
[3] Other side-effects might be helping to identify what's highly cited in Wikipedia (which would be interesting -- and might help prioritize Wikisource additions), automatically adding quotes to Wikiquote, ...
Jakob writes:
there already *are* communities that collect and share bibliographic data
I would be happy if anyone does what I was describing; no point in reinventing what already exists. But I have not found it:
I mean a public collection of citations, with reader-editable commentary and categorization, for published works. Something that Open Library could link to from each of its books, that arXiv.org and PLoS could link to from each of its articles. Something that, for better or worse, Wikipedia articles could link to also, when they are cited as sources.
Jodi Schneider jodi.schneider@deri.org wrote:
I think focusing on Wikimedia's citation needs is the most promising, especially if this is intended to be a WMF project.
Agreed. That is clearly the place to start, as it was with Commons.
And, as with Commons, the project should be free to develop its own scope, and be more than a servant project to the others. That scope may be grand (a collection of all educational freely licensed media; a general collection of citations), but shouldn't keep us from getting started now.
As for mission -- yes -- let's talk about what problem we're trying to solve. Two central ones come to mind:
- Improve verifiability by making it possible to start with a source and
verify all claims made by referencing that source [1] 2. Make it easier for editors to give references, and readers to use them [2]
< others? [3]
3. Enable commenting on sources, to discuss their reliability and notability, in a shared place. (Note the value of having a multilingual discussion here: currently notions of notability and reliability can change a great deal across language barriers)
4. Enable discussing splitting or merging sources, or providing disambiguations when different people are confusingly using a single citation to refer to more than one source.
To figure out what the right problems are, I think it would help to look at the pain points -- and their solutions -- the hacks and proposals related to citations. Hacks include plugins and templates people have made to make MediaWiki more citation-friendly. Proposals include the ones on strategy wiki.
<
Some of the hacks and proposals are listed here: http://strategy.wikimedia.org/wiki/Category:Proposals_related_to_citations Could you add other hacks, proposals, and conversations...?
Thanks for that link.
Sam.
[1] This can be done using backlinks. http://en.wikipedia.org/wiki/Special:WhatLinksHere/Template:Greenwood%26Earn...) [2] I think of this as "actionable references" -- we'd have to explain exactly what the desirable qualities are. Adding to bilbiographic managers in one click is one of mine. :) [3] Other side-effects might be helping to identify what's highly cited in Wikipedia (which would be interesting -- and might help prioritize Wikisource additions), automatically adding quotes to Wikiquote, ...
On Tue, Jul 27, 2010 at 12:06 AM, Jodi Schneider jodi.schneider@deri.org wrote:
... [3] Other side-effects might be helping to identify what's highly cited in Wikipedia (which would be interesting -- and might help prioritize Wikisource additions), automatically adding quotes to Wikiquote, ...
I don't think this has been raised on this list.
The academic journals project hosts "Journals cited by Wikipedia" using the {{cite}} data. It is broken down by usage count.
http://en.wikipedia.org/wiki/WP:JCW
-- John Vandenberg
On Tue, 27 Jul 2010, John Vandenberg wrote:
On Tue, Jul 27, 2010 at 12:06 AM, Jodi Schneider jodi.schneider@deri.org wrote:
... [3] Other side-effects might be helping to identify what's highly cited in Wikipedia (which would be interesting -- and might help prioritize Wikisource additions), automatically adding quotes to Wikiquote, ...
I don't think this has been raised on this list.
The academic journals project hosts "Journals cited by Wikipedia" using the {{cite}} data. It is broken down by usage count.
I also have statistics of that sort. The corresponding to your "Top journals"
http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Academic_Journals/Journal...
is this:
http://neuro.imm.dtu.dk/services/wikipedia/enwiki-20080312-ref-articlejourna...
From the 2008 dump and based on the 'cite journal' template. For some of the statistics I skipped the citations added automatically from the "Protein Box Bot". I have built a small file which can aggregate the different names of popular journals. It is available from here:
http://neuro.imm.dtu.dk/services/brededatabase/wojous.xml
and my be useful for WP:JCW.
On the same site is results from different clusterings of the Wikipedia citations, for example:
http://neuro.imm.dtu.dk/services/wikipedia/enwiki-20080312-ref-articlejourna...
The main page is http://neuro.imm.dtu.dk/services/wikipedia/citejournalminer.html
/Finn
___________________________________________________________________
Finn Aarup Nielsen, DTU Informatics, Denmark Lundbeck Foundation Center for Integrated Molecular Brain Imaging http://www.imm.dtu.dk/~fn/ http://nru.dk/staff/fnielsen/ ___________________________________________________________________
On Wed, Jul 21, 2010 at 10:42 AM, Daniel Kinzler daniel@brightbyte.de wrote:
- The first three author names separated by slashes
why not separate by pluses? they don't form part of names either, and don't cause problems with wiki page titles.
I like this... however, how would you represent this in a URL?
%2B would seem to be the obvious choice to me.
Also note that using plusses in page names don't work with all server configurations, since plus has a special meaning in URLs.
Don't know too much about the double escaping business to comment on that, but if pluses are not acceptable, we still have equal signs (possibly with similar problems, but still useful for direct web search) and underscores (which would turn the whole key into one string for search engines).
Daniel
On Wed, Jul 21, 2010 at 2:42 AM, Daniel Kinzler daniel@brightbyte.dewrote:
- The first three author names separated by slashes
why not separate by pluses? they don't form part of names either, and don't cause problems with wiki page titles.
I like this... however, how would you represent this in a URL? Also note that using plusses in page names don't work with all server configurations, since plus has a special meaning in URLs.
- Some or all of the date. For instance, if there is only one source by
this set of authors that year, we can just use YYYY. However, once
another
source by those set of authors is added, the key should change to
MMDDYYYY
or similar.
I don't think it is a good idea to change one key as a function of updates on another, except for a generic disambiguation tag.
I agree. And if you *have* to use the full date, use YYYYMMDD, not the other way around, please.
Since the slashes are somewhat cumbersome, perhaps we can not make them mandatory, but similarly use them only when they are necessary in order
to
"escape" a name. In the case that one of the authors does not have a
slash
in their name - the dominant case - we can stick to the easily legible
and
niecly compact CamelCase format.
Example keys generated by this algorithm:
KangHsuKrajbichEtAl2009
Kang+Hsu+Krajbich+2009+the+wick+in or Kang+Hsu+Krajbich+2009+twi
Both seem good, though i would suggest to form a convention to ignore any leading "the" and "a", to a more distinctive 3 word suffix.
Of course, it does not have to be _exactly_ three authors, nor three words from the title, and it does not solve the John Smith (or Zheng Wang) problem.
It also doesn't solve issues with transliteration: Merik Möller may become "Moeller" or "Moller", Jakob Voß may become "Voss" or "Vosz" or even "VoB", etc. In case of chinese names, it's often not easy to decide which part is the last name.
To avoid this kind of ambiguity, i suggest to automatically apply some type of normalization and/or hashing. There is quite a bit of research about this kind of normalisation out there, generally with the aim of detecting duplicates. Perhaps we can learn from bibsonomy.org, have a look how they do it: http://www.bibsonomy.org/help/doc/inside.html.
Gotta love open source university research projects :)
-- daniel
Hey Daniel,
Bibsonomy seems to suffer from the same problem as CiteULike - urls which convey no meaning. An example url id from CiteULike is 2434335, and one from Bibsonomy is 29be860f0bdea4a29fba38ef9e6dd6a09. I hope to continue to steer the conversation away from that direction. These IDs guarantee uniqueness, but I believe that we can create keys that both guarantee uniqueness and convey some meaning to humans. Consider that this key will be embedded in wiki articles any time a source is cited. It's important that it make some sense.
Plus signs and slashes in the key appear to be cumbersome. Perhaps we can avoid this by truncating last names that involve a slash to either the portion before or after the slash.
Changing the key seems to be a bad idea, so we want a key system that is unique from the start. That means we should use the full date, YYYYMMDD as suggested by Daniel.
In the event that multiple sources are published by the same set of authors on the same day, we can use a, b, c disambiguation.
This gives us the following key, guaranteed to be unique: KangHsuKrajbich20091011b
Brian
Hey Daniel,
Bibsonomy seems to suffer from the same problem as CiteULike - urls which convey no meaning. An example url id from CiteULike is 2434335, and one from Bibsonomy is 29be860f0bdea4a29fba38ef9e6dd6a09. I hope to continue to steer the conversation away from that direction. These IDs guarantee uniqueness, but I believe that we can create keys that both guarantee uniqueness and convey some meaning to humans. Consider that this key will be embedded in wiki articles any time a source is cited. It's important that it make some sense.
Oh, I didn#t mean we should use hashes or IDs as keys or identifiers in the URL. I mean we can employ the hashing technique to detect dupes. Because you will inadvertably get information about the same thing under two different keys, because of issues with translitteration, etc.
-- daniel
Hi,
Talking about identifiers for bibliographic records I just want to stress one crucial point:
This gives us the following key, guaranteed to be unique: KangHsuKrajbich20091011b
There is absolutely no such thing as a "guaranteed unique identifier" that can be derived from existing metadata. You will *always* have false positives (different publications get the same identifier [1]) and false negatives (same publication has different identifiers [2]). Fuzzy identifiers even occur if they are created by the publisher or author himself (for instance duplicate ISBNs for definitely different editions or even totally different books). If you argue about identifiers please keep in mind that you *always* talk about heuristics but not about something "unique per se". Existing identifiers only differ in the ratio of false positives and false negatives.
The only way you may get unique identifiers is to assign your own identifiers that are *not* derived from the content - such as auto-incremented record ids in a database. Even then they are not unique if you change the content because the identity of the object may change. A MD5 or SHA-sum on the full content [3] or the version id in a versioning database (like MediaWiki) is unique but not practical if you want to change content. A solution to this problem is to let people decide in every single case about how an identifier looks like and when it should change (example: Wikipedia article titles). But then the identifiers are not permanent (records may split and join and be renamed).
That's the way it is. You have to decide which problem to solve with an identifier and then be aware of its limitations. As Brooks [3] wrote there is no silver bullet - so there is no silver identifier.
Cheers Jakob
[1] For instance if you have a common name and a general title or if you want to distinguish the printed version and the presentation slides of the same publication etc.
[2] For instance different ways to abbreviate and/or write the name of an author and/or title, different years (year of preprint vs year of printed version) etc.
[3] See http://en.wikipedia.org/wiki/No_Silver_Bullet which cites an article that has been published in 1986 and 1987, and probably reprinted in another year - so what's the identifier? ;-)
On 07/21/2010 03:36 PM, Jakob wrote:
Hi,
Talking about identifiers for bibliographic records I just want to stress one crucial point:
This gives us the following key, guaranteed to be unique: KangHsuKrajbich20091011b
There is absolutely no such thing as a "guaranteed unique identifier" that can be derived from existing metadata. You will *always* have false positives (different publications get the same identifier [1]) and false negatives (same publication has different identifiers [2]). Fuzzy identifiers even occur if they are created by the publisher or author himself (for instance duplicate ISBNs for definitely different editions or even totally different books). If you argue about identifiers please keep in mind that you *always* talk about heuristics but not about something "unique per se". Existing identifiers only differ in the ratio of false positives and false negatives.
The only way you may get unique identifiers is to assign your own identifiers that are *not* derived from the content - such as auto-incremented record ids in a database. Even then they are not unique if you change the content because the identity of the object may change.
I haven't been following this thread, but the way I addressed this in my own bibliography manager (http://yabman.sourceforge.net/) is: the BibTeX key is the first author's name (lowercased) plus an auto-incremented ID. So for example, one of my papers is "priedhorsky229". 229 is arbitrary, but there's only a few 3-digit numbers per author, so I don't get confused.
Now in a large system, that would obviously break down into the long, incomprehensible CiteULike-type IDs.
A compromise could be that the ID is the first author's name plus an auto-incrememented ID per author. So for example, the first paper of mine the system learns is priedhorsky1, the second priedhorsky2, etc. So you get a system-generated ID for uniqueness but also something comprehensible for people.
HTH,
Reid
On 21 Jul 2010, at 21:43, Reid Priedhorsky wrote:
A compromise could be that the ID is the first author's name plus an auto-incrememented ID per author. So for example, the first paper of mine the system learns is priedhorsky1, the second priedhorsky2, etc. So you get a system-generated ID for uniqueness but also something comprehensible for people.
Interesting. I'd really like ID's to be not only comprehensible but also to have a fair chance of being directly inputtable by humans.
For instance, on Wikipedia, if I know that I am looking for the article on "citation signals" I can type the URL directly, without searching.
In my ideal citation-wiki-in-the-sky, you could get to the citation directly in this way -- and sensible disambiguation pages would be automatically generated.
-Jodi
For items that have been assigned a doi, isn't the doi unique (in the absence of errors--which i cannot recall having ever encountered)? Of course the same item in its various manifestations may have multiple dois, or may have versions that do not have dois as well as versions that do have them, and the versions may or may not be identical. We also need to account for the presence of illegitimate as well as legitimate copies--a person entering a WP reference may have gotten it from a site that has an unauthorized copy--quite a few scientific papers are present on the web in such versions.
There are really two problems: one is a pointer to the voucher authorized version of a document, which may well be the printed version, and the other problem is pointers to accessible legitimate versions. Crossref does a fairly nice job of this for online articles, but it organized to provide access to paid publishers versions preferentially, rather than to possible legitimate free versions.
On Wed, Jul 21, 2010 at 6:20 PM, Jodi Schneider jodi.schneider@deri.org wrote:
On 21 Jul 2010, at 21:43, Reid Priedhorsky wrote:
A compromise could be that the ID is the first author's name plus an auto-incrememented ID per author. So for example, the first paper of mine the system learns is priedhorsky1, the second priedhorsky2, etc. So you get a system-generated ID for uniqueness but also something comprehensible for people.
Interesting. I'd really like ID's to be not only comprehensible but also to have a fair chance of being directly inputtable by humans.
For instance, on Wikipedia, if I know that I am looking for the article on "citation signals" I can type the URL directly, without searching.
In my ideal citation-wiki-in-the-sky, you could get to the citation directly in this way -- and sensible disambiguation pages would be automatically generated.
-Jodi _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
"citation signals" will always work until a rock band takes that name and gets a page in Wikipedia. Try "game theory".
Making "semantic" identifiers seems to be a hard problem. If you put slashes in an identifier, you irritate the folks who want pure and simple REST URLs. If you put underscores, MediaWiki interprets them as spaces. Some other characters simply violate the rules of messages sent over HTTP just as putting apostrophes in strings gives SQL fits.
I do hope someone comes up with a nice, clean solution.
Jack
On Wed, Jul 21, 2010 at 3:20 PM, Jodi Schneider jodi.schneider@deri.org wrote:
On 21 Jul 2010, at 21:43, Reid Priedhorsky wrote:
A compromise could be that the ID is the first author's name plus an auto-incrememented ID per author. So for example, the first paper of mine the system learns is priedhorsky1, the second priedhorsky2, etc. So you get a system-generated ID for uniqueness but also something comprehensible for people.
Interesting. I'd really like ID's to be not only comprehensible but also to have a fair chance of being directly inputtable by humans.
For instance, on Wikipedia, if I know that I am looking for the article on "citation signals" I can type the URL directly, without searching.
In my ideal citation-wiki-in-the-sky, you could get to the citation directly in this way -- and sensible disambiguation pages would be automatically generated.
-Jodi _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
On 22 Jul 2010, at 03:13, Jack Park wrote:
"citation signals" will always work until a rock band takes that name and gets a page in Wikipedia. Try "game theory".
Still directly inputable -- and takes me close to my intended destination (if I'm a human, paying attention):
"This article is about the branch of applied mathematics. For the discipline of studying games, see Game studies. For other uses of "Game theory", see Game theory (disambiguation)."
Sensible disambiguation pages (ideally generated automatically) are needed for a wiki for citations.
-Jodi
yes, being aware that 1/ /the rules for disambiguating can get very complicated--for examples, the LC cataloging rule interpretation series does very nicely--the number of detail that arises in a very large file is hard to believe until you start working on it. and 2/, there will be items which cannot be unambiguously assigned. There remain in literature many items of disputed authorship, and many items of very uncertain dates. For examples of handling them, see the LC authority file. And consider how many people-years of highly-trained expert work have gone into making that file.
I note that the projects of ISI and Scopus to produce an unambiguous list of authors of scientific articles have a remarkably high proportion of errors of every possible description, although both of them supplement their algorithms by manual correction.
On Thu, Jul 22, 2010 at 6:38 AM, Jodi Schneider jodi.schneider@deri.org wrote:
On 22 Jul 2010, at 03:13, Jack Park wrote:
"citation signals" will always work until a rock band takes that name and gets a page in Wikipedia. Try "game theory".
Still directly inputable -- and takes me close to my intended destination (if I'm a human, paying attention): "This article is about the branch of applied mathematics. For the discipline of studying games, see Game studies. For other uses of "Game theory", see Game theory (disambiguation)." Sensible disambiguation pages (ideally generated automatically) are needed for a wiki for citations. -Jodi _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Jodi Schneider wrote:
Interesting. I'd really like ID's to be not only comprehensible but also to have a fair chance of being directly inputtable by humans.
Usual bibliographic catalogs do not provide a mnemonic key as soon as their size is more then a few hundred entries. There are various IDs like ISBN and OCLC number but there is no large-scale system that has simple identifiers. Why do you want to type in the ID by hand anyway? What is the use-case?
For instance, on Wikipedia, if I know that I am looking for the article on "citation signals" I can type the URL directly, without searching.
In my ideal citation-wiki-in-the-sky, you could get to the citation directly in this way -- and sensible disambiguation pages would be automatically generated.
Why do you want to directly work with fragile identifiers? Every modern web application provides auto-suggest: you type in a keyword, title, author, anything and get a list of publications and a link to create a new one. Then you select a publication from the list and its ID gets copied into your editor (an ideal editor would also send a pingback to the citation database to know where a publication identifier is used). Done.
I also like mnemonic identifiers, they are useful if you have to read, memorize and type in them. But if your workflow is truly digital then their limitation is just a burden. I would value uniqueness and stability much more then readability - and you cannot get both!
Cheers Jakob
On Thu, Jul 22, 2010 at 2:00 PM, Jakob jakob.voss@s1999.tu-chemnitz.dewrote:
Jodi Schneider wrote:
Interesting. I'd really like ID's to be not only comprehensible but also to have a fair chance of being directly inputtable by humans.
Usual bibliographic catalogs do not provide a mnemonic key as soon as their size is more then a few hundred entries. There are various IDs like ISBN and OCLC number but there is no large-scale system that has simple identifiers. Why do you want to type in the ID by hand anyway? What is the use-case?
For instance, on Wikipedia, if I know that I am looking for the article on "citation signals" I can type the URL directly, without searching.
In my ideal citation-wiki-in-the-sky, you could get to the citation directly in this way -- and sensible disambiguation pages would be automatically generated.
Why do you want to directly work with fragile identifiers? Every modern web application provides auto-suggest: you type in a keyword, title, author, anything and get a list of publications and a link to create a new one. Then you select a publication from the list and its ID gets copied into your editor (an ideal editor would also send a pingback to the citation database to know where a publication identifier is used). Done.
I also like mnemonic identifiers, they are useful if you have to read, memorize and type in them. But if your workflow is truly digital then their limitation is just a burden. I would value uniqueness and stability much more then readability - and you cannot get both!
Cheers Jakob
You continue to rest the basis of your argument on a small number of outlier cases, vis-a-vis stability.
Brian
On 22 Jul 2010, at 21:00, Jakob wrote:
Jodi Schneider wrote:
Interesting. I'd really like ID's to be not only comprehensible but also to have a fair chance of being directly inputtable by humans.
Usual bibliographic catalogs do not provide a mnemonic key as soon as their size is more then a few hundred entries. There are various IDs like ISBN and OCLC number but there is no large-scale system that has simple identifiers. Why do you want to type in the ID by hand anyway? What is the use-case?
I am looking up a paper that is cited somewhere else.
For instance, on Wikipedia, if I know that I am looking for the article on "citation signals" I can type the URL directly, without searching.
In my ideal citation-wiki-in-the-sky, you could get to the citation directly in this way -- and sensible disambiguation pages would be automatically generated.
Why do you want to directly work with fragile identifiers? Every modern web application provides auto-suggest: you type in a keyword, title, author, anything and get a list of publications and a link to create a new one.
Sure, that's fine, too. That's a kind of automatic disambiguation -- and better than what I proposed.
Then you select a publication from the list and its ID gets copied into your editor (an ideal editor would also send a pingback to the citation database to know where a publication identifier is used). Done.
um, sometimes the 'editor' is my brain, not software.
I also like mnemonic identifiers, they are useful if you have to read, memorize and type in them. But if your workflow is truly digital then their limitation is just a burden. I would value uniqueness and stability much more then readability - and you cannot get both!
Another reason to optimize readability is SEO. But you're right, it depends very much on who is using this, and how.
:) -Jodi
Cheers Jakob
On Wed, Jul 21, 2010 at 2:36 PM, Jakob jakob.voss@s1999.tu-chemnitz.dewrote:
Hi,
Talking about identifiers for bibliographic records I just want to stress one crucial point:
This gives us the following key, guaranteed to be unique: KangHsuKrajbich20091011b
There is absolutely no such thing as a "guaranteed unique identifier" that can be derived from existing metadata. You will *always* have false positives (different publications get the same identifier [1]) and false negatives (same publication has different identifiers [2]). Fuzzy identifiers even occur if they are created by the publisher or author himself (for instance duplicate ISBNs for definitely different editions or even totally different books). If you argue about identifiers please keep in mind that you *always* talk about heuristics but not about something "unique per se". Existing identifiers only differ in the ratio of false positives and false negatives.
The only way you may get unique identifiers is to assign your own identifiers that are *not* derived from the content - such as auto-incremented record ids in a database. Even then they are not unique if you change the content because the identity of the object may change. A MD5 or SHA-sum on the full content [3] or the version id in a versioning database (like MediaWiki) is unique but not practical if you want to change content. A solution to this problem is to let people decide in every single case about how an identifier looks like and when it should change (example: Wikipedia article titles). But then the identifiers are not permanent (records may split and join and be renamed).
That's the way it is. You have to decide which problem to solve with an identifier and then be aware of its limitations. As Brooks [3] wrote there is no silver bullet - so there is no silver identifier.
Cheers Jakob
[1] For instance if you have a common name and a general title or if you want to distinguish the printed version and the presentation slides of the same publication etc.
[2] For instance different ways to abbreviate and/or write the name of an author and/or title, different years (year of preprint vs year of printed version) etc.
[3] See http://en.wikipedia.org/wiki/No_Silver_Bullet which cites an article that has been published in 1986 and 1987, and probably reprinted in another year - so what's the identifier? ;-)
Hi Jakob,
I would like to counter this point with the following rule: There is always a way to adjudicate ambiguity. It is easy to create a rule that works in 90% of cases:
Author1Author2Author3EtAl10
It is easy to modify this rule to work in 99% of cases:
Author1Author2Author3EtAl20101011b
Modifying the rule to work in 100% of cases requires a community of users to adjudicate the relatively small number of special cases.
Brian
Names can have up to two of these three properties:
- Secure (Unique) - Decentralized (Global) - Human-meaningful
Decentralized and human-meaningful: this is true of nicknames people choose for themselves
Secure and human-meaningful: this is the property that domain names and URLs aim for
Secure and decentralized: this is a property of OpenPGP key fingerprints
Terrell
/bcc Zooko
On 7/21/10 4:36 PM, Jakob wrote:
Hi,
Talking about identifiers for bibliographic records I just want to stress one crucial point:
This gives us the following key, guaranteed to be unique: KangHsuKrajbich20091011b
There is absolutely no such thing as a "guaranteed unique identifier" that can be derived from existing metadata. You will *always* have false positives (different publications get the same identifier [1]) and false negatives (same publication has different identifiers [2]). Fuzzy identifiers even occur if they are created by the publisher or author himself (for instance duplicate ISBNs for definitely different editions or even totally different books). If you argue about identifiers please keep in mind that you *always* talk about heuristics but not about something "unique per se". Existing identifiers only differ in the ratio of false positives and false negatives.
The only way you may get unique identifiers is to assign your own identifiers that are *not* derived from the content - such as auto-incremented record ids in a database. Even then they are not unique if you change the content because the identity of the object may change. A MD5 or SHA-sum on the full content [3] or the version id in a versioning database (like MediaWiki) is unique but not practical if you want to change content. A solution to this problem is to let people decide in every single case about how an identifier looks like and when it should change (example: Wikipedia article titles). But then the identifiers are not permanent (records may split and join and be renamed).
That's the way it is. You have to decide which problem to solve with an identifier and then be aware of its limitations. As Brooks [3] wrote there is no silver bullet - so there is no silver identifier.
Cheers Jakob
[1] For instance if you have a common name and a general title or if you want to distinguish the printed version and the presentation slides of the same publication etc.
[2] For instance different ways to abbreviate and/or write the name of an author and/or title, different years (year of preprint vs year of printed version) etc.
[3] See http://en.wikipedia.org/wiki/No_Silver_Bullet which cites an article that has been published in 1986 and 1987, and probably reprinted in another year - so what's the identifier? ;-)
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
On Thu, Jul 22, 2010 at 11:50 AM, Terrell Russell terrellrussell@gmail.comwrote:
From http://en.wikipedia.org/wiki/Zooko%27s_triangle
Names can have up to two of these three properties:
- Secure (Unique)
- Decentralized (Global)
- Human-meaningful
Decentralized and human-meaningful: this is true of nicknames people choose for themselves
Secure and human-meaningful: this is the property that domain names and URLs aim for
Secure and decentralized: this is a property of OpenPGP key fingerprints
Terrell
/bcc Zooko
Thanks for pointing this out Terrell.
The solution to Zooko's puzzle, as explained on the page, are "petnames." These petnames are basically the same thing that I am suggesting - there is never a truly ambiguous case that you cannot disambiguate. Even in the case when two separate records have exactly the same metadata you can disambiguate them by flipping a coin and adding an arbitrary a,b,c incrementer to one of them. In 99.9999999% of cases records won't be exactly identical, and some piece of the metadata can be incorporated into the key in order to disambiguate.
The crux of this thread seems to be: There will always be edge cases, therefore we must rely on meaningless unique hashes in order to index them. My solution is to create a key that gets almost 100% of all cases, and then create policies that disambiguate the rest by creating "petkeys" for them that are used instead of the default key system. This imbues almost all of our keys with meaning and uniqueness. It also makes it very clear - in the key itself - when there is ambiguity in a record.
The system that disambiguates nearly-identical records is rather simple. The community simply needs to create an order of precedence for properties, such that the bibliographic field that has the highest precedence and is not ambiguous goes into the key for that record. In almost all cases these unambiguous fields will be the authors and the date.
Brian
On Mon, Jul 19, 2010 at 1:20 PM, Brian J Mingus Brian.Mingus@colorado.edu wrote:
I have been working with Sam and others for some time now on brainstorming a proposal for the Foundation to create a centralized wiki of citations, a WikiCite so to speak, if that is not the eventual name. My plan is to continue to discuss with folks who are knowledgeable and interested in such a project and to have the feedback I receive go into the proposal which I hope to write this summer.
This sounds great. Just speaking as a community member, I've been thinking about this topic a long time myself, and have plenty to add to the conversation.
The proposal white paper will then be sent around to interested parties for corrections and feedback, including on-wiki and mailing lists, before eventually landing at the Foundation officially. As we know WMF has not started a new project in some years, so there is no official process. Thus I find it important to get it right.
I'd suggest finding an on-wiki spot to discuss this work. Here's one place this has been discussed in the past that may be a good place to revive the conversation: http://strategy.wikimedia.org/wiki/Proposal:Building_a_database_of_all_books...
Rather than commenting on list about the subject itself, I've commented on the discussion page there: http://strategy.wikimedia.org/wiki/Proposal_talk:Building_a_database_of_all_...
Rob
On Mon, Jul 19, 2010 at 8:08 PM, Rob Lanphier robla@robla.net wrote:
On Mon, Jul 19, 2010 at 1:20 PM, Brian J Mingus Brian.Mingus@colorado.edu wrote:
I have been working with Sam and others for some time now on
brainstorming a
proposal for the Foundation to create a centralized wiki of citations, a WikiCite so to speak, if that is not the eventual name. My plan is to continue to discuss with folks who are knowledgeable and interested in
such
a project and to have the feedback I receive go into the proposal which I hope to write this summer.
This sounds great. Just speaking as a community member, I've been thinking about this topic a long time myself, and have plenty to add to the conversation.
The proposal white paper will then be sent around to interested parties for corrections and feedback, including on-wiki and mailing lists, before eventually landing at the Foundation officially. As
we
know WMF has not started a new project in some years, so there is no official process. Thus I find it important to get it right.
I'd suggest finding an on-wiki spot to discuss this work. Here's one place this has been discussed in the past that may be a good place to revive the conversation:
http://strategy.wikimedia.org/wiki/Proposal:Building_a_database_of_all_books...
Rather than commenting on list about the subject itself, I've commented on the discussion page there:
http://strategy.wikimedia.org/wiki/Proposal_talk:Building_a_database_of_all_...
Rob
Rob,
Thanks for bringing my attention to this proposal. It certainly has some of the same ring as this project, with of course some important differences. Commonalities between the projects are that they are multilingual and require a powerful search engine. Differences are that this project is for all literary sources and that I believe it is best suited at the WMF. The widespread use of citations across the Wikipedias will drive user contributions towards adding richer metadata to those citations. And having a source of citations available will increase the quality of the Wikipedias as it becomes easier and easier to cite sources.
Brian
wiki-research-l@lists.wikimedia.org