On 1/21/07, Simetrical Simetrical+wikitech@gmail.com wrote:
On 1/21/07, Henry Skelton dimensiondude.oss@gmail.com wrote:
I'm interested in helping out in the development of mediawiki. I think it's very good, but one that seems missing is a good system for metadata and cross indexing information.
That might not be the best way to put it. As I see it, an article would include more than just the text of the article. It would have a summary (possible present on the page), and also metadata (like the classification for an organism). e.g., on an article for a family of turtles, say Chelydridae, you might want to list all of the genuses. Instead of searching and hoping you have them all, then copy/pasting or making your own summaries, you could just do something like
[[wikipedia searchForArticlesWithTags:genus,family=Testudinidae] listData:scientificname,commonname,conservationstatus]
This would then automatically get all of those genuses, with the information specified, and would update if anything changed or new ones were added. You could also automatically grab summaries, which are quite common [someArticle getSummary].
Something like this could also keep data in sync. I've seen several instances of conflicting information between an article, and a small summary in another article.
Is there something like this in progress? I'd like to help, but I'd like to avoid duplicating effort on something someone is already working on. I know how to program, and know a small amount of PHP.
The major issue here tends to be optimization. Semantic MediaWiki more or less has what you're looking for, but as I understand it, it scales far too poorly to be used for a site like Wikipedia. Category intersection is a more specific much less ambitious thing to do, and there appears to be some progress in that direction right now, although it's not necessarily ready for prime time.
But for the overall idea of what you posted, well, that strikes me as basically like allowing anyone to run arbitrary SELECT queries. You just can't do that with a database Wikipedia's size.
I once posted the idea (which, of course, was ignored;-) to store the names and values of variables passed to templates from articles in a SQL table. If you write {{xyz|a=1|b=2}} in article BLA and save, it would store BLA | xyz | a | 1 BLA | xyz | b | 2 in said table. Applied to {{Persondata}} [1], you could search for a specific birth date, or for "%January%1980%" to find people born in January 1980, which you can not do with the current category system, even with intersections, AFAIK.
Given the amount of data we put in navboxes via templates, this is a vast repository of unharvested metadata, IMHO.
Magnus