Re: [Wikitech-l] Development of a Metadata/Cross-Indexing System

22 Jan 2007


      On 1/21/07, Simetrical Simetrical+wikitech@gmail.com wrote:
...
On 1/21/07, Henry Skelton dimensiondude.oss@gmail.com wrote:
...
I'm interested in helping out in the development of mediawiki. I
think it's very good, but one that seems missing is a good system for
metadata and cross indexing information.
That might not be the best way to put it. As I see it, an article
would include more than just the text of the article. It would have a
summary (possible present on the page), and also metadata (like the
classification for an organism). e.g., on an article for a family of
turtles, say Chelydridae,  you might want to list all of the genuses.
Instead of searching and hoping you have them all, then copy/pasting
or making your own summaries, you could just do something like
[[wikipedia searchForArticlesWithTags:genus,family=Testudinidae]
listData:scientificname,commonname,conservationstatus]
This would then automatically get all of those genuses, with the
information specified, and would update if anything changed or new
ones were added. You could also automatically grab summaries, which
are quite common [someArticle getSummary].
Something like this could also keep data in sync. I've seen several
instances of conflicting information between an article, and a small
summary in another article.
Is there something like this in progress? I'd like to help, but I'd
like to avoid duplicating effort on something someone is already
working on. I know how to program, and know a small amount of PHP.
The major issue here tends to be optimization.  Semantic MediaWiki
more or less has what you're looking for, but as I understand it, it
scales far too poorly to be used for a site like Wikipedia.  Category
intersection is a more specific much less ambitious thing to do, and
there appears to be some progress in that direction right now,
although it's not necessarily ready for prime time.
But for the overall idea of what you posted, well, that strikes me as
basically like allowing anyone to run arbitrary SELECT queries.  You
just can't do that with a database Wikipedia's size.
I once posted the idea (which, of course, was ignored;-) to store the
names and values of variables passed to templates from articles in a
SQL table. If you write {{xyz|a=1|b=2}} in article BLA and save, it
would store
BLA | xyz | a | 1
BLA | xyz | b | 2
in said table. Applied to {{Persondata}} [1], you could search for a
specific birth date, or for "%January%1980%" to find people born in
January 1980, which you can not do with the current category system,
even with intersections, AFAIK.
Given the amount of data we put in navboxes via templates, this is a
vast repository of unharvested metadata, IMHO.
Magnus
[1] http://en.wikipedia.org/wiki/Persondata

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Development of a Metadata/Cross-Indexing System