So, when I was weeding through the RFEs on SourceForge last week, I noted quite a few having to do with metadata, including:
579758 Consider keyword meta tags 629323 Simple article categorisation system 766213 Provide Standardized Subject Search 839394 Meta ICBM tags when appropriate
I think I must have closed like 5 other RFEs and referred them to #629323. There's lots of people who want category metadata.
Anyways, I wanted to float a design idea for inclusion of metadata in MediaWiki articles. None of this is particularly new, and it seems like most has been suggested at one point or another.
---8<---
WHAT IS METADATA? -----------------
Metadata is data about data. For example, the filename and count of bytes of a file on a filesystem is metadata: not part of the data, but _about_ the data. Metadata can arise naturally from the data (byte count) or can be added from outside the data (file name).
Metadata helps us sort data, find data, and make decisions about data. For example, you can sort files in a directory alphabetically by filename to find a file you want. Or you can sort them by size to find the biggest or smallest. You can delete a file called "TEMPFILE.$$$" because you know it's a temporary file.
We use metadata so often, we sometimes forget that it's "meta". But it is: changing the name of a file, or moving it to another folder, doesn't change the contents of the file. It just changes how we find and access the file.
METADATA IN MEDIAWIKI ---------------------
We use a lot of metadata for MediaWiki articles: page sizes, page histories, new pages page, recent changes page, etc. etc. Some of the metadata we use is calculated from the data itself -- sizes -- and some of it is not -- timestamps, who made changes, etc. It would probably be fair to say that article titles and redirects are metadata.
One form of metadata that we now use is interlanguage links. Interlanguage links are metadata that say: "There is an article in the language XX Wikipedia (or other installation) that covers this same topic." That's pretty cool metadata to have.
What I propose it that MediaWiki expand this metadata format to cover other types of metadata, such as:
* categorization -- saying that particle physics is in the physics category, or that Lord of the Rings is in the fantasy books category * relationships between articles -- break up a single page into multiple chapters or sections, and note that they're all part of the same article * synopses -- providing a synopsis or description of an article * geography -- marking up pages to specify that they cover a particular geographical location * customizable per-installation metadata -- metadata that may make sense for different installations.
I'm particularly interested in this last one, since it would help us with Wikitravel.
METADATA IN THE DATABASE ------------------------
I think it's pretty reasonable to just think of metadata as name-value pairs, where the name key is not necessarily unique. For example, we could have a table like this for the article on David Brinkley:
article id metadata_name metadata_value ------------------------------------------------ 1 category journalists 1 category American people 1 see_also CBS 1 see_also CBS News
It should be relatively easy to slurp up the metadata on an article when rendering the article, and to pluck out the metadata from an article when saving it. We do this with links, broken_links, and image_links now.
METADATA IN WIKITEXT --------------------
One way to do metadata is to have a different entry format for metadata than for the text and markup of an article. I'm going to reject that, since we already have a winning mechanism for marking up some metadata -- interlanguage links -- within the Wiki markup. Whether this would make it _really_ data instead of metadata is left as an exercise to the reader.
There's a couple of ways we could do this in wikitext:
<<name=value>> <name:value> [[meta:name=value]] [[meta:name:value]] [[name:value]]
The first couple are kinda radical, and don't really jibe with interlinks, anyways. The last one has the potential to clash with other namespaces.
I prefer [[meta:name=value]], just because it's kinda easy. I realize that the name "meta" clashes with "metawikipedia", so maybe another work would work.
It's not particularly important what format we use; what's important is that we have a way to enter arbitrary name-value pairs into the text of articles.
RENDERING METADATA ------------------
I think there are a couple of ways we can deal with metadata in an article when rendering it:
* Add it as a <meta> tag to the HTML <head> * Add it as a <link> tag to the HTML <head> * Add it as an out-of-page link, like Interlanguage links work now * Render it in the text of the document (I can't see why this would be useful, but) * Ignore it
I think we could have several pre-defined names, with predefined rendering, and other names with possible renderings configurable per-installation.
OTHER USES OF METADATA ----------------------
There are some other uses of metadata, of course. One would be automatically-built directories, by category.
---8<---
So, my second big proposal of the day.
~ESP
Evan Prodromou wrote:
So, when I was weeding through the RFEs on SourceForge last week, I noted quite a few having to do with metadata, including:
579758 Consider keyword meta tags 629323 Simple article categorisation system 766213 Provide Standardized Subject Search 839394 Meta ICBM tags when appropriate
I think I must have closed like 5 other RFEs and referred them to #629323. There's lots of people who want category metadata.
Anyways, I wanted to float a design idea for inclusion of metadata in MediaWiki articles. None of this is particularly new, and it seems like most has been suggested at one point or another.
All seems very reasonable; the tags will be invaluable in maintaining indexes like [[list of people]], which is missing thousands of bios that are scattered all over WP now. I thought Magnus or somebody had implemented it already?
Syntaxwise, I like [[meta:name:value]], with a general inclination to create new names that are recognized specially, so I can say [[category:person]] or even [[cat:person]] to be really brief, since nearly all articles will have one or more category tags.
Stan
"SS" == Stan Shebs shebs@apple.com writes:
SS> All seems very reasonable; the tags will be invaluable in SS> maintaining indexes like [[list of people]], which is missing SS> thousands of bios that are scattered all over WP now. I SS> thought Magnus or somebody had implemented it already?
There is some code to do categorization, which appears to be mostly commented-out. It finds all the links in a page that start with a language-specific prefix -- most of the languages, including English, don't have the prefix defined -- and collects them into something called "category links". It doesn't do anything with them, though.
There's also a special page for listing categories.
I think this was code that got started and never got finished.
~ESP
Evan Prodromou wrote:
"SS" == Stan Shebs shebs@apple.com writes:
SS> All seems very reasonable; the tags will be invaluable in SS> maintaining indexes like [[list of people]], which is missing SS> thousands of bios that are scattered all over WP now. I SS> thought Magnus or somebody had implemented it already?
There is some code to do categorization, which appears to be mostly commented-out. It finds all the links in a page that start with a language-specific prefix -- most of the languages, including English, don't have the prefix defined -- and collects them into something called "category links". It doesn't do anything with them, though.
There's also a special page for listing categories.
I think this was code that got started and never got finished.
Well, the code *I* wrote (which I assume is what you're talking about) is finished, in the sense of "it works, but it ain't too pretty".
Example: "[[Category:Stuff]]" somewhere in the article will show a link above (similar to language links), leading to the category page ("Category:Stuff"). That will automagically list all pages in that category (meaning, that link there via category link). A category page can be edited like a normal page (the article list is automatically shown below), and can include "super-categories" (e.g., "[[Category:Biology]]" on "Category:Zoology"). Any page can, of course, have many categories.
You'll have to enable categories in LocalSettings, but I forgot how :-(
Magnus
From: Magnus Manske Friday, December 05, 2003 4:23 AM Well, the code *I* wrote (which I assume is what you're talking about) is finished, in the sense of "it works, but it ain't too pretty".
<snip>
You'll have to enable categories in LocalSettings, but I forgot how
:-(
Please, please do not turn on this feature. Human-inserted metadata is basically unwiki. There are better approaches to dealing with the problem of categorization/computer comprehension of data. The right approaches act like magic.
Moreover, this isn't really (only) a discussion for wikitech-l.
The Cunctator wrote:
Please, please do not turn on this feature. Human-inserted metadata is basically unwiki. There are better approaches to dealing with the problem of categorization/computer comprehension of data. The right approaches act like magic.
Would be great to have one of those. But, AFAIK, there's no way to implement a category scheme purely by code. That mean, there *has* to be some interface for humans changing categories around. That can be either some "Add/remove/change category" construct on every page; ugly, and a bitch to write, as one would have to re-implement "old version", delete/undelete, and the like *for the category system alone*. OTOH, we can use the way of language links; while I'd prefer a centralized language link facility (one for all wikipedias), it could work better for the category scheme, as that information stays within that 'pedia.
And since when is editing an article unwiki? :-)
Moreover, this isn't really (only) a discussion for wikitech-l.
The original question was a technical one, IIRC. I also vaguely remember support for that category scheme when I had it running at the test site.
But if we need Yet Another Discussion (TM) on wikipoedia-l about the nature of the category system (I *know* we kinda decided to use one!), so be it.
Magnus
From: Magnus Manske
The Cunctator wrote:
Please, please do not turn on this feature. Human-inserted metadata
is
basically unwiki. There are better approaches to dealing with the problem of categorization/computer comprehension of data. The right approaches act like magic.
Would be great to have one of those. But, AFAIK, there's no way to implement a category scheme purely by code. That mean, there *has* to
be
some interface for humans changing categories around.
I didn't say we should implement a category scheme purely by code. I can think of several better methods to providing the utility of categories than inserting hidden metadata tags.
I should have more time in the coming weeks to make some explicit suggestions--that is, mock up examples, etc.
I think that having a categorization project (a la dmoz for Wikipedia) would be a fine idea, as long as the work, and the data, are separate from the root Wikipedia.
In fact, the best thing would be to work on developing ways for outside projects to hook easily into the Wikipedia content without having to be a Bomis-hosted project, prolly by having an XML hook.
If we did so, I could imagine a group at the MIT Media Lab or the Cyc project figuring out some bad-ass way of navigating Wikipedia content, etc.
And since when is editing an article unwiki? :-)
Adding hidden content is unwiki.
What I'd like to see is an explicit wiki-statement of what is the desired functionality--that is, what is the utility missing--that a category scheme would provide.
Then we can discuss particular implementations separately--for example, is it better to use a system which has a single ontology or a system which allows for dynamic ontologies?
Magnus Manske wrote:
The Cunctator wrote:
Please, please do not turn on this feature. Human-inserted metadata is basically unwiki. There are better approaches to dealing with the problem of categorization/computer comprehension of data. The right approaches act like magic.
Would be great to have one of those. But, AFAIK, there's no way to implement a category scheme purely by code. That mean, there *has* to be some interface for humans changing categories around. That can be either some "Add/remove/change category" construct on every page; ugly, and a bitch to write, as one would have to re-implement "old version", delete/undelete, and the like *for the category system alone*. OTOH, we can use the way of language links; while I'd prefer a centralized language link facility (one for all wikipedias), it could work better for the category scheme, as that information stays within that 'pedia.
And since when is editing an article unwiki? :-)
Moreover, this isn't really (only) a discussion for wikitech-l.
The original question was a technical one, IIRC. I also vaguely remember support for that category scheme when I had it running at the test site.
But if we need Yet Another Discussion (TM) on wikipoedia-l about the nature of the category system (I *know* we kinda decided to use one!), so be it.
I agree that there seemed to be much support fro some kind of category scheme. It's hard to see it as anything other than human inserted; the alternative of leaving it to some bot horrifies me. I still think that some kind of category box that would leave room for any number of categories would be superior; these could have a comma or other appropriate delimiter. It seems more user friendly if the user does not have to learn more mark-up and can simply type in the category that he wants.
If we're speaking of something being more wiki we should not prejudge what the categories themselves would be, but let them evolve naturally. Some types of categories may be obvious, but we have a lot of imaginative participants out there in Wikiland. Some will be just plain strange, but they are worth trying. If they are not used by others they can be removed at a much later time.
A categorization scheme could be especially welcome at Wiktionary where there has been a trend to give a long list of translations for any given word. A number of cross indexes have been devised which in principle would be useful if anybody bothered to keep them up to date. It is not enough to give the Finnish or Guaranà word for "dog"; the cross-indexes for those languages need to be adjusted as well. That's what hasn't happened. I'm also looking at what might evolve at Wikisource, and perhaps Wikibooks.
Ec
"TC" == The Cunctator cunctator@kband.com writes:
TC> Moreover, this isn't really (only) a discussion for TC> wikitech-l.
How to put metadata into articles is a tech question.
What metadata fields to define -- like "category" -- and how to use them is an implementation question. It would vary from installation to installation.
~ESP
Evan Prodromou wrote:
"TC" == The Cunctator cunctator@kband.com writes:
TC> Moreover, this isn't really (only) a discussion for TC> wikitech-l.
How to put metadata into articles is a tech question.
What metadata fields to define -- like "category" -- and how to use them is an implementation question. It would vary from installation to installation.
~ESP
How about just allowing links with equals signs in, like this:
[[predicate=object]] in RDF terminology, where the article is the subject.
Note that this has the advantage of currently being an illegal format for links, so we won't break anything.
So, we could in principle have things like
[[category=biology]] [[author=J. R. Hartley]] [[author=W. Mandella]] [[latlong=21.2N 33.4E]] [[OSGB=SN 045 055]]
There should be no restriction on either the predicate or object values other than that is that neither can contain "]" or "=": hopefully, data formats that are useful will be invented and standardised by the normal Wikipedia process.
In this way, we don't overload the current use of colons for namespaces, but make the insertion of name-value pairs fairly intuitive. This way, both the chosen predicates, and the choices of values for those predicates, can be chosen by the community.
Again: the knotty issue is how to display these values in normal article rendering to users who are not editing, as these values will only accumulate cruft if they are not visible for peer review.
Suggestion: a line near the top of the article showing
Author: J. R. Hartley, W. Mandella; Category: biology; Latlong: 21.2N 33.4E; OSGB: SN 045 055
We can then have special links to [[Category:]] articles for the _predicate_ names (where their definitions, formats etc. can be defined and discussed), and to Wikipedia articles for the values.
-- Neil
"NH" == Neil Harris usenet@tonal.clara.co.uk writes:
NH> How about just allowing links with equals signs in, like this:
NH> [[predicate=object]] in RDF terminology, where the article is NH> the subject.
NH> Note that this has the advantage of currently being an illegal NH> format for links, so we won't break anything.
I really like that, actually.
~ESP
On Fri, 05 Dec 2003 10:22:54 +0100, Magnus Manske wrote:
Well, the code *I* wrote (which I assume is what you're talking about) is finished, in the sense of "it works, but it ain't too pretty".
Example: "[[Category:Stuff]]" somewhere in the article will show a link above (similar to language links), leading to the category page ("Category:Stuff"). That will automagically list all pages in that category (meaning, that link there via category link). A category page can be edited like a normal page (the article list is automatically shown below), and can include "super-categories" (e.g., "[[Category:Biology]]" on "Category:Zoology"). Any page can, of course, have many categories.
You'll have to enable categories in LocalSettings, but I forgot how :-(
Magnus
Hello,
Maybe this should be activated on test.wikipedia.org so people can look at it and crash it ? Then we will have more feature requests to improve the system :)
cheers,
"MM" == Magnus Manske magnus.manske@web.de writes:
MM> Well, the code *I* wrote (which I assume is what you're MM> talking about) is finished, in the sense of "it works, but it MM> ain't too pretty".
OK. Right now, as far as I can tell, it doesn't seem to work, since it's commented out. I can find the part where it collects the category links, but I can't find the part where it renders them.
The whole thing is blocked out by a "CHECK MERGE" comment, which I guess means it was a merge problem.
MM> Example: "[[Category:Stuff]]" somewhere in the article will MM> show a link above (similar to language links), leading to the MM> category page ("Category:Stuff"). That will automagically list MM> all pages in that category (meaning, that link there via MM> category link). A category page can be edited like a normal MM> page (the article list is automatically shown below), and can MM> include "super-categories" (e.g., "[[Category:Biology]]" on MM> "Category:Zoology"). Any page can, of course, have many MM> categories.
Hrm. OK, I'm grokking part of this, at least. There's some stuff in the skin code to do category links. I'll see if I can reenable it in the unstable branch.
MM> You'll have to enable categories in LocalSettings, but I MM> forgot how :-(
$wgUseCategoryMagic.
So, I think that this is a nice piece of work. Since it's not in use in production -- couldn't be, with all that commented-out stuff -- how would you feel about me generalizing it to just another kind of metadata? That is, instead of trying to find [[Category:Foo]], use [[metadata:category:foo]] instead?
This would leave the door open for other kinds of metadata.
~ESP
Evan Prodromou wrote:
[about Magnus' category scheme]
So, I think that this is a nice piece of work. Since it's not in use in production -- couldn't be, with all that commented-out stuff -- how would you feel about me generalizing it to just another kind of metadata? That is, instead of trying to find [[Category:Foo]], use [[metadata:category:foo]] instead?
I think that you should work out the integration of Magnus' own scheme with the present code, recreating what we had on [[test:]] back then. The reason is that this [[test:]] appearance had a great deal of support; there were just a few kinks to work out when people lost interest (because there were /other/ things to test at the same time). With fresh blood (you), we can get it straightened out this time, get it running on a big wiki (like [[en:]]) and working for real; then we'll all be in a good position to say «What's still missing?» and decide if we want more extended metadata.
-- Toby
"TB" == Toby Bartels toby+wikipedia@math.ucr.edu writes:
TB> I think that you should work out the integration of Magnus' TB> own scheme with the present code, recreating what we had on TB> [[test:]] back then. [...] With fresh blood (you), we can get TB> it straightened out this time, get it running on a big wiki TB> (like [[en:]]) and working for real; then we'll all be in a TB> good position to say «What's still missing?» and decide if we TB> want more extended metadata.
I'd be perfectly happy to do that, but I'd prefer to do it with a syntax and design that's extensible to other metadata. That way, if or when it's endorsed, we can actually *use* the other metadata we think up.
Magnus's design is pretty cool, and I think all I'd have to do is tweak a couple of things to make it work with in a field-value pair context.
~ESP
Evan Prodromou wrote:
Toby Bartels wrote:
I think that you should work out the integration of Magnus' own scheme with the present code, recreating what we had on [[test:]] back then. [...] With fresh blood (you), we can get it straightened out this time, get it running on a big wiki (like [[en:]]) and working for real; then we'll all be in a good position to say «What's still missing?» and decide if we want more extended metadata.
I'd be perfectly happy to do that, but I'd prefer to do it with a syntax and design that's extensible to other metadata. That way, if or when it's endorsed, we can actually *use* the other metadata we think up.
Just change Magnus' [[categery:foo]] to [[category=foo]].
Magnus's design is pretty cool, and I think all I'd have to do is tweak a couple of things to make it work with in a field-value pair context.
Frankly, I find most of the metadata proposals on this thread rather fishy. (I mean, do we really need [[discussion=Talk:Foo]]???) But categories were popular, so you can get started. Then allowing more becomes a wiki-by-wiki decision, so Wikitravel's needs won't be limited by the cramped vision of me and The Cunctator (any many others) on the English Wikipedia. ^_^
-- Toby
Evan Prodromou wrote:
Anyways, I wanted to float a design idea for inclusion of metadata in MediaWiki articles. None of this is particularly new, and it seems like most has been suggested at one point or another.
While your definition of metadata is correct, and in full correspondence with every other definition, you fail to mention that there is also a "metadata movement" that has sprung up in the last ten years, as a reaction against the Internet, primarily among librarians. Today this movement is centered around Stu Weibel's "Dublin Core" metadata initiative at the Online Computer Library Center (OCLC) in Dublin, Ohio, http://www.dublincore.org/
The background was that librarians thought highly about themselves and considered the freedom of the Internet as a dirty "chaos" and a threat to their professionalism as information specialists. The Internet (i.e. the World Wide Web) was characterized as "the world's biggest library with a drunken librarian after an earthquake", or something like this. In this community, hierarchical web indexes such as the commercial Yahoo or the non-commercial Dmoz were welcomed, but considered insufficient. Full text search engines such as Altavista and Google were considered as barbaric failures. The library catalog was the ideal, and "metadata" was the promised solution. If they could only teach all website creators to become librarians and "mark" their web pages with "metadata" corresponding to what's found on a library catalog card, order would form out of the previous chaos.
Ten years later, of course, we know that non-hierarchical solutions such as Google and Wikipedia are the winners. Data and metadata are presented in-line, without separation. This is what works, and what people tend to use. There are no useful global metadata search engines, and website maintainers don't include structured (Dublin Core) and honest metadata in their web pages. On the contrary, the "meta" HTML tag is most often used for dishonest "spamdexing".
In the wiki world, the article on Angola does not contain any formalized, hierarchical metadata markup such as [[part of:Africa]] or [[borders:Namibia]], but instead the plain English phrase "Angola is a country in southwestern [[Africa]], bordering [[Namibia]]". That's what wiki contributors can learn and be made to use. It doesn't require more complex parsing, it doesn't require any special database tables, and it doesn't require hours of user training.
"LA" == Lars Aronsson lars@aronsson.se writes:
LA> In the wiki world, the article on Angola does not contain any LA> formalized, hierarchical metadata markup such as [[part LA> of:Africa]] or [[borders:Namibia]], but instead the plain LA> English phrase "Angola is a country in southwestern LA> [[Africa]], bordering [[Namibia]]". That's what wiki LA> contributors can learn and be made to use. It doesn't require LA> more complex parsing, it doesn't require any special database LA> tables, and it doesn't require hours of user training.
Some responses off the top of my head:
* I think it might have be useful to back away from the politically-loaded use of metadata and just concentrate on the definition I gave. Metadata doesn't have to be for the Semantic Web or some remote librarians; it can be useful for inhouse stuff.
I'm not particularly interested in the Semantic Web. I think Dublin Core is cool, but I think most of the Dublin Core data tags could be determined from the metadata (contributors, timestamps, license, etc.) we have now. So, this isn't really a Semantic Web/Dublin Core/yadda yadda issue.
* There _are_, in fact, systems using <meta> tags, although Wikitext metadata doesn't need to be rendered as <meta>. For example, GeoURL (http://geourl.org/) uses ICBM <meta> tags, which is pretty cool.
* Full-text search -- the Google approach -- is fine, but it sure does whack Wikipedia hard. It's disabled now. Some metadata-based navigation could be useful to offload some of the burden from full-text search.
* Wikipedia contributors do metadata markup now -- with Interlanguage tags. It's not unprecedented. It's not particularly hard.
* It should be obvious from the high number of [[list of X]] pages in Wikipedia that Wikipedia contributors think about categories and other metadata.
* Providing metadata makes our data more useful to downstream users. For example, consider if someone wanted to take Wikipedia's prodigious collection of information about science fiction and fantasy and produce an encyclopedia out of it. Metadata could help them do it.
I guess my point is that metadata isn't necessarily bad, authoritarian, or evil. It can make projects using MediaWiki more useful.
~ESP
Evan Prodromou wrote:
* Providing metadata makes our data more useful to downstream users. For example, consider if someone wanted to take Wikipedia's prodigious collection of information about science fiction and fantasy and produce an encyclopedia out of it. Metadata could help them do it.
You have to provide some incentive for the editor of an article to input the metadata. This is where Dublin Core utterly fails to attract followers among website maintainers. How are you going to succeed in attracting wikipedia article editors?
If I edit [[Jules Verne]] and input [[fr:Jules Verne]] I get the instant reward of a working link to the French article about the same person, which is sufficient motivation for my adding this markup.
But what immediate reward do I get for inputing something like [[genre:science fiction]]? If all I get is a link to the page [[science fiction]], then I already got that from mentioning in plain text that "Jules Verne was an early writer of [[science fiction]]". So what extra benefit is entering the metadata going to give me?
"LA" == Lars Aronsson lars@aronsson.se writes:
LA> You have to provide some incentive for the editor of an LA> article to input the metadata. How are you going to succeed LA> in attracting wikipedia article editors?
That's a worthwhile question. Here's some possible answers:
* Automatic indices. Use metadata to automatically create indices to the encyclopedia, like [[list of X]]. We could probably also do some fun automatic stuff with the timeline and date pages.
[[meta:year=1885,born]] [[meta:year=1923,died]]
* Metadata-guided search. Currently we have three levels of search: exact title (the "Go" button), title match, and full-text match. I'd say that a metadata search (probably placed, in order of value, between exact title and title match) would be helpful. We could leave it on (like "Go") even when full-text search was too compute-intensive.
* Metadata in search results: even for full-text search, it can be useful to return metadata. Like, if I search for "Springfield", it'd be kind of nice to see:
- Springfield is-a: city, is-part-of: Kentucky
(matching text here)
- Springfield is-a: city, is-part-of: Missouri
(matching text here)
- Springfield is-a: fictional city, genre: animation
(matching text here)
Yes, the presentation is lame -- I don't think we'd ever show raw tags like that. But you get the picture.
* Geographical proximity. Frankly, I think ICBM tags make the whole thing worthwhile, just on their own. But that's my own bete noire.
* Breadcrumb navigation. It's fairly cumbersome to write, in [[cyclotron]], that "A cyclotron is an [[instrument]] used in [[particle physics]] which is a branch of [[physics]] which in turn is a [[natural science]] which is a kind of [[science]]." After all, the article isn't about natural science -- why describe it from here?
But with metadata we could have a breadcrumb link thing that says:
sciences > natural sciences > physics > particle physics > instruments
Frankly, what I think is that we just need to have one or two applications of the metadata, and people are going to think up brilliant new ones. They'll send patches for MediaWiki to do it, or they'll send RFEs, or they'll develop their own bots or whatever.
In other words, I don't think we're going to need to worry about getting people into doing metadata; we're going to have to worry about keeping up!
~ESP
I'm being quite wooly here, but I think we may be able to avoid the kind of issues Lars seems to be worried (of remote, centrallised, irrelevant meta data) about by wikifing the semantics themselves, making them fluid and negotiable like the rest of the wiki. For example, I could write:
/King Harold died in [[meta:DiedIn:1066]] during the [[Battle of Hastings]] in [[that year|meta:Battle:1066]]. / Now, neither DiedIn or Battle need be defined beforehand. These could themselves be pages that describe further semantic relationships, eg that battles have a start and end time, participants and a location. These could be added by other participants (possibly prototyped in natural English with links to relevant wiktionary or wikipedia articles) along with references to existing relationships. And talk pages, of course.
Elsewhere I might write:
/Fred Bloggs lived until [[meta:LivedUntil:1977]]. /
This could be dealt with as a synonym for DiedIn in a way similar to page redirects. I'm concious that there are other examples where ways of stating relationships are incompatible, though I can't think of any at the moment. Consequently I can't concieve of how they might work yet.
There are relationships that are complementary, eg the ball is in the box if and only if the box contains the ball. How that and similar types of relationships can be 'understood' by the system to be equivalent without vast amounts of processing, I must admit I have no idea. Some of you seem to have studied this topic. Any thoughts?
Storing relationships and their semantics allow us to do things like automatically create timelines of battles and other categories of historical events. Webs of relationships between objects (and meanings of words) would start to be set up allowing other classification and possibly even reasoning of sorts to be performed. Whether these could be made practical to carry out, particularly as edits occur, I don't know.
There's lots more to be usefully said on this topic (possibly not be me, I don't think I really grasp it yet), but far more that is not useful. I hope the list will not be clogged by wild extrapolations, but debate around this topic may point to simple, implementable solutions.
However, is wikitech-L the place for such debate? Should there be a page (or pages) on meta.wiki.org called "applications of metadata" and "semantics in the wiki"?
Russell Jones
Evan Prodromou wrote:
"LA" == Lars Aronsson lars@aronsson.se writes:
LA> You have to provide some incentive for the editor of an LA> article to input the metadata. How are you going to succeed LA> in attracting wikipedia article editors?
That's a worthwhile question. Here's some possible answers:
- Automatic indices. Use metadata to automatically create indices to
the encyclopedia, like [[list of X]]. We could probably also do some fun automatic stuff with the timeline and date pages.
[[meta:year=1885,born]] [[meta:year=1923,died]]
- Metadata-guided search. Currently we have three levels of search:
exact title (the "Go" button), title match, and full-text match. I'd say that a metadata search (probably placed, in order of value, between exact title and title match) would be helpful. We could leave it on (like "Go") even when full-text search was too compute-intensive.
- Metadata in search results: even for full-text search, it can be
useful to return metadata. Like, if I search for "Springfield", it'd be kind of nice to see:
- Springfield is-a: city, is-part-of: Kentucky (matching text here) - Springfield is-a: city, is-part-of: Missouri (matching text here) - Springfield is-a: fictional city, genre: animation (matching text here)
Yes, the presentation is lame -- I don't think we'd ever show raw tags like that. But you get the picture.
- Geographical proximity. Frankly, I think ICBM tags make the whole
thing worthwhile, just on their own. But that's my own bete noire.
- Breadcrumb navigation. It's fairly cumbersome to write, in
[[cyclotron]], that "A cyclotron is an [[instrument]] used in [[particle physics]] which is a branch of [[physics]] which in turn is a [[natural science]] which is a kind of [[science]]." After all, the article isn't about natural science -- why describe it from here?
But with metadata we could have a breadcrumb link thing that says:
sciences > natural sciences > physics > particle physics > instruments
Frankly, what I think is that we just need to have one or two applications of the metadata, and people are going to think up brilliant new ones. They'll send patches for MediaWiki to do it, or they'll send RFEs, or they'll develop their own bots or whatever.
In other words, I don't think we're going to need to worry about getting people into doing metadata; we're going to have to worry about keeping up!
~ESP
Lars Aronsson lars-at-aronsson.se |wikipedia| wrote:
Evan Prodromou wrote:
Anyways, I wanted to float a design idea for inclusion of metadata in MediaWiki articles. None of this is particularly new, and it seems like most has been suggested at one point or another.
In the wiki world, the article on Angola does not contain any formalized, hierarchical metadata markup such as [[part of:Africa]] or [[borders:Namibia]], but instead the plain English phrase "Angola is a country in southwestern [[Africa]], bordering [[Namibia]]".
That is not very surprising, since meta tags are yet to be implemented.
That's what wiki contributors can learn and be made to use.
Many articles contain complicated and advanced usages of HTML, and that seem to work out fine. Editors who don't know HTML simply leaves it for someone else to maintain. Meta tags would be much simpler to understand, probably not much harder than the interwiki links. There is the risk of scaring away newcomers if the first impression of wiki-text is too frightening, but meta tags could easily be hidden by default, if that is found to be a problem. (Or just hidden at the bottom of the article).
It doesn't require more complex parsing, it doesn't require any special database tables, and it doesn't require hours of user training.
It also doesn't have the possibilites of metadata.
// E23
E23 wrote:
That is not very surprising, since meta tags are yet to be implemented.
Not only are they "not yet" implemented in MediaWiki, but they are not implemented in any wiki software anywhere, as far as I know. And this is not because the problem hasn't been considered, but because those who have considered the problem have chosen to leave out this feature.
Many articles contain complicated and advanced usages of HTML, and that seem to work out fine.
I don't think anybody is happy with the need for HTML markup in wiki pages.
It also doesn't have the possibilites of metadata.
You can of course implement this in the wiki parser and store the harvested metadata in a database table. But how will that database table be used? Will it be useful enough to motivate users to input the necessary metadata in Wikipedia pages? Go ahead and show me. I'll believe it when I see it. The measure of your success is the number of Wikipedia pages that contain such metadata. Today it is zero.
On Fri, 5 Dec 2003 06:15:22 +0100 (CET), Lars Aronsson wrote:
Many articles contain complicated and advanced usages of HTML, and that seem to work out fine.
I don't think anybody is happy with the need for HTML markup in wiki pages.
Hello,
I personally like being able to code html in wikipedia articles. That's the frist descriptive langage I learned. I don't see why we should relearn a new langage (wiki) when there is already one working well.
I mean <b> to bold a text is not THAT difficult to learn :)
cheers,
From: Ashar Voultoiz Friday, December 05, 2003 8:45 AM
On Fri, 5 Dec 2003 06:15:22 +0100 (CET), Lars Aronsson wrote:
Many articles contain complicated and advanced usages of HTML, and
that
seem to work out fine.
I don't think anybody is happy with the need for HTML markup in wiki pages.
Hello,
I personally like being able to code html in wikipedia articles.
That's
the frist descriptive langage I learned. I don't see why we should
relearn a
new langage (wiki) when there is already one working well.
I mean <b> to bold a text is not THAT difficult to learn :)
The reason is that wiki markup is designed to be more comfortably human-readable than is HTML markup.
== Big heading == [[Link]] has ''many'' ways of going: # One, # Two, # Three.
How about that!
is much terser and easier to parse than
<h2>Big heading</h2> <a href="?Link">Link</a> has <i>many</i> ways of going: <ol> <li>One, <li>Two, <li>Three. </ol> <p> How about that! </p>
And we operate on the general assumption that the majority of people are familiar with neither.
The Cunctator wrote:
The reason is that wiki markup is designed to be more comfortably human-readable than is HTML markup.
== Big heading == [[Link]] has ''many'' ways of going: # One, # Two, # Three.
How about that!
is much terser and easier to parse than
<h2>Big heading</h2> <a href="?Link">Link</a> has <i>many</i> ways of going: <ol> <li>One, <li>Two, <li>Three. </ol> <p> How about that! </p>
And we operate on the general assumption that the majority of people are familiar with neither.
Also, our own wiki markup gives us a better chance of producing the /correct/ HTML -- which in this case is <em>, not <i> at all. ^_^
-- Toby
E23 wrote:
Many articles contain complicated and advanced usages of HTML, and that seem to work out fine.
I don't think that it works out fine! Most of this is for tables and image display, which is why people work on simplifying wiki editing of these.
Editors who don't know HTML simply leaves it for someone else to maintain.
And that's whats not fine about it -- most editors ignore it.
This is not too relevant to the discussion /so long as/ we agree that any new metadata markup must be simple. Magnus' [[Category:foo]] markup is simple like that.
(Or just hidden at the bottom of the article).
I'm already one of the people that places interwiki links at the bottom.
-- Toby
"LA" == Lars Aronsson lars@aronsson.se writes:
LA> On the contrary, the "meta" HTML tag is most often used for LA> dishonest "spamdexing".
One other thought: I don't think this is a big issue in a Wiki context. I doubt that metadata spamming in Wikipedia is really worth the effort (say, putting a page in every single possible category).
It would probably just end up being reverted like other abuse or NPOV behavior.
~ESP
I only started lurking recently, but I thought I'd better decloak at this point.
Metadata is (IMHO) a generally good thing, and the reasons have already been given in this thread. The primary benefit is in making individual applications more useful, but longer term there is likely to be huge benefit through exchange of that metadata. The beginnnings of this are already visible with RSS feeds from blogs, news sites and Wikis.
From the immediate point of view of the developer, it makes sense to reuse
existing work - and there has been a *lot* done in this area. There's a reasonably mature standard for describing resources : the Resource Description Framework (RDF) - some refs below. The key part of this framework is the model, and I'd strongly recommend that this was followed in WikiMedia projects. There are toolkits/APIs for practically every language. (A lot of people get hung up on the RDF/XML syntax, which looks pretty painful, but in practice it's the model that's important and that's pretty simple).
This is all aside from how you'd get metadata from the Wiki - there's all the mechanical stuff (e.g. page creation/modification dates), stuff that can be semi-automated (e.g. author) - management of this is pretty straightforward to hook up to an existing system (most of it's probably already there in the DB tables). Then there's stuff that's entirely human-created. I reckon the MediaWiki is in an excellent position to lead the way on how this is done (what syntax etc.).
Cheers, Danny.
RDF Specs : http://www.w3.org/RDF/
RDF One-pager http://dsg.port.ac.uk/projects/uisb/docs/design/rdf_one_pager.pdf
RDF in 500 Words: http://www.dannyayers.com/docs/rdf500.htm
Some nice pdfs: http://www.semaview.com/resources/resources.html
One of the RDF vocabularies that's been getting a lot of attention recently is FOAF( friend-of-a-friend), which (amongst many other things) can allow you to express social relationships: Fred foaf:knows Bert. People are encouraged to add a 'profile' of themselves to their web site: http://www.foaf-project.org/
A couple of PHP APIs -
RAP: http://www.wiwiss.fu-berlin.de/suhl/bizer/rdfapi/
Redland: http://www.redland.opensource.ac.uk/docs/php.html
I've been working on RDF-enabling a Wiki myself, with the help of the Jena (Java) toolkit: http://66.70.191.189/cgi-bin/mt-search.cgi?IncludeBlogs=1&search=stiki
http://www.hpl.hp.com/semweb/jena.htm
Already the ModWiki and Dublin Core vocabularies are finding application in Wikis, but there's a lot more interesting and useful stuff could be done:
http://www.usemod.com/cgi-bin/mb.pl?ModWiki
It looks like 'categories' is a bit of a permathread around here - hopefully the ongoing work on RDF thesauri will help here:
"DA" == Danny Ayers danny666@virgilio.it writes:
DA> This is all aside from how you'd get metadata from the Wiki - DA> there's all the mechanical stuff (e.g. page DA> creation/modification dates), stuff that can be semi-automated DA> (e.g. author) - management of this is pretty straightforward DA> to hook up to an existing system (most of it's probably DA> already there in the DB tables). Then there's stuff that's DA> entirely human-created. I reckon the MediaWiki is in an DA> excellent position to lead the way on how this is done (what DA> syntax etc.).
A lot of what's considered important metadata, from a World Wide Web standpoint, could already be provided from the MediaWiki database. Scanning over the Dublin Core element set:
http://www.dublincore.org/documents/dces/
...I don't think there's much that we couldn't encode either as RDF or in HTML link/meta tags. Yeah, I know, Dublin Core is only one part of the magical world that is RDF and the Semantic Web and so forth.
Whether we do that for MediaWiki is a separate question from how and whether we do editor-assigned name-value pairs.
~ESP
Evan Prodromou wrote:
"DA" == Danny Ayers danny666@virgilio.it writes:
DA> This is all aside from how you'd get metadata from the Wiki - DA> there's all the mechanical stuff (e.g. page DA> creation/modification dates), stuff that can be semi-automated DA> (e.g. author) - management of this is pretty straightforward DA> to hook up to an existing system (most of it's probably DA> already there in the DB tables). Then there's stuff that's DA> entirely human-created. I reckon the MediaWiki is in an DA> excellent position to lead the way on how this is done (what DA> syntax etc.).
A lot of what's considered important metadata, from a World Wide Web standpoint, could already be provided from the MediaWiki database. Scanning over the Dublin Core element set:
http://www.dublincore.org/documents/dces/
...I don't think there's much that we couldn't encode either as RDF or in HTML link/meta tags. Yeah, I know, Dublin Core is only one part of the magical world that is RDF and the Semantic Web and so forth.
Whether we do that for MediaWiki is a separate question from how and whether we do editor-assigned name-value pairs.
~ESP
Yes: I've explored this in another project I'm working on, which is a Wiki-meets-database-meets-Semantic Web hybrid that is still at the early experimental stages of alpha development. Based on this, I think we can place our trust in the Wiki-nature first, and let bindings to the Semantic Web develop subsequent to this.
For example, it's likely that the Wikipedia community will choose [[author=...]] to mean the same as the http://purl.org/dc/elements/1.1/creator RDF predicate. Or maybe they'll want to write [[dc:creator=...]], to be extra sure?
It doesn't matter: we will naturally tend to absorb standards from the larger world, and things will sort themselves out.
-- Neil
"EP" == Evan Prodromou evan@wikitravel.org writes:
EP> Anyways, I wanted to float a design idea for inclusion of EP> metadata in MediaWiki articles.
I probably oversold this proposal by using the word "metadata", since it's such a loaded term. Introducing categorization as a possible metadata application, which is also a loaded subject for Wikipedia, might have also been a misstep.
Perhaps I should have stuck with "editor-assigned field-value pairs". All *I* want is a way to:
* define field-value pairs for a page
* define different ways to render these field-value pairs: as in-page links, as out-of-page links (like interlanguage links now), as <meta> or <link> tags in the <head> of the HTML output, not render them at all, or whatever.
In other words, I want a technical feature for the MediaWiki software. How that feature gets used in different MediaWiki installations should be up to the communities using the installation.
I'd like to make the feature powerful enough that it can handle categorization of Wikipedia, which is one possible if complex application. But if it can't, I'd still like to have the feature.
So, to boil things down again, here's the technical proposal:
* Editors can put markup into a page like this:
[[field:name:value]]
* "field" is a per-installation fixed string, defining this link as a data field. It probably should not conflict with any namespace, language code, or InterWiki link.
* "name" can be any string that won't interfere with other wikitext parsing and doesn't contain a ":"; it is the field name.
* "value" is any string that won't interfere with other wikitext parsing, and doesn't contain "]]". It is the field value.
* The field tag can go anywhere in the page text. There can be multiple field tags with the same name.
* When a page is saved, the fields are extracted and put into a database table called "fields". The table has the following columns:
field_id: an autoincrementing unique identifier field_cur_id : the cur_id of the cur row that contains this field field_name: the contents of the name part field_value: the contents of the value part
* When a page is rendered, the associated fields are retrieved, and depending on field name and the installation configuration, they can be:
* ignored. Just left out of the page entirely. * displayed as a link in-page. * displayed as text in-page. * displayed as a link out-of-page (like interlanguage links) * displayed as text out-of-page * displayed as a <meta> field-value pair * displayed as a <link> field-value pair
Any application of this feature is up to the installation. Auto-indexed pages, bread-crumb navigation, field-based search, field display in search results, directories, etc., could be implemented if needed.
~ESP
Evan Prodromou wrote:
"EP" == Evan Prodromou evan@wikitravel.org writes:
EP> Anyways, I wanted to float a design idea for inclusion of EP> metadata in MediaWiki articles.
I probably oversold this proposal by using the word "metadata", since it's such a loaded term. Introducing categorization as a possible metadata application, which is also a loaded subject for Wikipedia, might have also been a misstep.
Perhaps I should have stuck with "editor-assigned field-value pairs". All *I* want is a way to:
* define field-value pairs for a page * define different ways to render these field-value pairs: as in-page links, as out-of-page links (like interlanguage links now), as <meta> or <link> tags in the <head> of the HTML output, not render them at all, or whatever.
In other words, I want a technical feature for the MediaWiki software. How that feature gets used in different MediaWiki installations should be up to the communities using the installation.
I'd like to make the feature powerful enough that it can handle categorization of Wikipedia, which is one possible if complex application. But if it can't, I'd still like to have the feature.
So, to boil things down again, here's the technical proposal:
Editors can put markup into a page like this:
[[field:name:value]]
"field" is a per-installation fixed string, defining this link as a data field. It probably should not conflict with any namespace, language code, or InterWiki link.
"name" can be any string that won't interfere with other wikitext parsing and doesn't contain a ":"; it is the field name.
"value" is any string that won't interfere with other wikitext parsing, and doesn't contain "]]". It is the field value.
The field tag can go anywhere in the page text. There can be multiple field tags with the same name.
When a page is saved, the fields are extracted and put into a database table called "fields". The table has the following columns:
field_id: an autoincrementing unique identifier field_cur_id : the cur_id of the cur row that contains this field field_name: the contents of the name part field_value: the contents of the value part
When a page is rendered, the associated fields are retrieved, and depending on field name and the installation configuration, they can be:
- ignored. Just left out of the page entirely.
- displayed as a link in-page.
- displayed as text in-page.
- displayed as a link out-of-page (like interlanguage links)
- displayed as text out-of-page
- displayed as a <meta> field-value pair
- displayed as a <link> field-value pair
Any application of this feature is up to the installation. Auto-indexed pages, bread-crumb navigation, field-based search, field display in search results, directories, etc., could be implemented if needed.
~ESP
I agree completely with most of Evan's points.
Does anyone have any good ideas for the fixed string in Evan's scheme? Obviously, we can't use "meta".
Unless someone can think of a good string, I would suggest the [[name=value]] notation from my earlier post, which * eliminates the need for a per-installation fixed string, * the "=" char easily accessed in all international charsets, since it is needed to do math * does not conflict with any existing link scheme * needs fewer characters to type * does not look like ordinary links * is perhaps self-explanatory? * allows namespaces to be used within the fields themselves, eg: [[discussion=Wikipedia talk:Foo]], [[originally-from=en:Fish]]
-- Neil
"NH" == Neil Harris usenet@tonal.clara.co.uk writes:
NH> Does anyone have any good ideas for the fixed string in Evan's NH> scheme? Obviously, we can't use "meta".
NH> Unless someone can think of a good string, I would suggest the NH> [[name=value]] notation from my earlier post.
So, I like this, as it's visually distinct from namespaces.
My only caveat is that it would prohibit any article titles with "=" in the name. For example, [[F=ma]] or [[Silence = Death]] or something. I don't know if that's an issue.
Using a little namespace-like prefix would make that a moot point, although it's a little less clear.
I can't really think of an article that would or should have an equal sign in the name, so that's probably unnecessary.
~ESP
On Fri, 5 Dec 2003, Evan Prodromou wrote:
So, I like this, as it's visually distinct from namespaces.
My only caveat is that it would prohibit any article titles with "=" in the name. For example, [[F=ma]] or [[Silence = Death]] or something. I don't know if that's an issue.
Not really, because article titles with '=' go to the title without '=' instead anyway.
Andre Engels
Evan Prodromou wrote:
- Editors can put markup into a page like this:
[[field:name:value]]
* "field" is a per-installation fixed string, defining this link as a data field. It probably should not conflict with any namespace, language code, or InterWiki link.
* "name" can be any string that won't interfere with other wikitext parsing and doesn't contain a ":"; it is the field name.
* "value" is any string that won't interfere with other wikitext parsing, and doesn't contain "]]". It is the field value.
* The field tag can go anywhere in the page text. There can be multiple field tags with the same name.
You know, if you combine with <somebody else>'s suggestion to employ "=" and reduce the overload on ":", then we could get a very nice general system here.
To start with, notice that interwiki links behave a bit oddly right now. On [[meta:]], I can (or could once) link to [[fr:]] with the link [[fr:Fou]]. But on [[en:]], that won't work, the link [[fr:fou]] behaves specially. It might be clearer if ":" was always for links and "=" always for magic.
But ":" magic wouldn't work unless turned on -- by recognising the "name". So the format is [[name:value]], and we might start by recognising: * [[en=foo]], [[fr=fou]], [[eo=fu]] and other interwiki links * [[category=nonsense word]] But maybe Wikitravel will decide that it wants more.
-- Toby
"TB" == Toby Bartels toby+wikipedia@math.ucr.edu writes:
TB> You know, if you combine with <somebody else>'s suggestion to TB> employ "=" and reduce the overload on ":", then we could get a TB> very nice general system here.
Yes, I like it too.
~ESP
So, I've made this a proposed feature on meta:
http://meta.wikipedia.org/wiki/Field-value_pairs
I think I got it close to what came up on this list.
There's also an RFE on SourceForge:
http://sourceforge.net/tracker/index.php?func=detail&aid=855890&grou...
Thanks to everyone who refined this fuzzy concept into something implementable.
~ESP
"EP" == Evan Prodromou evan@wikitravel.org writes:
EP> So, I've made this a proposed feature on meta:
EP> http://meta.wikipedia.org/wiki/Field-value_pairs
EP> I think I got it close to what came up on this list.
I also added a proposal for applying this feature to categorization; see here:
http://meta.wikipedia.org/wiki/Categorization_with_field-value_pairs
Thanks,
~ESP
* Evan Prodromou evan@wikitravel.org | | I also added a proposal for applying this feature to categorization; | see here: | | | http://meta.wikipedia.org/wiki/Categorization_with_field-value_pairs
Hi. I'm new to the list but have been following the discussion regarding metadata with some interest. I welcome this discussion because I believe that Wikipedia could gain a lot from adding "metadata" to the articles. I am a complete newbie when it comes to Wikipedia but have been interested in the categorisation of content for a while. I have just had a look at the above document and wish to make a few comments. There has also just been a flourish of mails on the list and this post applies to them as well.
1. What actually meant by category?
I believe that the relationship "category" is perhaps being used too liberally and that it could lead to problems down the track. Certainly I can see some problems in the example already given. The main problem I see so far is that category has been used to cover different concepts: "type/instance", "type/subtype", "whole/part" and "related". Mixing these up will cause problems for those wanting to etract the metadata and do something with it. From a web surfers or a display perspective it looks OK because these three concepts can some kind of containment. Type pages can list instances. Type pages can list subtypes, pages can list related.
I'll take a quick look at one of the examples given so far to show what I mean:
Example A:
w:William Carlos Williams [[category=American poets]] [[category=20th-century Americans]]
In this example category is being used as a type/instance: - William Carlos Williams IsA American Poet - William Carlos Williams IsA 20th-century American
Example B:
w:American poets [[category=Americans]] [[category=poets]]
In this example category is being used as a type/subtype - American poet IsASubType of American - American poet IsASubType of Poet
Example C:
w:Canada [[category=History of Canada]]
In this later example it is being used as a "related" link. Maybe this could be considered a "facet" of the article. - History of Canada isAbout Canada
Example D:
w:Muffler [[category=Car]]
And here is a "part/whole" relationship thrown in for good measure. - Muffler IsAPartOf Car
The point is that "category" is really disguising all of these "containment" relationships. I believe that it would be a big mistake for users to apply the term liberally across the whole site as you are really hiding just as much as you are revealing. You will have achieved you aim of being able to provide lists but I also believe that you would have missed a big opportunity to provide some structure to the textual content you already have.
I therefore beieve that a reaosnable case could be made for the following basic roles: [[category=x]] type/instance [[superType=x]] type/subtype [[related=x]] related [[whole=x]] part/whole
IMO the "category" or type/instance relationship will be the one most commonly used. This is where all of the low hanging fruit is - what with the list pages and all.
There are a few further short comings (but I guess that these were non aims anyway) - How do you express properties?. eg: Elvis hasProperty diedAtAge 42. - How do you express other relationships? eg Elvis played Rock
If you were going to deal with these two situations then things get a bit more complicated syntax wise. You would also need to loosen up the kinds of roles which were allowed.
2. Where can assertions be made?
The current formulation assumes that the current page is the subject, however, there will be circumstances where this won't bethe case. This will happen in situations where it is easier to group assertions together. For example, on any "lists" page an author is listing out instances of a certain type (x isA y). In this case x is the subject of the assertion but not the subject of the page - the subject is "List of Ys". It therefore might be of benefit to be able to specify all three parts of a triple on these pages.
eg.List of Colours red [red: category=colour] blue [blue: category=colour]
[nb: I am not proposing a syntax here - just showing that all three parts of a triple should be able to be expressed.]
This means that type information can be managed in one place and that the author doesn't have to open up every Colour page to add the info. If this were possible then it would be very easy to mark up type information quickly. The same could probably be said for timelines. However, I do note that the rationale for the proposal was to avoid just this type of thing as it is tedious to keep lists of things. Personally, I do agree with this rationale, however, it might be more natural for the maintainers of lists to specify all of the relationships in one place. This would lead to duplication of information however.
A stronger case for being able to supply a full triple can be made by considering that some pages deal with complex relationships and it makes sense to be able to include the full triple on that page.
eg.Complementary Colours Blue and Green should never be seen. [blue:clashesWith=green] Blue and Orange are cool. [blue:complements=orange]
These last features probably fall outside the 80/20 for the categorisation system. I'm just flagging them as something that users may want to do in the future.
[I have just seen that there has been some recent conversion on this on the list!]
3. What do the parts of the triple represent?
The point has been made by others that all parts of the triple can be represented by Wikipedia pages. IMO this is an amzingly strong point of the system. ie. Every page represents a subject and the subjects can be all parts of a triple: "objects" and "predicates" as well as "subjects". Don't forget the predicate!
The area where it falls down is in trying to represent discrete values which are generally represented by properties - the value doesn't really represent a subject. eg. Elvis ageAtDeath 42 Life hasMeaningOf 42
"42" could be a subject in Wikipedia by being given a page (eg. this has been done for representing years) but this doesn't mean that all property assignments should necessariy link to this page. In the Elvis example it probably shouldn't link. In the Life example it could.
So, I am arguing that a metadata system needs a way of specifying plain property values. I guess that maybe I should get with the Wiki program and realise that "42" could just be a blank page waiting for someone to add some content to it. Wikis are very good at growing in the areas they need to.
4. A look at some of the Disadvantages listed at http://meta.wikipedia.org/wiki/Categorization_with_field-value_pairs
* Huge amount of manual work to do to categorize English Wikipedia
There is a hell of a lot of info contained in the lists pages. Wikipedia is largely about proper nouns and instances of things. This is all contained in the lists pages. You can get up and running very quickly with this.
* Lists of articles and sub-categories may get unreasonable long (although this can be ameliorated with careful categorization schemes).
If you output the lists with a script then in must be able to support paging. Imagine a list of all of the "people" in Wikipedia. Lists of types high in the type hierarchy will always have many instances.
* Unclear generalization of what the "type" field means. What other types would there be?*
I didn't understand what you were trying to achive with [[type=category]]. Assuming that categories are for type/instance relationships then you could have:
w:American poets [[category=type]] - American poets IsA Type
This way you can have a "Types" page which would list American Poet and all other types in the system. I would just drop the the [type=x] because that is exactly what [category=] is doing, assuming that category represents type/instance :)
* Unclear whether categories refer to the article or to the subject of the article. For example, would [[category=biography]] be appropriate fo r w:William Carlos Williams?*
Good question. In general I would say the "subject of the article". When I say "Blue isA Colour" I am talking about the subjects (Blue and Green), not about the articles (the resources). This proposal is therefore about making assertions between subjects, rather than applying metadata to articles.
Having said that, the biography is clearly about the article and I would feel uneasy about saying [[category=biography]] as it would not be strictly correct. The subject is a person not an article. If you were doing it properly you would have to create a "articleType" role, [[articleType=biography]]. If you were being anal about it and wanted to signify that the articleType role was not about the subject you could say about article type [[category=resourceRole]]. Also, "biogrpahy" could be an instance of "article type". Wikipedia readers would be happy enough to see "Article Type: Biography" and a link to "biography" which would list all of the biographies.
Erik has just stated that: | 2) Category pages are not articles. Like talk pages and meta pages, they | should be logically separated from articles, which has numerous benefits | (easier searching/filtering, counting etc.)
I would disagree and so that all pages represent subjects. Each article is designed to represent a subject in a computer in an addressable way. The subjects can be anything we wish to talk about - instances and types. This is a huge advantage for Wikipedia - all subjects are treated the same. Why should they be logically seperated. You already have a create system for managing the textual content for all subjects. This proposal is just a layer on top. Seperation = complexity. The system is ultra flexible at the moment and drawing lines through it would be a difficut task.
eg. The "american poet" page has some text which explains what an american poet is. It then gets the added goodies of the categorisation system offers.
* May get abused for purposes other than categorization where more specific fields may be more appropriate. For example, a status field for articles ([[status=stub]]) may be useful, but could be preempted by [[category=stub]] before the "status" field is implemented.*
This is the same case as [[category=biography]]. The assertion is about the article not the subject. So, if it were possible to create new roles for things such as status you could do:
Article X [[status=stub]]
Status [[category=resourceRole]]
resourceRole [[category=type]]
* Fuzzy distinction between part-whole relationships and category-member relationships.*
I agree that this is a disadvantage for the reasons outline above. I think that you can get around this by specifying that "category=" can only be used for type/instance kinds of relationships. Don't forget type/subtype and related as well.
5. Back to work
Maybe the subject line of this email is a bit erroneous - I've broadened things out. Sorry about that. Just a few thoughts I had to get down. You might want to take a look at an italian opera topic map which provides examples of Subject Indexes, Resource Indexes and Role Indexes, as well as the instances of these types. There are a few features in this topic map which we would never be able to do in wikipedia but the basics are achievable with a few well selected relationship types.
http://www.ontopia.net/omnigator/models/topicmap_complete.jsp?tm=opera.x tm
cheers
Murray Woodman -- http://www.murraywoodman.com/ http://www.veryhappening.com/ http://www.topicmap.com/
* Murray | I therefore beieve that a reaosnable case could be made for the | following basic roles: | [[category=x]] type/instance | [[superType=x]] type/subtype | [[related=x]] related | [[whole=x]] part/whole
And of course i forgot one: contains. Thanks for reminding me Toby.
* Toby | That's why [[List of biology topics]] exist now, of course.
Biology [[contains=Aerobiology]]
cheers
Murray
From: Murray Woodman Sent: Thursday, December 11, 2003 7:46 PM To: Wikimedia developers Subject: Re: [Wikitech-l] Re: Narrowing focus
- Murray
| I therefore beieve that a reaosnable case could be made for the | following basic roles: | [[category=x]] type/instance | [[superType=x]] type/subtype | [[related=x]] related | [[whole=x]] part/whole
And of course i forgot one: contains. Thanks for reminding me Toby.
- Toby
| That's why [[List of biology topics]] exist now, of course.
Biology [[contains=Aerobiology]]
My god, this is getting ugly. Please don't take that judgment personally--I've had plenty of flights of baroque fancy when I've engaged in a brainstorming process, and I've been helped when someone reminded me that the goal was to make something that's usable by normal people, not something that's perfect and insanely complex.
One reason this should be obvious is that the idea of needing a "related" category when that is what is already done by having links in a non-hierarchical structure is silly.
Simple is better. Natural is better. Don't try to reinvent language or account for every possibility, because you'll fail.
Instead, look at what already works and identify what the gaps are, and start from that point.
* The Cunctator cunctator@kband.com
| My god, this is getting ugly. Please don't take that judgment | personally--I've had plenty of flights of baroque fancy when I've | engaged in a brainstorming process, and I've been helped when someone | reminded me that the goal was to make something that's usable by | normal people, not something that's perfect and insanely complex. | ... | Simple is better. Natural is better. Don't try to reinvent language or | account for every possibility, because you'll fail. | | Instead, look at what already works and identify what the gaps are, | and start from that point.
Sure. I'm sorry if it was a bit over the top. I was just throwing a few ideas into the ring. I did however, state somewhere in the ramble that "category" dealing with type/instance is what most people would use, ie. wikipedia is mostly about proper nouns and instances and that "category" is good for this. Category will be fine in the majority of places. "Things should be made as simple as possible, but not any simpler."
I can just see problems when "category" it is used in a way which it loses its meaning. If you guys want to build type hierarchies and subject containment then "category" might be too simple in these cases. I just tried to show this. If it is a hold all for everything then a "lists" page as it currently stands already serves that purpose. You have gained only a little over what the lists pages offer. If you want to provide something which has wider applications then maybe a couple of more relationship types (type/subtype, contains) would be very helpful.
cheers
Murray
From: Murray Woodman on Thursday, December 11, 2003 8:42 PM
<snip>
I can just see problems when "category" it is used in a way which it loses its meaning. If you guys want to build type hierarchies and subject containment then "category" might be too simple in these
cases.
I just tried to show this. If it is a hold all for everything then a "lists" page as it currently stands already serves that purpose. You have gained only a little over what the lists pages offer. If you want to provide something which has wider applications then maybe a couple
of
more relationship types (type/subtype, contains) would be very
helpful.
That's a valid criticism; at the same time I think it might actually be reasonably considered a point in its favor that we start with something that's only a little better than what we presently have.
In other words, if we start with trying to improve what we currently have (the hackish list pages), rather than grafting on something completely different and of a new order of complexity, a) we're much less likely to do something we will later regret b) wide-spread adoption will be much more likely c) new contributors to Wikipedia are more likely to understand it
etc.
At the same time, I'd like us to work on making it easier for people to experiment with Wikipedia content, so that people with interesting, complicated (but possibly very good) ideas can try them without running the risk of screwing up the core project.
The classic OS-application divide. Or for another analogy, Google does a very good job of keeping its core utility simple but allowing for experiments on the side or the outside to happen.
It's easier for a search engine than a hyperlinked omnipedia, but the principle should remain the same.
wikitech-l@lists.wikimedia.org