Sounds to me like it would be a sort of "expert" system. Consider:
The Wikipedia article Anne Frank could have tags at the bottom like
{WasBorn:1929} (IsA:Woman} {Wrote:The_Diary_Of_A_Young_Girl} {BornIn:Germany}
you get the idea. These facts or relationships tie together different articles in a more structured way. Presumably a user interface system could then be written that would take all the data and be able to do things like list off all the female german authors of the twentieth century.
Since facts aren't tied to one language, the "relationship database" could span all languages (since there is already a system for linking articles on the same subject to different languages). A relationship would have a certain ID internally with "translations" to different languages. So after typing in: {BornIn:Germany} in the english version, the computer would find the french equivelent of the relationship -BornIn- and the french equiv of Germany and update the french Anne_Frank article accordingly.
Anyway, I'm no specialist in the area, I do know that there are some internet projects of this sort trying to teach computers "common sense" {Lion IsA Animal} I don't know how much success they've had. It might be usefull, but I'm not so sure it fits into wikipedias mandate.
Just my three cents.
Matt
--------------------------------- Post your free ad now! Yahoo! Canada Personals
How about, "BnPlc", "Gndr", "BnTim", etc?
And why not store this sort of information in an entirely separate offsite database? For example, http://www.sklop-tsikn.dom/wiki/Lacabacadacanaca could read:
Lang= (article language) Type= (article type, ie Plc place, Psn person, Tim time, Cpt concept, etc) Plc.lng= (longitude) Plc.lat= (latitude) Plc.alt= (alternative names) Plc.ati= (altitude) Plc.att= (attitude) Plc.pop= (population) Plc.are= (area) Pof= (is a part of...) Icl= (includes... [things that are a part of it]) Rel= (otherwise related)
Mark
On Mon, 14 Mar 2005 12:43:29 -0500 (EST), Matt Kingston matt_kingston@yahoo.com wrote:
Sounds to me like it would be a sort of "expert" system. Consider:
The Wikipedia article Anne Frank could have tags at the bottom like
{WasBorn:1929} (IsA:Woman} {Wrote:The_Diary_Of_A_Young_Girl} {BornIn:Germany}
you get the idea. These facts or relationships tie together different articles in a more structured way. Presumably a user interface system could then be written that would take all the data and be able to do things like list off all the female german authors of the twentieth century.
Since facts aren't tied to one language, the "relationship database" could span all languages (since there is already a system for linking articles on the same subject to different languages). A relationship would have a certain ID internally with "translations" to different languages. So after typing in: {BornIn:Germany} in the english version, the computer would find the french equivelent of the relationship -BornIn- and the french equiv of Germany and update the french Anne_Frank article accordingly.
Anyway, I'm no specialist in the area, I do know that there are some internet projects of this sort trying to teach computers "common sense" {Lion IsA Animal} I don't know how much success they've had. It might be usefull, but I'm not so sure it fits into wikipedias mandate.
Just my three cents.
Matt
Post your free ad now! Yahoo! Canada Personals _______________________________________________ Wikipedia-l mailing list Wikipedia-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikipedia-l
Hoi Sounds like relational data to me and, Wikidata is going to happen :) Thanks, GerardM
Matt Kingston wrote:
Sounds to me like it would be a sort of "expert" system. Consider:
The Wikipedia article Anne Frank could have tags at the bottom like
{WasBorn:1929} (IsA:Woman} {Wrote:The_Diary_Of_A_Young_Girl} {BornIn:Germany}
you get the idea. These facts or relationships tie together different articles in a more structured way. Presumably a user interface system could then be written that would take all the data and be able to do things like list off all the female german authors of the twentieth century.
Since facts aren't tied to one language, the "relationship database" could span all languages (since there is already a system for linking articles on the same subject to different languages). A relationship would have a certain ID internally with "translations" to different languages. So after typing in: {BornIn:Germany} in the english version, the computer would find the french equivelent of the relationship -BornIn- and the french equiv of Germany and update the french Anne_Frank article accordingly.
Anyway, I'm no specialist in the area, I do know that there are some internet projects of this sort trying to teach computers "common sense" {Lion IsA Animal} I don't know how much success they've had. It might be usefull, but I'm not so sure it fits into wikipedias mandate.
Just my three cents.
Matt
Yes I meant it to be relational data. Maybe a simple version. One table per index kind. And on Wiki pages data-records like:
Person = {Wiki: en.wikipedia.org/Anne_Frank, Name: Anne Frank, Born: 1929, Died: 1944, ....} Writer = {Wiki: en.wikipedia.org/Anne_Frank, Name: Anne Frank, Language: German, Duthch} WWII_Victims = {Wiki: en.wikipedia.org/Anne_Frank}
The parser would add the data to the apropriate tables.
Gerard, what do you mean by, "it will happen"?
-----Original Message----- From: Gerard Meijssen [mailto:gerard.meijssen@gmail.com] Sent: Tuesday, March 15, 2005 7:29 AM To: wikipedia-l@Wikimedia.org Cc: k.andris@gmail.com Subject: Re: [Wikipedia-l] Re:WikiIndex (idea)
Hoi Sounds like relational data to me and, Wikidata is going to happen :) Thanks, GerardM
Matt Kingston wrote:
Sounds to me like it would be a sort of "expert" system. Consider:
The Wikipedia article Anne Frank could have tags at the bottom like
{WasBorn:1929} (IsA:Woman} {Wrote:The_Diary_Of_A_Young_Girl} {BornIn:Germany}
you get the idea. These facts or relationships tie together different articles in a more structured way. Presumably a user interface system could
then be written that would take all the data and be able to do things like list off all the female german authors of the twentieth century.
Since facts aren't tied to one language, the "relationship database" could span all languages (since there is already a system for linking
articles on the same subject to different languages). A relationship would have a certain ID internally with "translations" to different languages. So after typing in: {BornIn:Germany} in the english version, the computer would find the french equivelent of the relationship -BornIn- and the french equiv of Germany and update the french Anne_Frank article accordingly.
Anyway, I'm no specialist in the area, I do know that there are some internet projects of this sort trying to teach computers "common sense"
{Lion IsA Animal} I don't know how much success they've had. It might be usefull, but I'm not so sure it fits into wikipedias mandate.
Just my three cents.
Matt
Matt Kingston wrote:
Sounds to me like it would be a sort of "expert" system. Consider:
The Wikipedia article Anne Frank could have tags at the bottom like
{WasBorn:1929} (IsA:Woman} {Wrote:The_Diary_Of_A_Young_Girl} {BornIn:Germany}
How about this: [[Category:1929 births]] [[Category:German authors]] [[Category:Women]]
and on the article for "The Diary of a Young Girl" we could put [[Category:Books by Anne Frank]]
Since facts aren't tied to one language, the "relationship database" could span all languages (since there is already a system for linking articles on the same subject to different languages). A relationship would have a certain ID internally with "translations" to different languages. So after typing in: {BornIn:Germany} in the english version, the computer would find the french equivelent of the relationship -BornIn- and the french equiv of Germany and update the french Anne_Frank article accordingly.
How about interwiki links on the category pages?
Categorization is pretty free-form at the moment, but standards have been developing and the category structures have been slowly coalescing so that eventually it might be adequate for these sorts of activities. The birth/death categories are very standardized, the nationalities and occupations less so but still useable. Most book articles don't have authorship categories to them, but there are genre and publication date categories most fall into.
There's no [[category:women]], but if there was a need for one it could theoretically be created. Might not be popular at the moment, though, considering how categories currently work; that'd be a fearsomely huge category and splitting the nationality/occupation/birthdate categories up instead would be rather awkward. Maybe once there are better category-sorting functions supported (intersections, unions, etc.)
We're 90% of the way to what you've suggested already, let's not reinvent the wheel and recategorize every article in the Wikipedias if we can help it. :)
Bryan Derksen wrote:
Matt Kingston wrote:
Sounds to me like it would be a sort of "expert" system. Consider: The Wikipedia article Anne Frank could have tags at the bottom like
{WasBorn:1929} (IsA:Woman} {Wrote:The_Diary_Of_A_Young_Girl} {BornIn:Germany}
How about this: [[Category:1929 births]] [[Category:German authors]] [[Category:Women]]
and on the article for "The Diary of a Young Girl" we could put [[Category:Books by Anne Frank]]
I would question the wisdom and usefulness of a separate category for all the books of any person who only wrote one book.
Ec
How about this: [[Category:1929 births]] [[Category:German authors]] [[Category:Women]]
Hi all, this is my first mail to the list,
Firstly, a small gripe of mine: The current category system does not recursively list pages. Consider a Category:Cats, with two subcategories; Category:WhiteCats and Category:BlackCats. If I go the the Category:Cats page, I am not shown any pages that are part of Category:WhiteCats or Category:BlackCats. Also, someone could make Category:BlackCats have a subcategory Category:Cats. This creates a loop in the category hierarchy. Cats -> BlackCats -> Cats. In my opinion: - When displaying Category:Cats, all members of this category and subcategories should be displayed to the user. - If a user adds a subcategory to a category, wikipedia should check for loops as a post-condition of the edit, and reject the edit if it does create a loop.
Now, This thread is about adding meta data (data about the data) to pages. Broadly speaking, there seem to be two suggestions put forward here to accomplish the same goal.
1. To use better categories to store data about page. 2. To invent a new, separate way of classifying pages.
There are benefits and drawbacks to associated with each suggestion:
1. To use better categories to store data about page:
This method does not store any information about the links between data. For example, there are subtle semantic differences between defining a person as Category:Irish and defining a ship as Category:Irish.
2. To invent a new, separate way of classifying pages:
This way of adding meta data to pages would be custom designed to achieve the goal, and would be better than the current category system. However, new code would have to be written to add this, which would take time, labour, money and introduce bugs that would need to be fixed, costing more time, labour and money. The current Category system exists, and works. I therefore feel that it would be a better decision to augment the current category system, than to create a second system that shares a lot of common features with the category system. Also, most of the pages in Wikipedia currently have categories, and if a new system was created, all the existing pages would have to be tagged (mostly by hand) for the new system.
If I were in charge of Wikipedia, I would have an overhaul of the current categories, removing the very specific categories, and adding the ability to perform the basic set of set operations to categories.
For example: Currently these exists a Category:Irish_Poets. I would delete this category and add all pages in it to Category:Irish and Category:Poet. If a user wants list of Irish Poets, he would ask wikipedia for (Irish) Intersection (Poets).
Also I would create a category Person. All people in Wikipedia would be part of this category. Also I would create Category:Men and Category:Women. These would be subcategories of Category:Person. All people in Wikipedia would be part of either Category:Men or Category:Women. If a user is looking for articles about people, they can simply search within Category:Person to find the information that they are looking for.
Now we must think about data storage. consider the article on Liam Neeson. Do we mark it as Category:Person and Category:Men? Or do we mark the page as just Category:Men. If we store it as both we have redundancy - it can be deduced from the fact that a page is part of Category:Men that it must be part of Category:Person too, since Person is a superset of Men. But for Wikipedia to reason that a Man is a Person takes computation. The fact that a Man is a Person should never change, so we could make the page as Category:Person automatically when a page is marked Category:Men. We are storing more information that is necessary, but future reasoning becomes faster.
Disjointness must also be considered: Category:Men is disjoint from Category:Women. How can we store this information in the database? We may have to add a new tag to category pages:
Category:Men [[DisjointFrom:Woman]]
(A page cannot be a member of both Men and Women - For this sake of argument please ignore trans genders, transsexuals etc, this is just a blackboard example.)
Also, when someone adds the above tag, the page Category:Woman must automatically be marked as [[DisjointFrom:Men]].
What happens when a wikipedia user edits the Liam Neeson page and adds Category:Women? Should wikipedia search what categories have been marked as disjoint (this is computationally expensive operation), and not add not allow the edit?
Also, with the category system we could save Categories as virtual Categories. For example, consider that a VCategory is a virtual category:
VCategory:Irish_Poets would simply be a re-direct to: Category:Irish (intersection) Category:Poets
What do other people think?
Best Regards,
Marc O'Morain
Marc O'Morain wrote:
Also I would create a category Person. All people in Wikipedia would be part of this category. Also I would create Category:Men and Category:Women. These would be subcategories of Category:Person. All people in Wikipedia would be part of either Category:Men or Category:Women. If a user is looking for articles about people, they can simply search within Category:Person to find the information that they are looking for.
(A page cannot be a member of both Men and Women - For this sake of argument please ignore trans genders, transsexuals etc, this is just a blackboard example.)
Hermaphrodites excepted.
What happens when a wikipedia user edits the Liam Neeson page and adds Category:Women? Should wikipedia search what categories have been marked as disjoint (this is computationally expensive operation), and not add not allow the edit?
Also, with the category system we could save Categories as virtual Categories. For example, consider that a VCategory is a virtual category:
VCategory:Irish_Poets would simply be a re-direct to: Category:Irish (intersection) Category:Poets
What do other people think?
The biggest disadvantage that this proposal carries is that it is so logical.
I remember raising something of the sort before categories became a reality. It would certainly save us from developping a lot of one item categories.
Ec
I'd also agree to augmenting the existing category system, so that the categories are somehow logical to one another, i.e.
On the page on Liam Neeson, having the categories in order
[[Category:Human]] [[Category:Men]] [[Category:Americans]] [[Category:Actors]]
Would place him in a logical tree from human->men->Americans->actors. It could be arranged differently, maybe with other Category tags that are automatically hidden, similar as to those that were earlier suggested:
[[Gender:Male]] [[Animal:Homo Sapiens]] [[Nationality:American]]
The Category+ tags would be set in a "All Category Tags" special page, so they would be translated for each language wikipedia, but be useful across wikipedias, enabling cross-wiki searching/indexing. I would hope to see some basic categories that allow any article to be categorized, and logically structured in the Wiki-DB, such as: Person, Place, Animal, Thing, Event (wars, time periods, etc.), then Fictitious/Non-Fictitious, then so forth.
I would like to see recursive category listings, so that if an article is in Cats, Black Cats, then on the category page for "Black Cats", I can click on Cats to go one category up, and find other colored cats.
Page: Category: Cats Subcategory: Black Cats
------------------
If it gets to more than 3 subcategories down, then have the uppermost category at top, a vertical ellipsis, then the bottom two categories. The ellipsis would be clickable to reveal the hidden categories that are hidden from view to save page space.
Page: Cats : : Subcategory: Himalayan Cats in Entertainment Subcategory: Cats on Television
------------------------
Something like that.
James
-----Original Message----- From: wikipedia-l-bounces@Wikimedia.org [mailto:wikipedia-l-bounces@Wikimedia.org] On Behalf Of Marc O'Morain Sent: Tuesday, March 15, 2005 7:25 PM To: wikipedia-l@wikimedia.org Subject: Re: [Wikipedia-l] Re:WikiIndex (idea)
How about this: [[Category:1929 births]] [[Category:German authors]] [[Category:Women]]
Hi all, this is my first mail to the list,
Firstly, a small gripe of mine: The current category system does not recursively list pages. Consider a Category:Cats, with two subcategories; Category:WhiteCats and Category:BlackCats. If I go the the Category:Cats page, I am not shown any pages that are part of Category:WhiteCats or Category:BlackCats. Also, someone could make Category:BlackCats have a subcategory Category:Cats. This creates a loop in the category hierarchy. Cats -> BlackCats -> Cats. In my opinion: - When displaying Category:Cats, all members of this category and subcategories should be displayed to the user. - If a user adds a subcategory to a category, wikipedia should check for loops as a post-condition of the edit, and reject the edit if it does create a loop.
Now, This thread is about adding meta data (data about the data) to pages. Broadly speaking, there seem to be two suggestions put forward here to accomplish the same goal.
1. To use better categories to store data about page. 2. To invent a new, separate way of classifying pages.
There are benefits and drawbacks to associated with each suggestion:
1. To use better categories to store data about page:
This method does not store any information about the links between data. For example, there are subtle semantic differences between defining a person as Category:Irish and defining a ship as Category:Irish.
2. To invent a new, separate way of classifying pages:
This way of adding meta data to pages would be custom designed to achieve the goal, and would be better than the current category system. However, new code would have to be written to add this, which would take time, labour, money and introduce bugs that would need to be fixed, costing more time, labour and money. The current Category system exists, and works. I therefore feel that it would be a better decision to augment the current category system, than to create a second system that shares a lot of common features with the category system. Also, most of the pages in Wikipedia currently have categories, and if a new system was created, all the existing pages would have to be tagged (mostly by hand) for the new system.
If I were in charge of Wikipedia, I would have an overhaul of the current categories, removing the very specific categories, and adding the ability to perform the basic set of set operations to categories.
For example: Currently these exists a Category:Irish_Poets. I would delete this category and add all pages in it to Category:Irish and Category:Poet. If a user wants list of Irish Poets, he would ask wikipedia for (Irish) Intersection (Poets).
Also I would create a category Person. All people in Wikipedia would be part of this category. Also I would create Category:Men and Category:Women. These would be subcategories of Category:Person. All people in Wikipedia would be part of either Category:Men or Category:Women. If a user is looking for articles about people, they can simply search within Category:Person to find the information that they are looking for.
Now we must think about data storage. consider the article on Liam Neeson. Do we mark it as Category:Person and Category:Men? Or do we mark the page as just Category:Men. If we store it as both we have redundancy - it can be deduced from the fact that a page is part of Category:Men that it must be part of Category:Person too, since Person is a superset of Men. But for Wikipedia to reason that a Man is a Person takes computation. The fact that a Man is a Person should never change, so we could make the page as Category:Person automatically when a page is marked Category:Men. We are storing more information that is necessary, but future reasoning becomes faster.
Disjointness must also be considered: Category:Men is disjoint from Category:Women. How can we store this information in the database? We may have to add a new tag to category pages:
Category:Men [[DisjointFrom:Woman]]
(A page cannot be a member of both Men and Women - For this sake of argument please ignore trans genders, transsexuals etc, this is just a blackboard example.)
Also, when someone adds the above tag, the page Category:Woman must automatically be marked as [[DisjointFrom:Men]].
What happens when a wikipedia user edits the Liam Neeson page and adds Category:Women? Should wikipedia search what categories have been marked as disjoint (this is computationally expensive operation), and not add not allow the edit?
Also, with the category system we could save Categories as virtual Categories. For example, consider that a VCategory is a virtual category:
VCategory:Irish_Poets would simply be a re-direct to: Category:Irish (intersection) Category:Poets
What do other people think?
Best Regards,
Marc O'Morain _______________________________________________ Wikipedia-l mailing list Wikipedia-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikipedia-l
Ray Saintonge wrote:
Bryan Derksen wrote:
and on the article for "The Diary of a Young Girl" we could put [[Category:Books by Anne Frank]]
I would question the wisdom and usefulness of a separate category for all the books of any person who only wrote one book.
As would I, but the only alternative that came to mind at the time was to put [[Category:Authors of The Diary of a Young Girl]] on the Anne Frank page which would have been even worse. :)
Matt Kingston wrote:
Sounds to me like it would be a sort of "expert" system. Consider:
The Wikipedia article Anne Frank could have tags at the bottom like
{WasBorn:1929} (IsA:Woman} {Wrote:The_Diary_Of_A_Young_Girl} {BornIn:Germany}
you get the idea. These facts or relationships tie together different articles in a more structured way. Presumably a user interface system could then be written that would take all the data and be able to do things like list off all the female german authors of the twentieth century.
Since facts aren't tied to one language, the "relationship database" could span all languages (since there is already a system for linking articles on the same subject to different languages). A relationship would have a certain ID internally with "translations" to different languages. So after typing in: {BornIn:Germany} in the english version, the computer would find the french equivelent of the relationship -BornIn- and the french equiv of Germany and update the french Anne_Frank article accordingly.
Anyway, I'm no specialist in the area, I do know that there are some internet projects of this sort trying to teach computers "common sense" {Lion IsA Animal} I don't know how much success they've had. It might be usefull, but I'm not so sure it fits into wikipedias mandate.
Just my three cents.
Matt
I'm working on something like this already; it's an experimental Wiki with the ability to embed special links, in this case with the format [[property = value]]. These are then used to maintain entries in a tuple table, which is in turn used to generate an alternate RDF view of the page.. Yes, you can do [[is a = thing]]. Bits of Web GUI are generated based on class deifinitions, and you can also do things like import CSV tables, and I hope export them as well. I'm currently working on [[same as = thing ]] and logical inference, as well as extending this to trans-wiki inference.
Hopefully some of this experience will be useful, when it becomes time to add this to Wikipedia. For now, it won't hurt at all to use templates to encode this sort of information, so some data will be available first, before the knowledge-mining software gets written.
-- Neil
Matt Kingston wrote:
Sounds to me like it would be a sort of "expert" system. Consider:
The Wikipedia article Anne Frank could have tags at the bottom like
{WasBorn:1929} (IsA:Woman} {Wrote:The_Diary_Of_A_Young_Girl} {BornIn:Germany}
you get the idea.
The german wikipedia tagged in a collaborative project approximately 30000 biography entries for the DVD which is currently produced. There are tools which help with this process.
More infos on: http://de.wikipedia.org/wiki/Wikipedia:Personendaten
For an example see http://de.wikipedia.org/wiki/Max_Ernst
The data is automatically hidden by CSS (display can be switched on and off in the stylesheet).
greetings, elian
wikipedia-l@lists.wikimedia.org