An idea. I'll try to be short.
Wikipedia has a lot of information, and it is heavily crosslinked. But it's not indexed. I mean an index of people, and index of places and an index of things. And events. And countries. And lakes. And whatever. Each index is a table (in database terms), with a few required fields. You could the add a page (or a part of it) to an index (or more indexes) by specifying theese required fields of an index (probably in the wiki source). The MediaWiki software would create real database tables based on this information.
Using this you could look up things/people that happened, borned, died or whatever on a given day. Or things that happened in Tokyo, or in 1923, and put that on a Google Map. Look at Wikipedia as an intelligent "who's who" (searching not only by name). Or list books or movies that have wiki pages about them. Possibilities are quite broad. Look up pages that are in multiple indexes, "events" and "presidents of the world" for example.
"indexers" would be wikipedians who index things. Make and index, like "countries" or "operating systems" or "mysteries". And then collect things into that index. And specify the attributes (database fields) of that index. There are pages like this, I know, for database systems for example, but you see this is a different level. You could create an index of abbreviations for example...
(I don't have much time to discuss it, but if anyone finds it worth working on, please let me know. Later I might join in. Have a nice day.)
That would certainly be an excellent thing to have, but I see 2 difficulties: - readability: manual annotations in the source code make it much more difficult to read (it's easier to do it once the document is frozen, which never happens in wikis). It's a bit like hyperlinks. - quality: I am afraid that automatic indexing would not be better than a Google search, but if authors have do it manually, it's hard to maintain.
So I am not sure that indexing for web documents is as useful as in books due to the presence of hyperlinks :-)
my 2 cents
Martin
On Wed, 11 Jan 2006, András Kardos wrote:
An idea. I'll try to be short.
Wikipedia has a lot of information, and it is heavily crosslinked. But it's not indexed. I mean an index of people, and index of places and an index of things. And events. And countries. And lakes. And whatever. Each index is a table (in database terms), with a few required fields. You could the add a page (or a part of it) to an index (or more indexes) by specifying theese required fields of an index (probably in the wiki source). The MediaWiki software would create real database tables based on this information.
Using this you could look up things/people that happened, borned, died or whatever on a given day. Or things that happened in Tokyo, or in 1923, and put that on a Google Map. Look at Wikipedia as an intelligent "who's who" (searching not only by name). Or list books or movies that have wiki pages about them. Possibilities are quite broad. Look up pages that are in multiple indexes, "events" and "presidents of the world" for example.
"indexers" would be wikipedians who index things. Make and index, like "countries" or "operating systems" or "mysteries". And then collect things into that index. And specify the attributes (database fields) of that index. There are pages like this, I know, for database systems for example, but you see this is a different level. You could create an index of abbreviations for example...
(I don't have much time to discuss it, but if anyone finds it worth working on, please let me know. Later I might join in. Have a nice day.)
MediaWiki-l mailing list MediaWiki-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
-- Martin Jambon, PhD http://martin.jambon.free.fr
Visit http://wikiomics.org, the Bioinformatics Howto Wiki
I like the idea of such indices very much. It would be uber-cool if you could use SQL syntax to search the content of a wiki.
What I understand is that you want to create a tag/extension/template construction where the editor of the page can enter the fields to go into the table (and presumably select the table it should go to. Something like this (using the template syntax) {{indexthis|cities|name=Vlaardingen|Country=NL|population=70000|.....etc....}}
While this may seem a good idea, the first thougth that sprung to mind was that this makes for a very exploitable structure (for wikispammers). With a "normal" page the spammers are annoying, but simple revert the edit and the information is gone from the wiki (the search anyway). But how does this work with indices: * How can I delete records from the index again (for example I made a page for a city with an index record and later decided to delete the entire page. How does the record get removed from the index automagically). * How can I change records in the index (correcting mistakes/typos...). I cannot forsee a structure where I can uniquely identify individual records in the index/database other then adding another tag/template to specifically change a record which is rahter cumbersome. * How do we avoid creating multiple records in the index table.
What might (repeat *might*) make more sense is that some clever coder (like the guys at google or 'our own' mediawiki coders develop algorithms that yield better search results on wiki pages. (Because of the sometimes amazing number of cross links on a page I imagine that pages in a wiki can be searched/linked in more efficient ways then "just any other" HTML page.
Just my two cents (OK maybe 5).
Hans Voss.
On 1/11/06, András Kardos k.andris@gmail.com wrote:
An idea. I'll try to be short.
Wikipedia has a lot of information, and it is heavily crosslinked. But it's not indexed. I mean an index of people, and index of places and an index of things. And events. And countries. And lakes. And whatever. Each index is a table (in database terms), with a few required fields. You could the add a page (or a part of it) to an index (or more indexes) by specifying theese required fields of an index (probably in the wiki source). The MediaWiki software would create real database tables based on this information.
Using this you could look up things/people that happened, borned, died or whatever on a given day. Or things that happened in Tokyo, or in 1923, and put that on a Google Map. Look at Wikipedia as an intelligent "who's who" (searching not only by name). Or list books or movies that have wiki pages about them. Possibilities are quite broad. Look up pages that are in multiple indexes, "events" and "presidents of the world" for example.
"indexers" would be wikipedians who index things. Make and index, like "countries" or "operating systems" or "mysteries". And then collect things into that index. And specify the attributes (database fields) of that index. There are pages like this, I know, for database systems for example, but you see this is a different level. You could create an index of abbreviations for example...
(I don't have much time to discuss it, but if anyone finds it worth working on, please let me know. Later I might join in. Have a nice day.)
MediaWiki-l mailing list MediaWiki-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
-- ---- Met vriendelijke groeten / With kind regards Hans Voss --------------------------------------- skype: hans.voss google talk enabled I am looking for people to invite to Gmail. I have 100 invitations left.
Hans Voss <hans.voss@...> writes:
While this may seem a good idea, the first thougth that sprung to mind was that this makes for a very exploitable structure (for wikispammers). With a "normal" page the spammers are annoying, but simple revert the edit and the information is gone from the wiki (the search anyway). But how does this work with indices:
Hans understood what I meant.
Some more things: these tables are filled (records added, deleted) just after a page is updated - when it is parsed. If you add a "record" to a page - in the inline syntax - it will be added to the apropriate database table at that time too. If you revert a page to a previous version, or remove a declaration of an "inline" record, then the database record it corresponds to is deleted too - for example if it was a "spam". Maybe database records only kept for current versions of pages, since they can be recreated anytime from the page sources. So database records refer to pages by their name.
And I thought this indexing would be done by hande. BTW, I'm studiing to be a "real" indexer (a librarian) at the moment...
Please do searches for prior code and discussion:
Already implemented: * manually created lists * categories
Discussion: * "semantic wikipedia" * wikidata * geographic data * metadata etc
-- brion vibber (brion @ pobox.com)
On 1/11/06, András Kardos k.andris@gmail.com wrote:
Using this you could look up things/people that happened, borned, died or whatever on a given day. Or things that happened in Tokyo, or in 1923, and put that on a Google Map. Look at Wikipedia as an intelligent "who's who" (searching not only by name). Or list books or movies that have wiki pages about them. Possibilities are quite broad. Look up pages that are in multiple indexes, "events" and "presidents of the world" for example.
There are a couple of hacks and extensions for mediawiki which can automate some of this.
One allows you to create a list from the information for different categories. So you can create a list of all pages in [category x} and {category y} but not {category z}.
Another lets you do similar things, but with backlinks.
I don't think either would ever make it into mediawiki code or into the Wikipedia if they are very cpu-intensive to operate.
I don't like the idea of editors needing to do things like this manually. I like the idea of having mediawiki do it Martin mentioned Google.. they index wikipedia very well, and it's a "free" * way to have good indexing done.
* Not totally free because of advertising on search result pages.
Sy Ali <sy1234@...> writes:
I don't like the idea of editors needing to do things like this manually. I like the idea of having mediawiki do it Martin mentioned Google.. they index wikipedia very well, and it's a "free" * way to have good indexing done.
- Not totally free because of advertising on search result pages.
The whole Wikipedia is done "manually". This indexing is just a way to make this knowledge more accessible. Google is fine for page-like information, but not database like. You can't list people born in France in the time of World War 1, thogh this information is "hidden" in wikipedia.
Some people like to write articles, others (like myself) would do this indexing stuff better. It's a personality thing...
On 1/14/06, András Kardos k.andris@gmail.com wrote:
The whole Wikipedia is done "manually". This indexing is just a way to make this knowledge more accessible. Google is fine for page-like information, but not database like. You can't list people born in France in the time of World War 1, thogh this information is "hidden" in wikipedia.
Hmm.. good point. The "see also" or related topics ideas really do help the searchability of things, but it's nowhere near perfect, nor is it accessible. Plus it doesn't give the "map" or related items the way a good system would.
Some people like to write articles, others (like myself) would do this indexing stuff better. It's a personality thing...
I for one would love to move all the "see also" links to the top where they're more accessible. I hate doing fuzzy searches, finding a related article and needing to scroll through to the bottom to find related articles.
For me.. doing indexing and information management would actually be quite fun. Breadcrumbs, disambiguation pages and see also links have made my own site particularly easy to get around.. even if it's mostly for me. =)
mediawiki-l@lists.wikimedia.org