Is a separate integer namespace property actually necessary? I got the impresion that namespaces were designated by a prefix in the article title, such as Special:, User:, or Wiki:
In that case, one could find all the articles in the Wiki: namespace by doing this query:
SELECT article_id FROM cur WHERE title LIKE "Wiki:%" ;
And if we had some indexes defined, that search would actually be extremely quick to do.
Any reasons why we need to keep a separate column for "namespace"? If we want to keep namespaces in their own column, maybe we should have a table of namespace id's and their corresponding prefix, and then NOT put that prefix as part of the articles actual title.
Jonathan
Jonathan Walther wrote:
Is a separate integer namespace property actually necessary? I got the impresion that namespaces were designated by a prefix in the article title, such as Special:, User:, or Wiki:
In that case, one could find all the articles in the Wiki: namespace by doing this query:
SELECT article_id FROM cur WHERE title LIKE "Wiki:%" ;
That's how I did it in Phase II. Did work, but this works better. And, namespaces are currently *not* part of the title in the database!
Magnus
On Tue, Nov 19, 2002 at 09:55:35AM +0100, Magnus Manske wrote:
In that case, one could find all the articles in the Wiki: namespace by doing this query:
SELECT article_id FROM cur WHERE title LIKE "Wiki:%" ;
That's how I did it in Phase II. Did work, but this works better. And, namespaces are currently *not* part of the title in the database!
Ah, ok. In Phase II was an index defined on the article_title column? Postgres has b-tree, r-tree, hash, and various other kinds of indexes that make even string searching blazing fast.
In the sample database Brion posted, I must have overlooked the table that mapped between namespace Id's, and the name of the namespace.
Namespaces are convenient conceptually, but are they really necessary?
Jonathan
Jonathan Walther wrote:
Postgres has b-tree, r-tree, hash, and various other kinds of indexes that make even string searching blazing fast.
OK, we can change that, *once* we've switched to Postgres...
In the sample database Brion posted, I must have overlooked the table that mapped between namespace Id's, and the name of the namespace.
It's no table, it is in Language.php...
Namespaces are convenient conceptually, but are they really necessary?
Yes! How else to tell apart [[Vulture]] and [[user:Vulture]]? And where to talk? etc.
Magnus
On Tue, Nov 19, 2002 at 10:13:35AM +0100, Magnus Manske wrote:
Postgres has b-tree, r-tree, hash, and various other kinds of indexes that make even string searching blazing fast.
OK, we can change that, *once* we've switched to Postgres...
I'm doing preparatory work for a mod_wiki written in C. It may never be suitable for the Wikipedia proper, but I have found the Wikipedia version of Wiki to be the most useful as far as community building goes, so I want it to at least be capable of running the Wikipedia.
Once it's running I'll experiment with the current database, and then solicit feedback. Don't expect it before Christmas.
It's no table, it is in Language.php...
Ah hah. Magic numbers embedded in the code. Not nice :-(
Namespaces are convenient conceptually, but are they really necessary?
Yes! How else to tell apart [[Vulture]] and [[user:Vulture]]? And where to talk? etc.
So we have the following namespaces: User, User_talk, Global (or Root), Talk, and Special? Is Wikipedia actually a separate namespace, or is it just a prefix to some articles?
Jonathan
So we have the following namespaces: User, User_talk, Global (or Root), Talk, and Special? Is Wikipedia actually a separate namespace, or is it just a prefix to some articles?
Yes, Wikipedia is a namespace, as are Wikipedia talk, Image and Image talk. I'm not 100% sure about Special, though - there are no 'real' pages in there, just query results.
Andre Engels
On Tue, Nov 19, 2002 at 10:58:01AM +0100, Andre Engels wrote:
So we have the following namespaces: User, User_talk, Global (or Root), Talk, and Special? Is Wikipedia actually a separate namespace, or is it just a prefix to some articles?
Yes, Wikipedia is a namespace, as are Wikipedia talk, Image and Image talk. I'm not 100% sure about Special, though - there are no 'real' pages in there, just query results.
What is the purpose of namespaces? Why not have the following tables:
current_articles current_binaries current_discussions
previous_articles previous_binaries previous_discussions
Knowing whether something is an article, binary, or a discussion seems to cover what we need.
What other uses were envisioned for namespaces that can't be accomplished with a "properties" field for each article, where for each possible value of the properties field, a host of things are defined in another table, like who exactly can edit the page, who exactly can look at it, and who exactly can even see that it exists by the existance of links pointing to it in other articles?
Jonathan
On Tue, Nov 19, 2002 at 02:09:37AM -0800, Jonathan Walther wrote:
On Tue, Nov 19, 2002 at 10:58:01AM +0100, Andre Engels wrote:
So we have the following namespaces: User, User_talk, Global (or Root), Talk, and Special? Is Wikipedia actually a separate namespace, or is it just a prefix to some articles?
Yes, Wikipedia is a namespace, as are Wikipedia talk, Image and Image talk. I'm not 100% sure about Special, though - there are no 'real' pages in there, just query results.
What is the purpose of namespaces? Why not have the following tables:
current_articles current_binaries current_discussions
previous_articles previous_binaries previous_discussions
Knowing whether something is an article, binary, or a discussion seems to cover what we need.
You would have to double a lot of effort: You need separate SQL-statements for things like update_article, update_discussion, update_wikipedia, .... Using a namespace you just have one additional attribute to keep track of and your code stays simple and readable.
JeLuF
On Tue, Nov 19, 2002 at 11:26:09AM +0100, Jens Frank wrote:
current_articles current_binaries current_discussions
previous_articles previous_binaries previous_discussions
Knowing whether something is an article, binary, or a discussion seems to cover what we need.
You would have to double a lot of effort: You need separate SQL-statements for things like update_article, update_discussion, update_wikipedia, .... Using a namespace you just have one additional attribute to keep track of and your code stays simple and readable.
Au contraire. I believe the code would become more simple and readable. Notice I never suggested "a table per namespace". I suggested only the 6 tables that covered every circumstance I could think of.
And there are already a lot of queries; having less queries and making of them refer to a slightly different set of tables doesn't strike me as in any way complicating things. The complication comes when you shoehorn different things into the same table.
My question remains, do we need namespaces for any other reason than to know whether the row represents an article, binary, or discussion?
Jonathan
On Die, 2002-11-19 at 11:24, Jonathan Walther wrote:
My question remains, do we need namespaces for any other reason than to know whether the row represents an article, binary, or discussion?
As has already been noted, namespaces are also used for User pages and Wikipedia-specific pages. I don't think there's much sense in splitting stuff up into separate tables, or adding these types of pages as properties (how is that cleaner than what we have now?). As for storing binaries in tables, this is not desirable with our current config (table is already too big), and probably not desirable with PostgreSQL (we only have your word on performance, and having them as files has various advantages).
Having the names of the namespaces in the code may not look very clean, but it actually is, because we need them in various translations ("Talk" is called "Diskussion" in the German WP etc.), and it's better to have all localization stuff in one place (Language*.php). If this is properly documented, I don't see a problem.
Can we stick to the "if it's not broken, don't fix it" rule? If you want to work on mod_wiki, I think it would be better to do it on a separate mailing list (I'd actually be interested in that). If you want to help with the PostgreSQL port of Wikipedia, we should focus on the table structure, fulltext search, MySQLisms etc. It would probably be the most efficient for one PostgreSQL expert to just convert everything that's necessary and then put the code+db dump somewhere, then we can work on a conversion script.
Other than that, we have many urgent issues, like fixing the current IP blocking mechanism and giving sysops better tools to restore vandalized pages.
Regards,
Erik
Jonathan Walther wrote:
My question remains, do we need namespaces for any other reason than to know whether the row represents an article, binary, or discussion?
Distinguishing between * article ("blank:") * "talk": * "user:" * "user talk:" * "wikipedia:" * "wikipedia talk:" * "image:" * "image talk:" * "special:" (not stored in the database, generated on-the-fly)
You seem to be quite eager to remodel the database structure. I am not really qualified to tell wether your proposed changes will improve the whole thing or not. But please, keep in mind the countless changes to the software one (you?) would have to make for that.
IMHO we could live with a good-but-imperfect database until we have explored other ways of optimizing the db/software. One of them would be switching to Postgres on a test server.
Magnus
On Tue, Nov 19, 2002 at 12:56:18PM +0100, Magnus Manske wrote:
- article ("blank:")
- "talk":
- "user:"
- "user talk:"
- "wikipedia:"
- "wikipedia talk:"
- "image:"
- "image talk:"
- "special:" (not stored in the database, generated on-the-fly)
Thanks for the enumeration, Magnus. It is very helpful. Could you tell me, how are Wikipedia namespace articles different from regular articles? Are they special in that only sysops can edit them? Are they special in that what is in them results in changes being made to the database and the behavior of the Wikipedia?
IMHO we could live with a good-but-imperfect database until we have explored other ways of optimizing the db/software. One of them would be switching to Postgres on a test server.
That is what I'm exploring here.
Jonathan
Jonathan Walther wrote:
Could you tell me, how are Wikipedia namespace articles different from regular articles? Are they special in that only sysops can edit them? Are they special in that what is in them results in changes being made to the database and the behavior of the Wikipedia?
They are special in that they are one of several categories of "stuff that isn't articles". When we want to count or show *encyclopedia articles*, it's very convenient to strip out such things with "WHERE cur_namespace=0" rather than a whole series of string comparisons.
And, of course, there's the issue of keeping talk page namespaces assigned in a consistent way. If we treated Wikipedia: pages as regular articles, we'd get Talk:Wikipedia:Foobar in Engish, and similar monstrosities in other languages.
Also note that image: pages are _descriptions_ of uploaded files (not necessarily images; it's a misnomer and I would very much prefer to change the namespace to file: or somesuch, particularly since image: is overloaded in links to show the image inline).
-- brion vibber (brion @ pobox.com)
wikitech-l@lists.wikimedia.org