If you remember back in November we were asked what would be on a technical wishlist for Wikibooks. One of the things we said[2] we wanted was a way to list all books & enumerate them (all *books* as opposed to all pages). Ramac and Pietrodn have been working on an extension[2] which does this.
Darklama and I have taken an initial look at things, and Simetrical found one error that has been corrected. Anyone who is technically-minded is invited to review & test the code. Everyone else is invited to discuss improvements.
The extension is documented at mediawiki.org, but here's a quick overview for the sake of convenience:
*Adds the {{NUMBEROFBOOKS}} variable
*Adds [[Special:AllBooks]], which lists all books in specified namespaces
*List books beginning at some prefix using ?offset=whatever
*The regex used to determine what is a book and what isn't is a system message, so administrators may edit it, though the default should work for most (all?) Wikibooks languages
After this much time with apparently no interest in developing anything for Wikibooks it's exciting to finally see such active development on something needed specifically for this project. Many thanks to Ramac and Pietrodn for their work, and also those who have helped them along the way. Hopefully this is the beginning of a trend, and more tools will be developed to fulfill our wishlist!
-Mike.lifeguard
[1] http://lists.wikimedia.org/pipermail/textbook-l/2007-November/001164.html
CC'ing to Wikitech-l. Some more review:
* The way NUMBEROFBOOKS is calculated (scanning the page namespace) is not acceptable, performance-wise. It needs to work like NUMBEROFPAGES, etc., with a site_stats row or similar. * The allbooks-regex message might not be a great way of doing things: ** Since results will have to be stored in various ways, like the count being stored in site_stats, it's probably not feasible to change the regex used without running a maintenance script. A config option might therefore be better. ** Directly inputting the regex into a query is VERY VERY BAD! It's an immediate SQL injection attack. Also, it will cause the query to break in the presence of things like single quotes. Use Database::addQuotes() here. * Generating AllBooks using a REGEXP query is . . . maybe not ideal, altogether. Usually we have a flag in the table, for instance, page_is_redirect. It might be okay given that you'd "just" be scanning a few times as many rows, in the average case, and only for views of a certain special page. But a flag would be nice. * If you're using the Xml functions, you don't have to use htmlspecialchars explicitly. It will double-escape the variable. * Variable names that are in Italian are kind of funny, but not really in accordance with our coding standards. :) $conta -> $count, $numero -> $number
This is only a quick glance, mind you. I haven't actually tested it and probably won't find the time to do so.
Overall, I'm not sure this is the best way to tackle the problem. Has Wikibooks considered, for instance, having the books' "main pages" be in the main namespace, and the various pages be in a namespace like "Page"? Then the problem is partly solved right away: you can use Special:AllPages set to the main namespace to get a list of books. For NUMBEROFBOOKS, you just need PAGESINNAMESPACE to be enabled, which is fairly feasible at some point if someone's willing to do a little optimization work, since it's a generally-requested feature.
On Thu, Apr 17, 2008 at 5:33 PM, mike.lifeguard mike.lifeguard@gmail.com wrote:
If you remember back in November we were asked what would be on a technical wishlist for Wikibooks. One of the things we said[2] we wanted was a way to list all books & enumerate them (all *books* as opposed to all pages). Ramac and Pietrodn have been working on an extension[2] which does this.
Darklama and I have taken an initial look at things, and Simetrical found one error that has been corrected. Anyone who is technically-minded is invited to review & test the code. Everyone else is invited to discuss improvements.
The extension is documented at mediawiki.org, but here's a quick overview for the sake of convenience:
*Adds the {{NUMBEROFBOOKS}} variable
*Adds [[Special:AllBooks]], which lists all books in specified namespaces
*List books beginning at some prefix using ?offset=whatever
*The regex used to determine what is a book and what isn't is a system message, so administrators may edit it, though the default should work for most (all?) Wikibooks languages
After this much time with apparently no interest in developing anything for Wikibooks it's exciting to finally see such active development on something needed specifically for this project. Many thanks to Ramac and Pietrodn for their work, and also those who have helped them along the way. Hopefully this is the beginning of a trend, and more tools will be developed to fulfill our wishlist!
-Mike.lifeguard
[1] http://lists.wikimedia.org/pipermail/textbook-l/2007-November/001164.html
[2] http://www.mediawiki.org/wiki/Extension:AllBooks
Textbook-l mailing list Textbook-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/textbook-l
First of all, thanks for your replay and your suggestions.
CC'ing to Wikitech-l. Some more review:
- The way NUMBEROFBOOKS is calculated (scanning the page namespace) is
not acceptable, performance-wise. It needs to work like NUMBEROFPAGES, etc., with a site_stats row or similar.
OK, I'll learn how use caching :-D
- The allbooks-regex message might not be a great way of doing things:
** Since results will have to be stored in various ways, like the count being stored in site_stats, it's probably not feasible to change the regex used without running a maintenance script. A config option might therefore be better. ** Directly inputting the regex into a query is VERY VERY BAD! It's an immediate SQL injection attack. Also, it will cause the query to break in the presence of things like single quotes. Use Database::addQuotes() here.
Ok i'm going to correct it
- Generating AllBooks using a REGEXP query is . . . maybe not ideal,
altogether. Usually we have a flag in the table, for instance, page_is_redirect. It might be okay given that you'd "just" be scanning a few times as many rows, in the average case, and only for views of a certain special page. But a flag would be nice.
What do you mean? A "is_book" flag? I saw that exists the a kind of "is supbage" flag but unfortuately not every Wikibooks use the subpage convention :(
- If you're using the Xml functions, you don't have to use
htmlspecialchars explicitly. It will double-escape the variable.
- Variable names that are in Italian are kind of funny, but not really
in accordance with our coding standards. :) $conta -> $count, $numero -> $number
Yes, it was a beta, I'll fix them (italian was for developing :D )
This is only a quick glance, mind you. I haven't actually tested it and probably won't find the time to do so.
Overall, I'm not sure this is the best way to tackle the problem. Has Wikibooks considered, for instance, having the books' "main pages" be in the main namespace, and the various pages be in a namespace like "Page"? Then the problem is partly solved right away: you can use Special:AllPages set to the main namespace to get a list of books. For NUMBEROFBOOKS, you just need PAGESINNAMESPACE to be enabled, which is fairly feasible at some point if someone's willing to do a little optimization work, since it's a generally-requested feature.
I think it is not very intuitive -- the real content should be in the main namespace. The best thing would be that each Wikibooks followed the subpages naming convention (book title/chapter). IT may be a multilingual policy or something like that.
By the way, is there any developer who can help me? I have knowledge of PHP but i'm not a very great mediawiki developer. It would be better having some help by any expert developer, mostly for caching and other advanced stuff :-).
--Ramac
On Thu, Apr 17, 2008 at 5:33 PM, mike.lifeguard mike.lifeguard@gmail.com wrote:
If you remember back in November we were asked what would be on a technical wishlist for Wikibooks. One of the things we said[2] we wanted was a way to list all books & enumerate them (all *books* as opposed to all pages). Ramac and Pietrodn have been working on an extension[2] which does this.
Darklama and I have taken an initial look at things, and Simetrical found one error that has been corrected. Anyone who is technically-minded is invited to review & test the code. Everyone else is invited to discuss improvements.
The extension is documented at mediawiki.org, but here's a quick overview for the sake of convenience:
*Adds the {{NUMBEROFBOOKS}} variable
*Adds [[Special:AllBooks]], which lists all books in specified namespaces
*List books beginning at some prefix using ?offset=whatever
*The regex used to determine what is a book and what isn't is a system message, so administrators may edit it, though the default should work for most (all?) Wikibooks languages
After this much time with apparently no interest in developing anything for Wikibooks it's exciting to finally see such active development on something needed specifically for this project. Many thanks to Ramac and Pietrodn for their work, and also those who have helped them along the way. Hopefully this is the beginning of a trend, and more tools will be developed to fulfill our wishlist!
-Mike.lifeguard
[1] http://lists.wikimedia.org/pipermail/textbook-l/2007-November/001164.html
[2] http://www.mediawiki.org/wiki/Extension:AllBooks
Textbook-l mailing list Textbook-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/textbook-l
Textbook-l mailing list Textbook-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/textbook-l
On Fri, Apr 18, 2008 at 11:54 AM, Raffaele raffaelemac@tiscali.it wrote:
I think it is not very intuitive -- the real content should be in the main namespace.
Not necessarily. Content that is logically different and that you want to be distinguished technically should go in separate namespaces. For instance, mediawiki.org has most of its content in non-main namespaces, such as Manual:, Extension:, and Help:. In fact, enwikibooks already has a Cookbook namespace where a substantial amount of content resides. There's a concept of a "content namespace", so the pages could be made to behave much like the main namespace: counted as articles on Special:Statistics, searched by default, etc. If you preferred, you could have the books in a Book: namespace and the pages in the main namespace.
The point is, don't try writing an extension when reorganization would do the same thing to better effect, using already-existent core software features. I'm not a shell user, but if I were, I would reject any extension to this effect unless it were shown that suitable reorganization couldn't use built-in namespace features instead. The layout of the site should be determined by technical requirements where those have an effect on things. Deciding on how to organize the site without any consideration for how well that works with the software you're using, then trying to patch the software to match your preordained specifications, is not the sensible way to do things.
By the way, is there any developer who can help me? I have knowledge of PHP but i'm not a very great mediawiki developer. It would be better having some help by any expert developer, mostly for caching and other advanced stuff :-).
I'm a MediaWiki developer. I'm not a shell user and can't enable new extensions, but I can provide review and add things to Subversion. In this case I've given my review: I don't think the extension's concept is a good idea and I think it should be scrapped in favor of reorganization, and work on pagesinnamespace, if these features are desired.
The ones who could enable your extension are Brion Vibber or Tim Starling, either of whom may disagree with my opinion on this idea.
On Fri, Apr 18, 2008 at 12:07 PM, Simetrical Simetrical+wikilist@gmail.com wrote:
On Fri, Apr 18, 2008 at 11:54 AM, Raffaele raffaelemac@tiscali.it wrote:
I think it is not very intuitive -- the real content should be in the main namespace.
Not necessarily. Content that is logically different and that you want to be distinguished technically should go in separate namespaces.
Content from a sub-page is not logically different from the content of the book's base page. "Books" are more of a concept then an actually entity, they are a collection of pages. It is up to the author to determine how those pages are organized, and what is the logical relationship between them. Sometimes the base page is a title page, sometimes it is a table of contents, sometimes it is merely a redirect to one of these. Sometimes we have "Book/Chapter/Page", sometimes we have "Book/Page", sometimes we just have a one-page "Book". The basepage has no special significance besides being the most common landing point for new readers, and a prefix to differentiate the pages of one book from all the others.
Having subpages in a different namespace from the book's main page is completely illogical and, from my standpoint, unacceptable. The vast majority of all content pages on our site would need to be moved to a new namespace. Links would need to be updated, including links which are dynamically generated for in-book navigation. Consider the common snippet from a navigation template:
{{#if:{{{next|}}}|[[Bookname/{{{next|}}}|Next Page]]|}}
A bot likely can't edit all these links automatically, it is going to involve a huge waste of manpower resources to perform a task that could be easily performed by a single programmer writing a small extension instead.
Moving subpages to a new namespace would cause us to lose the automatically-generated back-links at the top of the page, which many books still rely on for navigation. Writing an extension or a javascript to replace these backlinks for pages which are written cross-namespace would be no easier then writing this allbooks extension.
Books are already separated from their subpages technically, using slash delimiters. Different books have different prefixes.
Books are single self-contained units, not to be constructed from bits and peices strewn across multiple namespaces.
For instance, mediawiki.org has most of its content in non-main namespaces, such as Manual:, Extension:, and Help:. In fact, enwikibooks already has a Cookbook namespace where a substantial amount of content resides.
The cookbook is a single book. Prior to creating the namespace, most of the pages in the cookbook were already titled using the pseudo-namespace "Cookbook:Subpage", which was deemed problematic. we created the new namespace for it in order to leverage searching and listing tools. If you think that this is the way to go, then every single book on Wikibooks should get it's own namespace. We would also need a form for easily creating new namespaces, since we don't want to have to wait on developers everytime we want to start a new book. We would also need a {{NUMBEROFNAMESPACES}} variable to count the number of books.
What works for one project, like Wikipeda or Mediawiki.org is not going to work for all projects. We want to do this the way that is right for us. If we wanted to do it the wrong way, we would just stay as we are now.
If you preferred, you could have the books in a Book: namespace and the pages in the main namespace.
This doesn't work either, for the same reasons. All content on Wikibooks is a "book", We shouldn't have to write [[Book:Bookname]] because all top-level pages are already books. You're trying to draw an unnatural distinction between a "book" and a "page" which is just not supported in the way we work. The pages in a book belong together.
The point is, don't try writing an extension when reorganization would do the same thing to better effect, using already-existent core software features.
The effect would be much worse, and I would consider either situation you suggested to be intolerable. All Wikibookians know what Wikibooks is being written on the "software that runs Wikipedia", we've been a round peg in the square hole since our project was created. We've had to come up with lots of hack-job solutions because what is best for our project is rarely well-supported in the software. We were asked for a wishlist of things that we would like as a project, not a list of more hack-jobs that we could throw together ourselves at the expense of doing things the right way.
Deciding on how to organize the site without any consideration for how well that works with the software you're using, then trying to patch the software to match your preordained specifications, is not the sensible way to do things.
The way we are organized is logical, and well-supported by the software. Using forward slashes to separate subpages is used on most Wikimedia projects, including occasional usage on Wikipedia. the recently added {{#titleparts:}} parser function supports this convention. The existence of backlinks at the top of the page support this convention. The only thing we don't have is an automatically-generated count and list of base pages. If the answer is, as it usually turns out to be "too bad, nobody cares about Wikibooks", then we'll just do things our own way.
--Andrew Whitworth
On Fri, Apr 18, 2008 at 1:02 PM, Andrew Whitworth wknight8111@gmail.com wrote:
Content from a sub-page is not logically different from the content of the book's base page.
Of course they are. Otherwise you wouldn't want to list and count it separately. If you prefer, the book *page* is logically different from the book's *subpages*, even if their respective contents may not be logically much different.
Having subpages in a different namespace from the book's main page is completely illogical and, from my standpoint, unacceptable. The vast majority of all content pages on our site would need to be moved to a new namespace. Links would need to be updated, including links which are dynamically generated for in-book navigation. Consider the common snippet from a navigation template:
{{#if:{{{next|}}}|[[Bookname/{{{next|}}}|Next Page]]|}}
A bot likely can't edit all these links automatically, it is going to involve a huge waste of manpower resources to perform a task that could be easily performed by a single programmer writing a small extension instead.
Moving subpages to a new namespace would cause us to lose the automatically-generated back-links at the top of the page, which many books still rely on for navigation.
All of this is probably solvable using redirects, except for conducting the moves themselves. If, as the author of this extension thinks, a regular expression is reliable enough to distinguish books from their subpages, a bot could be used to do all the moves and fix any resulting double redirects.
Books are already separated from their subpages technically, using slash delimiters. Different books have different prefixes.
An alternative approach to fix the problem would be to improve support for subpages. This is, from a software point of view, a path of greater resistance, since namespaces are already well-distinguished and subpages are not. It would have to be thought out very carefully. One thing that occurs to me is that theoretically, it doesn't have to be set in stone that different namespaces correspond to different colon-delimited prefixes. Hypothetically one could have "Book" in the main namespace and "Book/subpage" in another namespace, even though they have the same prefix. But this would require a considerable amount of effort to implement.
This doesn't work either, for the same reasons. All content on Wikibooks is a "book", We shouldn't have to write [[Book:Bookname]] because all top-level pages are already books. You're trying to draw an unnatural distinction between a "book" and a "page" which is just not supported in the way we work. The pages in a book belong together.
They would still be kept together, by a naming convention and by templates. I think part of your objection stems from the fact that you're used to a particular way of doing things, while I don't use Wikibooks to any great extent and am not committed to any particular way of doing things. I submit that the way I suggest is not so much worse than the current way, in terms of aesthetics or utility. It's just a convention, that people would get used to if it were adopted.
It's not great: I agree that adding "Book:" or "Page:" is logically redundant and should not really be necessary. But to call the difference between the concept of a book and the concept of its pages "an unnatural distinction", to use arguments like "the pages in a book belong together" (as if a few letters in a name makes them more or less together), is a little much. I was just making a suggestion.
All Wikibookians know what Wikibooks is being written on the "software that runs Wikipedia", we've been a round peg in the square hole since our project was created. We've had to come up with lots of hack-job solutions because what is best for our project is rarely well-supported in the software. We were asked for a wishlist of things that we would like as a project, not a list of more hack-jobs that we could throw together ourselves at the expense of doing things the right way.
Well, look at it this way. The overwhelming majority of websites are using software not designed specifically for them, and have to put up with all sorts of annoying assumptions that don't fit their usage patterns. Only the very largest websites tend to have decent software designed in-house. Wikibooks is, comparatively, in the excellent position that if anyone wanted to become a developer, they could fairly easily become one, code up any features that they wanted, and have them incorporated into the main product without having to maintain hacks.
That most MediaWiki developers (including me) come from Wikipedia probably speaks to the fact that Wikipedia is one of the largest websites in the world, while Wikibooks, Wiktionary, and so on are not. Not as many people seem to be able (or willing?) to step forward from the smaller projects, so they get much less done for them.
If anything, I would encourage anyone who wants to improve MediaWiki's utility for Wikibooks to work on improving the core software, not creating what are (as you put it) hack-job extensions. Features of more general interest and usefulness are more likely to be accepted and maintained by others, even if they are of particular interest to Wikibooks, plus you help out more sites than just Wikibooks. Only if a feature is intrinsically of very narrow interest should it be put in an extension.
I may be primarily interested in the English Wikipedia, by the way, but I've more than once opposed the implementation feature requests from there based on their narrowness and the existence of better and more general solutions. And on the other hand, almost none of the changes I've made are of use only to Wikipedia, as this proposal would be of use only to Wikibooks. This is not a question of Wikibooks vs. Wikipedia, it's a question of coding practice in general.
The best question to ask from a development standpoint is not "How can I get this narrow feature implemented for Wikibooks?", but rather "What flaws in the software does this problem we're having exhibit, and how can I fix it in the simplest and most generally useful way possible?" When I asked the latter question, the answer I saw is this: this issue seems to indicate that the namespace system is either not being used properly, or is inadequate for the purposes at question. I suggested it be used to better effect. Failing that, if the presence of a few letters before the page name is really a killer issue, the clean and correct (if difficult) course is to improve the namespace system, since it's a feature that's already meant to serve the basic purpose desired here. Creating a new system orthogonal to very similar existing functionality is not the right way to go from the perspective of maintaining generally-useful software.
But again, this is just my opinion. I didn't mean to cause a fuss. I was asked to review the extension and I gave my honest opinion on it.
On Fri, Apr 18, 2008 at 1:43 PM, Simetrical Simetrical+wikilist@gmail.com wrote:
On Fri, Apr 18, 2008 at 1:02 PM, Andrew Whitworth wknight8111@gmail.com wrote:
Content from a sub-page is not logically different from the content of the book's base page.
Of course they are. Otherwise you wouldn't want to list and count it separately. If you prefer, the book *page* is logically different from the book's *subpages*, even if their respective contents may not be logically much different.
We're not interested in counting the individual pages (although I suspect that's one of the easiest ways to go about this), we're interested in counting groups of pages with the same prefix name. Logically, it doesn't matter whether a particular page is named "Bookname" or "Bookname/Subpagename". Consider a book with slashes in the title, like "P/L programming". We wouldn't have a page named "P" at all to be counted, but "P/" still serves as a unique prefix for all pages in that book. Or the book we had about Oracle, which until just recently didn't have a base page at all, there were [[Oracle/Contents]] and [[Oracle/Cheatsheet]], but no [[Oracle]].
What we are after here is a simple count, a single integer value that is probably in the neighborhood of 3000-4000 for en.wikibooks, and is significantly lower for most other wikibooks or wikiversity projects (since wikiversity is likely to want a feature like this too). Massive across-the-board reorganization of the entire project is simply not an acceptable solution for this problem.
We would also like a listing of books, although most of our books are already well-categorized, and we can obtain mostly-accurate lists from these categories (although it would be nice to know which "books" are not properly categorized, that's a small issue).
Consider that a solution from our point of view is to create two bots: 1) Bot 1 makes repeated calls to api?action=query?list=allpages to download the complete wikibooks page list. 2) Bot 2 applies a simple regex to that page list, counts the number of unique prefixes. Posts an edit to the page [[Template:NUMBEROFBOOKS]] with that number, and posts an edit to [[Wikibooks:List of all books]] with the complete list.
I doubt this would be an ideal solution, but it would give us exactly what we want.
An alternative approach to fix the problem would be to improve support for subpages. This is, from a software point of view, a path of greater resistance, since namespaces are already well-distinguished and subpages are not. It would have to be thought out very carefully. One thing that occurs to me is that theoretically, it doesn't have to be set in stone that different namespaces correspond to different colon-delimited prefixes. Hypothetically one could have "Book" in the main namespace and "Book/subpage" in another namespace, even though they have the same prefix. But this would require a considerable amount of effort to implement.
Again, measure the amount of implementation effort against the desired goal. All we want is a single integer value. A massive programming effort is no more desired on our end then a massive reorganization of our books.
Wikibooks is, comparatively, in the excellent position that if anyone wanted to become a developer, they could fairly easily become one, code up any features that they wanted, and have them incorporated into the main product without having to maintain hacks.
This is exactly what we are trying to do. Ramac is trying to code this feature up as an extension and have it added for Wikibooks.
Only if a feature is intrinsically of very narrow interest should it be put in an extension.
I would venture to guess that this feature is of particularly narrow interest. I can only imagine that, based on the way I know other projects to handle naming, only Wikibooks and possibly Wikiversity would use it. Hence, the extension.
--Andrew Whitworth
On Fri, Apr 18, 2008 at 2:34 PM, Andrew Whitworth wknight8111@gmail.com wrote:
Consider that a solution from our point of view is to create two bots:
- Bot 1 makes repeated calls to api?action=query?list=allpages to
download the complete wikibooks page list. 2) Bot 2 applies a simple regex to that page list, counts the number of unique prefixes. Posts an edit to the page [[Template:NUMBEROFBOOKS]] with that number, and posts an edit to [[Wikibooks:List of all books]] with the complete list.
I doubt this would be an ideal solution, but it would give us exactly what we want.
In all honesty, there *is* a place for bots on wikis, and features that aren't so easy to implement cleanly in software are a good place for them. In many ways bots are more flexible and easier to handle. In this case, I suggest getting someone with a toolserver account to run the regex on the page table instead of using the API to retrieve the list in bits and pieces. (The current number not matching Ramac's regex is 6,085.)
I would venture to guess that this feature is of particularly narrow interest. I can only imagine that, based on the way I know other projects to handle naming, only Wikibooks and possibly Wikiversity would use it. Hence, the extension.
I had a discussion with some people (Mike, wknight8111, darkcode, Ramac) about this on #wikibooks. I still think these couple of features, and a few others, fit most naturally in with the concept of namespaces, and that it would make the most sense to reorganize the site (but I doubt that will happen, due to inertia). Some other features would be better suited to implementation as better subpage handling. For instance, the ability to move all subpages together with the base page is a perfectly logical thing to have in the core software, and in fact I just added it in r33565.
I think few of the features Wikibooks wants are really narrow enough, intrinsically, to belong in extensions. Allowing automatic TOCs, navigation, etc. for particular namespaces was the only one I saw that really doesn't fit with any more generic feature that is or should be present, and that would make a good extension.
At any rate, the decision as to whether these features would make a good extension (and if so, what about it needs to be improved) is up to Brion, who I've pointed to this discussion.
textbook-l@lists.wikimedia.org