Hi all,
I'm new to this list, but have read a lot of the material on the goals and methods of Wikibooks, Wikiversity, and Wikipedia as well as the last few months of archives for this group. I was not able to find my concerns address in that material, so I'm trying out this mailing group. I hope my comments and questions are not all hat that has long since been addressed.
It seems that the very large problem with Wikibooks, and the cause of its very slow development (relative to Wikipedia), is that the goal of a book, or even a textbook, is too vague and represents many things to many people. On the other hand, an encyclopedia can be fairly narrowly defined both in its goals and its "methods," which include selection of content and style of presentation. Generally, for any collaborative project, if the goal can be agreed on, the "methods" can be discussed and agreed on by participants. Developing a set of goals is always the first step in the "acquisition" of any product of creative endeavor, including book publishing and software development. This is also true for collaborative open-source projects.
Finding a common goal (or set of goals to be solved by the proposed project) is essential BEFORE any project work can proceed effectively. This is a MAJOR problem that must be dealt with before collaborative, open source books can thrive like Wikipedia. I have some ideas for solving this, but first some more analysis of this problem.
Let me start by illustrating the nature of the problem with the case of open source development of a college level textbook. Of course this is quite distinct from K-12 level textbooks as well as non-text non-fiction books, but this is simply to illustrate the difficulty of reaching a common goal for any book.
As a university professor, I have taught Biochemistry for 22 years, using several textbooks from several publishers and authors. These are thick, expensive ($120) textbooks and there are about 12 or so current versions that are in print. All of them have problems with accuracy, clarity, topic order, timeliness, and depth of coverage. These problems make choosing one of them a nightmare for me, and probably for every instructor of Biochemistry in the country/world. No instructor enjoys changing texts, because this means a change in the syllabus, topic order, and lecture content (new figures, slides, terms, etc), at the very least. This can double or triple the prep time needed for teaching the first time a new text is used compared to years when a previously used text is continued. But changing texts is essential because they become out of date as science progresses with new facts and new paradigms, as well as changing topic coverage and discontinuation of a title for various reasons. Also, textbooks tend to drive the selection of topics taught, since teaching without a text means more work for both the instructor and the students. This means that the courses are designed around the texts, rather than vice versa (the way it should be). Also this means that the slowly changing of textbooks tend to keep the teaching of a topic behind the current understanding of the topic. An open source text book (eText or print format, it doesn't matter) would be the ideal solution to this, as long as each instructor could tailor the textbook content to the desired selection, order, and depth of coverage of each topic in the field.
The purposes of/markets for biochemistry textbooks vary quite a bit. Sometime the material is presented in abbreviated form and shallow depth in a single semester or quarter (30 to 45 hours of lecture), while other courses take two quarters or semesters (60 to 90 hours of lecture). Medical, pharmacy, and nursing schools need a medically oriented Human biochemistry course, while cellular biologists, microbiologists, chemists, and food and agricultural science students need a more broadly oriented course that covers microbes and plants and has less focus on clinical aspects such as diseases and drugs. Sometimes this topic is taught to seniors in high school while most get it as seniors in college, yet others take this class as graduate or professional students. But there are not enough students in each of these various market subcategories to justify creating individual texts for each. Thus, the texts available try to be everything for everyone, but as a result they are universally poor for everyone.
I see this motivation, to have the ideal, custom tailored textbook that can be revised every year, as quite distinct from the motivation of states and schools to save money by preparing their own text books. The latter appears to be the major motivation for the K-12 crowd. However, the same problem hits the K-12 groups: There is not one set of standards and/or goals for all the markets/purposes of a potential book, and people will never, ever agree on a common purpose or set of goals. This should be obvious: A introductory book on learning the Spanish language (or any other foreign language) for English speaking students could be written for grades 9-12, for college students, or for grades 4 to 8. Each audience requires that the books would vary dramatically in depth of coverage as well as style of presentation, and probably even in the order of presentation. Yet all the different level books would need to contain the same facts and many of the other instructional material, including reading/translation passages, exercises, vocabulary, verb and noun ending tables, irregular word tables, and some cultural and geographic information. This information makes up over 50% of the book (usually more like 90% of the book), and it is simply a waste of time to recreate it for each of the three different versions of the book.
I have a lot of experience in software development, and the holy grail in that field is reusable software modules. It turns out that this has never been achieved at a finely grained level, but it certainly has been successful with any module that can be created to function in a generic manner and purpose. This is especially true for software modules that are incorporated into the coding language library, or better yet, into the operating system or as a stand alone, but generic, application. An outstanding example of this is the relational database application. No modern application programs today include code for their own data storage and retrieval system for anything beyond serially outputted log files or serially inputted configuration files. All programs today use database engines to do data access because they can do it far faster and with much less application coding than if one "rolls their own code".
How is this relevant here? Well, the problem with open source textbooks, as I see it, is not to do the impossible and seek consensus on the goals for each book project, but that we need to avoid reinventing the wheel (creating the facts, paradigms, explanations, figures, etc) with every book. We need to segregate the reusable information from the non-reusable. We should put the reusable material in one or two places, perhaps the facts in Wikipedia, and the learning modules in another. Then for each book, let one instructor (or a small group of them that share a common need) glue these reusable pieces together in a custom order with a small amount of content that is created for each book.
My vision for how this might work is as follows: 1. Facts are added via Wikipedia articles and other resource repositories. The latter might include dictionaries, various types of network diagrams (spatial networks = maps, concept networks, metabolic pathway networks, biomolecular regulatory networks, phylogenic trees, etc) and possibly separate databases of images, figures, and other data that might require special viewers (though these are best kept inside Wikipedia and/or dictionaries by using shims to let their data be included.)
2. People interested in creating teaching/learning modules using the resources listed in #1, would have a collaborative tool, like the Wikipedia editing function, for creating such learning modules with only the most generic aspects required for a module (its title, scope, connections to other modules including recommended prerequisite modules, wikipedia readings/links, dictionary terms/vocabulary, network subsection, and possibly exercises). The idea would be to cover a topic at all relevant levels of treatment. For my Biochemistry example, that would mean for medical vs general courses, for secondary, college, and graduate students, and for 2 semester vs 1 semester courses.
Finding a way to deal with the multiple levels of coverage remains to be developed. My idea for doing this is to require that each prerequisite, reading/term, and exercise in a module be categorized as appropriate for one of three levels: beginner, intermediate, and advanced. These levels could refer to both the learning style of the intended student and their prior experience with the topic. Alternatively separate specifiers could be developed for each, with beginner, intermediate, and advanced referring to the student learning style and separately to the familiarity with the prerequisite material.
Given such specification of the learning module components, when a module is selected for inclusion in a book (or possibly chapter -see below), it inherits the level(s) chosen for the book and only the module elements appropriate for that level are incorporated into the book. Optionally, the one or more lower levels of material coverage depth (at the same learning style level) also could be incorporated. Alternatively, modules on the same topic but with different coverage depth levels could be incorporated into the text (or chapter) in a serial and progressive fashion immediately after each other, or perhaps interspersed with other topics at other depth levels. Currently many biochemistry texts have introductory chapters on more general topics (proteins, nucleic acids, cells, metabolism) not only before the more detailed chapters, but as the first 3 or 4 chapters in the text, so that there is an introductory mini-course inside each the textbooks. Of course, it would make the most sense to specify that all the modules selected for a book to be at the same level of student learning "style".
3. A second development tool would be used to create the text book by selecting the modules and their order and specify their level of treatment of the topics.
There will probably need to be an intermediate level or organization of material between a book and modules, something like a chapter, but not serialized numerically. For any subject, most chapters fall naturally into a rational order for teaching, so identifying the prerequisites should not be a problem. One way issues about order and prerequisites could be solved is by allowing some more basic treatments of a subject precede more advanced treatments that occur in later chapters. However, the development tools needed to create chapters would be similar or identical to the tools needed to create texts. Both use lower levels (chapters or modules) in a customizable order with selectable level of treatment and would provide a means to provide custom text and/or figures to glue the sub-components together. This way, there not only could be different texts made from the same modules, but even different chapters made from the same modules, depending on the purpose, emphasis on each topic, topic order, as well as the learning level/style of the students.
4. This vision will not only need these two new development tools (one for modules and one for chapters and texts), but also will need a means to create a "freeze" of the reusable parts used in the text (chapters, modules, Wikipedia articles, dictionary entries, etc) at a user selectable time. This is essential for any instructor because it would be impossible to teach a course where the "Facts," their presentation, or their arrangement are changed in the middle of teaching a course by some well-meaning contributor. Some means will be needed for the instructor to selectively "refresh" an article or other resource during the duration of the course if it is found to be defective in the freeze.
Unfortunately, the need for a freeze requires an infrastructure requirement for its deployment at the instittution that "adopts"/creates the book. This requirement for a web server with middleware and database engine support for "deployment" of a freeze along with its modules and chapters would be a substantial disincentive to use this approach. However, the IT support at most colleges and universities, and a great many K-12 school districts/authorities are sufficiently advanced to provide this level of infrastruture. Clearly, the pain of this can be minimized if there can be some sort of data encoding method that lets difference software engines (and operating systems) support this need. If open source textbooks become popular, there would be a tremendous need for such deployment software, and such need tends to inspire healthy competition among vendors and even different open source developers.
Probably a better approach would be to develop a third tool that accomplishes both the freezing and the deployment simultaneously. It would create a complete post script or pdf document or set of html files with images to create a web site based on the specifications of the book, including a snapshot (freeze) of the chapters, modules, and resources such as Wikipedia articles, etc. While this could be a rather large document or set or web pages, an alternative would be to create one pdf or ps document for each chapter, but do it simultaneously so that the reused components are identical in each chapter. It would be easy to write a program to create these pdf or ps documents for distribution as is, as well as to roll them up into something that can be distributed as needed, such as a self installing web site directory heirarchy. Yet all three formats would retain their nonlinear hyperlinked basis. Such deployment methods can fairly easily deal with the copyright notices, terms of use, and authorship attributions that would be needed if the pdf or ps file is printed or if the html material is posted to a public web site. Other formats such as MS Word or open office XML document could be included by using a system of output filters such as those that have been developed by the open office project.
Of course, such a development tool for freezing and deploying would have to have a means for an instructor (or curriculum committee, or district/state text authority) to do a mid-semester "refresh" or update of a single resource for any module used to fix problems as they are encountered. This is a bit like the CVS code-branch problem, and so solutions to this issue exist.
Thanks to anyone who has the time to read this long document and to offer feedback on how we can get there from where we are now. I have a certain amount of influence in my field and my University to make this approach more popular. With some of these problems solved (if we can agree on a solution), then we can move on to making this promising approach a reality in a big time way.
Curt Ashendel ashendel@purdue.edu