Very general discussion of goals and methods (long post) - Textbook-l

22 Dec 2004


      Hi all,
I'm new to this list, but have read a lot of the material on the goals and 
methods of Wikibooks, Wikiversity, and Wikipedia as well as the last few 
months of archives for this group. I was not able to find my concerns 
address in that material, so I'm trying out this mailing group. I hope my 
comments and questions are not all hat that has long since been addressed.
It seems that the very large problem with Wikibooks, and the cause of its 
very slow development (relative to Wikipedia), is that the goal of a book, 
or even a textbook, is too vague and represents many things to many people. 
On the other hand, an encyclopedia can be fairly narrowly defined both in 
its goals and its "methods," which include selection of content and style 
of presentation. Generally, for any collaborative project, if the goal can 
be agreed on, the "methods" can be discussed and agreed on by participants. 
Developing a set of goals is always the first step in the "acquisition" of 
any product of creative endeavor, including book publishing and software 
development. This is also true for collaborative open-source projects.
Finding a common goal (or set of goals to be solved by the proposed 
project) is essential BEFORE any project work can proceed effectively. This 
is a MAJOR problem that must be dealt with before collaborative, open 
source books can thrive like Wikipedia. I have some ideas for solving this, 
but first some more analysis of this problem.
Let me start by illustrating the nature of the problem with the case of 
open source development of a college level textbook. Of course this is 
quite distinct from K-12 level textbooks as well as non-text non-fiction 
books, but this is simply to illustrate the difficulty of reaching a common 
goal for any book.
As a university professor, I have taught Biochemistry for 22 years, using 
several textbooks from several publishers and authors. These are thick, 
expensive ($120) textbooks and there are about 12 or so current versions 
that are in print. All of them have problems with accuracy, clarity, topic 
order, timeliness, and depth of coverage. These problems make choosing one 
of them a nightmare for me, and probably for every instructor of 
Biochemistry in the country/world. No instructor enjoys changing texts, 
because this means a change in the syllabus, topic order, and lecture 
content (new figures, slides, terms, etc), at the very least. This can 
double or triple the prep time needed for teaching the first time a new 
text is used compared to years when a previously used text is continued. 
But changing texts is essential because they become out of date as science 
progresses with new facts and new paradigms, as well as changing topic 
coverage and discontinuation of a title for various reasons. Also, 
textbooks tend to drive the selection of topics taught, since teaching 
without a text means more work for both the instructor and the students. 
This means that the courses are designed around the texts, rather than vice 
versa (the way it should be). Also this means that the slowly changing of 
textbooks tend to keep the teaching of a topic behind the current 
understanding of the topic. An open source text book (eText or print 
format, it doesn't matter) would be the ideal solution to this, as long as 
each instructor could tailor the textbook content to the desired selection, 
order, and depth of coverage of each topic in the field.
The purposes of/markets for biochemistry textbooks vary quite a bit. 
Sometime the material is presented in abbreviated form and shallow depth in 
a single semester or quarter (30 to 45 hours of lecture), while other 
courses take two quarters or semesters (60 to 90 hours of lecture). 
Medical, pharmacy, and nursing schools need a medically oriented Human 
biochemistry course, while cellular biologists, microbiologists, chemists, 
and food and agricultural science students need a more broadly oriented 
course that covers microbes and plants and has less focus on clinical 
aspects such as diseases and drugs. Sometimes this topic is taught to 
seniors in high school while most get it as seniors in college, yet others 
take this class as graduate or professional students. But there are not 
enough students in each of these various market subcategories to justify 
creating individual texts for each. Thus, the texts available try to be 
everything for everyone, but as a result they are universally poor for 
everyone.
I see this motivation, to have the ideal, custom tailored textbook that can 
be revised every year, as quite distinct from the motivation of states and 
schools to save money by preparing their own text books. The latter appears 
to be the major motivation for the K-12 crowd. However, the same problem 
hits the K-12 groups: There is not one set of standards and/or goals for 
all the markets/purposes of a potential book, and people will never, ever 
agree on a common purpose or set of goals. This should be obvious: A 
introductory book on learning the Spanish language (or any other foreign 
language) for English speaking students could be written for grades 9-12, 
for college students, or for grades 4 to 8. Each audience requires that the 
books would vary dramatically in depth of coverage as well as style of 
presentation, and probably even in the order of presentation. Yet all the 
different level books would need to contain the same facts and many of the 
other instructional material, including reading/translation passages, 
exercises, vocabulary, verb and noun ending tables, irregular word tables, 
and some cultural and geographic information. This information makes up 
over 50% of the book (usually more like 90% of the book), and it is simply 
a waste of time to recreate it for each of the three different versions of 
the book.
I have a lot of experience in software development, and the holy grail in 
that field is reusable software modules. It turns out that this has never 
been achieved at a finely grained level, but it certainly has been 
successful with any module that can be created to function in a generic 
manner and purpose. This is especially true for software modules that are 
incorporated into the coding language library, or better yet, into the 
operating system or as a stand alone, but generic, application. An 
outstanding example of this is the relational database application. No 
modern application programs today include code for their own data storage 
and retrieval system for anything beyond serially outputted log files or 
serially inputted configuration files. All programs today use database 
engines to do data access because they can do it far faster and with much 
less application coding than if one "rolls their own code".
How is this relevant here? Well, the problem with open source textbooks, as 
I see it, is not to do the impossible and seek consensus on the goals for 
each book project, but that we need to avoid reinventing the wheel 
(creating the facts, paradigms, explanations, figures, etc) with every 
book. We need to segregate the reusable information from the non-reusable. 
We should put the reusable material in one or two places, perhaps the facts 
in Wikipedia, and the learning modules in another. Then for each book, let 
one instructor (or a small group of them that share a common need) glue 
these reusable pieces together in a custom order with a small amount of 
content that is created for each book.
My vision for how this might work is as follows:
1. Facts are added via Wikipedia articles and other resource repositories. 
The latter might include dictionaries, various types of network diagrams 
(spatial networks = maps, concept networks, metabolic pathway networks, 
biomolecular regulatory networks, phylogenic trees, etc) and possibly 
separate databases of images, figures, and other data that might require 
special viewers (though these are best kept inside Wikipedia and/or 
dictionaries by using shims to let their data be included.)
2. People interested in creating teaching/learning modules using the 
resources listed in #1, would have a collaborative tool, like the Wikipedia 
editing function, for creating such learning modules with only the most 
generic aspects required for a module (its title, scope, connections to 
other modules including recommended prerequisite modules, wikipedia 
readings/links, dictionary terms/vocabulary, network subsection, and 
possibly exercises). The idea would be to cover a topic at all relevant 
levels of treatment. For my Biochemistry example, that would mean for 
medical vs general courses, for secondary, college, and graduate students, 
and for 2 semester vs 1 semester courses.
Finding a way to deal with the multiple levels of coverage remains to be 
developed. My idea for doing this is to require that each prerequisite, 
reading/term, and exercise in a module be categorized as appropriate for 
one of three levels: beginner, intermediate, and advanced. These levels 
could refer to both the learning style of the intended student and their 
prior experience with the topic. Alternatively separate specifiers could be 
developed for each, with beginner, intermediate, and advanced referring to 
the student learning style and separately to the familiarity with the 
prerequisite material.
Given such specification of the learning module components, when a module 
is selected for inclusion in a book (or possibly chapter -see below), it 
inherits the level(s) chosen for the book and only the module elements 
appropriate for that level are incorporated into the book. Optionally, the 
one or more lower levels of material coverage depth (at the same learning 
style level) also could be incorporated. Alternatively, modules on the same 
topic but with different coverage depth levels could be incorporated into 
the text (or chapter) in a serial and progressive fashion immediately after 
each other, or perhaps interspersed with other topics at other depth 
levels. Currently many biochemistry texts have introductory chapters on 
more general topics (proteins, nucleic acids, cells, metabolism) not only 
before the more detailed chapters, but as the first 3 or 4 chapters in the 
text, so that there is an introductory mini-course inside each the 
textbooks. Of course, it would make the most sense to specify that all the 
modules selected for a book to be at the same level of student learning 
"style".
3. A second development tool would be used to create the text book by 
selecting the modules and their order and specify their level of treatment 
of the topics.
There will probably need to be an intermediate level or organization of 
material  between a book and modules, something like a chapter, but not 
serialized numerically. For any subject, most chapters fall naturally into 
a rational order for teaching, so identifying the prerequisites should not 
be a problem. One way issues about order and prerequisites could be solved 
is by allowing some more basic treatments of a subject precede more 
advanced treatments that occur in later chapters. However, the development 
tools needed to create chapters would be similar or identical to the tools 
needed to create texts. Both use lower levels (chapters or modules) in a 
customizable order with selectable level of treatment and would provide a 
means to provide custom text and/or figures to glue the sub-components 
together. This way, there not only could be different texts made from the 
same modules, but even different chapters made from the same modules, 
depending on the purpose, emphasis on each topic, topic order, as well as 
the learning level/style of the students.
4. This vision will not only need these two new development tools (one for 
modules and one for chapters and texts), but also will need a means to 
create a "freeze" of the reusable parts used in the text (chapters, 
modules, Wikipedia articles, dictionary entries, etc) at a user selectable 
time. This is essential for any instructor because it would be impossible 
to teach a course where the "Facts," their presentation, or their 
arrangement are changed in the middle of teaching a course by some 
well-meaning contributor. Some means will be needed for the instructor to 
selectively "refresh" an article or other resource during the duration of 
the course if it is found to be defective in the freeze.
Unfortunately, the need for a freeze requires an infrastructure requirement 
for its deployment at the instittution that "adopts"/creates the book. This 
requirement for a web server with middleware and database engine support 
for "deployment" of a freeze along with its modules and chapters would be a 
substantial disincentive to use this approach. However, the IT support at 
most colleges and universities, and a great many K-12 school 
districts/authorities are sufficiently advanced to provide this level of 
infrastruture. Clearly, the pain of this can be minimized if there can be 
some sort of data encoding method that lets difference software engines 
(and operating systems) support this need. If open source textbooks become 
popular, there would be a tremendous need for such deployment software, and 
such need tends to inspire healthy competition among vendors and even 
different open source developers.
Probably a better approach would be to develop a third tool that 
accomplishes  both the freezing and the deployment simultaneously. It would 
create a complete post script or pdf document or set of html files with 
images to create a web site based on the specifications of the book, 
including a snapshot (freeze) of the chapters, modules, and resources such 
as Wikipedia articles, etc. While this could be a rather large document or 
set or web pages, an alternative would be to create one pdf or ps document 
for each chapter, but do it simultaneously so that the reused components 
are identical in each chapter. It would be easy to write a program to 
create these pdf or ps documents for distribution as is, as well as to roll 
them up into something that can be distributed as needed, such as a self 
installing web site directory heirarchy. Yet all three formats would retain 
their  nonlinear hyperlinked basis. Such deployment methods can fairly 
easily deal with the copyright notices, terms of use, and authorship 
attributions that would be needed if the pdf or ps file is printed or if 
the html material is posted to a public web site. Other formats such as MS 
Word or open office XML document could be included by using a system of 
output filters such as those that have been developed by the open office 
project.
Of course, such a development tool for freezing and deploying would have to 
have a means for an instructor (or curriculum committee, or district/state 
text authority) to do a mid-semester "refresh" or update of a single 
resource  for any module used to fix problems as they are encountered. This 
is a bit like the CVS code-branch problem, and so solutions to this issue 
exist.
Thanks to anyone who has the time to read this long document and to offer 
feedback on how we can get there from where we are now. I have a certain 
amount of influence in my field and my University to make this approach 
more popular. With some of these problems solved (if we can agree on a 
solution), then we can move on to making this promising approach a reality 
in a big time way.
Curt Ashendel
ashendel@purdue.edu