Hi all,
I'm new to this list, but have read a lot of the material on the goals and
methods of Wikibooks, Wikiversity, and Wikipedia as well as the last few
months of archives for this group. I was not able to find my concerns
address in that material, so I'm trying out this mailing group. I hope my
comments and questions are not all hat that has long since been addressed.
It seems that the very large problem with Wikibooks, and the cause of its
very slow development (relative to Wikipedia), is that the goal of a book,
or even a textbook, is too vague and represents many things to many people.
On the other hand, an encyclopedia can be fairly narrowly defined both in
its goals and its "methods," which include selection of content and style
of presentation. Generally, for any collaborative project, if the goal can
be agreed on, the "methods" can be discussed and agreed on by participants.
Developing a set of goals is always the first step in the "acquisition" of
any product of creative endeavor, including book publishing and software
development. This is also true for collaborative open-source projects.
Finding a common goal (or set of goals to be solved by the proposed
project) is essential BEFORE any project work can proceed effectively. This
is a MAJOR problem that must be dealt with before collaborative, open
source books can thrive like Wikipedia. I have some ideas for solving this,
but first some more analysis of this problem.
Let me start by illustrating the nature of the problem with the case of
open source development of a college level textbook. Of course this is
quite distinct from K-12 level textbooks as well as non-text non-fiction
books, but this is simply to illustrate the difficulty of reaching a common
goal for any book.
As a university professor, I have taught Biochemistry for 22 years, using
several textbooks from several publishers and authors. These are thick,
expensive ($120) textbooks and there are about 12 or so current versions
that are in print. All of them have problems with accuracy, clarity, topic
order, timeliness, and depth of coverage. These problems make choosing one
of them a nightmare for me, and probably for every instructor of
Biochemistry in the country/world. No instructor enjoys changing texts,
because this means a change in the syllabus, topic order, and lecture
content (new figures, slides, terms, etc), at the very least. This can
double or triple the prep time needed for teaching the first time a new
text is used compared to years when a previously used text is continued.
But changing texts is essential because they become out of date as science
progresses with new facts and new paradigms, as well as changing topic
coverage and discontinuation of a title for various reasons. Also,
textbooks tend to drive the selection of topics taught, since teaching
without a text means more work for both the instructor and the students.
This means that the courses are designed around the texts, rather than vice
versa (the way it should be). Also this means that the slowly changing of
textbooks tend to keep the teaching of a topic behind the current
understanding of the topic. An open source text book (eText or print
format, it doesn't matter) would be the ideal solution to this, as long as
each instructor could tailor the textbook content to the desired selection,
order, and depth of coverage of each topic in the field.
The purposes of/markets for biochemistry textbooks vary quite a bit.
Sometime the material is presented in abbreviated form and shallow depth in
a single semester or quarter (30 to 45 hours of lecture), while other
courses take two quarters or semesters (60 to 90 hours of lecture).
Medical, pharmacy, and nursing schools need a medically oriented Human
biochemistry course, while cellular biologists, microbiologists, chemists,
and food and agricultural science students need a more broadly oriented
course that covers microbes and plants and has less focus on clinical
aspects such as diseases and drugs. Sometimes this topic is taught to
seniors in high school while most get it as seniors in college, yet others
take this class as graduate or professional students. But there are not
enough students in each of these various market subcategories to justify
creating individual texts for each. Thus, the texts available try to be
everything for everyone, but as a result they are universally poor for
everyone.
I see this motivation, to have the ideal, custom tailored textbook that can
be revised every year, as quite distinct from the motivation of states and
schools to save money by preparing their own text books. The latter appears
to be the major motivation for the K-12 crowd. However, the same problem
hits the K-12 groups: There is not one set of standards and/or goals for
all the markets/purposes of a potential book, and people will never, ever
agree on a common purpose or set of goals. This should be obvious: A
introductory book on learning the Spanish language (or any other foreign
language) for English speaking students could be written for grades 9-12,
for college students, or for grades 4 to 8. Each audience requires that the
books would vary dramatically in depth of coverage as well as style of
presentation, and probably even in the order of presentation. Yet all the
different level books would need to contain the same facts and many of the
other instructional material, including reading/translation passages,
exercises, vocabulary, verb and noun ending tables, irregular word tables,
and some cultural and geographic information. This information makes up
over 50% of the book (usually more like 90% of the book), and it is simply
a waste of time to recreate it for each of the three different versions of
the book.
I have a lot of experience in software development, and the holy grail in
that field is reusable software modules. It turns out that this has never
been achieved at a finely grained level, but it certainly has been
successful with any module that can be created to function in a generic
manner and purpose. This is especially true for software modules that are
incorporated into the coding language library, or better yet, into the
operating system or as a stand alone, but generic, application. An
outstanding example of this is the relational database application. No
modern application programs today include code for their own data storage
and retrieval system for anything beyond serially outputted log files or
serially inputted configuration files. All programs today use database
engines to do data access because they can do it far faster and with much
less application coding than if one "rolls their own code".
How is this relevant here? Well, the problem with open source textbooks, as
I see it, is not to do the impossible and seek consensus on the goals for
each book project, but that we need to avoid reinventing the wheel
(creating the facts, paradigms, explanations, figures, etc) with every
book. We need to segregate the reusable information from the non-reusable.
We should put the reusable material in one or two places, perhaps the facts
in Wikipedia, and the learning modules in another. Then for each book, let
one instructor (or a small group of them that share a common need) glue
these reusable pieces together in a custom order with a small amount of
content that is created for each book.
My vision for how this might work is as follows:
1. Facts are added via Wikipedia articles and other resource repositories.
The latter might include dictionaries, various types of network diagrams
(spatial networks = maps, concept networks, metabolic pathway networks,
biomolecular regulatory networks, phylogenic trees, etc) and possibly
separate databases of images, figures, and other data that might require
special viewers (though these are best kept inside Wikipedia and/or
dictionaries by using shims to let their data be included.)
2. People interested in creating teaching/learning modules using the
resources listed in #1, would have a collaborative tool, like the Wikipedia
editing function, for creating such learning modules with only the most
generic aspects required for a module (its title, scope, connections to
other modules including recommended prerequisite modules, wikipedia
readings/links, dictionary terms/vocabulary, network subsection, and
possibly exercises). The idea would be to cover a topic at all relevant
levels of treatment. For my Biochemistry example, that would mean for
medical vs general courses, for secondary, college, and graduate students,
and for 2 semester vs 1 semester courses.
Finding a way to deal with the multiple levels of coverage remains to be
developed. My idea for doing this is to require that each prerequisite,
reading/term, and exercise in a module be categorized as appropriate for
one of three levels: beginner, intermediate, and advanced. These levels
could refer to both the learning style of the intended student and their
prior experience with the topic. Alternatively separate specifiers could be
developed for each, with beginner, intermediate, and advanced referring to
the student learning style and separately to the familiarity with the
prerequisite material.
Given such specification of the learning module components, when a module
is selected for inclusion in a book (or possibly chapter -see below), it
inherits the level(s) chosen for the book and only the module elements
appropriate for that level are incorporated into the book. Optionally, the
one or more lower levels of material coverage depth (at the same learning
style level) also could be incorporated. Alternatively, modules on the same
topic but with different coverage depth levels could be incorporated into
the text (or chapter) in a serial and progressive fashion immediately after
each other, or perhaps interspersed with other topics at other depth
levels. Currently many biochemistry texts have introductory chapters on
more general topics (proteins, nucleic acids, cells, metabolism) not only
before the more detailed chapters, but as the first 3 or 4 chapters in the
text, so that there is an introductory mini-course inside each the
textbooks. Of course, it would make the most sense to specify that all the
modules selected for a book to be at the same level of student learning
"style".
3. A second development tool would be used to create the text book by
selecting the modules and their order and specify their level of treatment
of the topics.
There will probably need to be an intermediate level or organization of
material between a book and modules, something like a chapter, but not
serialized numerically. For any subject, most chapters fall naturally into
a rational order for teaching, so identifying the prerequisites should not
be a problem. One way issues about order and prerequisites could be solved
is by allowing some more basic treatments of a subject precede more
advanced treatments that occur in later chapters. However, the development
tools needed to create chapters would be similar or identical to the tools
needed to create texts. Both use lower levels (chapters or modules) in a
customizable order with selectable level of treatment and would provide a
means to provide custom text and/or figures to glue the sub-components
together. This way, there not only could be different texts made from the
same modules, but even different chapters made from the same modules,
depending on the purpose, emphasis on each topic, topic order, as well as
the learning level/style of the students.
4. This vision will not only need these two new development tools (one for
modules and one for chapters and texts), but also will need a means to
create a "freeze" of the reusable parts used in the text (chapters,
modules, Wikipedia articles, dictionary entries, etc) at a user selectable
time. This is essential for any instructor because it would be impossible
to teach a course where the "Facts," their presentation, or their
arrangement are changed in the middle of teaching a course by some
well-meaning contributor. Some means will be needed for the instructor to
selectively "refresh" an article or other resource during the duration of
the course if it is found to be defective in the freeze.
Unfortunately, the need for a freeze requires an infrastructure requirement
for its deployment at the instittution that "adopts"/creates the book. This
requirement for a web server with middleware and database engine support
for "deployment" of a freeze along with its modules and chapters would be a
substantial disincentive to use this approach. However, the IT support at
most colleges and universities, and a great many K-12 school
districts/authorities are sufficiently advanced to provide this level of
infrastruture. Clearly, the pain of this can be minimized if there can be
some sort of data encoding method that lets difference software engines
(and operating systems) support this need. If open source textbooks become
popular, there would be a tremendous need for such deployment software, and
such need tends to inspire healthy competition among vendors and even
different open source developers.
Probably a better approach would be to develop a third tool that
accomplishes both the freezing and the deployment simultaneously. It would
create a complete post script or pdf document or set of html files with
images to create a web site based on the specifications of the book,
including a snapshot (freeze) of the chapters, modules, and resources such
as Wikipedia articles, etc. While this could be a rather large document or
set or web pages, an alternative would be to create one pdf or ps document
for each chapter, but do it simultaneously so that the reused components
are identical in each chapter. It would be easy to write a program to
create these pdf or ps documents for distribution as is, as well as to roll
them up into something that can be distributed as needed, such as a self
installing web site directory heirarchy. Yet all three formats would retain
their nonlinear hyperlinked basis. Such deployment methods can fairly
easily deal with the copyright notices, terms of use, and authorship
attributions that would be needed if the pdf or ps file is printed or if
the html material is posted to a public web site. Other formats such as MS
Word or open office XML document could be included by using a system of
output filters such as those that have been developed by the open office
project.
Of course, such a development tool for freezing and deploying would have to
have a means for an instructor (or curriculum committee, or district/state
text authority) to do a mid-semester "refresh" or update of a single
resource for any module used to fix problems as they are encountered. This
is a bit like the CVS code-branch problem, and so solutions to this issue
exist.
Thanks to anyone who has the time to read this long document and to offer
feedback on how we can get there from where we are now. I have a certain
amount of influence in my field and my University to make this approach
more popular. With some of these problems solved (if we can agree on a
solution), then we can move on to making this promising approach a reality
in a big time way.
Curt Ashendel
ashendel(a)purdue.edu