To: A list for people interested in the use of open source tools and open access in humanities teaching and research <
Hello Humanities!
I've been working on a project called
GITenberg.
The aim is to move Project Gutenberg's books to github.
As you probably know, Project Gutenberg (PG) is an amazing organization that has been digitizing public domain books since the 1970s. They have around 45,000 books.
But PG is hesitant to upgrade their tools, and have limited resources to work on new projects. But there are issues with the current collection. There are some remaining typos and transcription errors. And many books are using old encoding formats (PG predates unicode).
I want to help with that, and along the way, produce something that more developers, OKFN hackers, digital humanists and other groups can readily build upon.
Enter GITenberg.
GITenberg uses git and github to keep track of books. This adds a number of features right out of the gate, including:
+ version control via git
+ public bug tracking (PG uses a private RT instance to track reported issues)
+ public collaboration (pull requests under public review)
PG's metadata is provided in RDF/XML, in a 230mb zip file. While this is a wonderful resource, RDF isn't the easiest format for most developers to pick up and use. In fact, the .zip file has so many top-level folders, it can't be completely unpacked on some filesystems (ext3).
I've created repos and included the book source files (often including images!) for 43,000 of PG's books and put them on github.
There is a lot yet that I hope to do, but I would love to get OKFN's feedback, requests, or assistance!
All the best,
Seth