This is an ooooooold issue: Wikisource does not have proper metadata.

I'm one of the people that Nemo Was mentioning: I'm a digital librarian working for MLOL, and in the past 2 years, among other dozens of digital libraries, I've managed to import Wikisource metadata on MLOL in 3 languages (English, French, Italian):

It has been a pain in the ass because I actually had to web scrape the websites, from lists derived from categories. There is no easy way to have metadata about Wikisource books, at the moment.

There are prototypes like WS-search, from Sam Wilson:

And I'm sure that Tpt has some scripts to parse Wikisource (also, there was OPDS).

The Wikisource community tried somewhat to solve the issue via Wikidata, but discovered another black hole: modelling books on Wikidata it's very tricky, and even after 2 Wikicite conferences (with Tpt and other Wikisource people) I must confess I'm still confused...
For personal reasons I've not worked on the matter in the last months, hopefully, will do in the near future.

I remember Vigneron was braver than me and recently tried to revive the discussion:

IF (and it's a big IF) we find a workable solution with Wikidata (theoretical and practical) and
IF (and it's a big IF) we find some skilled Wikidata people to help us with customised queries and bots to help with the transition (we need to import Wikisource data into Wikidata, and we need to clear existing Wikidata items following the standard model, and we need to maintain them in the future),
then it's doable and we have solved the metadata problem.

Unfortunately, I spent most of my adult and professional life complaining about this (also, I really tried),
and nothing really changed... ;-)


On Thu, Oct 12, 2017 at 3:05 PM, Sam Wilson <> wrote:
Sorry, Sam, the other thing I meant to say was: that's a brilliant idea! :-) I'd love to help make it happen, if I can be of any use. :)

On Thu, 12 Oct 2017, at 08:51 PM, Sam Wilson wrote:
It's slightly tricky at the moment to extract data for validated works, for one because we don't have solid data linking Index pages to their corresponding main namespace (i.e. "work") pages. The Index pages have the status, but the mainspace pages are what we think of as the work. There's P1957 now, which is the connextion we need, but the data for that isn't complete.

We've also got (incomplete) support for OPDS in the wsexport tool, which I think is probably a brilliant way forward for sharing the Wikisource catalogue with other systems. Once we have better structural support in Wikisource itself (e.g. structured data for querying validation status) then we'll much more efficiently and easily be able to produce all sorts of output for sharing.

On Thu, 12 Oct 2017, at 08:37 PM, Sam Walton wrote:
Hi all,

I work on the Wikipedia Library program, and wanted to jump in with a passing thought I'd had about Wikisource and TWL. We'll be building search and discovery tools into the library card platform ( that's currently under ongoing development. They'll index all the usually-paywalled resources we have access to, but also open access content. As part of that process it's a desire of mine to index completed Wikisource works, though I haven't given it much thought beyond 'that would be nice'. This might be able to function as a kind of centralised search for all completed Wikisource works, if implemented.

If you're interested, the relevant Phab task is, where your thoughts are very welcome. It won't be worked on for a while and I can't guarantee that it will definitely happen, but if it's something the Wikisource community would benefit from, then that would absolutely increase the likelihood we'll work on it.


On 12 October 2017 at 13:07, Federico Leva (Nemo) <> wrote:
Gerard Meijssen, 12/10/2017 15:04:
Given the discussion about finished books on the Korean Wikisource, I this demonstrates that we really need to advertise the finished books to a reading public.

In Italy, after many years of talk with local libraries, the Wikisource books are included in the catalogs of many libraries (also via a local ebook provider, MLOL, who hired some wikimedians to work on the "open collection", big kudos to them).


Wikisource-l mailing list

Sam Walton
Partnerships Coordinator
The Wikipedia Library

Wikisource-l mailing list

Wikisource-l mailing list

Wikisource-l mailing list