@Sam, Tpt, my personal experience is too that HTML is the way to pull out the Wikisource important metadata, but it's also that every Wikisource has sort of a different way to show them, meaning that you need to tweak your scraper for each Wikisource. Is that still true? Last time I did it was more than one year ago, but I need to try it again soon.
Aubrey
On Wed, Nov 1, 2017 at 1:00 AM, Sam Wilson sam@samwilson.id.au wrote:
Yes I think you're definitely right! The easier way to send Wikisource data to Wikidata is going to be a clever gadget that reads the microformat or schema'd info in each page. My hack was just a quick and easy test at getting some things added. :)
Ultimately, I'm actually not that excited about working on the tools that we need to transfer the data. No no I don't mean that! Well, just that the end point we're aiming at is that a bunch of info *won't be* at all in Wikisource, but will be pulled from Wikidata, and so I am much more interested in making better tools for working with the data in Wikidata. :-) If you see what I mean.
My idea with ws-search is that it will progressively pull more and more data from Wikidata, and only resort to HTML scraping where the data is missing from Wikidata. I'm attempting to encapsulate this logic in the `wikisource/api` PHP library.
On Tue, 31 Oct 2017, at 11:14 PM, Thomas Pellissier Tanon wrote:
Hello Sam,
Thank you for this nice feature!
I have created a few months ago a prototype of Wikisource to Wikidata importation tool for the French Wikisource based on the schema.org annotation I have added to the main header template (I definitely think we should move from our custom microformat to this schema.org markup
that
could be much more structured). It's not yet ready but I plan to move it forward in the coming weeks. A beginning of frontend to add to your Wikidata common.js is here: https://www.wikidata.org/wiki/User:Tpt/ws2wd.js We should probably find a way to merge the two projects.
Cheers,
Thomas
Le 31 oct. 2017 à 15:10, Nicolas VIGNERON vigneron.nicolas@gmail.com
a écrit :
2017-10-31 13:16 GMT+01:00 Jane Darnell jane023@gmail.com: Sorry, I am much more of a Wikidatan than a Wikisourcerer! I was
referring to items like this one
https://www.wikidata.org/wiki/Q21125368
No need to be sorry, that is actually a good question and this example
is even better (I totally forgot this kind of case).
For now, this is probably better to deal with it by hands (and I'm not
sure what this tools can even do for this).
Cdlt, ~nicolas _______________________________________________ Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l Email had 1 attachment:
- signature.asc 1k (application/pgp-signature)
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l