Hello, everyone, here is what happened in the last 2 weeks (and a few days):
(adapted from https://meta.wikimedia.org/wiki/BHL/Our_outcomes/WiR/Status_updates/2025-04-... )04 April 2025 - 21 April 2025
This by-weekly update took a bit longer due to (1) me getting a nasty flu past week and (2) Easter celebrations around Brazil. Better late than never, so here they are! General Updates
- We got a session approved for Living Data 2025: https://www.livingdata2025.com/ Me & Anabela Plos (GBIF Argentina) are organizing the session *Wikimedia and Biodiversity Data: A Mutualistic Relationship in the Open Knowledge Ecosystem* at Living Data in October. The rubric says this session explores the intersection of Wikimedia projects (Wikipedia, Wikidata, and Commons) with global biodiversity infrastructures like GBIF, iNaturalist, and the Biodiversity Heritage Library (BHL), emphasizing their alignment for data mobilization, standardization, and public engagement and knowledge dissemination*. (...)*The session might have anywhere from 60 min to 120 min in total. *Abstracts are super welcome, including virtual presentations* as Living Data will be a hybrid conference. Registrations are already open https://www.livingdata2025.com/registration.html. Note, though, that there is a fee for registration of virtual participants (USD 120).
- *Grant submitted for the Wikimedia Research Fund 2025*: Titled *Biodiversity Knowledge Gaps on Wikipedia: A Cross-Lingual Analysis of Species Coverage and Contribution Patterns*, the grant a possible way to try and extend this work. It is a 9-month research project to investigate the flow of biodiversity content in Portuguese, Spanish and English Wikipedias. Not directly BHL, but with tight links to strategies to make BHL content reach wider audiences. I made the 12-page project available on Zenodo too https://zenodo.org/records/15236084. The grant is also a possibility of funding my travel to Living Data; let's see how it goes!
Technical Updates
- *Reporting the WiR work as a paper:* I have been investigating the Biodiversity Data Journal https://bdj.pensoft.net/about#Author-Guidelines as a venue for publishing a little article about the Wikimedian-in-Residence process. I lean towards a *Data Paper* treating the BHL — SDC subset as a dataset on its own. It would be a matter of extracting the relevant triples from the Commons + Wikidata pair and making it available for reuse. Possible uses include machine learning applications (like Mike Trizna did for the Flickr subset https://huggingface.co/spaces/MikeTrizna/bhl_flickr_search) or interactive art applications, as suggested in BHL's Annual Meeting. The BHL Image Explorer https://bhl-gallery.toolforge.org/ is also, on its own, an example of reuse of the dataset. Other options for reporting on what we did include GigaScience https://academic.oup.com/gigascience and the RIO Journal https://riojournal.com/, as well as releasing a pre-print in the ArphaHub https://preprints.arphahub.com/.
- *Acknowledgement of Harmful Content*: Due to the pace of changes in the U.S. federal landscape for DEI support, Bianca raised the possibiloity of hosting BHL's Acknowledgement of Harmful Content on the Meta Wiki page. Benefits include the possibility of translating the information into multiple languages. I have transcluded it into Meta Wiki https://meta.wikimedia.org/wiki/Biodiversity_Heritage_Library/Harmful_Content as a fork of the content in the BHL website; we probably should discuss in the future whether to turn this into the main source of the information or not to avoid conflicting drifts in the future.
- *One more QuickStatements generator*: The BHL Title QuickStatements generator https://bhl-qs-generator-production.up.railway.app/ is online. Rod Page’s excellent bhl2wiki https://bhl2wiki.herokuapp.com/ should still be the go-to tool for adding BHL DOIs to Wikidata. This tool is just slightly different: it (1)is Built in Python/Flask; (2) looks up BHL authors reconciled to Wikidata via BHL creator ID (P4081) https://www.wikidata.org/wiki/Property:P4081 (3) adds BHL bibliography ID (P4327) https://www.wikidata.org/wiki/Property:P4327) https://www.wikidata.org/wiki/Property:P4327, (4) uses "written work" https://www.wikidata.org/wiki/Q47461344 as the instance of (P31) https://www.wikidata.org/wiki/Property:P31 value, (5) includes an API endpoint at /api/quickstatements and (6) uses the new multiple language system https://www.wikidata.org/wiki/Help:Default_values_for_labels_and_aliases for the titles. Eventually, though, there should be only one *bhl2wiki* tool. That will need some coordination and should take more time than this quick workaround, and I apologize for any confusion.
- *The Wikimedia Hackathon*: I am joining the Wikimedia Hackathon https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2025 from May 2nd to May 4th in Istanbul, Turkey. It should be fun. I still haven't picked a hackathon project. I might leverage the gathering of knowledgeable tech-savvy Wikimedians to try and start the tool for direct upload of images from BHL --> Commons. There are many possible projects, though, and other ideas may appear!
- *Cleaning code*: I kept working on cleaning the code for the structured data uploads https://github.com/lubianat/bhl_sdc_data_curation, making it more readable and reusable in the future. The idea is that after this Wikimedian-in-Residence contract is finished, the script is ready for bite-sized, fun, volunteer-driven contributions (by myself and others).
And that is it! Thank you again for reading!
Have a great week, Tiago
*——————————————————————————* *Tiago Lubiana* *Wikimedian-in-Residence, Biodiversity Heritage Library https://www.biodiversitylibrary.org/*
*tiago.bio.br https://tiago.bio.br*