Hello, everyoneǃ
Here are the status updates for the past 2 weeks, transcluded from
https://meta.wikimedia.org/wiki/BHL/Our_outcomes/WiR/Status_updates/2025-03….
Thanks for reading!
07 March 2025 - 21 March 2025
As the residency approaches an end (*only 1 month and a half left*), the
focus for the next weeks will be on making workflows future-proof,
providing high-quality training material and documenting everything. If you
have any requests or ideas, this is a great time to bring them upǃ
[image: image.png]
South-American Orchids in the Toolforge app BHL-Wiki-GBIF Image Gallery,
reachable at
https://bhl-gallery.toolforge.org/?taxonKey=7689&continent=SOUTH_AMERICA
General updates
- The thoughts on how to show the value for Structured Data on Commons
led to a *portal/gallery for BHL images with GBIF filters
<https://bhl-gallery.toolforge.org/>* at is an application that usesǃ It
uses the GBIF API to navigate taxa and locations for the species on
the 18.6k
depicts statements on BHL images on Commons <https://w.wiki/DVLX>.
- The next few weeks will see the *1Pic1Bio events, promoted by the
Wikimedia Foundation* to increase usage of BHL images on Wikipedia. The
events will happen with live translation to English, but natively in Spanish
on March 26 <https://meta.wikimedia.org/wiki/Event:1Pic1Bio_(Spanish)>,
in French on March 28
<https://meta.wikimedia.org/wiki/Event:1Pic1Bio_(French)> and in Portuguese
on April 2 <https://meta.wikimedia.org/wiki/Event:1Pic1Bio_(Portuguese)>.
Anyone interested may join the events by clicking on those links.
Technical updates
- *Bot for plant depictions*ː A bot/automatic operation was approved
<https://commons.wikimedia.org/wiki/Commons:Bots/Requests/TiagoLubianaBot>
to infer depicts (P180) <https://www.wikidata.org/wiki/Property:P180>
statements from Commons categories. The bot script finished running,
updating 55k files containing botanical illustrations, including at
least 8.1k
BHL images <https://w.wiki/DVLj>. These had been previously manually
catalogued to particular species by the Commons community in a tremendous
tour-de-force. Next stepsː (1) try and add BHL Page IDs for the BHL Images
missing it and (2) try and reproduce this for other taxaǃ
- *BHL images missing page ID:* This maintenance query for images in the
BHL category that have a depicts statement, but miss a page ID
<https://commons.wikimedia.org/w/index.php?search=incategory%3A%22Files+from…>
may help prioritize targets for adding Structured Data. The are just one BHL
page ID (P687) <https://www.wikidata.org/wiki/Property:P687> away from
appearing in tools like the BHL Image Explorer
<https://bhl-gallery.toolforge.org/> .
- *WMF's Research Fund grantː* The WMF's Research Fund grant
<https://meta.wikimedia.org/wiki/Grants:Programs/Wikimedia_Research_%26_Tech…>
is open for applications and the submission deadline is April 16, 2025. I
did write some thoughts on *Mapping Indigenous and Common Names in Latin
American Biodiversity Texts
<https://docs.google.com/document/d/1bPQSx7fldXFi2zlQbt_73CCfLdnnGguRsfUpDEV…>*
for this grant. They changed a bit the scope, though, and seem to be more
focused on social sciences and computer sciences inquiries yielding
generalizable insight on the Wikimedia ecosystem. I likely *won't* send
an application, but if anyone thinks differently, just let me knowǃ
- *Adding public domain statements to images: *After conversation in the
BHL-Wiki working group meeting (thank you, Bianca, for bringing up the
subjectǃ), we decided to use the public domain (Q19652)
<https://www.wikidata.org/wiki/Q19652> value on Commons for works deemed
Public Domain on Commons. The Commons community has very strict
requirements on copyright
<https://commons.wikimedia.org/wiki/Commons:Copyright_rules>, so this
decision was harder than it may seem!
- *Removing wrong CC-BY statements**: *We also decided to remove
inaccurate CC-BY statements, a decade-long legacy from Flickr limitations
at the time. The Structured Data script is removing those
<https://commons.wikimedia.org/w/index.php?title=File:Histoire_naturelle_des…>
from structured data, and the new best-practices were reflected on
the Minimum
BHL Image Metadata Model
<https://docs.google.com/spreadsheets/d/1ocqDQBFaKAQvPsP3HMlrh52faiHiaDU-D9P…>,
which now reached v0.1.6. There remains, though, a need to remove the
statements from the Wikitext. Changing Wikitext in batch would need
different bot code, but seems doable.
- *Technical details of the Image Gallery:* The BHL Image Explorer or
BHL Image Gallery — I am still looking for a name — (source code here)
<https://github.com/lubianat/bhl-gallery> started as an all client-side
page in javascript, but after quite some tech work, it is a
simple-but-functional Flask application hosted on Toolforge
<https://bhl-gallery.toolforge.org/>. It has some fun perks, like
sharing the links for particular taxa or location (e.g. parrots from
Africa
<https://bhl-gallery.toolforge.org/?taxonKey=1445&continent=AFRICA>). It
is still in testing so, expect some bugs — and not of the good,
*Coleoptera* kind. If you find them, do let me knowǃ
- *Continued uploads of structured metadataː* The directed structured
data uploads
<https://docs.google.com/spreadsheets/d/1YhMSb_iBylJaWPX37kZbVzdyWoFidT9a31P…>
continued, now covering >7,5k images, about 2,5k more than in the last
reportǃ The code for uploads is available at
github.com/lubianat/bhl_sdc_data_curation. I am improving the docs, but
it is a somewhat complex workflow, as there are a lot of corner cases. I
will still try and refactor and make it usable for other tech-savvy
volunteers in the future. I considered making a web app but that would take
a lot of time to do well.
- *BHL Day Workshopː* It will soon be BHL Day (April 9-10
<https://about.biodiversitylibrary.org/get-involved/events/bhl-day-2025/>)
and Siobhan and Sabine will be in Berlin to discuss all kinds of nice Wiki
thingsǃ I'll attend remotely and share more news on the next update, on
April 4 ː)
- *Internet Archive and Machine Learningː* I had a quick call with Mike
Trizna about BHL and ARCH (Archives Research Compute Hub)
<https://webservices.archive.org/pages/arch/> in preparation for a
meeting to happen on March 25th with Karl Blumenthal, from the ARCH team.
He brought up some good ideas and told me a bit about what he and others at
the Smithsonian have been doing. Let's see what we can do with ARCH!
That is it and once again thank you for reading it throughǃ If you have any
comments or questions, just let me know and see you soonǃ
Tiago
*——————————————————————————*
*Tiago Lubiana*
*Wikimedian-in-Residence, Biodiversity Heritage Library
<https://www.biodiversitylibrary.org/>*
*tiago.bio.br <https://tiago.bio.br>*
Hi, everyone, here is what happened in the last 2 weeks of
Wikimedian-in-Residentship:
(adapted from
https://meta.wikimedia.org/wiki/BHL/Our_outcomes/WiR/Status_updates/2025-03…)
21 February 2024 - 07 March 2025
*General updates*
- A milestone was reachedː *I have added Structured Data for over 5.000
files on Commons,* the milestone outlined in the Statement of Work for
the positionǃ That is not the end of it, thoughː the workflow is getting
faster and more scalable, as the last manual bits are automated. Now it
integrates 3 APIs, BHL, Flickr and GBIF, to add metadata to Commons.
- Impressively, other users (well, mostly Siobhanǃ) have contributed SDC
to also over 5000 other items using OpenRefine and other semi-automatic
workflows. That means *over 10.000 image files with SDCǃ* Congrats to
the BHL-Wiki working groupǃ (See this query for the updated countː
https://w.wiki/DLB5)
<https://meta.wikimedia.org/wiki/File:Animalia_nova_sive_species_novae_testu…>
The big-headed Amazon river turtle
<https://commons.wikimedia.org/wiki/File:Animalia_nova_sive_species_novae_te…>,
identified in Flickr as *Emys tracaxa*, now has the metadata on Commons
pointing to *Peltocephalus dumerilianus
<https://www.wikidata.org/wiki/Q2716575>*, a match swiftly done by the GBIF
species-matching API . <https://techdocs.gbif.org/en/openapi/v1/species>
*Technical updates*
- *Bibliography ids and BHL DOIsː* Leveraging the new Wikidata batch
upload tool, Quickstatements V3, I have run a batch to add BHL Title IDs
for Wikidata items with BHL DOIs
<https://qs-dev.toolforge.org/batch/256/>. There were 728 of those. I am
afraid they were left off the last Wikidata roundtripping
<https://blog.biodiversitylibrary.org/2023/02/round-tripping-persistent-iden…>
events, because I don't see the links on BHL. Maybe it is a good
opportunity to re-run the roundtripping scriptsǃ
- *Better code:* Reduced reliance on legacy wikitext information, now
using only the "pageid" and pulling all the other pieces of metadata from
the BHL API (with some local caching to reduce calls).
- *Updates to model and scriptsː* The Minimum BHL Image Data Model
<https://docs.google.com/spreadsheets/d/1ocqDQBFaKAQvPsP3HMlrh52faiHiaDU-D9P…>
reaches v0.1.5 and is maturing towards supporting more automation. The
script supported by this model leverages the GBIF API to detect the
accepted name for synonyms that may appear as a Flickr tag or in the BHL
OCR. For clarity, the model now includes the possibility of using
references such as inferred from GBIF scientific name matching service
(Q132907038) <https://www.wikidata.org/wiki/Q132907038>, allowing for
clear understanding.
- *QLever, a fast alternative to the Commons Query Service:* QLever
<https://qlever.cs.uni-freiburg.de/wikimedia-commons> is an alternative
backend for handling structured data from Wikidata/Commons which is much
faster for the majority of cases. I have added an issue to their tracker
<https://github.com/ad-freiburg/qlever/issues/1836> asking about new
Structured Data updates + opportunities for collaboration as a BHL WiR.
They have promptly replied and are updating the data, a good sign for a
potential partnership.
- *Copyright statements*ː As imports from Flickr had, in some cases,
elements of accidental Flickrwashing
<https://commons.wikimedia.org/wiki/Commons:License_laundering>, I
started adding no known copyright restrictions (Q99263261)
<https://www.wikidata.org/wiki/Q99263261> as a value for the copyright
status (P6216) <https://www.wikidata.org/wiki/Property:P6216> property
to counterbalance the "copyrighted" claims. This was already in the Data
Model
<https://docs.google.com/spreadsheets/d/1ocqDQBFaKAQvPsP3HMlrh52faiHiaDU-D9P…>,
as an "optional" field as cleaning this information up is complex, and so
is copyright. I bumped it in the version v0.1.3 from "optional" to
"recommended" (example diff
<https://commons.wikimedia.org/w/index.php?title=File:A_natural_history_of_b…>
).
- *BHL-to-Commons upload toolː* We discussed in the BHL-Wiki Working
Group meeting the possibility of a direct BHL-to-Commons upload tool,
parallel to Flickr uploads. Of note, this seems to have been done in the
past, as some BHL files in Commons are not in Flickr, e.g.
https://commons.wikimedia.org/wiki/Category:The_natural_history_of_the_Tine…
and
https://commons.wikimedia.org/wiki/Category:Contributions_%C3%A0_la_connais….
This won't be done as part of this Wikimedian-in-Residence scope of work —
but I am personally excited with the idea of making this real, perhaps as
part of my volunteer activities in the future, slow but steady.
- *Inferring depicts from categoriesː* A Bot Request
<https://commons.wikimedia.org/wiki/Commons:Bots/Requests> is up on
Wikimedia Commons for inferring the depicts (P180) statements from the
community curated categories. This will help adding depicts statements for
>10.000 botanical illustrations, a large part of it from BHL.
- *A GBIF-BHL-Commons portal for searching images.* With the robust
metadata on Commons, I start thinking about how we can show the world how
good metadata matters not only in theory, but in practice. Maybe a simple
tool using GBIF to select taxa based on, e.g. geographic location or a
taxonomic group (like "birds") and retrieving BHL images from Commons that
match said taxa. The backend could either query Commons via, say, QLever,
or *we could release a dataset (on Zenodo, figshare or similar ) with
all BHL Image Commons Files* *with* *metadata* and P180 statements +
identifiers derived from Wikidata (GBIF, iNaturalist). I like the idea of
the dataset because it creates a persistent record of the work we are
doing, complementing the live APIs.
- *Internet Archive and Machine Learningː* JJ shared an open call for
ideas to work together with the ARCH (Archives Research Compute Hub)
<https://webservices.archive.org/pages/arch/>, and I have applied to
partner up to test a few things, such as dedicated OCR for text in names
(detecting species names and author contributions) and improving page type
detection. It was well received, they replied promptly, and we should have
a first meeting on March 25 to evaluate the feasibility. JJ also put me in
contact with Mike Trizna, who has been working on similar ideas for a while
now and is already providing valuable feedback. If you also have any ideas
or suggestions, just let me knowǃ
- *Page Types as bottleneck for automationː* The current bottleneck for
a complete batch upload is the mismatch between the BHL Image Data Model
and the current curation of page types in BHL. The biggest issue is that
both Photographs and Drawings are labelled as "Illustration" on BHL, and we
try to differentiate both on the Commons Data Model. Maybe with the ARCH
project we can try and bridge that gap.
And that is itǃ Thank you for reading this update and see you soonǃ
Cheers,
Tiago
*——————————————————————————*
*Tiago Lubiana*
*Wikimedian-in-Residence, Biodiversity Heritage Library
<https://www.biodiversitylibrary.org/>*
*tiago.bio.br <https://tiago.bio.br>*
[apologies for cross-posting]
Dear BHL Partners, Staff, Committees, and Working Groups,
The Call for Speakers for BHL Day 2025 is now open! In conjunction with the BHL Annual Meeting, the Museum für Naturkunde Berlin will host a public symposium on 9 April 2025 with the theme: Bridging Data and Nature: Connecting Information, Technology, and Biodiversity. For complete details, visit https://s.si.edu/bhlday2025
We invite our BHL colleagues to present your work on biodiversity data and literature at BHL Day 2025. The events take place over two days, 9-10 April, in a lunch-to-lunch format that allows additional time for interactive engagement with the extended BHL community, biodiversity community, and invited guests. The 15-20 minute talks will be held in-person and streamed live during the symposium on the afternoon of 9 April. On the morning of 10 April, in-person workshops will be held for participants to discuss topics from across the BHL consortium. Selected speakers may be asked to lead breakout sessions focused on their symposium talks. The final program for the symposium and workshops will be announced in early March.
Submission process:
*
Submit your proposal<https://forms.gle/QzW533GuRuerbdQN9> by 21 February 2025.
*
The planning committee and BHL Executive Committee will make the final selection of speakers.
*
Preference will be given to speakers attending in person. Speakers may also present virtually or via recording.
*
Speakers will be notified of selection by the end of February.
[X]<https://s.si.edu/bhlday2025>
Colleen Funkhouser
Program Manager, Biodiversity Heritage Library
she/her/hers
phone: 202.633.1709
funkhouserc(a)si.edu<mailto:funkhouserc@si.edu>
biodiversitylibrary.org<https://biodiversitylibrary.org/>
Smithsonian Libraries and Archives
10th St. & Constitution Ave. NW
PO Box 37012 MRC 154
Washington, D.C. 20013-7012
librariesarchives.si.edu<https://librariesarchives.si.edu/>
Hi everyone,
I’m posting the following message on behalf of Colleen Funkhouser from the Biodiversity Heritage Library. I gave a presentation at the BHL Annual Meeting Public Day 2024 and throughly enjoyed to experience. If you are interested I encourage you to submit your proposal. I’m also happy to discuss this with anyone who may be interested and who wants to know more.
Dear BHL Partners, Staff, Committees, and Working Groups,
The Call for Speakers for BHL Day 2025 is now open! In conjunction with the BHL Annual Meeting, the Museum für Naturkunde Berlin will host a public symposium on 9 April 2025 with the theme:Bridging Data and Nature: Connecting Information, Technology, and Biodiversity.For complete details, visit https://s.si.edu/bhlday2025
We invite our BHL colleagues to present your work on biodiversity data and literature at BHL Day 2025. The events take place over two days, 9-10 April, in a lunch-to-lunch format that allows additional time for interactive engagement with the extended BHL community, biodiversity community, and invited guests. The 15-20 minute talks will be held in-person and streamed live during the symposium on the afternoon of 9 April. On the morning of 10 April, in-person workshops will be held for participants to discuss topics from across the BHL consortium. Selected speakers may be asked to lead breakout sessions focused on their symposium talks. The final program for the symposium and workshops will be announced in early March.
Submission process:
Submit your proposal <https://forms.gle/QzW533GuRuerbdQN9> by 21 February 2025.
The planning committee and BHL Executive Committee will make the final selection of speakers.
Preference will be given to speakers attending in person. Speakers may also present virtually or via recording.
Speakers will be notified of selection by the end of February.
<https://s.si.edu/bhlday2025>
Colleen Funkhouser
Program Manager, Biodiversity Heritage Library
she/her/hers
phone: 202.633.1709
funkhouserc(a)si.edu <mailto:funkhouserc@si.edu>
biodiversitylibrary.org <https://biodiversitylibrary.org/>
Smithsonian Libraries and Archives
10th St. & Constitution Ave. NW
PO Box 37012 MRC 154
Washington, D.C. 20013-7012
librariesarchives.si.edu <https://librariesarchives.si.edu/>
FYI - potentially of interest, feel free to forward on.
________________________________
From: BHL-Techteam <BHL-TECHTEAM(a)SI-LISTSERV.SI.EDU> on behalf of Funkhouser, Colleen <0000035f8c42e8cb-dmarc-request(a)SI-LISTSERV.SI.EDU>
Sent: Monday, February 3, 2025 4:19 PM
To: Listserv BHL-Techteam <BHL-TECHTEAM(a)SI-LISTSERV.SI.EDU>
Subject: Call for Speakers: BHL Annual Meeting Symposium
[apologies for cross-posting]
Dear BHL Colleagues,
The Call for Speakers for BHL Day 2025 is now open! In conjunction with the BHL Annual Meeting<https://confluence.si.edu/display/BHLopen/2025+BHL+Annual+Meeting+Logistics>, the Museum für Naturkunde Berlin will host a public symposium on 9 April 2025 with the theme: Bridging Data and Nature: Connecting Information, Technology, and Biodiversity. For complete details, visit https://s.si.edu/bhlday2025
We invite our BHL colleagues to present your work on biodiversity data and literature at BHL Day 2025. The events take place over two days, 9-10 April, in a lunch-to-lunch format that allows additional time for interactive engagement with the extended BHL community, biodiversity community, and invited guests. The 15-20 minute talks will be held in-person and streamed live during the symposium on the afternoon of 9 April. On the morning of 10 April, in-person workshops will be held for participants to discuss topics from across the BHL consortium. Selected speakers may be asked to lead breakout sessions focused on their symposium talks. The final program for the symposium and workshops will be announced in early March.
Submission process:
*
Submit your proposal<https://forms.gle/QzW533GuRuerbdQN9> by 21 February 2025.
*
The planning committee and BHL Executive Committee will make the final selection of speakers.
*
Preference will be given to speakers attending in person. Speakers may also present virtually or via recording.
*
Speakers will be notified of selection by the end of February.
Colleen Funkhouser
Program Manager, Biodiversity Heritage Library
she/her/hers
phone: 202.633.1709
funkhouserc(a)si.edu<mailto:funkhouserc@si.edu>
biodiversitylibrary.org<https://biodiversitylibrary.org/>
Smithsonian Libraries and Archives
10th St. & Constitution Ave. NW
PO Box 37012 MRC 154
Washington, D.C. 20013-7012
librariesarchives.si.edu<https://librariesarchives.si.edu/>