Hello everyone,
I'm excited to invite you to the 32nd edition of the DCW Conversation Hour,
featuring Abbad Diraneyya, a long-time Wikimedian, writer, and advocate for
open knowledge. Active in the Wikimedia movement since 2009. He has
authored *The Story of Wikipedia in Arabic*, a Creative Commons–licensed
work that documents the growth of Arabic free knowledge online and is
currently building expertise as a Conversational AI Designer.
In this session, we will explore how artificial intelligence is reshaping
knowledge production and consumption, the risks of AI hallucinations, and
the urgent need for robust fact-checking. Drawing on his experiences across
Wikimedia, authorship, knowledge strategy, and AI design, Abbad will guide
us through critical questions on how communities can safeguard knowledge
integrity in the age of AI.
The Conversation Hour is scheduled for Sunday, 31 August 2025, at 13:30 UTC
(7:00 p.m. IST).
We have integrated registration directly into the event page; the link is
provided below:
https://dcwwiki.org/dc-o8
We look forward to your participation. Please bring along your questions!
Kind regards,
Ariba
Deoband Community Wikimedia
On Mon, 25 Aug 2025 at 18:43, Leppert, Greg <gleppert(a)law.harvard.edu> wrote:
> I fear I should have mentioned that our goal isn’t to replicate or strictly adhere to existing movements.
"Public domain" isn't a movement.
If the nearly 1M books in your collection are public domain, that
status is (subject only to a massive and highly improbable change in
various laws) immutable.
> Given that I’m new to the list, I wouldn’t be surprised to learn that that means my original message was off topic or, at the least confusing.
Why would you think anyone is confused? Your message seemed very clear.
--
Andy Mabbett
https://pigsonthewing.org.uk
Hi Greg (CCing the "Wikimedia & GLAM collaboration" mailing list),
First, as there has been no reaction here yet: Congrats to you and Harvard
Law School Library on this release! A dataset of one million
high-quality-OCR public domain books sounds very impressive.
However, your message here, and in particular its highlighting of
*"time-bounded
Terms of Service that attempts to privilege open and noncommercial actors"*,
give the distinct impression that you are unaware of some central aspects
of Wikipedia and the Wikimedia movement, or indeed the wider free-culture
movement as well. While the Wikimedia Foundation is indeed a nonprofit
organization, and Wikipedia and the other Wikimedia projects are indeed
noncommercial, they have never accepted content licenses or terms that are
confined to "open and noncommercial actors". So let me link some
explanatory material:
The Wikimedia Foundation's licensing policy
<https://foundation.wikimedia.org/wiki/Resolution:Licensing_policy> (which
governs the content on Wikipedia and all other Wikimedia projects) relies
on *a definition of "free content" that excludes licenses limited to
noncommercial usage*, like your terms are. Summarizing the rationales for
this long-standing decision would go too far here - if you are interested
in those, this
<https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org…>
might be a good starting point. But to highlight one well-known problem
with such licenses (in particularly the -NC variants of the Creative
Commons licenses), because it may help to illustrate some especially
problematic restrictions that you/your lawyers attempt to impose: People
have found out time and again that it is difficult to actually define
commercial usage, in a way that doesn't have unintended consequences. (E.g.
could a hobbyist blogger be sued for using an NC-licensed image because her
blog features some Google ads?) Creative Commons even ran a whole study in
an attempt to retroactively clarify such boundaries.
But in any case, despite these well-documented complications, legal
restrictions about the commercial *usage* of particular material still seem
more straightforward to figure out than the *restrictions on "intent" and
"affiliation" of the *user** that you (or Harvard's lawyers?) try to impose
in the terms of use for this release
<https://huggingface.co/datasets/institutional/institutional-books-1.0>:
"Open-source projects and other public-use efforts are welcome, even if
they may indirectly support commercial use, so long as they are
unaffiliated with commercial actors or intent."
Your requirement that an open source project must not even be "affiliated"
with "commercial ... intent", would likely exclude, say, the majority of
widely used (e.g. by Wikimedia organizations
<https://meta.wikimedia.org/wiki/FLOSS-Exchange>) open source software
projects, which are frequently either maintained by a commercial company,
or by volunteers who also have a related day job as developer or may offer
paid support. Even the most anticapitalist purists in the free software
movement shy away from such restrictions in their licenses.
In any case, we can be pretty sure that your clause rules out the Wikimedia
Foundation, as it is not just "affiliated" with a commercial actor but has
one directly incorporated as a subsidiary, namely the for-profit Wikimedia
LLC. You don't seem to be aware of this, given that you came here with the
apparent impression that an offer to "privilege open and noncommercial
actors" may enable a cooperation.
The second clause of your terms
<https://x.com/tilmanbayer/status/1933311788688552165> (*"No
Redistribution"*) is likewise a non-starter for "open actors" - it is
almost the definition of non-open.
I do realize of course that there will be many AI/ML folks on HF and
elsewhere who are happy to use such a dataset while blissfully ignore such
attempts to impose restrictions on public domain content, perhaps assuming
- possibly correctly - that you didn't think these terms of use through
very thoroughly and are thus unlikely to enforce them, or who are simply
not yet as familiar with the long-term effects of such legal footguns as
Wikimedians and FLOSS developers have become over many years. That said,
I've seen your terms cause consternation in the open AI/ML world too, e.g.
on the EleutherAI Discord.
You should also be aware that in the history of the Wikimedia movement
there have been some some ugly *legal disputes with GLAMs* (galleries,
libraries, archives and museums, i.e. organizations like yours) who *attempted
to restrict reproduction of public domain works* in their possession with
similar rationales (i.e. an alleged need to extract revenue to refinance
digitization efforts or such, which I hear echoing in your vague remarks
about "sustainability" "ecosystem" etc). Two examples:
-
https://en.wikipedia.org/wiki/National_Portrait_Gallery_and_Wikimedia_Found…
- https://en.wikipedia.org/wiki/Reiss_Engelhorn_Museum#Wikimedia_lawsuit
(While that museum prevailed in court against the Wikimedia Foundation, the
EU Copyright Directive subsequently made such assertions of copyright over
faithful reproductions of public domain works impossible.)
I'm not saying that the Institutional Books project is likely to become
similarly contentious (if only for the simple reason that Wikimedians have
long already been importing the same underlying Google Books scans
<https://commons.wikimedia.org/wiki/Category:Scans_from_Google_Books>,
often to do their own OCR and proofreading on Wikisource). I'm just trying
to help you understand that the restrictions on public access that you
attempt to impose here under the label of "public-interest leverage" - i.e.
your own institution retaining control over the content so you can monetize
it - are likely to be seen as unacceptable by the open content movement.
Another point you should be aware of is that while Wikimedia volunteers
spend a lot of time diligently enforcing the copyrights of third parties
(by deleting infringing material uploaded to Wikimedia projects), they
*explicitly
reject
<https://commons.wikimedia.org/wiki/Commons:Non-copyright_restrictions>
enforcing
non-copyright terms* imposed by such third parties.
Lastly, a question:
You say here
<https://www.institutionaldatainitiative.org/posts/open-call-for-collaborato…>
that you (the Institutional Data Initiative) are "one of the
Harvard-affiliated beneficiaries of OpenAI's new NextGenAI consortium". *Is
OpenAI also one of your customers* paying for privileged access to the
Institutional Books dataset (while your terms exclude the general public
from it for the time being)?
I'm not arguing that OpenAI is evil per se, or that academic institutions
and GLAMs must never collaborate with Big Tech companies. (After all,
Google Books, which your project is based on, was such a collaboration
between Big Tech and academic libraries in the first place. And many
Wikipedians can testify to its great value and usefulness for the general
public.) However, the obfuscatory language in your post here regarding
commercial partnerships and monetization ("garnering support from
commercial actors as we iterate on sustainability"), combined with vague
gesturing at a possible time-delayed free release at an undetermined point
in the future, doesn't exactly inspire trust in this matter. If the project
provides more transparent information about this question elsewhere, feel
free to provide pointers. It would also be interesting to learn how much
revenue the Institutional Data Initiative projects to derive from this
monetization of public domain works.
Regards, Tilman ([[User:HaeB]])
On Mon, Jul 7, 2025 at 7:32 AM Leppert, Greg <gleppert(a)law.harvard.edu>
wrote:
> Hi all. Great to meet you and thank you to Leila for inviting me to join
> the list. I’m the Executive Director of the Institutional Data Initiative<
> https://www.institutionaldatainitiative.org> (IDI) at Harvard and I
> wanted to share our recent data release—Institutional Books<
> https://www.institutionaldatainitiative.org/institutional-books>, a
> collection of nearly 1M public domain books, scanned at Harvard Library
> through the Google Books project.
>
> IDI works with libraries and other knowledge institutions to publish their
> collections as data with the goal of establishing public-interest leverage
> in the AI ecosystem while improving collections for traditional patron
> usage. With each project, we look for novel ways to structure and analyze
> the collection and set standards along the way. With Institutional Books,
> we tackled language analysis, topic classification, and OCR correction, and
> our technical report<https://arxiv.org/abs/2506.08300> has even more. We
> hope to evolve the collection over time and release new formats as we go,
> such as EPUB and Markdown.
>
> We’re also using this moment to experiment with a time-bounded Terms of
> Service that attempts to privilege open and noncommercial actors while
> garnering support from commercial actors as we iterate on sustainability.
> The goal is to eventually make the collection and all of its scans
> available under a more traditional open model.
>
> Thoughts, questions, and collaboration welcomed. We also have a Slack
> where we’re talking about this collection and others. Or next project is to
> dig in on a new collection of old newspapers, in collaboration with Boston
> Public Library, as we work toward building a global commons.
>
> —Greg
> _______________________________________________
> Wiki-research-l mailing list -- wiki-research-l(a)lists.wikimedia.org
> To unsubscribe send an email to wiki-research-l-leave(a)lists.wikimedia.org
>
Hello, Wikimedia GLAM folks!
Ahead of state and local elections this November, the Harvard Kennedy School (HKS) Library, in collaboration with Harvard Library's Information & Technical Services and UX & Discovery departments, is hosting two civically engaged, participatory events focused on the Political Buttons at HKS Collection<https://www.hks.harvard.edu/faculty-research/library-knowledge-services/col…>, which features nearly 3,000 buttons representing U.S. political history from 1904 through today.
The events will focus on improving collection metadata and investigating copyright status, enabling us to make collection items available on platforms beyond Harvard (e.g., ✨Wikimedia Commons✨) to expand the collection's reach and research potential. At the events, we'll also have information on how participants can register to vote in municipal elections taking place on November 4.
View event details and register:
* Research-A-Thon<https://libcal.library.harvard.edu/event/14759570>. In-person on Friday, October 17, 10:30 a.m. – 2:30 p.m. Identify dates associated with the buttons of candidates who have run for political office.
* Copyright-A-Thon<https://libcal.library.harvard.edu/event/14759577>. Virtual via Zoom on Tuesday, October 28, 10:30 a.m. – 2:30 p.m. Investigate the copyright status of candidate buttons with identified dates.
You are invited to participate in one (or more!) of three ways:
* Attend the event(s) yourself! Register using the links above.
* Spread the word about the events in your circles.
* Volunteer to support the events by testing participant instructions and/or staffing the event(s) day-of.
What would volunteering entail?
We are asking for your insight and support in one or both of two ways:
* By late September, review and test the draft documentation/instructions we plan to provide to participants. We estimate this will require approximately 2–3 hours, including a brief meeting.
* Help staff the events by answering participant questions and supporting their workflows. Time commitment varies depending on your availability but will involve attending a 45-minute onboarding meeting and staffing one or both 4-hour events.
If you are willing and able to provide support, please fill out this form to sign up<https://urldefense.proofpoint.com/v2/url?u=https-3A__harvard.az1.qualtrics.…>.
By participating in and/or supporting these events, you will:
* Help to enhance Harvard's digital collections metadata to promote open knowledge and expand access.
* Promote copyright literacy and practice one method for determining a work's copyright status.
* Promote civic engagement by providing local voter registration information.
* Explore the many political campaigns run in the U.S. over the past century.
Warm thanks for considering. We hope to see you at one or both events!
—
Chelcie Juliet Rowell (she, they)
Associate Head of Digital Collections Discovery
UX & Discovery, Harvard Library
Hello GLAM Wiki community,
It gives us immense pleasure and excitement to you that the *Wikisource
Reader mobile application
<https://play.google.com/store/apps/details?id=org.cis_india.wsreader>* is
now released for Android users at the Google Play Store for them to read
books which are completely proofread and transcluded on the digital library
websites of Wikisource. The Github repo is here
<https://github.com/cis-india/wikisource-reader> and a website for the app
<https://cis-india.github.io/wikisource-reader-app/> is also created.
The metadata of the books are fetched directly from Wikidata and strictly
follows the bibliographical book model
<http://www.wikidata.org/wiki/Wikidata:Books> of Wikidata. So to appear in
the app, any completed book must fulfill the three mandatory criteria
mentioned below. They need to have corresponding:
- Wikidata items
- Wikisource sitelink with proofread or validation badges
- P1957 <https://www.wikidata.org/wiki/Property:P1957> property linked
in the items.
A sample Wikidata item of one such book is here
<https://www.wikidata.org/wiki/Q51614301>. A sample SPARQL query to list
the books to be displayed in the app for one specific language is here
<https://w.wiki/F4Av>. A detailed documentation can be found on this Meta-Wiki
page <https://meta.wikimedia.org/wiki/Wikisource_reader_app/Selection>.
The app currently hosts more than 7300 books in 22 language Wikisource
editions like Assamese, Bangla, Catalan, Czech, Danish, English, French,
Hindi, Indonesian, Italian, Javanese, Marathi, Malay, Polish, Punjabi,
Spanish, Sundanese, Swedish, Tamil, Telugu, Ukrainian and Vietnamese. More
languages will be added in next releases as and when they fulfill the
mandatory criteria in due course for at least 1-5 books as a start.
The app has the following features
1. Clean and beautiful user interface
2. Dark and Light theme
3. Option to browse free e-books in multiple languages
4. Option to import books from non-Wikisource external sources
5. Option to filter books in different literary forms
6. Option to download books for offline access
7. Option to store, read and delete books from library
8. Option to jump through chapters
9. In-built e-book reader
10. Customization of font color, size and weight
11. Light, Dark, Sepia and customized color mode for reading
12. Adjustment of page margins
13. RTL and LTR support
14. System default typeface along with options for Literata, Sans Sarif,
IA Writer Duospace, AccsiibleDfa and OpenDyslexic typefaces
15. Option to choose among left, right and justified text alignments
16. Customization of line height, paragraph indent, paragraph spacing,
word spacing and letter spacing
17. Options to highlight, underline and annotate texts
18. Option to bookmark
19. Text to Speech in different languages with customizable speed and
pitch
The app is dependent on
- WSindex API <https://wsindex.toolforge.org/books/>, which was built
specifically to fetch books for the app. The source code is here
<https://codeberg.org/ph4ni/wsindex>.
- WS export <https://wikisource.org/wiki/Wikisource:WS_Export> tool to
generate Epubs
- Myne app <https://github.com/Pool-Of-Tears/Myne/> by Shivam
<https://krsh.dev/> for user interface
- Readium mobile <https://github.com/readium/kotlin-toolkit> by The
European Digital Reading Lab (EDRLab)
<https://www.edrlab.org/software/readium-mobile/> for the actual reading
experience.
The development of the app was initially financially supported by Centre
for Internet and Society <https://meta.wikimedia.org/wiki/CIS-A2K> until
March 2025, who also host the app on Google play store now. The app is now
developed and maintained in volunteer capacity and we welcome all open
source developers and experienced Wikisourcerers to contribute to the
development of the app for the future.
We sincerely thank everyone who was involved in supporting the app in
different ways, without whom this app could not have been developed.
Regards,
Sai Phanindra and Bodhisattwa
(both as volunteer capacity)