Hello everyone,
I’m delighted to invite you to the 36th edition of the DCW Conversation
Hour, featuring Pavan Santhosh, an experienced Wikimedian and open
knowledge practitioner who has been actively contributing to the Telugu
Wikipedia and its sister projects since 2015. Currently, he is the Program
Manager for the Open Knowledge Initiatives at IIIT Hyderabad, continuing
his work at the intersection of open knowledge, communities, and
institutions.
In this conversation, Pavan will focus on reimagining how Wikimedia
communities engage and collaborate beyond the movement, moving past
traditional outreach models toward more meaningful and reciprocal
partnerships. He will also emphasize the importance of understanding the
needs, priorities, and contexts of external communities—rather than merely
inviting them to contribute—so that collaborations are built on shared
goals and mutual benefit.
The Conversation Hour is scheduled for Wednesday, 24 December 2025, at
13:30 UTC (7:00 p.m. IST).
You can find event details and registration on the page below:
https://dcwwiki.org/dcw-3Q
We look forward to your participation and thoughtful engagement in this
important discussion.
Kind regards, Ariba Deoband Community Wikimedia
Hello everyone,
I am Quinn (User:SuperGrey) from Chinese Wikisource (zh.wikisource.org). I am writing to request advice and precedent from the wider Wikisource community and the Wikimedia Foundation regarding a proposed large-scale import of Chinese court judgments from the national database known as China Judgments Online (中国裁判文书网, often abbreviated as CJO).
I would like to begin with some background, because many non-Chinese Wikimedia contributors may not be aware of how significant CJO has been for judicial transparency in China and how sharply access to it has been reduced in recent years.
China Judgments Online was launched in 2014 by the Supreme People’s Court (SPC) as a major transparency initiative. For nearly a decade, courts across the country uploaded tens of millions of decisions, creating what was widely regarded as one of the world’s largest publicly accessible judicial databases. At its peak, CJO hosted over 140 million documents and received tens of billions of page views. Researchers inside and outside China used the site extensively to study judicial behavior, local governance, criminal justice, and institutional changes.
However, since around 2021, and especially in 2023–2024, the Chinese government has significantly reversed this openness. Multiple independent investigations and media reports have documented the systematic removal of previously public judgments, particularly those that reflect poorly on local authorities, expose procedural misconduct, involve politically sensitive issues, or contradict preferred political narratives. In late 2023, leaked SPC documents revealed instructions to migrate judgments into a new internal-only database accessible solely within the court system, while sharply reducing what remains publicly visible. Studies have shown that vast numbers of cases have already disappeared from public view. Major news organizations such as MIT Technology Review, Radio Free Asia, the South China Morning Post, and Reuters have all reported on this rollback of judicial transparency:
– https://www.technologyreview.com/2023/12/20/1085741/china-judgements-online…
– https://www.rfa.org/english/news/china/china-court-records-12142023132626.h…
– https://www.scmp.com/news/china/politics/article/3246067/china-cut-back-acc…
– https://www.reuters.com/world/china/china-vows-judicial-disclosure-after-ou…
For our purposes, the important point is this: CJO has removed or restricted access to large portions of its historical archive, including documents that were originally public, legally non-copyrightable under Chinese law, and crucial for understanding the functioning of China’s legal system. Many judgments that were once easily verifiable on the official site can no longer be checked against their original source. These documents are at risk of disappearing entirely from public access.
An independent archiving project, caseopen.org, has preserved a large HTML snapshot of CJO’s judgments spanning 2013 to October 2024. The maintainers of caseopen.org have donated this dataset to Chinese Wikisource. The files capture the “online version” as it originally appeared on CJO, including formatting and errors, and therefore represent a unique opportunity to preserve a historical record of China’s legal system prior to this wave of censorship and delisting. In practical terms, this may be the last comprehensive public snapshot that will ever exist.
On Chinese Wikisource, I have proposed importing this dataset through a bot (User:SuperGrey-bot). The local discussion, including technical details and code links, is here (in Chinese):
https://zh.wikisource.org/wiki/Wikisource:机器人#User:SuperGrey-bot
The scale of the corpus is extremely large: tens of millions of judgments, potentially more if we include non-judgment document types such as 裁定书 (ruling document) and 通知书 (notification document). We are planning a staged import, beginning with small test batches, then individual months, and only later the full corpus, once the community settles questions about formatting, titling, metadata, and scope.
Because this project includes politically sensitive material and an unusual archival value, and because the scale is unprecedented for our language Wikisource, I would greatly appreciate advice and precedent from the international community. This is not only a technical or organizational task; it is also a preservation effort. We are attempting to safeguard public domain legal documents that have been systematically removed from public access. Wikisource may be one of the last neutral, open, global platforms capable of preserving this historical record.
Given the potential size of the import, I would also appreciate input from the Wikimedia Foundation on any operational considerations. A multi-million–page import may affect storage, dumps, CirrusSearch indexing, and overall site performance. Before proceeding beyond small test batches, I would like to understand whether such an import is feasible within the current technical limits of Chinese Wikisource, and whether coordination with SRE or Cloud Services is recommended.
Specifically, I would like to ask for input on the following areas:
1. Scope and suitability
Have other Wikisources hosted similarly massive, uniform corpora of government or legal documents? How did you determine whether they fit the mission of Wikisource? Were there concerns about overwhelming the project or changing its character?
2. Verifiability and provenance
In our case, the source is an independent mirror of a government website that is now selectively removing documents. While Wikimedia projects have long preserved public domain government documents after originals were taken down or censored, I am unsure how Wikisource communities have handled this scenario in practice. Are mirrored datasets acceptable when the original public source has been altered or removed? How should we document provenance and authenticity for future readers?
3. Organizational and technical considerations
If we proceed, how should we structure this corpus so the project remains usable? Are there recommended practices for:
– titling, metadata, and Wikidata integration for legal documents,
– organizing millions of pages so they do not overwhelm categories and search,
– mitigating strain on job queues, dumps, and indexing,
– making future partial deletions or corrections feasible if political pressure or legal demands (e.g., DMCA takedown notices) ever arise?
4. Political and archival importance
Wikisource has historically preserved documents at risk of censorship or disappearance, whether due to authoritarian restrictions or institutional neglect. Do other communities have experience with politically sensitive archival projects where the preservation value itself was a central motivation?
At present, Chinese Wikisource is still deliberating basic formatting and policy questions. No large imports will be performed until a local consensus is clear. Although we are working from the independent caseopen.org snapshot rather than relying on ongoing availability of the official CJO site, the broader context is that public access to Chinese judicial decisions has already been substantially reduced in recent years. Because our dataset preserves a historical record that may not remain accessible through official channels, we believe this is an appropriate moment to seek broader input and learn from other Wikisource communities with similar archival experiences.
Thank you very much for your time, advice, and any examples or concerns you can share. Even understanding which questions we should be asking would be extremely helpful.
Best regards,
Quinn Gao (User:SuperGrey)
https://meta.wikimedia.org/wiki/User:SuperGrey
Hello Wikisourcerers,
We are happy to introduce Levy Muguro to the Wikisource community. He has
been selected as an Outreachy intern to improve the Wikisource reader app
for the next couple of months.
The tasks related to the internship can be traced here -
https://phabricator.wikimedia.org/T405593 . All Wikisource reader app
related tasks can be tracked in this workboard -
https://phabricator.wikimedia.org/project/board/8233/query/all/ . If you
want to request a new feature or resolve bugs related to the app, please
feel free to report it there.
Regards.
Sai and Bodhisattwa