Dan Nessett wrote:
The discussion about whether to support license data in the database has settled down. There seems to be some support. So, I think the next step is to determine the best technical approach. Below I provide a strawman proposal. Note that this is only to foster discussion on technical requirements and approaches. I have nothing invested in the strawman.
Implementation location: In an extension
Permissions: include two new permissions - 1) addlicensedata, and 2) modifylicensedata. These are pretty self-explanatory. Sites that wish to give all users the ability to provide and modify licensing data would assign these permissions to everyone. Sites that wish to allow all users to add licensing data, but restrict those who are allowed to modify it, would give the first permission to everyone and the second to a limited group.
Database schema: Add a "licensing" table to the db with the following columns - 1) revision_or_image, 2) revision_id, 3) image_id, 4) content_source, 5) license_id, 6) user_id.
The first three columns identify the revision or image to which the licensing data is associated.
That's ugly. I would prefer having one licensing table for revisions and another for images (btw, there's no such thing as image_id they are identified by name, or the id of their description page, plus timestamp if you also want to address old versions).
The content_source column is a string that is a URL or other reference that specifies the source of the content under license. The license_id identifies the specific license for the content. The user_id identifies the user that added the licensing information. The user_id may be useful if a site wishes to allow someone who added the licensing information to delete or modify it. However, there are complications with this. Since IP addresses are easily spoofed, it would mean this entry should only be valid for logged in users.
The user id could be stored at the logging table. You may want to add a licensing_id to identify rows on this table.
Add a "license" table with the following columns - 1) license_id, 2) license_text, 3) license name and 4) license_version. The license_id in the licensing table references rows in this table.
You could begin by hardcoding the available licenses in the extension, and then add support for the license table. There is a number of issues there: When can you remove a license? (maybe never once it is used), which licenses are shown as available? Do you have licenses which will "change" (eg. when you may want to change the default license from "CC-BY-SA 3.0 or later"to "CC-BY-SA 4.0 or later") ? Note that the license_version could also be part of the license_name. To make it useful you probably need a boolean to mark that it is an "or later" licensing.
One complication is when a page or image is reverted, the licensing table must be modified to reflect the current state.
If you are associating licenses with revisions (instead of pages), you don't need to change the state in the licensing table on further edits (just copy the license of the previous revision).
Data manipulation: The extension would use suitable hooks to insert, modify and render licensing data. Insertion and modification would probably use a relevant Edit Page or Article Management hook. Rendering would probably use a Page Rendering Hook.
Page rendering: You probably don't want to dump licensing data directly onto a page. Instead, it is preferable to output a short licensing statement like:
"Content on this page uses licensed content. For details, see licensing data."
The phrase "licensing data" would be a link to a special page that accesses the licensing table and displays the license data associated with the page.
That's fine. You could even use "Content on this page uses licensed content from XXXX under [[Special:Licenses/YYY|YYY license]]"
Do you want to support multilicensing? You could have revisions with data coming from several sources. That means you must allow duplicated revision_id in the licensing table.