On Fri, 10 Sep 2010 23:11:27 +0000, Dan Nessett wrote:
We are currently attempting to refactor some specific
modifications to
the standard MW code we use (1.13.2) into an extension so we can upgrade
to a more recent maintained version. One modification we have keeps a
flag in the revisions table specifying that article text was imported
from WP. This flag generates an attribution statement at the bottom of
the article that acknowledges the import.
I don't want to start a discussion about the various legal issues
surrounding text licensing. However, assuming we must acknowledge use of
licensed text, a legitimate technical issue is how to associate state
with an article in a way that records the import of licensed text. I
bring this up here because I assume we are not the only site that faces
this issue.
Some of our users want to encode the attribution information in a
template. The problem with this approach is anyone can come along and
remove it. That would mean the organization legally responsible for the
site would entrust the integrity of site content to any arbitrary
author. We may go this route, but for the sake of this discussion I
assume such a strategy is not viable. So, the remainder of this post
assumes we need to keep such licensing state in the db.
After asking around, one suggestion was to keep the licensing state in
the page_props table. This seems very reasonable and I would be
interested in comments by this community on the idea. Of course, there
has to be a way to get this state set, but it seems likely that could be
achieved using an extension triggered when an article is edited.
Since this post is already getting long, let me close by asking whether
support for associating licensing information with articles might be
useful to a large number of sites. If so, the perhaps it belongs in the
core.
The discussion about whether to support license data in the database has
settled down. There seems to be some support. So, I think the next step
is to determine the best technical approach. Below I provide a strawman
proposal. Note that this is only to foster discussion on technical
requirements and approaches. I have nothing invested in the strawman.
Implementation location: In an extension
Permissions: include two new permissions - 1) addlicensedata, and 2)
modifylicensedata. These are pretty self-explanatory. Sites that wish to
give all users the ability to provide and modify licensing data would
assign these permissions to everyone. Sites that wish to allow all users
to add licensing data, but restrict those who are allowed to modify it,
would give the first permission to everyone and the second to a limited
group.
Database schema: Add a "licensing" table to the db with the following
columns - 1) revision_or_image, 2) revision_id, 3) image_id, 4)
content_source, 5) license_id, 6) user_id.
The first three columns identify the revision or image to which the
licensing data is associated. I am not particularly adept with SQL, so
there may be a better way to do this. The content_source column is a
string that is a URL or other reference that specifies the source of the
content under license. The license_id identifies the specific license for
the content. The user_id identifies the user that added the licensing
information. The user_id may be useful if a site wishes to allow someone
who added the licensing information to delete or modify it. However,
there are complications with this. Since IP addresses are easily spoofed,
it would mean this entry should only be valid for logged in users.
Add a "license" table with the following columns - 1) license_id, 2)
license_text, 3) license name and 4) license_version. The license_id in
the licensing table references rows in this table.
One complication is when a page or image is reverted, the licensing table
must be modified to reflect the current state.
Data manipulation: The extension would use suitable hooks to insert,
modify and render licensing data. Insertion and modification would
probably use a relevant Edit Page or Article Management hook. Rendering
would probably use a Page Rendering Hook.
Page rendering: You probably don't want to dump licensing data directly
onto a page. Instead, it is preferable to output a short licensing
statement like:
"Content on this page uses licensed content. For details, see licensing
data."
The phrase "licensing data" would be a link to a special page that
accesses the licensing table and displays the license data associated
with the page.
--
-- Dan Nessett