Hi all,
We used the revision history metadata from the official wmf dumps site in a forthcoming paper on coeditor network structure (linked below). We're publishing our code and (hopefully) the derived networks on the UMich DeepBlue archival server.
I can find information on the copyright/terms-of-use for text and image data, but nothing explicit about the metadata. Anyone know if that exists?
Thanks!
E.L. Platt & D.M. Romero. "Network Structure, Efficiency, and Performance in WikiProjects." ICWSM 2018. https://arxiv.org/abs/1804.03763
We're using the pages-meta-history XML files (user ids, timestamps, article ids, etc). Everything I can find on the WMF site refers to "textual content" which is a bit unclear about metadata. Our archival librarians would be a lot more comfortable if I could point them to something very explicit about terms of use for metadata and derivatives.
On Wed, May 16, 2018 at 12:15 PM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Edward L Platt, 16/05/2018 18:57:
I can find information on the copyright/terms-of-use for text and image data, but nothing explicit about the metadata.
Which metadata are you talking about? The copyright license applies to the whole XML text.
Federico
Edward L Platt, 16/05/2018 19:23:
We're using the pages-meta-history XML files (user ids, timestamps, article ids, etc). Everything I can find on the WMF site refers to "textual content" which is a bit unclear about metadata.
The legal page has been added only recently and it's probably unclear, but "textual content" just means everything that is not multimedia files. The word is used in the sense of the terms of use: https://meta.wikimedia.org/wiki/Terms_of_use#7d
Our archival librarians would be a lot more comfortable if I could point them to something very explicit about terms of use for metadata
From our point of view, that's hardly even metadata. It's just MediaWiki-internal material, which tells little if anything about the data. It's also below the threshold of originality and produced automatically by a software, therefore clearly copyright ineligible.
If this is about problems in EU, we can add a CC-0 note to waive any hypothetical sui generis database rights on MediaWiki's internal identifiers. But it's useless anyway, because those are generated, stored and published in USA.
and derivatives.
What derivatives?
Federico
The derivatives in this case are coeditor networks for each WikiProject, based on which editors have edited the same articles.
On Wed, May 16, 2018 at 2:06 PM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Edward L Platt, 16/05/2018 19:23:
We're using the pages-meta-history XML files (user ids, timestamps, article ids, etc). Everything I can find on the WMF site refers to "textual content" which is a bit unclear about metadata.
The legal page has been added only recently and it's probably unclear, but "textual content" just means everything that is not multimedia files. The word is used in the sense of the terms of use: https://meta.wikimedia.org/wiki/Terms_of_use#7d
Our archival librarians would be a lot more comfortable if I could point
them to something very explicit about terms of use for metadata
From our point of view, that's hardly even metadata. It's just MediaWiki-internal material, which tells little if anything about the data. It's also below the threshold of originality and produced automatically by a software, therefore clearly copyright ineligible.
If this is about problems in EU, we can add a CC-0 note to waive any hypothetical sui generis database rights on MediaWiki's internal identifiers. But it's useless anyway, because those are generated, stored and published in USA.
and derivatives.
What derivatives?
Federico
Edward L Platt, 16/05/2018 23:16:
The derivatives in this case are coeditor networks for each WikiProject, based on which editors have edited the same articles.
Is this something you produce yourself? I cannot find such a dataset in https://dumps.wikimedia.org/other/.
Are you in EU?
Federico
We're in the US. We created the dataset ourselves, based on the info in the pages-meta-history XML files (combined with the WikiProject info from the articles themselves).
On Wed, May 16, 2018 at 5:17 PM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Edward L Platt, 16/05/2018 23:16:
The derivatives in this case are coeditor networks for each WikiProject, based on which editors have edited the same articles.
Is this something you produce yourself? I cannot find such a dataset in < https://dumps.wikimedia.org/other/%3E.
Are you in EU?
Federico
wiki-research-l@lists.wikimedia.org