Technical means for tagging content (was: [Foundation-l] Statement on appropriate educational content)

List overview All Threads
Download

newer

older

How to use wikipedia's images on...

MediaWiki GitHub mirror now running

Chad

9 May 2010 9 May '10

1:55 p.m.

On Sun, May 9, 2010 at 6:37 AM, David Gerard <dgerard(a)gmail.com> wrote:

...

On 9 May 2010 06:09, K. Peachey <p858snake(a)yahoo.com.au> wrote:

Bugzilla 982[1] MediaWiki should support ICRA's PICS content labeling. From my understanding without reading much about it, It [ICRA] is ment to be a "international" or at least a standard for these things which most people seem to abide by (i see it splashed around on a lot of education sites that they are compliant with that standard).

This came up in discussion a while ago on WHATWG - PICS is actually dead. Even its creators have given up on it. No-one implements it. As a standard, it's got no backing. So we'd be the first significant organisation to actually take it seriously, and would be reviving it. - d.

Forwarding this to wikitech-l solely for the technical discussion. I was unaware that it had died. Do you have any links from ICRA on the issue? I know Derk-Jan has been looking at resurrecting that bug, and I'd be interested to know what the actual state of the standard is. If nobody uses it and they've declared it dead, then we don't need to bother implementing it. Do we know if there is some standard that is used widely, or does every web filtering package reinvent the wheel? -Chad

Show replies by thread

K. Peachey

9 May 9 May

3:36 p.m.

Some people have pointed out the fact we currently already have a categorization system which is in use. This system doesn't exactly work well for end user/third party filtering. (Note I haven't played with the API in regards to this) Currently our system shows the contents of what is in it which is fine for when your looking for something in it, but to filter something you need to be easily get the file(name/s) of the contents in it en-mass for the content so they can be added to the relevant systems. For example, Showing real content, There is a difference in adding any of the three to a filter list: * http://en.wikipedia.org/wiki/Category:Cats (The Category) * http://en.wikipedia.org/wiki/File:ParliamentaryCats.jpg (The Description Page) * http://upload.wikimedia.org/wikipedia/en/e/ec/ParliamentaryCats.jpg (The Actual File) The last one (and the subsequent thumb files) is what we need to be easily identifiable [The Filenames and Full Paths] so they can be dealt with as required by the filters, For example a list that is automatically produced per category so they can be imported (Is there any standards for importable lists into filters??). -Peachey

Ævar Arnfjörð Bjarmason

3:50 p.m.

On Sun, May 9, 2010 at 12:36, K. Peachey <p858snake(a)yahoo.com.au> wrote:

...

For example, Showing real content, There is a difference in adding any of the three to a filter list: * http://en.wikipedia.org/wiki/Category:Cats (The Category) * http://en.wikipedia.org/wiki/File:ParliamentaryCats.jpg (The Description Page) * http://upload.wikimedia.org/wikipedia/en/e/ec/ParliamentaryCats.jpg (The Actual File) The last one (and the subsequent thumb files) is what we need to be easily identifiable [The Filenames and Full Paths] so they can be dealt with as required by the filters, For example a list that is automatically produced per category so they can be imported (Is there any standards for importable lists into filters??).

It's pretty easy to do arbitrary content tagging (and filtering now). You just add a template or external link to the page. E.g. {{PG-13}}. Then all some third party has to do is to download templatelinks.sql.gz (or externallinks.sql.gz) in addition to the image dump. You just have to start getting people to tag things consistently. The good thing is that you can start now without any additional software support.

K. Peachey

4:08 p.m.

On Sun, May 9, 2010 at 10:50 PM, Ævar Arnfjörð Bjarmason <avarab(a)gmail.com> wrote:

...

The dumps would have to be done fairly regularly if not daily (would that even be possible, especially on the bigger sites) for them to be really use for outside [filtering] companies to use them, And in a easy format, If it would take them to long to work it out, they would probably just block everything instead of wasting time on it.

K. Peachey

4:09 p.m.

On Sun, May 9, 2010 at 11:08 PM, K. Peachey <p858snake(a)yahoo.com.au> wrote:

...

On Sun, May 9, 2010 at 10:50 PM, Ævar Arnfjörð Bjarmason <avarab(a)gmail.com> wrote: > It's pretty easy to do arbitrary content tagging (and filtering now). > You just add a template or external link to the page. E.g. {{PG-13}}. > > Then all some third party has to do is to download > templatelinks.sql.gz (or externallinks.sql.gz) in addition to the > image dump. > > You just have to start getting people to tag things consistently. The > good thing is that you can start now without any additional software > support.

Also, Do we even have image dumps anymore? -Peachey

Daniel Schwen

4:46 p.m.

...

The dumps would have to be done fairly regularly if not daily (would that even be possible, especially on the bigger sites) for them to be

If it is only about obtaining a list of tagged images, that can even be done on the toolserver with a simple SQL query. Should not take more than a few minutes for a large wikipedia. So, yes, daily updates would be no problem.

Marco Schuster

10 May 10 May

3:59 a.m.

On Sun, May 9, 2010 at 3:09 PM, K. Peachey <p858snake(a)yahoo.com.au> wrote:

...

On Sun, May 9, 2010 at 11:08 PM, K. Peachey <p858snake(a)yahoo.com.au> wrote:

Also, Do we even have image dumps anymore?

AFAIR backups exist, but I don't know if there are any public dumps. I'd imagine they're simply too big to maintain and archive... Marco -- VMSoft GbR Nabburger Str. 15 81737 München Geschäftsführer: Marco Schuster, Volker Hemmert http://vmsoft-gbr.de

Tim Starling

5:26 a.m.

New subject: Technical means for tagging content (was: [Foundation-l] Statement on appropriate educational content)

On 09/05/10 20:55, Chad wrote:

...

PICS was a W3C proposed standard for tagging content. It is obsolete. It has been replaced by RDF Content Labels, a.k.a. POWDER: http://www.w3.org/2004/12/q/doc/content-labels-schema.htm Both PICS and RDF Content Labels are technical schemes with no moral values attached. ICRA provides a set of labels (the "ICRA vocabulary") relevant to prevailing Christian morality. It can be used with either PICS or RDF. Companies that sell filtering software tend to be coy about how they classify pages, since content analysis heuristics certainly play a big role. ICRA gives links to two content filters that support their tags. One is a simple browser plugin, the other is a large and complex content classification system suitable for filtering internet access for schools, businesses or ISPs. http://www.profiltechnology.com/en/index.aspx Profil looks big enough that if it does indeed support ICRA/RDF, then I think that's a good enough reason to write an extension. Note that there are lots of other applications for RDF Content Labels. In particular, accessibility and copyright/license tagging have been promoted. I think we could have some generic support for RDF in the core, with the ICRA vocabulary and UI in an extension. -- Tim Starling

5098

days inactive

5099

days old

wikitech-l@lists.wikimedia.org

Manage subscription

7 comments

6 participants

tags (0)

participants (6)

Chad
Daniel Schwen
K. Peachey
Marco Schuster
Tim Starling
Ævar Arnfjörð Bjarmason