[Foundation-l] Project Proposal: Wikicat

List overview All Threads
Download

newer

older

Re: [Foundation-l] Latest board...

[Foundation-l] Project Proposal:...

Jonathan Leybovich

28 Jul 2006 28 Jul '06

1:33 a.m.

All-

I hereby propose for your consideration Wikicat, a project to create an open, bibliographic catalog. The purpose of Wikicat is to both lay the groundwork for a scholarly apparatus to be used within Wikipedia as well as create a unique and valuable information resource in its own right. In particular, Wikicat will:

* facilitate the process of citation by automatically fetching bibliographic data based upon unique keys such as ISBN, ISSN, and LCCN * allow users to more easily navigate between information resources by grouping them in a functionally significant manner (in particular, according to the principles of [[w:FRBR]]) so that, for example, different editions, translations, etc. are all joined together * apply Wikipedia's collaborative content creation model to bibliographic data, resulting in a catalog of unprecedented detail

In terms of implementation, Wikicat will be defined like any other [[m:Wikidata]] dataset and will integrate with other datasets such as WiktionaryZ to share common entities and perhaps someday support something along the lines of a Semantic Mediawiki. As Wikidata is currently not code complete, though, Wikicat will be deployed in stages, during the first of which it will exist as a read-only database that populates itself on an as-needed/"as-cited" basis by importing data from the open catalog servers of such institutions as the Library of Congress, the University of California library system, the U.S. National Library of Medicine, etc.

Details about the project, in increasing technical detail, are available on the following pages:

http://meta.wikimedia.org/wiki/Proposals_for_new_projects#Wikicat http://meta.wikimedia.org/wiki/Wikicat http://meta.wikimedia.org/wiki/Wikicat_Technical_Design

Coding of the first stage of the project is nearly complete and a list of its operational requirements will soon be forthcoming. Here is a demo of Wikicat integration with the Cite/<ref> extension:

http://meta.wikimedia.org/wiki/Image:Wikicat_Cite_screenshoot.png

Thank you for your time and I look forward to your comments.

Jonathan Leybovich

__________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com

Show replies by date

Oldak Quill

28 Jul 28 Jul

5:10 a.m.

On 28/07/06, Jonathan Leybovich jleybov@yahoo.com wrote:

...

All-

I hereby propose for your consideration Wikicat, a project to create an open, bibliographic catalog. The purpose of Wikicat is to both lay the groundwork for a scholarly apparatus to be used within Wikipedia as well as create a unique and valuable information resource in its own right. In particular, Wikicat will:

facilitate the process of citation by automatically

fetching bibliographic data based upon unique keys such as ISBN, ISSN, and LCCN

allow users to more easily navigate between

information resources by grouping them in a functionally significant manner (in particular, according to the principles of [[w:FRBR]]) so that, for example, different editions, translations, etc. are all joined together

apply Wikipedia's collaborative content creation

model to bibliographic data, resulting in a catalog of unprecedented detail

In terms of implementation, Wikicat will be defined like any other [[m:Wikidata]] dataset and will integrate with other datasets such as WiktionaryZ to share common entities and perhaps someday support something along the lines of a Semantic Mediawiki. As Wikidata is currently not code complete, though, Wikicat will be deployed in stages, during the first of which it will exist as a read-only database that populates itself on an as-needed/"as-cited" basis by importing data from the open catalog servers of such institutions as the Library of Congress, the University of California library system, the U.S. National Library of Medicine, etc.

Details about the project, in increasing technical detail, are available on the following pages:

http://meta.wikimedia.org/wiki/Proposals_for_new_projects#Wikicat http://meta.wikimedia.org/wiki/Wikicat http://meta.wikimedia.org/wiki/Wikicat_Technical_Design

Coding of the first stage of the project is nearly complete and a list of its operational requirements will soon be forthcoming. Here is a demo of Wikicat integration with the Cite/<ref> extension:

http://meta.wikimedia.org/wiki/Image:Wikicat_Cite_screenshoot.png

Thank you for your time and I look forward to your comments.

I'm normally against the creation of new projects, but this sounds like a pretty good idea. Presumably, it'll be a little like Commons but instead of images, would handle citations. I suppose other Wikimedia projects will make use of this, do you hope to allow non-WikiMedia projects to use it?

The project will aim to catalogue books, news, journals, what else? Film?

How will different referencing styles be handled?

-- Oldak Quill (oldakquill@gmail.com)

GerardM

10:34 a.m.

Hoi, The project that you propose has a very large overlap with the WikiAuthors project. This project is as it is described on Meta about among other things the disambiguation of Pubmet articles. This project is extremely likely to be realised. The functionality that you describe can and will by and large be modelled using Wikidata technology.

As this project does use the Wikidata technology, it would make very much sense to collaborate and share our efforts and make one big and beautiful project.

Please let us discuss how we can / will collaborate.. for your information WiktionaryZ's codebase is updated daily. This means for instance that today people will have definitions shown in the language of their user interface. If this definition is not there, English will be shown. When there is no English, any definition will be shown.

Thanks, GerardM

On 7/28/06, Jonathan Leybovich jleybov@yahoo.com wrote:

...

All-

I hereby propose for your consideration Wikicat, a project to create an open, bibliographic catalog. The purpose of Wikicat is to both lay the groundwork for a scholarly apparatus to be used within Wikipedia as well as create a unique and valuable information resource in its own right. In particular, Wikicat will:

facilitate the process of citation by automatically

fetching bibliographic data based upon unique keys such as ISBN, ISSN, and LCCN

allow users to more easily navigate between

information resources by grouping them in a functionally significant manner (in particular, according to the principles of [[w:FRBR]]) so that, for example, different editions, translations, etc. are all joined together

apply Wikipedia's collaborative content creation

model to bibliographic data, resulting in a catalog of unprecedented detail

In terms of implementation, Wikicat will be defined like any other [[m:Wikidata]] dataset and will integrate with other datasets such as WiktionaryZ to share common entities and perhaps someday support something along the lines of a Semantic Mediawiki. As Wikidata is currently not code complete, though, Wikicat will be deployed in stages, during the first of which it will exist as a read-only database that populates itself on an as-needed/"as-cited" basis by importing data from the open catalog servers of such institutions as the Library of Congress, the University of California library system, the U.S. National Library of Medicine, etc.

Details about the project, in increasing technical detail, are available on the following pages:

http://meta.wikimedia.org/wiki/Proposals_for_new_projects#Wikicat http://meta.wikimedia.org/wiki/Wikicat http://meta.wikimedia.org/wiki/Wikicat_Technical_Design

Coding of the first stage of the project is nearly complete and a list of its operational requirements will soon be forthcoming. Here is a demo of Wikicat integration with the Cite/<ref> extension:

http://meta.wikimedia.org/wiki/Image:Wikicat_Cite_screenshoot.png

Thank you for your time and I look forward to your comments.

Jonathan Leybovich

Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com _______________________________________________ foundation-l mailing list foundation-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/foundation-l

Erik Moeller

12:20 p.m.

WiktionaryZ currently supports multilingual free-text attributes and will support other data type attributes in the future. The data you can associate with a WiktionaryZ entitity will likely depend on what class that entity belongs to. For instance, if an entity is of type person, you could associate a birthdate, death date, place of birth, place of death, first name/last name (though that might be handled on another level), etc.

If the entity belongs to the class "publication", you could associate references to specific edition; if it belongs to class "author", you could associate publications, and so on.

In this generic model, a reference is just one instance of an entity whose associated data you might want to display in a Wikipedia article. So we would probably come up with a special type of template that can fetch data from WiktionaryZ and insert it into a page layout. Then you could, for instance, use

<<person:Jimmy Wales>>

to display the person data about Jimbo, or

<country:Germany>

to make a nice country infobox. Then again, you could also do

<<ref:The Origin of Species|author=Charles Darwin|class=book>>

to show _any_ book edition of Darwin's work, or

<<ref:The Origin of Species|ISBN=whatever>>

to cite a specific edition. If the title is not ambiguous, you might even only have to do something like

<<ref:The Cosmic and the Comic: Einstein's Scientific Spirituality>>

and all the properties would be derived automatically from that. In fact, the WiktionaryZ model very much matches the work/expression distinction, though we use the terminology the other way around: an expression refers to _any_ possible meaning of a string of characters, whereas each meaning ("work" in this context) has its own defined meaning ID and can also be disambiguated by its relations and other associated data (for publications: author, year, various codes, etc.).

One advantage, in my view, is that we do not require people to pass around codes and numbers unless they really need to and want to, and even then, we retain the expression (work title) in the wiki source text, making it easy to see what a particular reference is about.

Erik

Jeff V. Merkey

12:59 p.m.

Erik Moeller wrote:

...

WiktionaryZ currently supports multilingual free-text attributes and will support other data type attributes in the future. The data you can associate with a WiktionaryZ entitity will likely depend on what class that entity belongs to. For instance, if an entity is of type person, you could associate a birthdate, death date, place of birth, place of death, first name/last name (though that might be handled on another level), etc.

If the entity belongs to the class "publication", you could associate references to specific edition; if it belongs to class "author", you could associate publications, and so on.

In this generic model, a reference is just one instance of an entity whose associated data you might want to display in a Wikipedia article. So we would probably come up with a special type of template that can fetch data from WiktionaryZ and insert it into a page layout. Then you could, for instance, use

<<person:Jimmy Wales>>

to display the person data about Jimbo, or

<country:Germany>

to make a nice country infobox. Then again, you could also do

<<ref:The Origin of Species|author=Charles Darwin|class=book>>

to show _any_ book edition of Darwin's work, or

<<ref:The Origin of Species|ISBN=whatever>>

to cite a specific edition. If the title is not ambiguous, you might even only have to do something like

<<ref:The Cosmic and the Comic: Einstein's Scientific Spirituality>>

and all the properties would be derived automatically from that. In fact, the WiktionaryZ model very much matches the work/expression distinction, though we use the terminology the other way around: an expression refers to _any_ possible meaning of a string of characters, whereas each meaning ("work" in this context) has its own defined meaning ID and can also be disambiguated by its relations and other associated data (for publications: author, year, various codes, etc.).

One advantage, in my view, is that we do not require people to pass around codes and numbers unless they really need to and want to, and even then, we retain the expression (work title) in the wiki source text, making it easy to see what a particular reference is about.

Erik _______________________________________________ foundation-l mailing list foundation-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/foundation-l

Erik,

Anything you can do to reduce the dependencies on extensions is a great move. You should incorporate all of them you can into the main mediawiki tree. You need to setup the exportDump.php program to output dynamically generated image names into the actual names as well.

www.wikipedia.org/wiki/Chess

is a great example of HOW NOT to code a template for image generation. It makes it a lot easier to keep images synced up between distributed wikipedia mirrors. For now, I have modified your php code to create a collision listing of missing image names when the page is converted into HTML for the first time so the wikix tools can grab and input the images into the database.

Jeff

Ray Saintonge

29 Jul 29 Jul

11:25 p.m.

Jonathan Leybovich wrote:

...

All-

I hereby propose for your consideration Wikicat, a project to create an open, bibliographic catalog. The purpose of Wikicat is to both lay the groundwork for a scholarly apparatus to be used within Wikipedia as well as create a unique and valuable information resource in its own right. In particular, Wikicat will:

facilitate the process of citation by automatically

fetching bibliographic data based upon unique keys such as ISBN, ISSN, and LCCN

allow users to more easily navigate between

information resources by grouping them in a functionally significant manner (in particular, according to the principles of [[w:FRBR]]) so that, for example, different editions, translations, etc. are all joined together

apply Wikipedia's collaborative content creation

model to bibliographic data, resulting in a catalog of unprecedented detail

In terms of implementation, Wikicat will be defined like any other [[m:Wikidata]] dataset and will integrate with other datasets such as WiktionaryZ to share common entities and perhaps someday support something along the lines of a Semantic Mediawiki. As Wikidata is currently not code complete, though, Wikicat will be deployed in stages, during the first of which it will exist as a read-only database that populates itself on an as-needed/"as-cited" basis by importing data from the open catalog servers of such institutions as the Library of Congress, the University of California library system, the U.S. National Library of Medicine, etc.

Details about the project, in increasing technical detail, are available on the following pages:

http://meta.wikimedia.org/wiki/Proposals_for_new_projects#Wikicat http://meta.wikimedia.org/wiki/Wikicat http://meta.wikimedia.org/wiki/Wikicat_Technical_Design

Coding of the first stage of the project is nearly complete and a list of its operational requirements will soon be forthcoming. Here is a demo of Wikicat integration with the Cite/<ref> extension:

http://meta.wikimedia.org/wiki/Image:Wikicat_Cite_screenshoot.png

Thank you for your time and I look forward to your comments.

While I have deep sympathy for the intentions of this proposals, I also find that the kind of theoretical discussions that are linked to the proposal offer very little encouragement to the average contributor.

Citations and verifiability are absolutely essential to the credibility of Wikipedia and its sister projects. Nevertheless, a person undertaking to substantiate his contributions should not need a professional librarianship background to do so. Any manner of clearly identifying the source should be acceptable. If someone else considers it important to bring the format of citations up to modern library standards he should feel to do that without blame being attributable to the original contributor.

There is also a need to begin referencing the material that is relatively easy to access on line or in other relatively inexpensive sources of public domain material, like CDs sold for $5.00 each that can each easily contain 100 books or more. In the last few years this material has been produced at a phenomenal rate. These are available in image, ASCII plain Jane or more scholarly annotatable formats. We could begin by including our own Wikisource material in the catalogue.

Erik Moeller

11:53 p.m.

On 7/30/06, Ray Saintonge saintonge@telus.net wrote:

...

Citations and verifiability are absolutely essential to the credibility of Wikipedia and its sister projects. Nevertheless, a person undertaking to substantiate his contributions should not need a professional librarianship background to do so. Any manner of clearly identifying the source should be acceptable.

Absolutely. However, we should also make the tools of professional referencing as easy to use as possible. You're right on that freely available (not necessarily freely licensed) content will be the first to be referenced. Thus, it is likely that Wikimedia will become both a primary beneficiary and driving force of the open access movement.

What is saddening to me is that even better referencing tools and systematic source checking processes will likely not be sufficient to deal adequately with the vast amounts of knowledge that is _not_ free or not even digital. Indeed, already today, I've seen quite a lot of cases where Wikipedians have reacted with intense frustration to the citation of sources that they could not verify simply by following a link.

One of my great hopes is that a broad international coalition of NGOs will eventually emerge to call for harmonization of copyright terms to a reasonable length. Perhaps Wikimedia could be part of such a coalition. If I look at the fantastic work Project Gutenberg is doing on even the most obscure publications, I cannot begin to imagine the profound effects on our culture it would have if copyright would last, say, 14 years, with the option to renew for another 14: http://creativecommons.org/projects/founderscopyright/

Erik

Ray Saintonge

31 Jul 31 Jul

1:20 a.m.

Erik Moeller wrote:

...

On 7/30/06, Ray Saintonge saintonge@telus.net wrote:

...
Citations and verifiability are absolutely essential to the credibility of Wikipedia and its sister projects. Nevertheless, a person undertaking to substantiate his contributions should not need a professional librarianship background to do so. Any manner of clearly identifying the source should be acceptable.

Absolutely. However, we should also make the tools of professional referencing as easy to use as possible. You're right on that freely available (not necessarily freely licensed) content will be the first to be referenced. Thus, it is likely that Wikimedia will become both a primary beneficiary and driving force of the open access movement.

I can be patient.

...

What is saddening to me is that even better referencing tools and systematic source checking processes will likely not be sufficient to deal adequately with the vast amounts of knowledge that is _not_ free or not even digital. Indeed, already today, I've seen quite a lot of cases where Wikipedians have reacted with intense frustration to the citation of sources that they could not verify simply by following a link.

That's not just sad; it's scary. It's on a par with saying, "If it's on the internet it must be true." It reflects a series of tendencies in the developed world with profound societal effects. When the most important factor for gaining knowledge is convenience it puts us on track for a Fahrenheit-451 kind of world. And I'm sure there is a certain segment of society that will be quite happy to encourage the people to take their new form of opium.

...

One of my great hopes is that a broad international coalition of NGOs will eventually emerge to call for harmonization of copyright terms to a reasonable length. Perhaps Wikimedia could be part of such a coalition. If I look at the fantastic work Project Gutenberg is doing on even the most obscure publications, I cannot begin to imagine the profound effects on our culture it would have if copyright would last, say, 14 years, with the option to renew for another 14: http://creativecommons.org/projects/founderscopyright/

Reducing the terms of copyright back to a reasonable level will be an uphill fight, and the way some of our collegues bend over to prevent the least suggestion of a copyright violation and stay law abiding does not give me a lot of hope.

6709

Age (days ago)

6712

Last active (days ago)

wikimedia-l@lists.wikimedia.org

7 comments

6 participants

tags (0)

participants (6)

Erik Moeller
GerardM
Jeff V. Merkey
Jonathan Leybovich
Oldak Quill
Ray Saintonge