Bibliographical properties on Wikidata are listed here:
https://www.wikidata.org/wiki/Wikidata:Books_task_force
In the last months, we tried to creade a metadata scheme to "cover" the
main elements of book classification.
It is not MARC21, of course, but I think that pretty much simple Dublin
Core is covered.
At the beginning, I drafted a mapping between different Wikimedia project
templates (Wikipedia book Infobox, Commons' template Book, Wikisource's
Index metadata form)
https://docs.google.com/spreadsheet/ccc?key=0AlPNcNlN2oqvdFQyR2F5YmhrMWpXaU…
It is far from perfect, but it gives an idea of which things could be
missing.
I'd love too to collaborate with openlibrary, but at the beginning of our
IEG project, me and Micru contacted them, in the person of Karen Coyle
(User:Kcoyle),
a very famous and skilled metadata librarian who is somehow in charge of
the project now.
She told us that openlibrary is frozen, at the moment, and there is no
staff nor funds to get that going.
Openlibrary was previously funded but internet Archive.
If someone could build the tool you proposed, Luiz, that would be awesome,
but I'm not a technical person and I'm not able to understnd if that is
feasible or not.
If we have other feedbacks on that, we could propose it as a projects for
the next Google Summer of Code: that is a great way to getting technical
things done.
Aubrey
On Sun, Dec 8, 2013 at 5:04 AM, Luiz Augusto <lugusto(a)gmail.com> wrote:
>
> On Sat, Dec 7, 2013 at 12:47 PM, Thomas Douillard <
> thomas.douillard(a)gmail.com> wrote:
>>
>>
>> That's why I think we must do a lot more with such datas than just
>> importing them from openlibrary, as they are really important to Mediawiki
>> in general, and that the community as a whole is a powerful drinving force
>> for Bibliographical datas. I'm not against cooperating with openlibrary,
>> but we should seek deep cooperation and integration with them so both
>> projects can benefits from each others community.
>>
>
> +1 on this
>
> openlibrary.org have a limited set of fields.
>
> Moreover, simply importing data at some random time of some random records
> will not benefit neither openlibrary neither Wikimedia.
>
> You will first need to search if Wikidata don't have the needed
> information, search again for it in openlibrary, create the content in
> openlibrary, import the content into Wikidata, make the desired local
> changes and send back to openlibrary any local relevant changes.
>
> But I had an idea: a MediaWiki User Interface to openlibrary data
>
> openlibrary.org offers access to records in 3 ways:
>
> * read/write of individual records through API;
> * read of individual records through RDF and JSON;
> * bulk download of the entire dataset
>
> So i'ts possible to:
>
> 1) Import the bulk data;
> 2) Catch all changes from openlibrary.org in real time;
> 3) Allows that the synced data can be browsable and editable at any time
> on MediaWiki/Wikidata instances;
> 4) Sends back to openlibrary the changes, storing locally the data from
> custom fields in the MediaWiki instance (allowing further import at
> openlibrary instance if they creates the corresponding fields in their DB);
> 5) Sends back to openlibrary all new book records created on MediaWiki
> instances.
>
>
>
>
> _______________________________________________
> Wikidata-l mailing list
> Wikidata-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>
>
Due to weird reasons my message was sent only to wikidata-l. Re-sending to
wikisource-l. Sorry for any inconvenience
On Sun, Dec 8, 2013 at 2:04 AM, Luiz Augusto <lugusto(a)gmail.com> wrote:
>
> On Sat, Dec 7, 2013 at 12:47 PM, Thomas Douillard <
> thomas.douillard(a)gmail.com> wrote:
>>
>>
>> That's why I think we must do a lot more with such datas than just
>> importing them from openlibrary, as they are really important to Mediawiki
>> in general, and that the community as a whole is a powerful drinving force
>> for Bibliographical datas. I'm not against cooperating with openlibrary,
>> but we should seek deep cooperation and integration with them so both
>> projects can benefits from each others community.
>>
>
> +1 on this
>
> openlibrary.org have a limited set of fields.
>
> Moreover, simply importing data at some random time of some random records
> will not benefit neither openlibrary neither Wikimedia.
>
> You will first need to search if Wikidata don't have the needed
> information, search again for it in openlibrary, create the content in
> openlibrary, import the content into Wikidata, make the desired local
> changes and send back to openlibrary any local relevant changes.
>
> But I had an idea: a MediaWiki User Interface to openlibrary data
>
> openlibrary.org offers access to records in 3 ways:
>
> * read/write of individual records through API;
> * read of individual records through RDF and JSON;
> * bulk download of the entire dataset
>
> So i'ts possible to:
>
> 1) Import the bulk data;
> 2) Catch all changes from openlibrary.org in real time;
> 3) Allows that the synced data can be browsable and editable at any time
> on MediaWiki/Wikidata instances;
> 4) Sends back to openlibrary the changes, storing locally the data from
> custom fields in the MediaWiki instance (allowing further import at
> openlibrary instance if they creates the corresponding fields in their DB);
> 5) Sends back to openlibrary all new book records created on MediaWiki
> instances.
>
>
>
>
This is the simple script that I'm using to reduce to a comfortable size
edit textarea in nsPage, it "sniffs" too layout toggling:
function resizeBox () {if ((wgCanonicalNamespace=="Page" &&
(wgAction=="edit" || wgAction=="submit"))&&
$(".wikiEditor-ui-left").css("width")==$("#wpTextbox1").css("width"))
{$("#wpTextbox1").attr("rows","10")} else
{$("#wpTextbox1").attr("rows","31");}
}
$(document).ready(function () {
$("img[rel='toggle-layout']").attr("onclick","resizeBox()");
resizeBox();
}
);
Rough, but running. :-)
Alex
Denny Vrandečić, 07/12/2013 00:59:
> Thanks for reviving this thread, Luiz. I also wanted to ask whether we
> should be updating parts of DNB and similar data. Maybe not create new
> entries, but for those that we already have, add some of the available
> data and point to the DNB dataset?
Or maybe use openlibrary.org as a staging area for such data and fetch
it from there? I'm not sure Wikidata should "compete" with openlibrary,
it's a huge work and they already have an infrastructure for it;
Wikidata/Wikimedia could "just" let the users easily import the data
when it's needed. An obvious example is pre-filling of book/work
metadata on Wikipedia articles, Wikisource books, Commons files (and
associated Wikidata entries).
Nemo
Hi all,
I am from ml.wikisource.org and I am having a doubt regarding Mediawiki API
and PDF files. I want to know if I can use Pywikipedia to grab the text
layer of a pdf file (in the file namespace, obviously) . Is the mediawiki
API handling any such functionality? Thanks in advance.
Regards,
Balasankar C
http://balasankarc.in
There are reasons for editing and there are also reasons for not editing.
One big reason *for* editing is if at any time data from new sources are
being imported.
We in Librarianship/Information Sciences makes decisions on how the data
will be available to our users/customers. Eg an author name. There are many
ways to write the same name from the same individual. The same individual
can adopt in his life dozens of nicknames, change their last name if gets
married and so on. The rule choosen in a particular library can be the same
in more libraries, or even an entire different one (based on how the local
community of users from a library will search/wants the data), or even no
rule is choosen and the data is recorded "as is" it is registered in the
publication. Some libraries have additional records specially devoted to
the synonymies for the same name, some not.
Google Book Search simply imported data from many libraries without making
any attempt to standardize then, resulting in the large amount of
duplicates and bullshits founds in some searches (specially those whose the
imprints didn't standardizet themselves the data).
Some special kinds of data from the same work can also be stored in
differents sets of "fields" and "subfields" of MARC21 records across
different libraries, again because the user/client need of informations
about the works can vary from place to place (ie you get data duplication
in the same record if you simple merge records from libraries).
MARC21 specification have also an entire design that IMHO is impossible to
reflect in the current MediaWiki schema, even with Semanctic MediaWiki.
And sometimes some libraries tells that their data is stored on MARC21
fields, but are on USMARC ones (yep, there are many flavours of MARC as
there are many flavors of Ubuntu). Or it is *based* on MARC21 fields, with
dozens of local adaptions.
I've just finished an internship in a library with 45k of records that was
migrating data from REPIDISCA *based* fields (let's call it as a FreeBSD
flavour) to MARC21 *based* fields (in this comparision, an Ubuntu flavour;
and yep, *based*, with local adaptions, we needs those changes). The data
is migrated in an automated fashion, but still needs to be validated record
by record if the library wants those records in the MARC21 fields as it's.
What I'm saying is:
1) You can't simply import data from many sources without validations
expecting a good quality end product. You will get a "search engine"
quality data (tons of random informations that will make sense only with a
continuously developed set of algorithms maybe more time+resources
consuming than standartizing the data);
2) Data standardize is an epic work dozens of times more epic than writing
an comprehensive encyclopedia about all subjects on all languages.
Institutional support will be needed, and in more compreensive ways
embracing more than just releasing their data to play around it (ie, with
additional hands for standardization).
[[Paul Otlet]] (1868-1944) tried it in efforts that some argues he's the
concept designer of Internet and hypertext. With no success, what is very
unfortunate. Will the wikimedians gets any level of success on it?
[[:m:User:555]]
On Fri, Dec 6, 2013 at 9:59 PM, Denny Vrandečić <vrandecic(a)gmail.com> wrote:
> Thanks for reviving this thread, Luiz. I also wanted to ask whether we
> should be updating parts of DNB and similar data. Maybe not create new
> entries, but for those that we already have, add some of the available data
> and point to the DNB dataset?
>
>
> On Fri, Dec 6, 2013 at 3:24 PM, Luiz Augusto <lugusto(a)gmail.com> wrote:
>
>> Just found this thread while browsing my email archives (I'm/was inactive
>> on Wikimedia for at least 2 years)
>>
>> IMHO will be very helpfull if a central place hosting metadata from
>> digitized works will be created.
>>
>> In my past experience, I've found lots of PD-old books from languages
>> like french, spanish and english in repositories from Brazil and Portugal,
>> with UI mostly in portuguese (ie, with very low probabilities to get found
>> by volunteers from subdomains from those languages), for example.
>>
>> I particularly loves validating metadata more than proofreading books.
>> Perhaps a tool/place like this makes new ways to contribute to Wikisource
>> and helps on user retention (based on some wikipedians that gets fun making
>> good articles but loves also sometimes to simply make trivial changes on
>> their spare time)?
>>
>> I know that the thread was focused on general metadata from all kinds and
>> ages of books, but I had this idea while reading this
>>
>> [[:m:User:555]]
>>
>>
>> On Mon, Aug 26, 2013 at 10:42 AM, Thomas Douillard <
>> thomas.douillard(a)gmail.com> wrote:
>>
>>> I know, I started a discussion about porting the bot to WIkidata in
>>> scientific Journal Wikiproject. One answer I got : the bot owner had other
>>> things to do in his life than running the bot and was not around very often
>>> any more. Having everiyhing in Wikidata already will be a lot more reliable
>>> and lazier, no tool that works one day but not the other one, no effort to
>>> tell the newbies that they should go to another website, no significant
>>> problem.
>>>
>>> Maybe one opposition would be that the data would be vandalised easily,
>>> but maybe we should find a way to deal with imported sourced datas which
>>> have no real reason to be modified, just marked deprecated or updated by
>>> another import from the same source.
>>>
>>>
>>> 2013/8/26 David Cuenca <dacuetu(a)gmail.com>
>>>
>>>> If the problem is to automate bibliographic data importing, a solution
>>>> is what you propose, to import everything. Another one is to have an import
>>>> tool to automatically import the data for the item that needs it. In WP
>>>> they do that, there is a tool to import book/journal info by ISBN/doi. The
>>>> same can be done in WD.
>>>>
>>>> Micru
>>>>
>>>>
>>>> On Mon, Aug 26, 2013 at 9:23 AM, Thomas Douillard <
>>>> thomas.douillard(a)gmail.com> wrote:
>>>>
>>>>> If Wikidata has an ambition to be a really reliable database, we
>>>>> should do eveything we can to make it easy for users to use any source they
>>>>> want. In this perspective, if we got datas with guaranted high quality, it
>>>>> make it easy for Wikidatian to find and use these references for users.
>>>>> Entering a reference in the database seems to me a highly fastidious,
>>>>> boring, and easily automated task.
>>>>>
>>>>> With that in mind, any reference that the user will not have to enter
>>>>> by hand is something good, and import high quality sources datas should
>>>>> pass every Wikidata community barriers easily. If there is no problem for
>>>>> the software to handle that many information, I say we really have no
>>>>> reason not to do the imports.
>>>>>
>>>>> Tom
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Wikidata-l mailing list
>>>>> Wikidata-l(a)lists.wikimedia.org
>>>>> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Etiamsi omnes, ego non
>>>>
>>>> _______________________________________________
>>>> Wikidata-l mailing list
>>>> Wikidata-l(a)lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Wikidata-l mailing list
>>> Wikidata-l(a)lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>>>
>>>
>>
>> _______________________________________________
>> Wikidata-l mailing list
>> Wikidata-l(a)lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>>
>>
>
> _______________________________________________
> Wikidata-l mailing list
> Wikidata-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>
>
Just found this thread while browsing my email archives (I'm/was inactive
on Wikimedia for at least 2 years)
IMHO will be very helpfull if a central place hosting metadata from
digitized works will be created.
In my past experience, I've found lots of PD-old books from languages like
french, spanish and english in repositories from Brazil and Portugal, with
UI mostly in portuguese (ie, with very low probabilities to get found by
volunteers from subdomains from those languages), for example.
I particularly loves validating metadata more than proofreading books.
Perhaps a tool/place like this makes new ways to contribute to Wikisource
and helps on user retention (based on some wikipedians that gets fun making
good articles but loves also sometimes to simply make trivial changes on
their spare time)?
I know that the thread was focused on general metadata from all kinds and
ages of books, but I had this idea while reading this
[[:m:User:555]]
On Mon, Aug 26, 2013 at 10:42 AM, Thomas Douillard <
thomas.douillard(a)gmail.com> wrote:
> I know, I started a discussion about porting the bot to WIkidata in
> scientific Journal Wikiproject. One answer I got : the bot owner had other
> things to do in his life than running the bot and was not around very often
> any more. Having everiyhing in Wikidata already will be a lot more reliable
> and lazier, no tool that works one day but not the other one, no effort to
> tell the newbies that they should go to another website, no significant
> problem.
>
> Maybe one opposition would be that the data would be vandalised easily,
> but maybe we should find a way to deal with imported sourced datas which
> have no real reason to be modified, just marked deprecated or updated by
> another import from the same source.
>
>
> 2013/8/26 David Cuenca <dacuetu(a)gmail.com>
>
>> If the problem is to automate bibliographic data importing, a solution is
>> what you propose, to import everything. Another one is to have an import
>> tool to automatically import the data for the item that needs it. In WP
>> they do that, there is a tool to import book/journal info by ISBN/doi. The
>> same can be done in WD.
>>
>> Micru
>>
>>
>> On Mon, Aug 26, 2013 at 9:23 AM, Thomas Douillard <
>> thomas.douillard(a)gmail.com> wrote:
>>
>>> If Wikidata has an ambition to be a really reliable database, we should
>>> do eveything we can to make it easy for users to use any source they want.
>>> In this perspective, if we got datas with guaranted high quality, it make
>>> it easy for Wikidatian to find and use these references for users. Entering
>>> a reference in the database seems to me a highly fastidious, boring, and
>>> easily automated task.
>>>
>>> With that in mind, any reference that the user will not have to enter by
>>> hand is something good, and import high quality sources datas should pass
>>> every Wikidata community barriers easily. If there is no problem for the
>>> software to handle that many information, I say we really have no reason
>>> not to do the imports.
>>>
>>> Tom
>>>
>>>
>>> _______________________________________________
>>> Wikidata-l mailing list
>>> Wikidata-l(a)lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>>>
>>>
>>
>>
>> --
>> Etiamsi omnes, ego non
>>
>> _______________________________________________
>> Wikidata-l mailing list
>> Wikidata-l(a)lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>>
>>
>
> _______________________________________________
> Wikidata-l mailing list
> Wikidata-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>
>
Hi!
Thanks a lot for your help proposition.
I'm currently writing unit tests where it’s possible as part of the refactoring of the ProofreadPage extension that have been begun by a GSoC project last summer [1]. But as I’ve no special knowledge in this domain I’m not sure do do it well.
I would like also to write some parser tests for the tags managed by the extension. These parsers tests haven’t been written before because <pages> and <pagelist> tags often relies on the presence of multipages files and so requires specific setups in the parser tests runner. An easy way to fix this problem is maybe to load 'by default’ the test file introduced by [2] that would be useful also in core parser tests to test the page= parameter of image inclusion (like in [[File:test.djvu|page=3]]).
Thanks again,
Thomas
[1] https://www.mediawiki.org/wiki/Extension:Proofread_Page/GSoC
[2] https://gerrit.wikimedia.org/r/#/c/98258/
Le 6 déc. 2013 à 09:33, Federico Leva (Nemo) <nemowiki(a)gmail.com> a écrit :
-------- Messaggio originale --------
Oggetto: Re: [QA] [Wikisource-l] Issues with ProofreadPage
Data: Thu, 5 Dec 2013 10:22:30 -0700
Mittente: Chris McMahon <cmcmahon(a)wikimedia.org>
Rispondi-a: QA (software quality assurance) for Wikimedia projects.
<qa(a)lists.wikimedia.org>
A: QA (software quality assurance) for Wikimedia projects.
<qa(a)lists.wikimedia.org>
CC: discussion list for Wikisource, the free library
<wikisource-l(a)lists.wikimedia.org>
On Thu, Dec 5, 2013 at 8:24 AM, Federico Leva (Nemo) <nemowiki(a)gmail.com
<mailto:nemowiki@gmail.com>> wrote:
Andrea Zanni, 05/12/2013 15:09:
Thank you Thomas,
On Thu, Dec 5, 2013 at 2:56 PM, Thomas Tanon
<thomaspt(a)hotmail.fr <mailto:thomaspt@hotmail.fr>
<mailto:thomaspt@hotmail.fr <mailto:thomaspt@hotmail.fr>>> wrote:
Im going to work on automatized tests in the next weeks in
order to
avoid a so major number of bugs the next times.
Hello Thomas,
Can you say more about these tests? We may be able to help there.
-Chris
---------- Mensagem encaminhada ----------
De: "Stephen LaPorte" <slaporte(a)wikimedia.org>
Data: 05/12/2013 20:57
Assunto: [Advocacy Advisors] Joining a letter on copyright term in the TPP?
Para: "Advocacy Advisory Group for WMF LCA" <
advocacy_advisors(a)lists.wikimedia.org>
Hello advocacy advisers,
Current drafts of the Trans Pacific Partnership[0], a new trade treaty
currently being negotiated, contains language that would require countries
that sign the treaty to extend the length of the minimum copyright term to
life of the author plus 70 years. Global treaties currently require only
life + 50 years, so the TPP would represent a widespread extension of
copyright terms by 20 years, and make it hard to roll back the copyright
term in countries that already have life + 70.
The letter below[1], addressed to the TPP negotiators, directly addresses
this issue. We’re considering signing, because the letter is specifically
targeted at an issue (copyright term) that is core to our encyclopedic
mission, and affects (at present) 14 different countries.
Does the advisory group have any thoughts about joining the letter? We
would like to let KEI know if we will join the letter before December 7,
2013.
[0] https://en.wikipedia.org/wiki/Trans-Pacific_Partnership ;
http://tppinfo.org/
(We briefly mentioned TPP in the Wikilegal fact sheet on ACTA in January
2012. If anyone is interested in updating that document, feel free to get
in touch! See: https://meta.wikimedia.org/wiki/Wikilegal/ACTA)
[1] http://keionline.org/nolifeplus70intpp
--
The letter was prepared by Knowledge Ecology International, and will be
joined by like-minded organizations including the Open Knowledge
Foundation, Electronic Frontier Foundation, and Free Software Foundation.
Full copy of the letter:
*Dear TPP negotiators,*
*In a December 7-10 meeting in Singapore you will be asked to endorse a
binding obligation to grant copyright protection for 70 years after the
death of an author. We urge you to reject the life+ 70 year term for
copyright.*
*There is no benefit to society of extending copyright beyond the 50 years
mandated by the WTO. While some TPP countries, like the USA, Mexico, Peru,
Chile or Australia, already have life+ 70 (or longer) copyright terms,
there is growing recognition that such terms were a mistake, and should be
shortened, or modified by requiring formalities for the extended periods.*
*The primary harm from the life+ 70 copyright term is the loss of access to
countless books, newspapers, pamphlets, photographs, films, sound
recordings and other works that are “owned” but largely not commercialized,
forgotten, and lost. The extended terms are also costly to consumers and
performers, while benefiting persons and corporate owners that had nothing
to do with the creation of the work.*
*Life+70 is a mistake, and it will be an embarrassment to enshrine this
mistake into the largest regional trade agreement ever negotiated.*
--
Stephen LaPorte
Legal Counsel
Wikimedia Foundation
*This message might have confidential or legally privileged information in
it. If you have received this message by accident, please delete it and let
us know about the mistake. For legal reasons, I may only serve as an
attorney for the Wikimedia Foundation. This means I may not give legal
advice to or serve as a lawyer for community members, volunteers, or staff
members in their personal capacity.*
_______________________________________________
Advocacy_Advisors mailing list
Advocacy_Advisors(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/advocacy_advisors