Wikisource-l December 2013

wikisource-l@lists.wikimedia.org

17 participants
24 discussions

Re: [Wikisource-l] [Wikidata-l] DNB 11M bibliographic records as CC0
by Andrea Zanni 09 Dec '13

09 Dec '13

Bibliographical properties on Wikidata are listed here: https://www.wikidata.org/wiki/Wikidata:Books_task_force In the last months, we tried to creade a metadata scheme to "cover" the main elements of book classification. It is not MARC21, of course, but I think that pretty much simple Dublin Core is covered. At the beginning, I drafted a mapping between different Wikimedia project templates (Wikipedia book Infobox, Commons' template Book, Wikisource's Index metadata form) https://docs.google.com/spreadsheet/ccc?key=0AlPNcNlN2oqvdFQyR2F5YmhrMWpXaU… It is far from perfect, but it gives an idea of which things could be missing. I'd love too to collaborate with openlibrary, but at the beginning of our IEG project, me and Micru contacted them, in the person of Karen Coyle (User:Kcoyle), a very famous and skilled metadata librarian who is somehow in charge of the project now. She told us that openlibrary is frozen, at the moment, and there is no staff nor funds to get that going. Openlibrary was previously funded but internet Archive. If someone could build the tool you proposed, Luiz, that would be awesome, but I'm not a technical person and I'm not able to understnd if that is feasible or not. If we have other feedbacks on that, we could propose it as a projects for the next Google Summer of Code: that is a great way to getting technical things done. Aubrey On Sun, Dec 8, 2013 at 5:04 AM, Luiz Augusto <lugusto(a)gmail.com> wrote: > > On Sat, Dec 7, 2013 at 12:47 PM, Thomas Douillard < > thomas.douillard(a)gmail.com> wrote: >> >> >> That's why I think we must do a lot more with such datas than just >> importing them from openlibrary, as they are really important to Mediawiki >> in general, and that the community as a whole is a powerful drinving force >> for Bibliographical datas. I'm not against cooperating with openlibrary, >> but we should seek deep cooperation and integration with them so both >> projects can benefits from each others community. >> > > +1 on this > > openlibrary.org have a limited set of fields. > > Moreover, simply importing data at some random time of some random records > will not benefit neither openlibrary neither Wikimedia. > > You will first need to search if Wikidata don't have the needed > information, search again for it in openlibrary, create the content in > openlibrary, import the content into Wikidata, make the desired local > changes and send back to openlibrary any local relevant changes. > > But I had an idea: a MediaWiki User Interface to openlibrary data > > openlibrary.org offers access to records in 3 ways: > > * read/write of individual records through API; > * read of individual records through RDF and JSON; > * bulk download of the entire dataset > > So i'ts possible to: > > 1) Import the bulk data; > 2) Catch all changes from openlibrary.org in real time; > 3) Allows that the synced data can be browsable and editable at any time > on MediaWiki/Wikidata instances; > 4) Sends back to openlibrary the changes, storing locally the data from > custom fields in the MediaWiki instance (allowing further import at > openlibrary instance if they creates the corresponding fields in their DB); > 5) Sends back to openlibrary all new book records created on MediaWiki > instances. > > > > > _______________________________________________ > Wikidata-l mailing list > Wikidata-l(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata-l > >

2 1

Re: [Wikisource-l] [Wikidata-l] DNB 11M bibliographic records as CC0
by Luiz Augusto 08 Dec '13

08 Dec '13

Due to weird reasons my message was sent only to wikidata-l. Re-sending to wikisource-l. Sorry for any inconvenience On Sun, Dec 8, 2013 at 2:04 AM, Luiz Augusto <lugusto(a)gmail.com> wrote: > > On Sat, Dec 7, 2013 at 12:47 PM, Thomas Douillard < > thomas.douillard(a)gmail.com> wrote: >> >> >> That's why I think we must do a lot more with such datas than just >> importing them from openlibrary, as they are really important to Mediawiki >> in general, and that the community as a whole is a powerful drinving force >> for Bibliographical datas. I'm not against cooperating with openlibrary, >> but we should seek deep cooperation and integration with them so both >> projects can benefits from each others community. >> > > +1 on this > > openlibrary.org have a limited set of fields. > > Moreover, simply importing data at some random time of some random records > will not benefit neither openlibrary neither Wikimedia. > > You will first need to search if Wikidata don't have the needed > information, search again for it in openlibrary, create the content in > openlibrary, import the content into Wikidata, make the desired local > changes and send back to openlibrary any local relevant changes. > > But I had an idea: a MediaWiki User Interface to openlibrary data > > openlibrary.org offers access to records in 3 ways: > > * read/write of individual records through API; > * read of individual records through RDF and JSON; > * bulk download of the entire dataset > > So i'ts possible to: > > 1) Import the bulk data; > 2) Catch all changes from openlibrary.org in real time; > 3) Allows that the synced data can be browsable and editable at any time > on MediaWiki/Wikidata instances; > 4) Sends back to openlibrary the changes, storing locally the data from > custom fields in the MediaWiki instance (allowing further import at > openlibrary instance if they creates the corresponding fields in their DB); > 5) Sends back to openlibrary all new book records created on MediaWiki > instances. > > > >

1 0

temporary patch for too high nsPage textarea
by Alex Brollo 07 Dec '13

07 Dec '13

This is the simple script that I'm using to reduce to a comfortable size edit textarea in nsPage, it "sniffs" too layout toggling: function resizeBox () {if ((wgCanonicalNamespace=="Page" && (wgAction=="edit" || wgAction=="submit"))&& $(".wikiEditor-ui-left").css("width")==$("#wpTextbox1").css("width")) {$("#wpTextbox1").attr("rows","10")} else {$("#wpTextbox1").attr("rows","31");} } $(document).ready(function () { $("img[rel='toggle-layout']").attr("onclick","resizeBox()"); resizeBox(); } ); Rough, but running. :-) Alex

3 4

Re: [Wikisource-l] [Wikidata-l] DNB 11M bibliographic records as CC0
by Federico Leva (Nemo) 07 Dec '13

07 Dec '13

Denny Vrandečić, 07/12/2013 00:59: > Thanks for reviving this thread, Luiz. I also wanted to ask whether we > should be updating parts of DNB and similar data. Maybe not create new > entries, but for those that we already have, add some of the available > data and point to the DNB dataset? Or maybe use openlibrary.org as a staging area for such data and fetch it from there? I'm not sure Wikidata should "compete" with openlibrary, it's a huge work and they already have an infrastructure for it; Wikidata/Wikimedia could "just" let the users easily import the data when it's needed. An obvious example is pre-filling of book/work metadata on Wikipedia articles, Wikisource books, Commons files (and associated Wikidata entries). Nemo

1 0

PDF Text Layer and Pywikipedia
by ബാലശങ്കർ സി 07 Dec '13

07 Dec '13

Hi all, I am from ml.wikisource.org and I am having a doubt regarding Mediawiki API and PDF files. I want to know if I can use Pywikipedia to grab the text layer of a pdf file (in the file namespace, obviously) . Is the mediawiki API handling any such functionality? Thanks in advance. Regards, Balasankar C http://balasankarc.in

2 1

Re: [Wikisource-l] [Wikidata-l] DNB 11M bibliographic records as CC0
by Luiz Augusto 07 Dec '13

07 Dec '13

There are reasons for editing and there are also reasons for not editing. One big reason *for* editing is if at any time data from new sources are being imported. We in Librarianship/Information Sciences makes decisions on how the data will be available to our users/customers. Eg an author name. There are many ways to write the same name from the same individual. The same individual can adopt in his life dozens of nicknames, change their last name if gets married and so on. The rule choosen in a particular library can be the same in more libraries, or even an entire different one (based on how the local community of users from a library will search/wants the data), or even no rule is choosen and the data is recorded "as is" it is registered in the publication. Some libraries have additional records specially devoted to the synonymies for the same name, some not. Google Book Search simply imported data from many libraries without making any attempt to standardize then, resulting in the large amount of duplicates and bullshits founds in some searches (specially those whose the imprints didn't standardizet themselves the data). Some special kinds of data from the same work can also be stored in differents sets of "fields" and "subfields" of MARC21 records across different libraries, again because the user/client need of informations about the works can vary from place to place (ie you get data duplication in the same record if you simple merge records from libraries). MARC21 specification have also an entire design that IMHO is impossible to reflect in the current MediaWiki schema, even with Semanctic MediaWiki. And sometimes some libraries tells that their data is stored on MARC21 fields, but are on USMARC ones (yep, there are many flavours of MARC as there are many flavors of Ubuntu). Or it is *based* on MARC21 fields, with dozens of local adaptions. I've just finished an internship in a library with 45k of records that was migrating data from REPIDISCA *based* fields (let's call it as a FreeBSD flavour) to MARC21 *based* fields (in this comparision, an Ubuntu flavour; and yep, *based*, with local adaptions, we needs those changes). The data is migrated in an automated fashion, but still needs to be validated record by record if the library wants those records in the MARC21 fields as it's. What I'm saying is: 1) You can't simply import data from many sources without validations expecting a good quality end product. You will get a "search engine" quality data (tons of random informations that will make sense only with a continuously developed set of algorithms maybe more time+resources consuming than standartizing the data); 2) Data standardize is an epic work dozens of times more epic than writing an comprehensive encyclopedia about all subjects on all languages. Institutional support will be needed, and in more compreensive ways embracing more than just releasing their data to play around it (ie, with additional hands for standardization). [[Paul Otlet]] (1868-1944) tried it in efforts that some argues he's the concept designer of Internet and hypertext. With no success, what is very unfortunate. Will the wikimedians gets any level of success on it? [[:m:User:555]] On Fri, Dec 6, 2013 at 9:59 PM, Denny Vrandečić <vrandecic(a)gmail.com> wrote: > Thanks for reviving this thread, Luiz. I also wanted to ask whether we > should be updating parts of DNB and similar data. Maybe not create new > entries, but for those that we already have, add some of the available data > and point to the DNB dataset? > > > On Fri, Dec 6, 2013 at 3:24 PM, Luiz Augusto <lugusto(a)gmail.com> wrote: > >> Just found this thread while browsing my email archives (I'm/was inactive >> on Wikimedia for at least 2 years) >> >> IMHO will be very helpfull if a central place hosting metadata from >> digitized works will be created. >> >> In my past experience, I've found lots of PD-old books from languages >> like french, spanish and english in repositories from Brazil and Portugal, >> with UI mostly in portuguese (ie, with very low probabilities to get found >> by volunteers from subdomains from those languages), for example. >> >> I particularly loves validating metadata more than proofreading books. >> Perhaps a tool/place like this makes new ways to contribute to Wikisource >> and helps on user retention (based on some wikipedians that gets fun making >> good articles but loves also sometimes to simply make trivial changes on >> their spare time)? >> >> I know that the thread was focused on general metadata from all kinds and >> ages of books, but I had this idea while reading this >> >> [[:m:User:555]] >> >> >> On Mon, Aug 26, 2013 at 10:42 AM, Thomas Douillard < >> thomas.douillard(a)gmail.com> wrote: >> >>> I know, I started a discussion about porting the bot to WIkidata in >>> scientific Journal Wikiproject. One answer I got : the bot owner had other >>> things to do in his life than running the bot and was not around very often >>> any more. Having everiyhing in Wikidata already will be a lot more reliable >>> and lazier, no tool that works one day but not the other one, no effort to >>> tell the newbies that they should go to another website, no significant >>> problem. >>> >>> Maybe one opposition would be that the data would be vandalised easily, >>> but maybe we should find a way to deal with imported sourced datas which >>> have no real reason to be modified, just marked deprecated or updated by >>> another import from the same source. >>> >>> >>> 2013/8/26 David Cuenca <dacuetu(a)gmail.com> >>> >>>> If the problem is to automate bibliographic data importing, a solution >>>> is what you propose, to import everything. Another one is to have an import >>>> tool to automatically import the data for the item that needs it. In WP >>>> they do that, there is a tool to import book/journal info by ISBN/doi. The >>>> same can be done in WD. >>>> >>>> Micru >>>> >>>> >>>> On Mon, Aug 26, 2013 at 9:23 AM, Thomas Douillard < >>>> thomas.douillard(a)gmail.com> wrote: >>>> >>>>> If Wikidata has an ambition to be a really reliable database, we >>>>> should do eveything we can to make it easy for users to use any source they >>>>> want. In this perspective, if we got datas with guaranted high quality, it >>>>> make it easy for Wikidatian to find and use these references for users. >>>>> Entering a reference in the database seems to me a highly fastidious, >>>>> boring, and easily automated task. >>>>> >>>>> With that in mind, any reference that the user will not have to enter >>>>> by hand is something good, and import high quality sources datas should >>>>> pass every Wikidata community barriers easily. If there is no problem for >>>>> the software to handle that many information, I say we really have no >>>>> reason not to do the imports. >>>>> >>>>> Tom >>>>> >>>>> >>>>> _______________________________________________ >>>>> Wikidata-l mailing list >>>>> Wikidata-l(a)lists.wikimedia.org >>>>> https://lists.wikimedia.org/mailman/listinfo/wikidata-l >>>>> >>>>> >>>> >>>> >>>> -- >>>> Etiamsi omnes, ego non >>>> >>>> _______________________________________________ >>>> Wikidata-l mailing list >>>> Wikidata-l(a)lists.wikimedia.org >>>> https://lists.wikimedia.org/mailman/listinfo/wikidata-l >>>> >>>> >>> >>> _______________________________________________ >>> Wikidata-l mailing list >>> Wikidata-l(a)lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/wikidata-l >>> >>> >> >> _______________________________________________ >> Wikidata-l mailing list >> Wikidata-l(a)lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wikidata-l >> >> > > _______________________________________________ > Wikidata-l mailing list > Wikidata-l(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata-l > >

1 0

Re: [Wikisource-l] [Wikidata-l] DNB 11M bibliographic records as CC0
by Luiz Augusto 07 Dec '13

07 Dec '13

Just found this thread while browsing my email archives (I'm/was inactive on Wikimedia for at least 2 years) IMHO will be very helpfull if a central place hosting metadata from digitized works will be created. In my past experience, I've found lots of PD-old books from languages like french, spanish and english in repositories from Brazil and Portugal, with UI mostly in portuguese (ie, with very low probabilities to get found by volunteers from subdomains from those languages), for example. I particularly loves validating metadata more than proofreading books. Perhaps a tool/place like this makes new ways to contribute to Wikisource and helps on user retention (based on some wikipedians that gets fun making good articles but loves also sometimes to simply make trivial changes on their spare time)? I know that the thread was focused on general metadata from all kinds and ages of books, but I had this idea while reading this [[:m:User:555]] On Mon, Aug 26, 2013 at 10:42 AM, Thomas Douillard < thomas.douillard(a)gmail.com> wrote: > I know, I started a discussion about porting the bot to WIkidata in > scientific Journal Wikiproject. One answer I got : the bot owner had other > things to do in his life than running the bot and was not around very often > any more. Having everiyhing in Wikidata already will be a lot more reliable > and lazier, no tool that works one day but not the other one, no effort to > tell the newbies that they should go to another website, no significant > problem. > > Maybe one opposition would be that the data would be vandalised easily, > but maybe we should find a way to deal with imported sourced datas which > have no real reason to be modified, just marked deprecated or updated by > another import from the same source. > > > 2013/8/26 David Cuenca <dacuetu(a)gmail.com> > >> If the problem is to automate bibliographic data importing, a solution is >> what you propose, to import everything. Another one is to have an import >> tool to automatically import the data for the item that needs it. In WP >> they do that, there is a tool to import book/journal info by ISBN/doi. The >> same can be done in WD. >> >> Micru >> >> >> On Mon, Aug 26, 2013 at 9:23 AM, Thomas Douillard < >> thomas.douillard(a)gmail.com> wrote: >> >>> If Wikidata has an ambition to be a really reliable database, we should >>> do eveything we can to make it easy for users to use any source they want. >>> In this perspective, if we got datas with guaranted high quality, it make >>> it easy for Wikidatian to find and use these references for users. Entering >>> a reference in the database seems to me a highly fastidious, boring, and >>> easily automated task. >>> >>> With that in mind, any reference that the user will not have to enter by >>> hand is something good, and import high quality sources datas should pass >>> every Wikidata community barriers easily. If there is no problem for the >>> software to handle that many information, I say we really have no reason >>> not to do the imports. >>> >>> Tom >>> >>> >>> _______________________________________________ >>> Wikidata-l mailing list >>> Wikidata-l(a)lists.wikimedia.org >>> https://lists.wikimedia.org/mailman/listinfo/wikidata-l >>> >>> >> >> >> -- >> Etiamsi omnes, ego non >> >> _______________________________________________ >> Wikidata-l mailing list >> Wikidata-l(a)lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wikidata-l >> >> > > _______________________________________________ > Wikidata-l mailing list > Wikidata-l(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata-l > >

1 0

Wikisource related projects
by Andrea Zanni 06 Dec '13

06 Dec '13

Here there are 2 bugs related to Wikisource: https://bugzilla.wikimedia.org/show_bug.cgi?id=57813 https://bugzilla.wikimedia.org/show_bug.cgi?id=57812 Feel free to CC yourself, or even better to work/find someone willing to work on them :-) Aubrey

5 15

Re: [Wikisource-l] [QA] Issues with ProofreadPage
by Thomas Tanon 06 Dec '13

06 Dec '13

Hi! Thanks a lot for your help proposition. I'm currently writing unit tests where it’s possible as part of the refactoring of the ProofreadPage extension that have been begun by a GSoC project last summer [1]. But as I’ve no special knowledge in this domain I’m not sure do do it well. I would like also to write some parser tests for the tags managed by the extension. These parsers tests haven’t been written before because <pages> and <pagelist> tags often relies on the presence of multipages files and so requires specific setups in the parser tests runner. An easy way to fix this problem is maybe to load 'by default’ the test file introduced by [2] that would be useful also in core parser tests to test the page= parameter of image inclusion (like in [[File:test.djvu|page=3]]). Thanks again, Thomas [1] https://www.mediawiki.org/wiki/Extension:Proofread_Page/GSoC [2] https://gerrit.wikimedia.org/r/#/c/98258/ Le 6 déc. 2013 à 09:33, Federico Leva (Nemo) <nemowiki(a)gmail.com> a écrit : -------- Messaggio originale -------- Oggetto: Re: [QA] [Wikisource-l] Issues with ProofreadPage Data: Thu, 5 Dec 2013 10:22:30 -0700 Mittente: Chris McMahon <cmcmahon(a)wikimedia.org> Rispondi-a: QA (software quality assurance) for Wikimedia projects. <qa(a)lists.wikimedia.org> A: QA (software quality assurance) for Wikimedia projects. <qa(a)lists.wikimedia.org> CC: discussion list for Wikisource, the free library <wikisource-l(a)lists.wikimedia.org> On Thu, Dec 5, 2013 at 8:24 AM, Federico Leva (Nemo) <nemowiki(a)gmail.com <mailto:nemowiki@gmail.com>> wrote: Andrea Zanni, 05/12/2013 15:09: Thank you Thomas, On Thu, Dec 5, 2013 at 2:56 PM, Thomas Tanon <thomaspt(a)hotmail.fr <mailto:thomaspt@hotmail.fr> <mailto:thomaspt@hotmail.fr <mailto:thomaspt@hotmail.fr>>> wrote: Im going to work on automatized tests in the next weeks in order to avoid a so major number of bugs the next times. Hello Thomas, Can you say more about these tests? We may be able to help there. -Chris

1 0

Fwd: [Advocacy Advisors] Joining a letter on copyright term in the TPP?
by Luiz Augusto 06 Dec '13

06 Dec '13

---------- Mensagem encaminhada ---------- De: "Stephen LaPorte" <slaporte(a)wikimedia.org> Data: 05/12/2013 20:57 Assunto: [Advocacy Advisors] Joining a letter on copyright term in the TPP? Para: "Advocacy Advisory Group for WMF LCA" < advocacy_advisors(a)lists.wikimedia.org> Hello advocacy advisers, Current drafts of the Trans Pacific Partnership[0], a new trade treaty currently being negotiated, contains language that would require countries that sign the treaty to extend the length of the minimum copyright term to life of the author plus 70 years. Global treaties currently require only life + 50 years, so the TPP would represent a widespread extension of copyright terms by 20 years, and make it hard to roll back the copyright term in countries that already have life + 70. The letter below[1], addressed to the TPP negotiators, directly addresses this issue. We’re considering signing, because the letter is specifically targeted at an issue (copyright term) that is core to our encyclopedic mission, and affects (at present) 14 different countries. Does the advisory group have any thoughts about joining the letter? We would like to let KEI know if we will join the letter before December 7, 2013. [0] https://en.wikipedia.org/wiki/Trans-Pacific_Partnership ; http://tppinfo.org/ (We briefly mentioned TPP in the Wikilegal fact sheet on ACTA in January 2012. If anyone is interested in updating that document, feel free to get in touch! See: https://meta.wikimedia.org/wiki/Wikilegal/ACTA) [1] http://keionline.org/nolifeplus70intpp -- The letter was prepared by Knowledge Ecology International, and will be joined by like-minded organizations including the Open Knowledge Foundation, Electronic Frontier Foundation, and Free Software Foundation. Full copy of the letter: *Dear TPP negotiators,* *In a December 7-10 meeting in Singapore you will be asked to endorse a binding obligation to grant copyright protection for 70 years after the death of an author. We urge you to reject the life+ 70 year term for copyright.* *There is no benefit to society of extending copyright beyond the 50 years mandated by the WTO. While some TPP countries, like the USA, Mexico, Peru, Chile or Australia, already have life+ 70 (or longer) copyright terms, there is growing recognition that such terms were a mistake, and should be shortened, or modified by requiring formalities for the extended periods.* *The primary harm from the life+ 70 copyright term is the loss of access to countless books, newspapers, pamphlets, photographs, films, sound recordings and other works that are “owned” but largely not commercialized, forgotten, and lost. The extended terms are also costly to consumers and performers, while benefiting persons and corporate owners that had nothing to do with the creation of the work.* *Life+70 is a mistake, and it will be an embarrassment to enshrine this mistake into the largest regional trade agreement ever negotiated.* -- Stephen LaPorte Legal Counsel Wikimedia Foundation *This message might have confidential or legally privileged information in it. If you have received this message by accident, please delete it and let us know about the mistake. For legal reasons, I may only serve as an attorney for the Wikimedia Foundation. This means I may not give legal advice to or serve as a lawyer for community members, volunteers, or staff members in their personal capacity.* _______________________________________________ Advocacy_Advisors mailing list Advocacy_Advisors(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/advocacy_advisors

1 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Wikisource-l December 2013