What kind of bot for wiktionary in wikidata needs?

List overview All Threads
Download

newer

older

Does Wikidata use a property store...

Multimedia engineering role at the...

Amirouche

2 Mar 2017 2 Mar '17

4:16 a.m.

Héllo,

I have been lurking around for some month now. I stumbled upon the wiktionary in wikidata project via for instance this pdf https://upload.wikimedia.org/wikipedia/commons/6/60/Wikidata_for_Wiktionary_...

Now I'd like to help. For that I want to build a bot to achieve that goal.

My understanding is that a proof of concept of the page 11 of the above pdf can be good. But I never really did any site scraping. Is there any abstraction that help in this regard.

My setup:

- homegrown rdf-like database with wikidata loaded from json dumps with minikanren querying

- GNU Guile

- soon enough dumps from https://en.wiktionary.org/api/

Tx!

Show replies by date

Finn Aarup Nielsen

2 Mar 2 Mar

4:37 a.m.

Hi,

It is my understanding that Wikidata for Wiktionary requires new data structures or at least new name space (L, F and S), and that is what holding people back.

What could be interesting to have would be a prototype (not necessarily built with MediaWiki+Wikibase) to see if the suggested scheme is ok.

Finn Årup Nielsen

On 03/01/2017 10:16 PM, Amirouche wrote:

...

Héllo,

I have been lurking around for some month now. I stumbled upon the wiktionary in wikidata project via for instance this pdf https://upload.wikimedia.org/wikipedia/commons/6/60/Wikidata_for_Wiktionary_...

Now I'd like to help. For that I want to build a bot to achieve that goal.

My understanding is that a proof of concept of the page 11 of the above pdf can be good. But I never really did any site scraping. Is there any abstraction that help in this regard.

My setup:

homegrown rdf-like database with wikidata loaded from json dumps with

minikanren querying

GNU Guile

soon enough dumps from https://en.wiktionary.org/api/

Tx!

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

fn＠imm.dtu.dk

4:43 a.m.

Hi,

It is my understanding that Wikidata for Wiktionary requires new data structures or at least new name space (L, F and S), and that is what holding people back.

What could be interesting to have would be a prototype (not necessarily built with MediaWiki+Wikibase) to see if the suggested scheme is ok.

Finn Årup Nielsen

On 03/01/2017 10:16 PM, Amirouche wrote:

...

Héllo,

I have been lurking around for some month now. I stumbled upon the wiktionary in wikidata project via for instance this pdf https://upload.wikimedia.org/wikipedia/commons/6/60/Wikidata_for_Wiktionary_...

Now I'd like to help. For that I want to build a bot to achieve that goal.

My understanding is that a proof of concept of the page 11 of the above pdf can be good. But I never really did any site scraping. Is there any abstraction that help in this regard.

My setup:

homegrown rdf-like database with wikidata loaded from json dumps with

minikanren querying

GNU Guile

soon enough dumps from https://en.wiktionary.org/api/

Tx!

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Léa Lacroix

4:34 p.m.

Hello Amirouche,

Thanks a lot for your interest in this project and your proposal to help. Currently, the development team is still working on the new datatype structure for lexemes, and we don't have something to demo yet. As soon as we can provide a viable structure to test, we will announce it here and on the talk page of the project https://www.wikidata.org/wiki/Wikidata_talk:Wiktionary.

Cheers,

On 1 March 2017 at 22:43, fn@imm.dtu.dk wrote:

...

Hi,

It is my understanding that Wikidata for Wiktionary requires new data structures or at least new name space (L, F and S), and that is what holding people back.

What could be interesting to have would be a prototype (not necessarily built with MediaWiki+Wikibase) to see if the suggested scheme is ok.

Finn Årup Nielsen

On 03/01/2017 10:16 PM, Amirouche wrote:

...
Héllo,

I have been lurking around for some month now. I stumbled upon the wiktionary in wikidata project via for instance this pdf https://upload.wikimedia.org/wikipedia/commons/6/60/Wikidata _for_Wiktionary_announcement.pdf

Now I'd like to help. For that I want to build a bot to achieve that goal.

My understanding is that a proof of concept of the page 11 of the above pdf can be good. But I never really did any site scraping. Is there any abstraction that help in this regard.

My setup:

homegrown rdf-like database with wikidata loaded from json dumps with

minikanren querying

GNU Guile

soon enough dumps from https://en.wiktionary.org/api/

Tx!

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

-- Léa Lacroix Project Manager Community Communication for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.

Amirouche

21 Mar 21 Mar

3:57 a.m.

Héllo all!

Le 02/03/2017 à 10:34, Léa Lacroix a écrit :

...

Hello Amirouche,

Thanks a lot for your interest in this project and your proposal to help. Currently, the development team is still working on the new datatype structure for lexemes, and we don't have something to demo yet.

I don't need wikibase support of L, F and S right now.

What I am wondering is whether there is already work done wikimedia side regarding the *extraction* of Lexeme, Form and Sens from wikitionary pages.

I started scrapping english wiktionary. I will have demo ready by the end of the week. But I'd like to avoid duplicate work and focus on other stuff if wikimedia already plan to do this.

...

As soon as we can provide a viable structure to test, we will announce it here and on the talk page of the project https://www.wikidata.org/wiki/Wikidata_talk:Wiktionary.

Cheers,

On 1 March 2017 at 22:43, <fn@imm.dtu.dk mailto:fn@imm.dtu.dk> wrote:

Hi,


It is my understanding that Wikidata for Wiktionary requires new
data structures or at least new name space (L, F and S), and that
is what holding people back.


What could be interesting to have would be a prototype (not
necessarily built with MediaWiki+Wikibase) to see if the suggested
scheme is ok



On 03/01/2017 10:16 PM, Amirouche wrote:

    Héllo,


    I have been lurking around for some month now. I stumbled upon the
    wiktionary in wikidata project
     via for instance this pdf
    https://upload.wikimedia.org/wikipedia/commons/6/60/Wikidata_for_Wiktionary_announcement.pdf
    <https://upload.wikimedia.org/wikipedia/commons/6/60/Wikidata_for_Wiktionary_announcement.pdf>


    Now I'd like to help. For that I want to build a bot to
    achieve that goal.


    My understanding is that a proof of concept of the page 11 of
    the above
    pdf can be good. But I never really did any site scraping. Is
    there any
    abstraction that help in this regard.

Amirouche

2:49 p.m.

New subject: What kind of bot wiktionary in wikidata needs?

Fixed the subject of the mail

Le 20/03/2017 à 21:57, Amirouche a écrit :

...

Héllo all!

Le 02/03/2017 à 10:34, Léa Lacroix a écrit :

...
Hello Amirouche,

Thanks a lot for your interest in this project and your proposal to help. Currently, the development team is still working on the new datatype structure for lexemes, and we don't have something to demo yet.

I don't need wikibase support of L, F and S right now.

What I am wondering is whether there is already work done wikimedia side regarding the *extraction* of Lexeme, Form and Sens from wikitionary pages.

I started scrapping english wiktionary. I will have demo ready by the end of the week. But I'd like to avoid duplicate work and focus on other stuff if wikimedia already plan to do this.

...
As soon as we can provide a viable structure to test, we will announce it here and on the talk page of the project https://www.wikidata.org/wiki/Wikidata_talk:Wiktionary.

Cheers,

On 1 March 2017 at 22:43, <fn@imm.dtu.dk mailto:fn@imm.dtu.dk> wrote:
Hi,


It is my understanding that Wikidata for Wiktionary requires new
data structures or at least new name space (L, F and S), and that
is what holding people back.


What could be interesting to have would be a prototype (not
necessarily built with MediaWiki+Wikibase) to see if the suggested
scheme is ok



On 03/01/2017 10:16 PM, Amirouche wrote:

    Héllo,


    I have been lurking around for some month now. I stumbled 
upon the wiktionary in wikidata project via for instance this pdf https://upload.wikimedia.org/wikipedia/commons/6/60/Wikidata_for_Wiktionary_... https://upload.wikimedia.org/wikipedia/commons/6/60/Wikidata_for_Wiktionary_announcement.pdf
    Now I'd like to help. For that I want to build a bot to
    achieve that goal.


    My understanding is that a proof of concept of the page 11 of
    the above
    pdf can be good. But I never really did any site scraping. Is
    there any
    abstraction that help in this regard.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Léa Lacroix

3:14 p.m.

Hello,

No, there is nothing from our side regarding extracting data from Wiktionary. This is not in the plans of the development team, by the way, we think that this decision (to extract or not) and the ways to possibly do it, should be taken by both of the communities (Wikidata and Wiktionary).

If you have any experiments or demo, feel free to share :)

On 20 March 2017 at 21:57, Amirouche amirouche@hypermove.net wrote:

...

Héllo all!

Le 02/03/2017 à 10:34, Léa Lacroix a écrit :

...
Hello Amirouche,

Thanks a lot for your interest in this project and your proposal to help. Currently, the development team is still working on the new datatype structure for lexemes, and we don't have something to demo yet.

I don't need wikibase support of L, F and S right now.

What I am wondering is whether there is already work done wikimedia side regarding the *extraction* of Lexeme, Form and Sens from wikitionary pages.

I started scrapping english wiktionary. I will have demo ready by the end of the week. But I'd like to avoid duplicate work and focus on other stuff if wikimedia already plan to do this.

As soon as we can provide a viable structure to test, we will announce it

...
here and on the talk page of the project https://www.wikidata.org/wiki /Wikidata_talk:Wiktionary.

Cheers,

On 1 March 2017 at 22:43, <fn@imm.dtu.dk mailto:fn@imm.dtu.dk> wrote:
Hi,


It is my understanding that Wikidata for Wiktionary requires new
data structures or at least new name space (L, F and S), and that
is what holding people back.


What could be interesting to have would be a prototype (not
necessarily built with MediaWiki+Wikibase) to see if the suggested
scheme is ok



On 03/01/2017 10:16 PM, Amirouche wrote:

    Héllo,


    I have been lurking around for some month now. I stumbled upon the
    wiktionary in wikidata project
     via for instance this pdf
    https://upload.wikimedia.org/wikipedia/commons/6/60/Wikidata
_for_Wiktionary_announcement.pdf https://upload.wikimedia.org/wikipedia/commons/6/60/Wikidat a_for_Wiktionary_announcement.pdf
    Now I'd like to help. For that I want to build a bot to
    achieve that goal.


    My understanding is that a proof of concept of the page 11 of
    the above
    pdf can be good. But I never really did any site scraping. Is
    there any
    abstraction that help in this regard.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Amirouche

22 Mar 22 Mar

4:10 p.m.

New subject: What kind of bot "wiktionary in wikidata" needs?

Le 21/03/2017 à 09:14, Léa Lacroix a écrit :

...

Hello,

No, there is nothing from our side regarding extracting data from Wiktionary. This is not in the plans of the development team, by the way, we think that this decision (to extract or not) and the ways to possibly do it, should be taken by both of the communities (Wikidata and Wiktionary).

If you have any experiments or demo, feel free to share :)

My understanding is that wiktionary (and wikipedia) CC-BY-SA license is incompatible with wikidata CC0 license.

see https://meta.wikimedia.org/wiki/Talk:Wikidata#Is_CC_the_right_license_for_da...

...

On 20 March 2017 at 21:57, Amirouche <amirouche@hypermove.net mailto:amirouche@hypermove.net> wrote:

Héllo all!


Le 02/03/2017 à 10:34, Léa Lacroix a écrit :

    Hello Amirouche,

    Thanks a lot for your interest in this project and your
    proposal to help.
    Currently, the development team is still working on the new
    datatype structure for lexemes, and we don't have something to
    demo yet.


I don't need wikibase support of L, F and S right now.

What I am wondering is whether there is already work done
wikimedia side regarding the *extraction* of Lexeme, Form and Sens
from wikitionary pages.

I started scrapping english wiktionary. I will have demo ready by
the end of the week. But I'd like to avoid duplicate work and
focus on other stuff if wikimedia already plan to do this.

    As soon as we can provide a viable structure to test, we will
    announce it here and on the talk page of the project
    <https://www.wikidata.org/wiki/Wikidata_talk:Wiktionary
    <https://www.wikidata.org/wiki/Wikidata_talk:Wiktionary>>.

    Cheers,

    On 1 March 2017 at 22:43, <fn@imm.dtu.dk
    <mailto:fn@imm.dtu.dk> <mailto:fn@imm.dtu.dk
    <mailto:fn@imm.dtu.dk>>> wrote:



        Hi,


        It is my understanding that Wikidata for Wiktionary
    requires new
        data structures or at least new name space (L, F and S),
    and that
        is what holding people back.


        What could be interesting to have would be a prototype (not
        necessarily built with MediaWiki+Wikibase) to see if the
    suggested
        scheme is ok



        On 03/01/2017 10:16 PM, Amirouche wrote:

            Héllo,


            I have been lurking around for some month now. I
    stumbled upon the
            wiktionary in wikidata project
             via for instance this pdf
    https://upload.wikimedia.org/wikipedia/commons/6/60/Wikidata_for_Wiktionary_announcement.pdf
    <https://upload.wikimedia.org/wikipedia/commons/6/60/Wikidata_for_Wiktionary_announcement.pdf>

    <https://upload.wikimedia.org/wikipedia/commons/6/60/Wikidata_for_Wiktionary_announcement.pdf
    <https://upload.wikimedia.org/wikipedia/commons/6/60/Wikidata_for_Wiktionary_announcement.pdf>>


            Now I'd like to help. For that I want to build a bot to
            achieve that goal.


            My understanding is that a proof of concept of the
    page 11 of
            the above
            pdf can be good. But I never really did any site
    scraping. Is
            there any
            abstraction that help in this regard.


_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata
<https://lists.wikimedia.org/mailman/listinfo/wikidata>

-- Léa Lacroix Project Manager Community Communication for Wikidata

Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de http://www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Daniel Kinzler

25 Mar 25 Mar

1:01 a.m.

New subject: What kind of bot "wiktionary in wikidata" needs?

Am 22.03.2017 um 10:10 schrieb Amirouche:

...

My understanding is that wiktionary (and wikipedia) CC-BY-SA license is incompatible with wikidata CC0 license.

That is true, for any copyrighted information on Wiktionary. That will mainly be definitions, and maybe example sentences. Facts, such as word type or morphology, are not copyrightable.

-- Daniel Kinzler Principal Platform Engineer Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.

2799

Age (days ago)

2822

Last active (days ago)

wikidata@lists.wikimedia.org

8 comments

5 participants

tags (0)

participants (5)

Amirouche
Daniel Kinzler
Finn Aarup Nielsen
fn＠imm.dtu.dk
Léa Lacroix