Re: [Foundation-l] [Commons-l] Wikidata

List overview All Threads
Download

newer

older

Re: [Foundation-l] Fwd: Re:...

[Foundation-l] 5M by 2 weeks

Michael Peel

22 Nov 2010 22 Nov '10

9:24 p.m.

(also including foundation-l as this isn't really a commons-specific discussion)

On 22 Nov 2010, at 21:04, Samuel Klein wrote:

...

...
A wikidata project could use semantic mediawiki from the outset, and be seeded with data from dbpedia.

A lot of existing & proposed projects would benefit from a centralised wikidata project. e.g. a genealogy wiki could use the relationships stored on the wikidata project. wikisource and commons could use the central data wiki for their Author and Creator details.

+1

Could this be part of dbpedia?

dbpedia is about collating the information available on Wikipedia and providing that as a database for others to use. This is about having a central information store that can be edited to add information. Whilst dbpedia could seed wikidata, they're very different projects in the way they would operate.

In my opinion, the Wikimedia Foundation should very seriously look into starting something like wikidata. I don't suppose there's a facilitator that could be hired that knows about Wikimedia sufficiently to facilitate an on-wiki discussion and formation of a comprehensive proposal to start this project, including bringing together the various people interested in this project?

Mike Peel

Show replies by date

John Vandenberg

22 Nov 22 Nov

10:31 p.m.

New subject: [Foundation-l] [Commons-l] Wikidata

On Tue, Nov 23, 2010 at 8:24 AM, Michael Peel email@mikepeel.net wrote:

...

...
Could this be part of dbpedia?

dbpedia is about collating the information available on Wikipedia and providing that as a database for others to use. This is about having a central information store that can be edited to add information. Whilst dbpedia could seed wikidata, they're very different projects in the way they would operate.

I agree.

...

In my opinion, the Wikimedia Foundation should very seriously look into starting something like wikidata. I don't suppose there's a facilitator that could be hired that knows about Wikimedia sufficiently to facilitate an on-wiki discussion and formation of a comprehensive proposal to start this project, including bringing together the various people interested in this project?

As it is the first new project in quite a long time, having a WMF staff member assigned to it would be brilliant. As this would/should involve the first deployment of semantic mediawiki by WMF, it would be good for that someone to already experienced with semantic medawiki.

-- John Vandenberg

Andrea Zanni

11:06 p.m.

New subject: [Foundation-l] [Commons-l] Wikidata

...

As it is the first new project in quite a long time, having a WMF staff member assigned to it would be brilliant. As this would/should involve the first deployment of semantic mediawiki by WMF, it would be good for that someone to already experienced with semantic medawiki.

Agree. Starting using SMW for a brand new project for data could solve all the issues that prevented it to be used until now? Hope it could. it would be extremely helpful for project like Commons and Wikisource (just talking about data now)

Aubrey.

Brian J Mingus

11:17 p.m.

New subject: [Foundation-l] [Commons-l] Wikidata

On Mon, Nov 22, 2010 at 4:06 PM, Andrea Zanni zanni.andrea84@gmail.comwrote:

...

...
As it is the first new project in quite a long time, having a WMF staff member assigned to it would be brilliant. As this would/should involve the first deployment of semantic mediawiki by WMF, it would be good for that someone to already experienced with semantic medawiki.

Agree. Starting using SMW for a brand new project for data could solve all the issues that prevented it to be used until now? Hope it could. it would be extremely helpful for project like Commons and Wikisource (just talking about data now)

Aubrey.

SMW would have to be completely redesigned for use in a project with millions of pages and millions of attributes where arbitrary queries are possible.

- Brian

Michael Peel

11:30 p.m.

New subject: [Foundation-l] [Commons-l] Wikidata

On 22 Nov 2010, at 23:17, Brian J Mingus wrote:

...

On Mon, Nov 22, 2010 at 4:06 PM, Andrea Zanni zanni.andrea84@gmail.comwrote:

...
...
As it is the first new project in quite a long time, having a WMF staff member assigned to it would be brilliant. As this would/should involve the first deployment of semantic mediawiki by WMF, it would be good for that someone to already experienced with semantic medawiki.

Agree. Starting using SMW for a brand new project for data could solve all the issues that prevented it to be used until now? Hope it could. it would be extremely helpful for project like Commons and Wikisource (just talking about data now)

Aubrey.

SMW would have to be completely redesigned for use in a project with millions of pages and millions of attributes where arbitrary queries are possible.

OK - a) why, b) how, c) is this feasible, d) is SMW the right way to go?

Mike

John Vandenberg

11:33 p.m.

New subject: [Foundation-l] [Commons-l] Wikidata

On Tue, Nov 23, 2010 at 10:17 AM, Brian J Mingus brian.mingus@colorado.edu wrote:

...

On Mon, Nov 22, 2010 at 4:06 PM, Andrea Zanni zanni.andrea84@gmail.comwrote:

...
...
As it is the first new project in quite a long time, having a WMF staff member assigned to it would be brilliant. As this would/should involve the first deployment of semantic mediawiki by WMF, it would be good for that someone to already experienced with semantic medawiki.

Agree. Starting using SMW for a brand new project for data could solve all the issues that prevented it to be used until now? Hope it could. it would be extremely helpful for project like Commons and Wikisource (just talking about data now)

Aubrey.

SMW would have to be completely redesigned for use in a project with millions of pages and millions of attributes where arbitrary queries are possible.

What limitations would be useful to get the project off the ground?

Some ideas:

The data project is initially only used/queried by Wikipedia projects, and then cached on the Wikipedia side

The data project is initially limited to only geographic entities.

-- John Vandenberg

Jan Kucera (Kozuch)

24 Nov 24 Nov

9:48 a.m.

New subject: [Foundation-l] [Commons-l] Wikidata

I too support the creation of Wikidata. Unfortunately, in the global decline of Wikimedia participation, even the Foundation is unable to help ignite a new valuable project that is desperately needed in order to bring in some new participation... sad situation. I wonder what the 50+ Foundation staffers are working on, if they are not able to help out little bit with such and important idea...

Kozuch

...

------------ Původní zpráva ------------ Od: John Vandenberg jayvdb@gmail.com Předmět: Re: [Foundation-l] [Commons-l] Wikidata Datum: 23.11.2010 00:33:40

On Tue, Nov 23, 2010 at 10:17 AM, Brian J Mingus brian.mingus@colorado.edu wrote:

...
On Mon, Nov 22, 2010 at 4:06 PM, Andrea Zanni

zanni.andrea84@gmail.comwrote:

...
...
...
As it is the first new project in quite a long time, having a WMF staff member assigned to it would be brilliant. As this would/should involve the first deployment of semantic mediawiki by WMF, it would be good for that someone to already experienced with semantic medawiki.

Agree. Starting using SMW for a brand new project for data could solve all the issues that prevented it to be used until now? Hope it could. it would be extremely helpful for project like Commons and Wikisource (just talking about data now)

Aubrey.

SMW would have to be completely redesigned for use in a project with millions of pages and millions of attributes where arbitrary queries are possible.

What limitations would be useful to get the project off the ground?

Some ideas:

The data project is initially only used/queried by Wikipedia projects, and then cached on the Wikipedia side

The data project is initially limited to only geographic entities.

-- John Vandenberg

foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Erik Moeller

7:25 p.m.

New subject: [Foundation-l] [Commons-l] Wikidata

Hi all,

as you may know I've been involved in the structured data community for a few years (through the original "Wikidata" proposal in 2004 as well as architecting and developing OmegaWiki, together with the OpenProgress team and others from 2005-2007). I've been following Semantic MediaWiki, Freebase and other projects from the beginning. You don't need to sell me on the value or importance of structured data.

The problem space is very complex, especially when taking into account that Wikimedia is a fully multilingual system. There are still low hanging fruits, especially for a project like Wikimedia Commons, but I agree w/ Michael that a more holistic approach to how to access and manage data in WMF projects is much preferable to, for example, throwing SMW into some wikis and not others, etc.

When I joined WMF, I couldn't justify arguing for higher priority on data tech projects more so than, for example, the 2009-10 usability initiative and continuing efforts in this area, especially given that we still have only a tiny engineering staff. I don't believe that structured data is going to be the principal driver of participation -- that problem space is more about social and technical barriers to entry, interaction with new users, mentoring, etc. And we're continuing to fall behind the rest of the web in terms of usability.

That being said, it's clear that it's a key enabling technology (including for _some_ usability improvements, although many of them can be made without a full-fledged structured data support system). I particularly think it has huge potential in bootstrapping small languages by more closely interconnecting useful and translatable bits of information (start a page about "Germany" in a new language and immediately pull all relevant data, possibly including translations of labels if available).

Danese and I have been working on a "Data Summit" this year to bring together both the key players in the structured data field (DBPedia, SMW, etc.), as well as some of the research and analytics community. Unfortunately we've had to reschedule it, but it'll happen in Q1 2011. We're not going to be able to dedicate lots of resources to engineering in this area in the near future, but since there are already so many disparate efforts that focus on making WP data usable, we do hope that we can partner up with others to move things forward.

In a nutshell, I think we should aim to establish a “Wikidata Commons” project at data.wikimedia.org which serves all Wikimedia projects with structured data in a language-neutral fashion, analog to “Wikimedia Commons” for multimedia files, and which becomes the central location to curate, maintain and discuss such data. Wikidata Commons should provide standard interfaces for querying, importing, and exporting data. This project could be built incrementally (starting with clunky but reasonably future-proof ways to manage and retrieve data).

The key challenges as I see them continue to be, as ever: 1) maintaining predictable and reasonable system performance as the DB scales, more and increasingly complex queries are performed, etc., 2) consistently improving rather than degrading user experience, 3) handling multilingual representations of all translatable content well without giving undue prominence to any one language, 4) effectively caching and purging data wherever it's used, 5) versioning/transactioning relational data to be maximally useful and conducive to collaboration.

Earlier this week, Danese and I met with Denny Vrandecic from SMW, who's recently put together a prototype called "Shortipedia" that allows language-independent (using multilingual labels) annotation of concepts with SMW-style properties through a minimal form-based interface, interfacing with whichever triple store is configured for SMW. It's still very much a hack, and he's aiming to clean it up for the summit. But it looks potentially very interesting, and like a concept we could rally energy behind. The data from such a repository could then be pulled into WP templates, accessed through "wizards" that auto-generate template data for new articles, etc.

Anyone who wants to advance the thinking in this space should also consider what can be done today with Wikimedia Commons and SMW. Since Wikimedia Commons is an intrinsically multilingual database with focus on annotating individual files, its operational requirements are somewhat different from those of most other projects. It would be useful to have an instance of SMW running using a copy of the Wikimedia Commons database and possibly Semantic Forms to see what such annotation could look like in practice. Anyone with time and technical skills can put together prototypes like this that'll help us move forward.

Again, I think the likely path forward here is for us to ally effectively with the key players in the space, rather than doing all the work ourselves.

-- Erik Möller Deputy Director, Wikimedia Foundation Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate

Jan Kucera (Kozuch)

27 Nov 27 Nov

11:23 a.m.

New subject: [Foundation-l] [Commons-l] Wikidata

Hi there,

so how do we move forward with Wikidata? There is a bunch of proposals both on Strategy and Meta, but I guess we need a clearly dedicated place for serious discussion on topic. So lets either create a wiki on data.wikimedia.org or a dedicated mailing list here... or both.

Kozuch

...

------------ Původní zpráva ------------ Od: Erik Moeller erik@wikimedia.org Předmět: Re: [Foundation-l] [Commons-l] Wikidata Datum: 24.11.2010 20:25:37

Hi all,

as you may know I've been involved in the structured data community for a few years (through the original "Wikidata" proposal in 2004 as well as architecting and developing OmegaWiki, together with the OpenProgress team and others from 2005-2007). I've been following Semantic MediaWiki, Freebase and other projects from the beginning. You don't need to sell me on the value or importance of structured data.

The problem space is very complex, especially when taking into account that Wikimedia is a fully multilingual system. There are still low hanging fruits, especially for a project like Wikimedia Commons, but I agree w/ Michael that a more holistic approach to how to access and manage data in WMF projects is much preferable to, for example, throwing SMW into some wikis and not others, etc.

When I joined WMF, I couldn't justify arguing for higher priority on data tech projects more so than, for example, the 2009-10 usability initiative and continuing efforts in this area, especially given that we still have only a tiny engineering staff. I don't believe that structured data is going to be the principal driver of participation -- that problem space is more about social and technical barriers to entry, interaction with new users, mentoring, etc. And we're continuing to fall behind the rest of the web in terms of usability.

That being said, it's clear that it's a key enabling technology (including for _some_ usability improvements, although many of them can be made without a full-fledged structured data support system). I particularly think it has huge potential in bootstrapping small languages by more closely interconnecting useful and translatable bits of information (start a page about "Germany" in a new language and immediately pull all relevant data, possibly including translations of labels if available).

Danese and I have been working on a "Data Summit" this year to bring together both the key players in the structured data field (DBPedia, SMW, etc.), as well as some of the research and analytics community. Unfortunately we've had to reschedule it, but it'll happen in Q1 2011. We're not going to be able to dedicate lots of resources to engineering in this area in the near future, but since there are already so many disparate efforts that focus on making WP data usable, we do hope that we can partner up with others to move things forward.

In a nutshell, I think we should aim to establish a “Wikidata Commons” project at data.wikimedia.org which serves all Wikimedia projects with structured data in a language-neutral fashion, analog to “Wikimedia Commons” for multimedia files, and which becomes the central location to curate, maintain and discuss such data. Wikidata Commons should provide standard interfaces for querying, importing, and exporting data. This project could be built incrementally (starting with clunky but reasonably future-proof ways to manage and retrieve data).

The key challenges as I see them continue to be, as ever: 1) maintaining predictable and reasonable system performance as the DB scales, more and increasingly complex queries are performed, etc., 2) consistently improving rather than degrading user experience, 3) handling multilingual representations of all translatable content well without giving undue prominence to any one language, 4) effectively caching and purging data wherever it's used, 5) versioning/transactioning relational data to be maximally useful and conducive to collaboration.

Earlier this week, Danese and I met with Denny Vrandecic from SMW, who's recently put together a prototype called "Shortipedia" that allows language-independent (using multilingual labels) annotation of concepts with SMW-style properties through a minimal form-based interface, interfacing with whichever triple store is configured for SMW. It's still very much a hack, and he's aiming to clean it up for the summit. But it looks potentially very interesting, and like a concept we could rally energy behind. The data from such a repository could then be pulled into WP templates, accessed through "wizards" that auto-generate template data for new articles, etc.

Anyone who wants to advance the thinking in this space should also consider what can be done today with Wikimedia Commons and SMW. Since Wikimedia Commons is an intrinsically multilingual database with focus on annotating individual files, its operational requirements are somewhat different from those of most other projects. It would be useful to have an instance of SMW running using a copy of the Wikimedia Commons database and possibly Semantic Forms to see what such annotation could look like in practice. Anyone with time and technical skills can put together prototypes like this that'll help us move forward.

Again, I think the likely path forward here is for us to ally effectively with the key players in the space, rather than doing all the work ourselves.

-- Erik Möller Deputy Director, Wikimedia Foundation

Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate

foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Gerard Meijssen

12:22 p.m.

New subject: [Foundation-l] [Commons-l] Wikidata

Hoi, At OmegaWiki we are MediaWiki based. We link to Wikipedia and to Commons. The data is multilingual and the relations show in "your" language when a translation exists. I hope that you have a look what can already be done. Try for instance котка Thanks, GerardM http://www.omegawiki.org/Expression:%D0%BA%D0%BE%D1%82%D0%BA%D0%B0

On 27 November 2010 12:23, Jan Kucera (Kozuch) garbage5@seznam.cz wrote:

...

Hi there,

so how do we move forward with Wikidata? There is a bunch of proposals both on Strategy and Meta, but I guess we need a clearly dedicated place for serious discussion on topic. So lets either create a wiki on data.wikimedia.org or a dedicated mailing list here... or both.

Kozuch

...
------------ Původní zpráva ------------ Od: Erik Moeller erik@wikimedia.org Předmět: Re: [Foundation-l] [Commons-l] Wikidata Datum: 24.11.2010 20:25:37

Hi all,

as you may know I've been involved in the structured data community for a few years (through the original "Wikidata" proposal in 2004 as well as architecting and developing OmegaWiki, together with the OpenProgress team and others from 2005-2007). I've been following Semantic MediaWiki, Freebase and other projects from the beginning. You don't need to sell me on the value or importance of structured data.

The problem space is very complex, especially when taking into account that Wikimedia is a fully multilingual system. There are still low hanging fruits, especially for a project like Wikimedia Commons, but I agree w/ Michael that a more holistic approach to how to access and manage data in WMF projects is much preferable to, for example, throwing SMW into some wikis and not others, etc.

When I joined WMF, I couldn't justify arguing for higher priority on data tech projects more so than, for example, the 2009-10 usability initiative and continuing efforts in this area, especially given that we still have only a tiny engineering staff. I don't believe that structured data is going to be the principal driver of participation -- that problem space is more about social and technical barriers to entry, interaction with new users, mentoring, etc. And we're continuing to fall behind the rest of the web in terms of usability.

That being said, it's clear that it's a key enabling technology (including for _some_ usability improvements, although many of them can be made without a full-fledged structured data support system). I particularly think it has huge potential in bootstrapping small languages by more closely interconnecting useful and translatable bits of information (start a page about "Germany" in a new language and immediately pull all relevant data, possibly including translations of labels if available).

Danese and I have been working on a "Data Summit" this year to bring together both the key players in the structured data field (DBPedia, SMW, etc.), as well as some of the research and analytics community. Unfortunately we've had to reschedule it, but it'll happen in Q1 2011. We're not going to be able to dedicate lots of resources to engineering in this area in the near future, but since there are already so many disparate efforts that focus on making WP data usable, we do hope that we can partner up with others to move things forward.

In a nutshell, I think we should aim to establish a “Wikidata Commons” project at data.wikimedia.org which serves all Wikimedia projects with structured data in a language-neutral fashion, analog to “Wikimedia Commons” for multimedia files, and which becomes the central location to curate, maintain and discuss such data. Wikidata Commons should provide standard interfaces for querying, importing, and exporting data. This project could be built incrementally (starting with clunky but reasonably future-proof ways to manage and retrieve data).

The key challenges as I see them continue to be, as ever: 1) maintaining predictable and reasonable system performance as the DB scales, more and increasingly complex queries are performed, etc., 2) consistently improving rather than degrading user experience, 3) handling multilingual representations of all translatable content well without giving undue prominence to any one language, 4) effectively caching and purging data wherever it's used, 5) versioning/transactioning relational data to be maximally useful and conducive to collaboration.

Earlier this week, Danese and I met with Denny Vrandecic from SMW, who's recently put together a prototype called "Shortipedia" that allows language-independent (using multilingual labels) annotation of concepts with SMW-style properties through a minimal form-based interface, interfacing with whichever triple store is configured for SMW. It's still very much a hack, and he's aiming to clean it up for the summit. But it looks potentially very interesting, and like a concept we could rally energy behind. The data from such a repository could then be pulled into WP templates, accessed through "wizards" that auto-generate template data for new articles, etc.

Anyone who wants to advance the thinking in this space should also consider what can be done today with Wikimedia Commons and SMW. Since Wikimedia Commons is an intrinsically multilingual database with focus on annotating individual files, its operational requirements are somewhat different from those of most other projects. It would be useful to have an instance of SMW running using a copy of the Wikimedia Commons database and possibly Semantic Forms to see what such annotation could look like in practice. Anyone with time and technical skills can put together prototypes like this that'll help us move forward.

Again, I think the likely path forward here is for us to ally effectively with the key players in the space, rather than doing all the work ourselves.

-- Erik Möller Deputy Director, Wikimedia Foundation

Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate

foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

aude

22 Nov 22 Nov

11:18 p.m.

New subject: [Foundation-l] [Commons-l] Wikidata

On Mon, Nov 22, 2010 at 5:31 PM, John Vandenberg jayvdb@gmail.com wrote:

...

On Tue, Nov 23, 2010 at 8:24 AM, Michael Peel email@mikepeel.net wrote:

...
...
Could this be part of dbpedia?

dbpedia is about collating the information available on Wikipedia and

providing that as a database for others to use. This is about having a central information store that can be edited to add information. Whilst dbpedia could seed wikidata, they're very different projects in the way they would operate.

I agree.

...
In my opinion, the Wikimedia Foundation should very seriously look into

starting something like wikidata. I don't suppose there's a facilitator that could be hired that knows about Wikimedia sufficiently to facilitate an on-wiki discussion and formation of a comprehensive proposal to start this project, including bringing together the various people interested in this project?

+1 Definitely want to see this implemented for Wikimedia. We had a bunch of related strategy proposals calling for us to do something like this:

http://strategy.wikimedia.org/wiki/Proposal:Data.wikimedia.org

http://strategy.wikimedia.org/wiki/Proposal:Data-driven_content

http://strategy.wikimedia.org/wiki/Proposal:Structured_Data

more...

We have our own data like coordinates that would be great to share across projects. Seeing governments and organisations (e.g. http://data.worldbank.org/, http://data.gov, http://data.gov.uk/ ...) jumping in on doing *open* data, we have an opportunity make use of it for infoboxes, charts, etc. Then, there's geodata from OpenStreetMap and elsewhere...

-Katie (@aude)

...

As it is the first new project in quite a long time, having a WMF staff member assigned to it would be brilliant. As this would/should involve the first deployment of semantic mediawiki by WMF, it would be good for that someone to already experienced with semantic medawiki.

-- John Vandenberg

foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

4989

Age (days ago)

4994

Last active (days ago)

wikimedia-l@lists.wikimedia.org

10 comments

8 participants

tags (0)

participants (8)

Andrea Zanni
aude
Brian J Mingus
Erik Moeller
Gerard Meijssen
Jan Kucera (Kozuch)
John Vandenberg
Michael Peel