Hi everyone,
As I said previously one of the major topics we need to work on is trust in our data. This among other things means adding reliable sources for statements. I would like us to keep an eye on the actual numbers there and track the progress. Is there anyone who'd like to hack up a small script/bot/page that shows us: * number of statements over time * number of sourced statements over time * number of statements with a source other than "imported from foo Wikipedia" etc
Having that would be really sweet.
Cheers Lydia
Hi Lydia,
I would be willing to support here... are other people for support and/or guidance on where to start?
Kind regards,
Daniel @dakoller Am 06.10.2013 19:36 schrieb "Lydia Pintscher" <lydia.pintscher@wikimedia.de
:
Hi everyone,
As I said previously one of the major topics we need to work on is trust in our data. This among other things means adding reliable sources for statements. I would like us to keep an eye on the actual numbers there and track the progress. Is there anyone who'd like to hack up a small script/bot/page that shows us:
- number of statements over time
- number of sourced statements over time
- number of statements with a source other than "imported from foo
Wikipedia" etc
Having that would be really sweet.
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Obentrautstr. 72 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
On Sun, Oct 6, 2013 at 10:35 PM, Daniel Koller daniel@dakoller.net wrote:
Hi Lydia,
I would be willing to support here... are other people for support and/or guidance on where to start?
Great! The best start is probably a download of the database dumps. You can find that here: https://www.wikidata.org/wiki/Wikidata:Database_download If you run into any issues or have questions just ask here. I'm sure there are people around to help.
Cheers Lydia
I have generated some stats, will present once Tools Labs has regained the ability to run PHP :-(
On Mon, Oct 7, 2013 at 9:53 AM, Lydia Pintscher < lydia.pintscher@wikimedia.de> wrote:
On Sun, Oct 6, 2013 at 10:35 PM, Daniel Koller daniel@dakoller.net wrote:
Hi Lydia,
I would be willing to support here... are other people for support and/or guidance on where to start?
Great! The best start is probably a download of the database dumps. You can find that here: https://www.wikidata.org/wiki/Wikidata:Database_download If you run into any issues or have questions just ask here. I'm sure there are people around to help.
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Obentrautstr. 72 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
OK, quickly, while Labs is up :-)
http://tools.wmflabs.org/wikidata-todo/stats.php
On Thu, Oct 10, 2013 at 10:28 AM, Magnus Manske <magnusmanske@googlemail.com
wrote:
I have generated some stats, will present once Tools Labs has regained the ability to run PHP :-(
On Mon, Oct 7, 2013 at 9:53 AM, Lydia Pintscher < lydia.pintscher@wikimedia.de> wrote:
On Sun, Oct 6, 2013 at 10:35 PM, Daniel Koller daniel@dakoller.net wrote:
Hi Lydia,
I would be willing to support here... are other people for support
and/or
guidance on where to start?
Great! The best start is probably a download of the database dumps. You can find that here: https://www.wikidata.org/wiki/Wikidata:Database_download If you run into any issues or have questions just ask here. I'm sure there are people around to help.
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Obentrautstr. 72 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
-- undefined
On Thu, Oct 10, 2013 at 2:05 PM, Magnus Manske magnusmanske@googlemail.com wrote:
OK, quickly, while Labs is up :-)
Magnus you're awesome. Thank you! This is very useful and from a quick look these numbers look really good.
Cheers Lydia
On 07/10/13 09:53, Lydia Pintscher wrote:
On Sun, Oct 6, 2013 at 10:35 PM, Daniel Koller daniel@dakoller.net wrote:
Hi Lydia,
I would be willing to support here... are other people for support and/or guidance on where to start?
Great! The best start is probably a download of the database dumps. You can find that here: https://www.wikidata.org/wiki/Wikidata:Database_download If you run into any issues or have questions just ask here. I'm sure there are people around to help.
First, if you want to download and analyse dumps automatically, then the wda script is your friend [1]. It knows where to get the dumps, it can get all relevant dump files (including dailies), and it can iterate through all dumps (or through all most recent page revisions in all dumps) to do something.
Second, I have created some stats on development over time a while back, using 14-day scan intervals. This is also done by the wda script, but using a MySQL database to store partly aggregated information from the dumps (problem: you have no random access to the dumps, the dumps mostly contain all revisions of one page in a sequence, while the analysis of development over time requires you to look at the revisions of all pages at one time; so one needs to reshuffle all the data first). The software also does some basic calendar calculations to find out which 14-day interval a revision belongs to.
I attach a figure produced from the resulting data. This is from mid July, so not current any more. But the software still exists if anyone wants to redo it now (most code should be in wda, but I might have some local scripts for actually using the code, which is not the standard operation of wda, obviously ;-). The code does not currently capture the number of references that are different from "imported from" but it would not be hard to add this. We will update these stats at some point in the future, but maybe not this weekend.
Cheers,
Markus