Hi all,
I'd like to share a little Wikidata application: I just used Wikidata to
guess the sex of people based on their (first) name [1]. My goal was to
determine gender bias among the authors in several research areas. This
is how some people spend their free time on weekends ;-)
In the process, I also created a long list of first names with
associated sex information from Wikidata [2]. It is not super clean but
it served its purpose. If you are a researcher, then maybe the gender
bias of journals/conferences is interesting to you as well. Details and
some discussion of the results are online [1].
Cheers,
Markus
[1] http://korrekt.org/page/Note:Sex_Distributions_in_Research
[2]
https://docs.google.com/spreadsheet/ccc?key=0AstQ5xfO-xXGdE9UVkxNc0JMVWJzNm…
Crossposting to all lists of interest, sorry for the fancy long title
but should be easier to search for future reference.
I recently had the opportunity to mentor during the GHCOSD[0] for the
Wikimedia Foundation. We were two mentors from the Foundation, and I
took on mentoring what we called the challenging tasks: collaborating
your first patch and writing your first bot.
This e-mail is about my approach to the writing your first bot task,
posting it here for future reference in case someone finds it useful,
and for comments/opinions. My approach consisted of challenging the
participants to write a game called Wikiflashcards. The game would use
pygame[1] to display an index card with the name of a country and,
after clicking, it would reveal the name of the capital city of that
country. The frontend was all given[2] so that participants wouldn't
have to worry about pygame at all (yet, we learned all the possible
ways to install pygame on a relatively old Mac, pretty complicated),
instead their task was to implement the backend using pywikibot to
generate the list of cities and getting the capital for each city.
This would naturally introduce the concept of listing a set of pages
of interest, searching through the wikicode, mining templates,
filtering links, etc.
This approach differs from that of teaching people how to use
pywikibot to collaborate directly with the wikipedia. My hypothesis is
that teaching how to use these tools to "scratch your own itch",
personal research, hobby, etc would make people match pywikibot with
their own interest, make them active users of the framework and that
will eventually lead them to use their expertise to collaborate with
any of the WMF projects.
After finishing a first version of the backend, I introduced the
concept and purpose of Wikidata, challenged the participants to
rewrite the backend using Wikidata items and properties and compare
the two approaches - in particular, the complexity of the first
approach vs the advantages of having a new backend ready for i18n and
whatnot. The goal was to naturally introduce the need of a structured
way to store and retrieve data, since I believe a direct introduction
to Wikidata to someone that has never been involved in a task of
mining data out of a Wikipedia looks very artificial.
At the end the challenge seemed to be very engaging for the
participants, and I had positive feedback about it but that doesn't
really tell if the goals listed above were achieved or not. If you
have further comments or questions just let me know.
Disclaimer: I'm not implying this is a good idea (in particular, I'm
not implying this was the best idea for this particular event), just
my idea.
David E. Narvaez
[0] http://gracehopper.org/2013/conference/grace-hopper-open-source-day/
[1] http://pygame.org/news.html
[2] https://gitorious.org/wiki-flash-cards/wiki-flash-cards
Hoi,
I think it makes more sense to contribute it to the upcoming Wiktionary
effort on Wikidata.
Thanks,
GerardM
On 8 October 2013 12:21, Judit, Ács <acs.judit(a)sztaki.hu> wrote:
> Dear Wiktionary Community,
>
> We have been working on a triangulation method to expand existing
> dictionaries in many languages. We were able to parse translations from 40
> Wiktionary editions and using these as seed dictionaries (appr. 3.6M
> translation pairs), we created an additional 16M pairs in 50 languages. It
> is possible to extend the number of languages.
>
> While the automatically generated dictionary is not a 100% correct, with
> correct filtering, 90%+ can be reached.
>
> One version of the parsed Wiktionaries and the generated pairs can be found
> here: https://www.dropbox.com/sh/r95tdr52o5rzzrw/a54Y66YGOJ
> We used dumps from August to create these.
> The software used to build dictionaries:
> https://github.com/juditacs/wikt2dict
>
> Do you think there is a way to contribute this dictionary back to
> Wiktionary?
>
> Best,
> Judit Ács
> _______________________________________________
> Wiktionary-l mailing list
> Wiktionary-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
>
Hello everyone,
Since Lydia stated her last summary update was last week, I have spoken
with her and have taken on the summaries myself in order to keep these
active. With that said, here is the summary of all the things that have
happened this week.
https://meta.wikimedia.org/wiki/Wikidata/Status_updates/2013_10_11
This week includes some sourcing statistics, a request from the development
team to help priortitise their work and an interesting interview with Lydia.
John
--
Hi everyone,
As I said previously one of the major topics we need to work on is
trust in our data. This among other things means adding reliable
sources for statements. I would like us to keep an eye on the actual
numbers there and track the progress.
Is there anyone who'd like to hack up a small script/bot/page that shows us:
* number of statements over time
* number of sourced statements over time
* number of statements with a source other than "imported from foo
Wikipedia" etc
Having that would be really sweet.
Cheers
Lydia
--
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata
Wikimedia Deutschland e.V.
Obentrautstr. 72
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Dear All,
by the end of the GSoC2013 period and as an output of Wikidata integration
inside DBpedia project ,
another new Updated RDF DBpedia Dumps for Wikidata Data is now available to
download.
the main Features of those dumps than
V0.1<https://github.com/hadyelsahar/extraction-framework/wiki/WikiData-DBpedia-D…>are
:
- properties mapped dumps from Wikidata to DBpedia
- better datatype handling dumps (automatic assignment of Wikidata (Time
, GlobeCoordinate , CommonMedia files , String ) to their equivalent values
in xsd datatypes and DBpedia equivalent properties)
your Feedback is always needed of course to review the exported data and
enhance it.
thanks
Regards
-------------------------------------------------
Hady El-Sahar
Research Assistant
Center of Informatics Sciences | Nile University<http://nileuniversity.edu.eg/>
Hi.
In Wikimedia Slovakia we are preparing Wikimedia Central and East Europe
Meeting, see https://meta.wikimedia.org/wiki/Wikimedia_CEE_Meeting_2013
[1]
There is interest about Wikidata and I would like to invite Wikidata
representative to have speak there. Unfortunately, core WD team can not
attend so I would like to invite a community member. We prefer a member
from CEE region; travel costs will be reimbursed.
The main goal of the meeting is to fortify interstate and international
collaboration between various Wikimedia chapters, thematic
organisations, user groups and other communities in Central and East
Europe and its regions so it would be great if speak about Wikidata
could be in line with this topic. Of course, it is good but not
compulsory ;)
Please contact me if you feel qualified for such speak / seminar.
Thanks in advance for your respond!
--
Michal Matúšov
[[:w:sk:Redaktor:KuboF]]
Wikimedia Slovenská republika
Dom služieb ALFA
Februárová 1478/2
958 01 Partizánske
predseda / president
Web: http://wikimedia.sk [2]
Facebook: www.facebook.com/WikimediaSK [3]
Twitter: https://twitter.com/WikimediaSK [4]
Links:
------
[1] https://meta.wikimedia.org/wiki/Wikimedia_CEE_Meeting_2013
[2] http://wikimedia.sk
[3] http://www.facebook.com/WikimediaSK
[4] https://twitter.com/WikimediaSK
In an item there is since recently a section for a link to Commons. I noticed that only one item can be linked to one category on Commons, while we have for many years two places where we would like to the same category on Commons.
Example:
We have a category and an article about Germany. From both the article as the category we would like to link to the Category:Germany on Commons. This system is in use on many Wikipedias.
So please enable the possibility for items that contain categories only two add the Commons category also to that page. Now we keep having Commonscat links in the section "Statements" and in the section "Wikimedia Commons page linked to this item" and it is only implemented half...
Secondly I also would like to know in the sidebar on Commons both which articles and which categories on Wikipedias link to this Commons category. So please split the section in the sidebar in two sections of interwikilinks, one section for categories on Wikipedia about that subject and one section for articles on Wikipedia about that subject.
Romaine