Dear all,
I have a vacancy in my group which should be especially of interest for
people who are into Wikisource and/or Wikidata.
We are looking for a Wikipedian in Residence for a duration of up to
three years.
The Job is part of a new project "NOA - The replication of open access
images: development of a method for automatic harvesting, indexing and
provision of open access images from technical subjects using the
infrastructure of Wikimedia Commons and Wikidata".
You will play a leading role in the conceptual design of harvesting Open
Access articles issued by various publishers.
Have a look at the full English job description here:
https://www.tib.eu/en/tib/careers-and-apprenticeships/vacancies/details/
stellenausschreibung-nr-372016/
Please note that applications should be send before 12th August 2016!
For any questions about the project and the job, please contact me. I'm
looking forward to all questions and applications!
Best,
Lambert
Lambert Heller
Technische Informationsbibliothek (TIB)
German National Library of Science and Technology
Open Science Lab
Welfengarten 1 B // 30167 Hannover, Germany
T +49 511 762-5348
lambert.heller(a)tib.eu
www.tib.eu <http://www.tib.eu/>
Hi all,
I joined this list right now to answer questions about the results that we got when
experimenting with data of Wikidata and different engines (such paper that Aidan
commented two days ago).
About the issue that we had with Blazegraph timeouts, you can see the discussion
on the Blazegraph mailing list:
https://sourceforge.net/p/bigdata/mailman/message/35028029/
Stas, I tried using the analytic mode, but it does not solve the timeouts issue.
Also, I tried with different garbage collectors and heap sizes, but no configuration
solves the issue. Every time I got only timeouts after getting the first timeout.
Cheers,
Daniel
Full disclosure: I am the creator of the Project Grant application for
Arc.heolo.gy <http://arc.heolo.gy/>, located here:
https://meta.wikimedia.org/wiki/Grants:Project/Arc.heolo.gy
I hope for this to be a general discussion on potential applications,
criticisms, questions, technological recommendations, and community
discussion about a graph representation of Wikipedia.
Currently, the project has a live Neo4j Graph database built and parsed
from a download of the English language Wikipedia from April. I have
temporarily hosted the database instance both on my local machine and a
SoftLayer server provided under a temporary entrepreneur credit.
My goal is two fold.
On the backend: refine the parsing algorithm (I am getting some incorrect
relationships in the database), automate the parsing so that it updates the
database frequently, expand language support, and perform semantic parsing
to weight individual relationships to strengthen the ability to filter out
extraneous relationships.
On the frontend: I have done little to zero work here beyond pure
conceptualization. I would hope to use an asynchronous front-end javascript
framework to build both a 2d (d3) and 3d (webGL) interface to be able to
explore the database with a high amount of control and ease.
If any of you would like to access the database for exploration, please
contact me privately and I will give you credentials.
Any recommendations on parsing, hosting, visualization, or otherwise are
appreciated. Endorsements and Volunteers are also highly appreciated!
p.s. I am new to directly engaging with the Wiki community, and if I
committed some faux pas in starting this thread please let me know and I
will do my best to correct it.
--
╭╮
╭╮┃┃
╭╮ ╭╮┃┃┃┃╭╮
┃┃ ╭╮ ┃╰╯╰╯┃┃╰
╭╮┃┃╭╮┃┃╭╮┃ ╰╯
╭╮ ┃┃┃┃┃╰╯┃┃╰╯
┃┃╭╮┃╰╯┃┃ ╰╯
╮┃╰╯┃┃ ╰╯
╰╯ ┃┃
╰╯
Hey all,
Recently we wrote a paper discussing the query performance for Wikidata,
comparing different possible representations of the knowledge-base in
Postgres (a relational database), Neo4J (a graph database), Virtuoso (a
SPARQL database) and BlazeGraph (the SPARQL database currently in use)
for a set of equivalent benchmark queries.
The paper was recently accepted for presentation at the International
Semantic Web Conference (ISWC) 2016. A pre-print is available here:
http://aidanhogan.com/docs/wikidata-sparql-relational-graph.pdf
Of course there are some caveats with these results in the sense that
perhaps other engines would perform better on different hardware, or
different styles of queries: for this reason we tried to use the most
general types of queries possible and tried to test different
representations in different engines (we did not vary the hardware).
Also in the discussion of results, we tried to give a more general
explanation of the trends, highlighting some strengths/weaknesses for
each engine independently of the particular queries/data.
I think it's worth a glance for anyone who is interested in the
technology/techniques needed to query Wikidata.
Cheers,
Aidan
P.S., the paper above is a follow-up to a previous work with Markus
Krötzsch that focussed purely on RDF/SPARQL:
http://aidanhogan.com/docs/reification-wikidata-rdf-sparql.pdf
(I'm not sure if it was previously mentioned on the list.)
P.P.S., as someone who's somewhat of an outsider but who's been watching
on for a few years now, I'd like to congratulate the community for
making Wikidata what it is today. It's awesome work. Keep going. :)
Dear Glorian,I need my October 1971 psychological evaluation from the AFEES Center in Milwaukee un- edited and non- redacted.
Powered by Cricket Wireless
-------- Original message --------From: Gerard Meijssen <gerard.meijssen(a)gmail.com> Date: 8/5/16 1:12 PM (GMT-06:00) To: "Discussion list for the Wikidata project." <wikidata(a)lists.wikimedia.org> Subject: Re: [Wikidata] Hello Wikidata!
Hoi,
Are there any specific tasks for you? What is it we can bother you with?
PS Welcome :)
Thanks,
GerardM
On 5 August 2016 at 15:20, Glorian Yapinus <glorian.yapinus(a)wikimedia.de> wrote:
Hi folks!
My name is Glorian Yapinus, but you can simply call me Glorian ;) . For the next 6 months, I will assist Lydia in supporting you all.
Regarding to my educational background, I hold a bachelor's degree in Information Technology and currently, I am working on my Master's in Software Engineering and Management.
I am a warm and nice person. So, please do not hesitate to reach out to me for any queries :-)
Last but not least, I am looking forward to working with you.
Cheers,
Glorian
-- Glorian YapinusProduct Management Intern for Wikidata
Imagine a world, in which every single human being can freely share in the sum of all knowledge. That‘s our commitment.
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
I created a pretty rough, but hopefully still usable, help page for
manually inputting and moving Authority control metadata from Wikipedia to
Wikidata for new pages that have no associated Wikidata item.
https://www.wikidata.org/wiki/Wikidata:Tours/Authority_control
Wasn't sure if it is okay to create a page on Wikidata, but I followed the
Be Bold! mantra and just went for it.
And again this is not the most elegant solution, but I wanted there to be
some sort of graphic documentation on how to update Authority Control.
Best,
- Erika
*Erika Herzog*
Wikipedia *User:BrillLyle <https://en.wikipedia.org/wiki/User:BrillLyle>*
Dear all,
I hereby announce the release of Wikidata Toolkit 0.7.0 [1], the Java
library for programming with Wikidata and Wikibase.
This is a maintenance release that implements several fixes to ensure
that WDTK can be used with recent Wikidata API outputs and future
Wikidata JSON dumps.
The new version also ships the code used to generate the basic
statistics used in the back of the SQID Wikidata Browser [2].
Maven users can get the library directly from Maven Central (see [1]);
this is the preferred method of installation. There is also an
all-in-one JAR at github [3] and of course the sources [4] and updated
JavaDocs [5].
As usual, feedback is welcome. Developers are also invited to contribute
via github.
Cheers,
Markus
[1] https://www.mediawiki.org/wiki/Wikidata_Toolkit
[2] https://tools.wmflabs.org/sqid/
[3] https://github.com/Wikidata/Wikidata-Toolkit/releases
[4] https://github.com/Wikidata/Wikidata-Toolkit/
[5] http://wikidata.github.io/Wikidata-Toolkit/
--
Prof. Dr. Markus Kroetzsch
Knowledge-Based Systems Group
Faculty of Computer Science
TU Dresden
+49 351 463 38486
https://iccl.inf.tu-dresden.de/web/KBS/en
Saw this posted on Twitter.
https://meta.wikimedia.org/wiki/Grants:Project/Putnik/Wikidata_module
This proposal is my greatest fear with Wikidata. Depreciate Infoboxes to Wikidata so casual Wikipedia editors can't edit on Wiki, are forced to use Wikidata (comparable to existing Authority Control depreciation). Huge barrier for Wikipedia end-users.
Before I voice my concerns on this Grant page, I wondered if the end-user issue has been discussed here -- and if this could be explained why it is such a good idea? And what user issues have been and could be addressed before the project is implemented.
I understand something like this is part of Russian Wikipedia. How did that community respond to this what I see as significant change?
- Erika
-------------------------------------------------------------------------------
WSDM Cup 2017: Call for Participation
-------------------------------------------------------------------------------
We invite you to take part in one of the following shared tasks:
Task 1.
Vandalism Detection -- Given a Wikidata revision, is it damaging?
This task is about detecting vandalism as well as all other kinds of
damaging
edits to Wikidata. In doing so, not only Wikidata's integrity is protected,
but
also that of all information systems making use of the knowledge base.
Task 2.
Triple Scoring -- Compute relevance scores for triples from type-like
relations.
For example, the triple "Johnny_Depp profession Actor" should get a high
score,
because acting is Depp's main profession, whereas "Quentin_Tarantino
profession
Actor" should get a low score, because Tarantino is more of a director than
an
actor. Such scores are a basic ingredient for ranking results in entity
search.
Learn more at http://www.wsdm-cup-2017.org
Register now at https://goo.gl/forms/JaVQwFFewLtVFCik2
-------------------------------------------------------------------------------
Important Dates
-------------------------------------------------------------------------------
now open Registration
Sep 1, 2016 Training data release
Dec 8, 2016 Final software submission
Dec 22, 2016 Announcement of evaluation results
Jan 5, 2017 Paper submission
Feb 6-10, 2017 Conference and WSDM Cup workshop
All deadlines are 11:59 PM, anywhere on earth (AoE).
-------------------------------------------------------------------------------
Special Announcements
-------------------------------------------------------------------------------
Evaluation as a Service.
For the sake of reproducability, we ask you to submit your software instead
of
just its run output. Software submissions allow for preserving your software
in working condition, and for re-evaluating it as new datasets appear.
To facilitate software submissions, we will make use of the cloud-based
evaluation platform TIRA (www.tira.io).
Open Source Proceedings.
We encourage the open source release of your software. To maximize the
impact
of your software, we collect it at a central repository on GitHub:
https://github.com/wsdm-cup-2017
Private repositories can be assigned to you at request during the
competition.
Benefits for early birds.
Submitting your software or your notebook early, as well as registering
early
for the conference will be rewarded. Check out the specific benefits on
our web page at http://www.wsdm-cup-2017.org