Hi,
Just wanted to express my belated support for such dumps:
- We encounter the same problem in research, and both for efficiency,
reproducibility, and authoritativeness a centralized solution would be
great.
- Besides the filtering for existence in Wikipedia, I'd see much potential
in removing labels. In most of our use cases labels are not needed in the
computations, and where introspection is needed, one can selectively add
them post-hoc. Alternatively, only retaining English labels would also save
much (and I don't see concerns of cultural bias, as long as we only use
them as decoration, not inside computations).
Thanks also for pointing out the WDumper tool, this looks great. Maybe it
would be worth to highlight selected dumps prominently on its start page?
(the names in "recent dumps" alone are not always informative, so one has
to inspect the specs one by one, and arguably, the larger number also means
some loss of authoritativeness)
Cheers,
Simon
> Hey all,
>
> As someone who likes to use Wikidata in their research, and likes to
> give students projects relating to Wikidata, I am finding it more and
> more difficult to (recommend to) work with recent versions of Wikidata
> due to the increasing dump sizes, where even the truthy version now
> costs considerable time and machine resources to process and handle. In
> some cases we just grin and bear the costs, while in other cases we
> apply an ad hoc sampling to be able to play around with the data and try
> things quickly.
>
> More generally, I think the growing data volumes might inadvertently
> scare people off taking the dumps and using them in their research.
>
> One idea we had recently to reduce the data size for a student project
> while keeping the most notable parts of Wikidata was to only keep claims
> that involve an item linked to Wikipedia; in other words, if the
> statement involves a Q item (in the "subject" or "object") not linked to
> Wikipedia, the statement is removed.
>
> I wonder would it be possible for Wikidata to provide such a dump to
> download (e.g., in RDF) for people who prefer to work with a more
> concise sub-graph that still maintains the most "notable" parts? While
> of course one could compute this from the full-dump locally, making such
> a version available as a dump directly would save clients some
> resources, potentially encourage more research using/on Wikidata, and
> having such a version "rubber-stamped" by Wikidata would also help to
> justify the use of such a dataset for research purposes.
>
> ... just an idea I thought I would float out there. Perhaps there is
> another (better) way to define a concise dump.
>
> Best,
> Aidan
Hello all,
Wikimedia Deutschland is currently looking for a Partner Relationship
Manager for Wikidata and Wikibase. Working inside the software department,
the person would contribute to create, develop and strengthen relationships
with GLAM institutions, science organizations, companies, etc. related to
Wikidata and Wikibase.
You will find all the details about the position and the application
process here.
<https://wikimedia-deutschland.softgarden.io/job/5732700/Partner-Relationshi…>
For this position, both English and German skills are required.
If you have any remaining questions about the position, feel free to write
me a direct email.
Cheers,
--
Léa Lacroix
Project Manager Community Communication for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
Hello all,
Wikimedia Deutschland is currently looking for a community communication
manager for Wikidata and Wikibase. This person would work with me and my
colleagues to take care of communication between our software department
and the Wikidata and Wikibase communities: gathering feedback from the
communities, communicating about our features, supporting community
initiatives and helping them to grow.
You will find all the details about the position and the application
process here.
<https://wikimedia-deutschland.softgarden.io/job/5730915/Community-Communica…>
If you have any remaining questions about the position, feel free to write
me a direct email.
Cheers,
--
Léa Lacroix
Project Manager Community Communication for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
============================
Call for Papers - ACM UMAP 2020
28th ACM International Conference on User Modeling, Adaptation and
Personalization
Theme: "Responsible Personalization"
Genoa (Italy)
July 14-17, 2020
Website: https://www.um.org/umap2020/
============================
Abstracts due: January 31, 2020 (mandatory)
Papers due: February 7, 2020
============================
BACKGROUND AND SCOPE:
============================
ACM UMAP – User Modeling, Adaptation and Personalization – is the premier
international conference for researchers and practitioners working on
systems that adapt to individual users or to groups of users, and that
collect, represent, and model user information. ACM UMAP is sponsored by
ACM SIGCHI and SIGWEB, and organized with User Modeling Inc. as the core
Steering Committee, extended with past years’ chairs. The proceedings are
published by ACM and will be part of the ACM Digital Library.
ACM UMAP covers a wide variety of research areas where personalization and
adaptation may be applied. This includes a number of domains in which
researchers are engendering significant innovations based on advances in
user modeling and adaptation, recommender systems, adaptive educational
systems, intelligent user interfaces, e-commerce, advertising, digital
humanities, social networks, personalized health, entertainment, and many
more.
We welcome submissions related to user modeling, personalization and
adaptation in any area; the conference web site provides a detailed (but
not proscriptive) list of topics and sub-topics of importance to the
conference. As the theme for UMAP 2020 is “Responsible Personalization,”
submissions in all areas that emphasize ethical dimensions of personalized
systems are welcome.
CONFERENCE TOPICS:
============================
For details, see the conference website (https://www.um.org/umap2020/)
* Personalized Recommender Systems
* Adaptive Hypermedia and the Semantic Web
* Intelligent User Interfaces
* Personalized Social Web
* Technology-Enhanced Adaptive Learning
* Privacy, Fairness, and Transparency
* Personalized Health
* User Modeling and Personalization Applications
* Theory, Opinion, Reflection
SUBMISSION AND REVIEW PROCESS
============================
Papers will be submitted through EasyChair:
https://easychair.org/conferences/?conf=acmumap2020
Long (8 pages + references) and Short (4 pages + references) papers in ACM
style. Original research papers addressing the theory and/or practice of
UMAP, and papers showcasing innovative use of UMAP and exploring the
benefits and challenges of applying UMAP technology in real-life
applications and contexts are welcome.
Long papers should present original reports of substantive new research
techniques, findings, and applications of UMAP. They should place the work
within the field and clearly indicate its innovative aspects. Research
procedures and technical methods should be presented in sufficient detail
to ensure scrutiny and reproducibility. Results should be clearly
communicated and implications of the contributions/findings for UMAP and
beyond should be explicitly discussed.
Short papers should present original and highly promising research or
applications. Merit will be assessed in terms of originality and importance
rather than maturity, extensive technical validation, and user studies.
Separation of long and short papers will be strictly enforced so papers
will not compete across categories, but only within each category. Papers
that receive high scores and are considered promising by reviewers, but
didn’t make the acceptance cut may be revised and resubmitted as posters.
Papers must be formatted using the ACM SIG Standard (SIGCONF) proceedings
template: https://www.acm.org/publications/proceedings-template.
All accepted papers will be published by ACM and will be available via the
ACM Digital Library. To be included in the Proceedings, at least one author
of each accepted paper must register for the conference and present the
paper there. Student registration fee is allowed to students who present a
student paper.
IMPORTANT DATES
============================
Abstracts: January 31, 2020 (mandatory)
Full paper: February 7, 2020
Notification: March 27, 2020
Camera-ready: May 3, 2020
Note: The submissions times are 11:59pm AoE time (Anywhere on Earth)
ORGANIZERS
============================
General chairs
Tsvi Kuflik, The University of Haifa
Ilaria Torre, University of Genoa
Program chairs
Robin Burke, University of Colorado, Boulder
Cristina Gena, University of Turin
RELATED EVENTS
============================
Separate calls will be sent for Workshops and Tutorials, Doctoral
Consortium, Posters, and Late Breaking Results, as these have different
deadlines and submission requirements.
Regards,
Oghenemaro Anuyah
Graduate Research Assistant
Dept. of Computer Science, Boise State University
ACM UMAP Conference 2020 - *Co-publicity Chair*
oghenemaroanuyah(a)u.boisestate.edu