A friend of mine, noting that Wikipedia uses SVG images
for diagrams, asked if it also had some written guidelines
for how to write the SVG source code, in particular to
express measurement values in the original units rather
than on a pixel scale. He had found some SVG diagram that
made a curve from 140 pixels to 190 pixels, rather than
from 7 million to 9.5 million inhabitants, which was the
unit that the y axis displayed. (Or something like that.)
I said "probably not, your thinking is likely 5 years
ahead of the Wikipedia community".
As this all happened in April 2009, he came back yesterday
to ask where we are now.
Do we have any guidelines for how to hand-write the
source code of SVG diagrams? Should we?
Maybe this is related to Wikidata?
--
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik - http://aronsson.se
Apologies for cross-posting,
Submissions are invited for the 3rd Workshop on Issues of Sentiment Discovery
and Opinion Mining (WISDOM), an ICML14 workshop exploring the new frontiers of
big data computing for opinion mining through machine-learning techniques and
sentiment learning methods. For more information, please visit:
http://sentic.net/wisdom
RATIONALE
The distillation of knowledge from social media is an extremely difficult task
as the content of today's Web, while perfectly suitable for human consumption,
remains hardly accessible to machines. The opportunity to capture the opinions
of the general public about social events, political movements, company
strategies, marketing campaigns, and product preferences has raised growing
interest both within the scientific community, leading to many exciting open
challenges, as well as in the business world, due to the remarkable benefits to
be had from marketing and financial market prediction.
Statistical NLP has been the mainstream NLP research direction since late 1990s.
It relies on language models based on popular machine-learning algorithms such
as maximum-likelihood, expectation maximization, conditional random fields, and
support vector machines. By feeding a large training corpus of annotated texts
to a machine-learning algorithm, it is possible for the system to not only learn
the valence of keywords, but also to take into account the valence of other
arbitrary keywords, punctuation, and word co-occurrence frequencies. However,
standard statistical methods are generally semantically weak as they merely
focus on lexical co-occurrence elements with little predictive value
individually.
Endogenous NLP, instead, involves the use of machine-learning techniques to
perform semantic analysis of a corpus by building structures that approximate
concepts from a large set of documents. It does not involve prior semantic
understanding of documents; instead, it relies only on the endogenous knowledge
of these (rather than on external knowledge bases). The advantages of this
approach over the knowledge engineering approach are effectiveness, considerable
savings in terms of expert manpower, and straightforward portability to
different domains. Endogenous NLP includes methods based either on lexical
semantics, which focuses on the meanings of individual words (e.g., LSA, LDA,
and MapReduce), or compositional semantics, which looks at the meanings of
sentences and longer utterances (e.g., HMM, association rule learning, and
probabilistic generative models).
TOPICS
WISDOM aims to provide an international forum for researchers in the field of
machine learning for opinion mining and sentiment analysis to share information
on their latest investigations in social information retrieval and their
applications both in academic research areas and industrial sectors. The broader
context of the workshop comprehends opinion mining, social media marketing,
information retrieval, and natural language processing. Topics of interest
include but are not limited to:
• Endogenous NLP for sentiment analysis
• Sentiment learning algorithms
• Semantic multi-dimensional scaling for sentiment analysis
• Big social data analysis
• Opinion retrieval, extraction, classification, tracking and summarization
• Domain adaptation for sentiment classification
• Time evolving sentiment analysis
• Emotion detection
• Concept-level sentiment analysis
• Topic modeling for aspect-based opinion mining
• Multimodal sentiment analysis
• Sentiment pattern mining
• Affective knowledge acquisition for sentiment analysis
• Biologically-inspired opinion mining
• Content-, concept-, and context-based sentiment analysis
SPEAKER
Rui Xia is currently an assistant professor at School of Computer Science and
Engineering, Nanjing University of Science and Technology, China. His research
interests include machine learning, natural language processing, text mining and
sentiment analysis. He received the Ph.D. degree from the Institute of
Automation, Chinese Academy of Sciences in 2011. He has published several
refereed conference papers in the areas of artificial intelligence and natural
language processing, including IJCAI, AAAI, ACL, COLING, etc. He served on the
program commitee member of several international conferences and workshops
including IJCAI, COLING, WWW Workshop on MABSDA, KDD Workshop on WISDOM and ICDM
Workshop on SENTIRE. He is a member of ACM, ACL and CCF, and he is an operating
committee member of YSSNLP.
KEYNOTE
One one hand, most of the existing domain adaptation studies in the field of NLP
belong to the feature-based adaptation, while the research of instance-based
adaptation is very scarce. One the other hand, due to the explosive growth of
the Internet online reviews, we can easily collect a large amount of labeled
reviews from different domains. But only some of them are beneficial for
training a desired target-domain sentiment classifier. Therefore, it is
important for us to identify those samples that are the most relevant to the
target domain and use them as training data. To address this problem, we propose
two instance-based domain adpatation methods for NLP applications. The first one
is called PUIS and PUIW, which conduct instance adaptation based on instance
selection and instance weighting via PU learning. The second one is called
in-target-domain logistic approximation (ILA), where we conduct instance
apdatation by a joint logistic approximation model. Both of methods achieve
sound performance in high-dimentional NLP tasks such as cross-domain text
categorization and sentiment classification.
SUBMISSIONS AND PROCEEDINGS
Authors are required to follow Springer LNCS Proceedings Template and to submit
their papers through EasyChair. The paper length is limited to 12 pages,
including references, diagrams, and appendices, if any. As per ICML tradition,
reviews are double-blind, and author names and affiliations should not be
listed. Each submitted paper will be evaluated by three PC members with respect
to its novelty, significance, technical soundness, presentation, and
experiments. Accepted papers will be published in Springer LNCS Proceedings.
Selected, expanded versions of papers presented at the workshop will be invited
to a forthcoming Special Issue of Cognitive Computation on opinion mining and
sentiment analysis.
TIMEFRAME
• May 11th, 2014: Submission deadline
• May 25th, 2014: Notification of acceptance
• June 1st, 2014: Final manuscripts due
• June 25th, 2014: Workshop date
ORGANIZERS
• Yunqing Xia, Tsinghua University (China)
• Erik Cambria, Nanyang Technological University (Singapore)
• Yongzheng Zhang, LinkedIn Inc. (USA)
• Newton Howard, MIT Media Laboratory (USA)
Hey folks,
Due to a too high edit frequency there is currently a massive lag in
the change propagation to Wikipedia and co. This means changes don't
show up timely there (but will eventually). We've already started
stopping the worst offenders. I hope the lag goes down now. If not I
will have to forcefully stop a few more. If you are editing at a high
frequency with a bot or Widar please stop until the lag goes down.
We're taking measures to prevent this in the future.
Please keep an eye on the lag at
https://www.wikidata.org/wiki/Special:DispatchStats
Cheers
Lydia
--
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Hi!
I've implemented in Wikibase, with the help of the Wikidata
development team, a piece of code that allows to display an "Other
project" sidebar managed by Wikidata as it have been done for
interlanguage links [1]. This means that a link to Commons for example
can automatically be added to a Wikipedia article's sidebar based on
the data in Wikidata.
The goal is to replace the JavaScript based hacks used by a lot of
wikis like nl.wikipedia that build such kind of sidebars from
templates inserted in wiki text.
This new feature have been successfully deployed last Monday to French
Wikisource. See, for example, this page [2] (the sidebar section is
called "Autres projets" in French). I've also written a JavaScript
hack that adds a link to Wikidata in this sidebar (it isn't supported
yet by the extension) and that allows to override links using the old
template in order to ensure a smooth migration.
If you want to see this feature installed in your wiki, please start a
discussion on your local project chat and, when a consensus is
reached, open a bug in bugzilla [3] (component: "Site requests")
linking to the discussion and giving the ordered list of the sites to
display (this can be one or several of "wikipedia", "commons",
"wikiquote", "wikivoyage", "wikisource").
Feel free to ask if you have any questions.
Cheers,
Thomas (User:Tpt)
On Thursday, April 10, 2014, Magnus Manske <magnusmanske(a)googlemail.com>
wrote:
> There's a tool for that:
>
> http://tools.wmflabs.org/wikidata-terminator/index.php
>
> Check the third row ("Top 1000 items with missing articles").
http://tools.wmflabs.org/wikidata-terminator/index.php?list&lang=ca&mode=tx…
mostly
* Categories of years e.g. [[Category:1923]]
* Categories of born in...
* Categories of died in...
* Country data categories
All these categories exist in ca.wiki, but there are hundreds of them. Is
there a bot that, given the corresponding translations, could go fix those?
--
Quim Gil
Engineering Community Manager @ Wikimedia Foundation
http://www.mediawiki.org/wiki/User:Qgil
See below for an extract of the discussion on the recurring disappearance
of interface messages recently. It was a mistake for the discussion to
unfold on an internal list, but it happened quite by chance, starting with
an incident report and developing from there.
---
Ori Livneh
ori(a)wikimedia.org
---------- Forwarded message ----------
From: Ori Livneh <ori(a)wikimedia.org>
Date: Thu, Apr 10, 2014 at 1:23 AM
Subject: Re: [Engineering] Localisation not working on MediaWiki.org
To: "Brad Jorsch (Anomie)" <bjorsch(a)wikimedia.org>
Cc: Bryan Davis <bd808(a)wikimedia.org>, Development and Operations Engineers
<engineering(a)lists.wikimedia.org>
On Tue, Apr 8, 2014 at 6:56 AM, Brad Jorsch (Anomie)
<bjorsch(a)wikimedia.org>wrote:
> On Mon, Apr 7, 2014 at 9:37 PM, Bryan Davis <bd808(a)wikimedia.org> wrote:
>
>> The obvious change that caused this was that `mwversionsinuse
>> --withdb` changed from returning "1.23wmf21=testwiki" to
>> "1.23wmf21=test2wiki". This result is used within scap by the
>> mw-update-l10n script to run the maintenance script that builds the
>> ExtensionMessages file. In theory the exact wiki passed to `mwscript
>> mergeMessageFileList.php --wiki=<WIKIDB>` shouldn't matter, but
>> obviously there are now some circumstances where it does indeed
>> matter.
>>
>
> It looks to me like it has always mattered to an extent: the final result
> from maintenance/mergeMessageFileList.php is the combination of extensions
> loaded for the --wiki wiki (e.g. in CommonSettings.php) and the extensions
> loaded by the script itself from the passed list of extensions. Hopefully
> the latter is always a superset of the former so that turns out not to
> matter.
>
Interface messages went missing again on wikidata.org. l10nupdate ran
updates on cawikibooks, where $wmgUseWikibaseClient is false. The theory
that the exact wiki shouldn't make a difference is pretty shaky. You should
expect to run on testwiki and fail loudly if you can't.
We should rethink our whole approach; I don't have any confidence in the
architecture. What is especially damning is not so much the recurrence of
failures as the way they were discovered (that is to say: by chance) and
the hard time we have had reasoning about their cause and the state of
localization on the cluster generally.
Dear all,
I am happy to announce the very first release of Wikidata Toolkit [1],
the Java library for programming with Wikidata and Wikibase. This
initial release can download and parse Wikidata dump files for you, so
as to process all Wikidata content in a streaming fashion. An example
program is provided [2]. The libary can also be used with MediaWiki
dumps generated by other Wikibase installations (if you happen to work
in EAGLE ;-).
Maven users can get the library directly from Maven Central (see [1]);
this is the preferred method of installation. There is also an
all-in-one JAR at github [3] and of course the sources [4].
Version 0.1.0 is of course alpha, but the code that we have is already
well-tested and well-documented. Improvements that are planned for the
next release include:
* Faster and more robust loading of Wikibase dumps
* Support for various serialization formats, such as JSON and RDF
* Initial support for Wikibase API access
Nevertheless, you can already give it a try now. In later releases, it
is also planned to support more advanced processing after loading,
especially for storing and querying the data.
Feedback is welcome. Developers are also invited to contribute via github.
Cheers,
Markus
[1] https://www.mediawiki.org/wiki/Wikidata_Toolkit
[2]
https://github.com/Wikidata/Wikidata-Toolkit/blob/v0.1.0/wdtk-examples/src/…
[3] https://github.com/Wikidata/Wikidata-Toolkit/releases
(you'll also need to install the third party dependencies manually when
using this)
[4] https://github.com/Wikidata/Wikidata-Toolkit/
Hey folks :)
Just wanted to let you know that we have just enabled interwiki links
for Wikiquote via Wikidata. Issues, questions and more please to
https://www.wikidata.org/wiki/Wikidata:Wikiquote
Welcome, Wikiquote!
Cheers
Lydia
--
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.