So I got a copy of the paper (thanks, Phoebe!) and skimmed it quickly, and
I'm not thrilled with the result.
Their translation of questions into Wikipedia queries was sophisticated
from a language processing point of view, but naive from a search point of
view. "How tall is Claudia Schiffer?" became search terms (Claudia
Schiffer, tall), though any sophisticated searcher should know that height
is usually listed under "height", not "tall". (The query still works
because it gets to the Claudia Schiffer wiki page. The drop the word
"produce" from a question about where beer is produced, but leave it in for
a producer (but don't use "producer", which is the expected specific
title). They also don't take advantage of any knowledge of Wikipedia, and
don't search for the obvious "list of X" articles that often answer the
questions with sortable tables. In one paragraph they mentioned the first
page of 20 results, and in the next they said they only looked at 5. So,
Wikipedia got short shrift, esp. as used by a moderately sophisticated user.
They did also skew their scores by dropping two queries that were too
complex and computing recall, precision, and F-score without them.
They didn't seem to mention in this paper the manual effort of mapping
infoboxes to whatever representation they used, and they never mentioned
the computational power required by the human to map the question to the
infobox components and the advantage this gives—again, especially in
comparison to the way they naively adapted the queries to Wikipedia search
terms. A commensurate level of effort put into the wiki searches would give
much much better results.
Still very interesting food for thought in terms of mapping infoboxes to
properties and entities.
Software Engineer, Discovery
On Tue, Aug 25, 2015 at 9:47 AM, Trey Jones <tjones(a)wikimedia.org> wrote:
On Tue, Aug 25, 2015 at 7:58 AM, Oliver Keyes
So it's a comparison of two search systems,
neither of which we use?
Well, sure... but they describe an interesting search paradigm that I
don't think we've even been considering (in the available paper). It's not
the type of query-by-example I'm used to seeing.
They intercept requests for wiki pages and convert infoboxes into
structured query forms that allow some basic boolean syntax. It converts
these queries into SPARQL and hits DBpedia to get results. Sounds
They do mention briefly in section 3.1 (last paragraph) that they
basically need a custom ("page-dependent") mapping from any given infobox
to appropriate internal representations for mapping to SPARQL. There are
some obvious machine learning approaches to try there. Since they don't
mention any machine learning, I assume they have done them manually, which
may or may not scale, depending on how many queries of the sort they are
interested in are covered by *n* manually mapped infobox types. Either way,
it's potentially brittle, since the Wikipedians tending the infoboxes won't
know about SWIPE.
As for the comparison to Xser (which I'm not familiar with, though it's
described here: http://ceur-ws.org/Vol-1180/CLEF2014wn-QA-XuEt2014.pdf
and plain keyword searches in Wikipedia, I'd really need to see the full
paper to comment properly, but I have some questions (which they may well
answer in the paper).
Plain keyword searches in Wikipedia are a fine baseline, though I wonder
if they preprocessed the natural language queries, or just tossed the whole
question into search (which it is not meant to handle, though it often
works anyway). And I don't know what counts as success—one of the first *n*
results contains the answer? How hard would a human have to look on a page
for the answer?
It seems that the SWIPE system requires a human to translate the query
into the infobox template (and know which template to use!). So, for the
query "who has Tom Cruise been married to?" (from the Xser paper), it seems
the user has to convert "married to" into the "spouse(s)" field of
person infobox—which is pushing the NLP processing into the human (of which
I am a fan, though it is not automatic).
I'm not liking that they claim 96% recall "among all answered
questions"—you don't get to ignore the ones you failed to answer when
calculating recall! 100% precision is nice.
Xser seems more like the NLP system I would have first imagined—parse a
query, convert it into a structured format, and hit the RDF store for
answers. SWIPE seems to get the human to do the hard parts (parsing and
converting to a structured format, with the help of existing infoboxes), so
of course it does better than Xser.
So what do we get out of this? If you haven't already thought of WDQS,
then you weren't paying attention! We could make things easier (for us, for
SWIPE, for anyone), if we could develop a standard way to map infobox
template fields to WDQS properties and contents to entities (someone
must've thought of this already).
Parsing the content of those fields (if you know what they are supposed to
contain) is easier than parsing random queries or other chunks of text.
That info could be used to automatically or semi-automatically populate
WDQS, or to refer WDQS results back to relevant Wiki pages, or turn
templates into query forms as SWIPE does.
Whether any of this gets onto our roadmap this century is a different
question, but there are some interesting things to think about here.
So, can anyone get me a copy of the full paper?
Thanks for the pointer, Tilman!
Software Engineer, Discovery
On 25 August 2015 at 10:54, Tilman Bayer
FYI just in case it's of interest and
hasn't shown up on the team's
Quote from the abstract:
"This paper discusses expressivity and accuracy of the By-Example
Structured (BESt) Query paradigm implemented on the SWiPE system
through the Wikipedia interface. We define an experimental setting
based on the natural language questions made available by the QALD-4
challenge, in which we compare SWiPE against Xser, a state-of-the-art
Question Answering system, and plain keyword search provided by the
Wikipedia Search Engine. The experiments show that SWiPE outperforms
the results provided by Wikipedia, and it also performs sensibly
better than Xser, obtaining an overall 85% of totally correct answers
vs. 68% of Xser."
(For context, there's an earlier paper where they describe an earlier
version of that SWiPE - "Search Wikipedia by example" - project:
IRC (Freenode): HaeB
Wikimedia-search mailing list
Wikimedia-search mailing list