So I got a copy of the paper (thanks, Phoebe!) and skimmed it quickly, and I'm not thrilled with the result.
Their translation of questions into Wikipedia queries was sophisticated from a language processing point of view, but naive from a search point of view. "How tall is Claudia Schiffer?" became search terms (Claudia Schiffer, tall), though any sophisticated searcher should know that height is usually listed under "height", not "tall". (The query still works because it gets to the Claudia Schiffer wiki page. The drop the word "produce" from a question about where beer is produced, but leave it in for a producer (but don't use "producer", which is the expected specific title). They also don't take advantage of any knowledge of Wikipedia, and don't search for the obvious "list of X" articles that often answer the questions with sortable tables. In one paragraph they mentioned the first page of 20 results, and in the next they said they only looked at 5. So, Wikipedia got short shrift, esp. as used by a moderately sophisticated user.
They did also skew their scores by dropping two queries that were too complex and computing recall, precision, and F-score without them.
They didn't seem to mention in this paper the manual effort of mapping infoboxes to whatever representation they used, and they never mentioned the computational power required by the human to map the question to the infobox components and the advantage this gives—again, especially in comparison to the way they naively adapted the queries to Wikipedia search terms. A commensurate level of effort put into the wiki searches would give much much better results.
Still very interesting food for thought in terms of mapping infoboxes to properties and entities.
—Trey
Trey Jones Software Engineer, Discovery Wikimedia Foundation
On Tue, Aug 25, 2015 at 9:47 AM, Trey Jones tjones@wikimedia.org wrote:
On Tue, Aug 25, 2015 at 7:58 AM, Oliver Keyes okeyes@wikimedia.org wrote:
So it's a comparison of two search systems, neither of which we use?
Well, sure... but they describe an interesting search paradigm that I don't think we've even been considering (in the available paper). It's not the type of query-by-example I'm used to seeing.
They intercept requests for wiki pages and convert infoboxes into structured query forms that allow some basic boolean syntax. It converts these queries into SPARQL and hits DBpedia to get results. Sounds reasonable.
They do mention briefly in section 3.1 (last paragraph) that they basically need a custom ("page-dependent") mapping from any given infobox to appropriate internal representations for mapping to SPARQL. There are some obvious machine learning approaches to try there. Since they don't mention any machine learning, I assume they have done them manually, which may or may not scale, depending on how many queries of the sort they are interested in are covered by *n* manually mapped infobox types. Either way, it's potentially brittle, since the Wikipedians tending the infoboxes won't know about SWIPE.
As for the comparison to Xser (which I'm not familiar with, though it's described here: http://ceur-ws.org/Vol-1180/CLEF2014wn-QA-XuEt2014.pdf ) and plain keyword searches in Wikipedia, I'd really need to see the full paper to comment properly, but I have some questions (which they may well answer in the paper).
Plain keyword searches in Wikipedia are a fine baseline, though I wonder if they preprocessed the natural language queries, or just tossed the whole question into search (which it is not meant to handle, though it often works anyway). And I don't know what counts as success—one of the first *n* results contains the answer? How hard would a human have to look on a page for the answer?
It seems that the SWIPE system requires a human to translate the query into the infobox template (and know which template to use!). So, for the query "who has Tom Cruise been married to?" (from the Xser paper), it seems the user has to convert "married to" into the "spouse(s)" field of the person infobox—which is pushing the NLP processing into the human (of which I am a fan, though it is not automatic).
I'm not liking that they claim 96% recall "among all answered questions"—you don't get to ignore the ones you failed to answer when calculating recall! 100% precision is nice.
Xser seems more like the NLP system I would have first imagined—parse a query, convert it into a structured format, and hit the RDF store for answers. SWIPE seems to get the human to do the hard parts (parsing and converting to a structured format, with the help of existing infoboxes), so of course it does better than Xser.
So what do we get out of this? If you haven't already thought of WDQS, then you weren't paying attention! We could make things easier (for us, for SWIPE, for anyone), if we could develop a standard way to map infobox template fields to WDQS properties and contents to entities (someone must've thought of this already).
Parsing the content of those fields (if you know what they are supposed to contain) is easier than parsing random queries or other chunks of text. That info could be used to automatically or semi-automatically populate WDQS, or to refer WDQS results back to relevant Wiki pages, or turn templates into query forms as SWIPE does.
Whether any of this gets onto our roadmap this century is a different question, but there are some interesting things to think about here.
So, can anyone get me a copy of the full paper?
Thanks for the pointer, Tilman!
—Trey
Trey Jones Software Engineer, Discovery Wikimedia Foundation
On 25 August 2015 at 10:54, Tilman Bayer tbayer@wikimedia.org wrote:
FYI just in case it's of interest and hasn't shown up on the team's
radar yet:
http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7194368 - paywalled, unfortunately.
Quote from the abstract:
"This paper discusses expressivity and accuracy of the By-Example Structured (BESt) Query paradigm implemented on the SWiPE system through the Wikipedia interface. We define an experimental setting based on the natural language questions made available by the QALD-4 challenge, in which we compare SWiPE against Xser, a state-of-the-art Question Answering system, and plain keyword search provided by the Wikipedia Search Engine. The experiments show that SWiPE outperforms the results provided by Wikipedia, and it also performs sensibly better than Xser, obtaining an overall 85% of totally correct answers vs. 68% of Xser."
(For context, there's an earlier paper where they describe an earlier version of that SWiPE - "Search Wikipedia by example" - project: http://web.cs.ucla.edu/~zaniolo/papers/AtzoriZ12 ) -- Tilman Bayer Senior Analyst Wikimedia Foundation IRC (Freenode): HaeB
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
-- Oliver Keyes Count Logula Wikimedia Foundation
Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search