Hello, As we know, wiki (mainly wikipedia) articles go into a lot of details about the subject. They often tend to become verbose. Sometimes individual sections become as long as articles. The information about a topic is split across various pages which are linked in the article.We have to open several such links to get a good understanding of the article.
Navigation popups/Hovercards make it a bit simpler. But the info provided by them is often out of context .They are more about an introduction to the linked article rather than the intended page and their connection; which makes it disconnected and muddled. It helps a reader figure out the importance of a page, but not its relevancy.
As part of GSoC project, I was thinking of making a summarization tool that could automatically create a wholesome summary of the article. The links, categories, infoboxes and other unique wiki things make it much different and interesting than simple text summarization. It makes it easier to gauge the context and relevancy of articles and the linked structure make it possible to crawl to relevant pages (like Hovercard). Finally, combining only the important and relevant information (from all sections), we can form a coherent and lucid summary for the reader. The intro paragraphs just provide an introduction to the article whereas the script will provide a jist of the entire article (and hence would be bigger in most cases)
Though there has been some independent research http://lms.comp.nus.edu.sg/sites/default/files/publication.../acl09-yesr.pdf done on it, the possibility of such a tool was never discussed at length on wikimedia. So, I want to ask the opinion of all the members towards such a tool, in the above or some other form. Also does it seem like something that can be done as a GSoC project (MVP)? Would there be any mentors interested?
On Mon, 2016-02-22 at 20:19 +0530, Ultimate Supreme wrote:
Though there has been some independent research http://lms.comp.nus.edu.sg/sites/default/files/publication.../acl09- yesr.pdf
"The requested page "/sites/default/files/publication.../acl09- yesr.pdf" could not be found."
Cheers, andre
Sorry, here is the correct link: http://lms.comp.nus.edu.sg/sites/default/files/publication-attachments/acl09...
On Mon, Feb 22, 2016 at 9:37 PM, Andre Klapper aklapper@wikimedia.org wrote:
On Mon, 2016-02-22 at 20:19 +0530, Ultimate Supreme wrote:
Though there has been some independent research http://lms.comp.nus.edu.sg/sites/default/files/publication.../acl09- yesr.pdf
"The requested page "/sites/default/files/publication.../acl09- yesr.pdf" could not be found."
Cheers, andre -- Andre Klapper | Wikimedia Bugwrangler http://blogs.gnome.org/aklapper/
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
--
I don't know anything about document summarization, but from a Hovercards perspective, it'd be more helpful to have a contextual summary than a complete one. For example:
Person A leader of the B movement studied in University C in 1923.
Then, apart from the Hovercard having a basic summary of University C, it would also be nice to see:
University C saw a lot of opposition to movement B during the 1920's.
I am not sure the research you linked to aims to do this though.
Also, the paper used the data in the infobox to figure out important parts of the document. Using Wikidata alongside this might make it easier to grasp the concepts that are being talked about.
CCing Aaron and Magnus who would know more about all of this.
—prtksxna
On Mon, Feb 22, 2016 at 10:34 PM, Ultimate Supreme ultimatesupreme2212@gmail.com wrote:
Sorry, here is the correct link: http://lms.comp.nus.edu.sg/sites/default/files/publication-attachments/acl09...
On Mon, Feb 22, 2016 at 9:37 PM, Andre Klapper aklapper@wikimedia.org wrote:
On Mon, 2016-02-22 at 20:19 +0530, Ultimate Supreme wrote:
Though there has been some independent research http://lms.comp.nus.edu.sg/sites/default/files/publication.../acl09- yesr.pdf
"The requested page "/sites/default/files/publication.../acl09- yesr.pdf" could not be found."
Cheers, andre -- Andre Klapper | Wikimedia Bugwrangler http://blogs.gnome.org/aklapper/
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org