Mediawiki-api February 2023

mediawiki-api@lists.wikimedia.org

4 participants
3 discussions

Need to extract abstract of a wikipedia page

by aditya srinivas

Hello, I am writing a Java program to extract the abstract of the wikipedia page given the title of the wikipedia page. I have done some research and found out that the abstract with be in rvsection=0 So for example if I want the abstract of 'Eiffel Tower" wiki page then I am querying using the api in the following way. http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Eiffel… and parse the XML data which we get and take the wikitext in the tag <rev xml:space="preserve"> which represents the abstract of the wikipedia page. But this wiki text also contains the infobox data which I do not need. I would like to know if there is anyway in which I can remove the infobox data and get only the wikitext related to the page's abstract Or if there is any alternative method by which I can get the abstract of the page directly. Looking forward to your help. Thanks in Advance Aditya Uppu

5 months

Get wikitext of all matching pages

by Dan Jacobson

What is the best way to get wikitext of all matching pages, like I have posted here: https://stackoverflow.com/questions/75305175/api-call-to-get-wikimedia-comm… Thanks. P.S., this month's archives have three spam messages.

1 year, 2 months

Old Rendering or new approach

by Max Vlasov

Hi, Wikipedia now renders with new design, so my previous tool relied on obtaining the text just by downloading it and applying an XPath, have to adjust to it. I have mixed results so the questions are: - Is there a plan to support the old design with some additional parameters? Even if not forever, just for comparison purposes it would be useful for me - Is there another better way to get the text. Basically I make a guessing work by converting some of the classical tags like H1/H2 etc into pseudo headings and so on, Bullet tags into bullet chars etc. The issue with the new design for me is that floating content now at the same level as all the items of the //main[@id='content'] tag, so I will have to do some filtering to get the main content without supplemental information. Thanks Max

1 year, 2 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Mediawiki-api February 2023