Hello,
I am writing a Java program to extract the abstract of the wikipedia page
given the title of the wikipedia page. I have done some research and found
out that the abstract with be in rvsection=0
So for example if I want the abstract of 'Eiffel Tower" wiki page then I am
querying using the api in the following way.
http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Eiffel…
and parse the XML data which we get and take the wikitext in the tag <rev
xml:space="preserve"> which represents the abstract of the wikipedia page.
But this wiki text also contains the infobox data which I do not need. I
would like to know if there is anyway in which I can remove the infobox data
and get only the wikitext related to the page's abstract Or if there is any
alternative method by which I can get the abstract of the page directly.
Looking forward to your help.
Thanks in Advance
Aditya Uppu
---------- Forwarded message ---------
From: Adam Baso <abaso(a)wikimedia.org>
Date: Wed, Mar 22, 2023 at 4:45 AM
Subject: Service Decommission Notice: Mobile Content Service - July 2023
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
TL;DR: The legacy Mobile Content Service is going away in July 2023. Please
switch to Parsoid or another API before then to ensure service continuity.
Hello World,
I'm writing about a service decommission we hope to complete mid-July 2023.
The service to be decommissioned is the legacy Mobile Content Service
("MCS"), which is maintained by the Wikimedia Foundation's Content
Transform Team. We will be marking this service as deprecated soon.
We hope that with this notice, people will have ample time to update their
systems for use of other endpoints such as Parsoid [1] (n.b., MCS uses
Parsoid HTML).
The MCS endpoints are the ones with the relative URL path pattern
/page/mobile-sections* on the Wikipedias. For examples of the URLs see the
"Mobile" section on the online Swagger (OpenAPI) specification
documentation with matching URLs here:
https://en.wikipedia.org/api/rest_v1/#/Mobile
== History ==
The Mobile Content Service ("MCS") is the historical aggregate service that
originally provided support for the article reading experience on the
Wikipedia for Android native app, as well as some other experiences. We
have noticed that there are other users of the service. We are not able to
determine all of the users, as it's hard to tell with confidence from the
web logs.
The Wikimedia Foundation had already transitioned the Wikipedia for
Android and iOS apps to the newer Page Content Service ("PCS") several
years ago. PCS has some similarities with MCS in terms of its mobility
focus, but it also has different request-response signatures in practice.
PCS, as with MCS, is intended to primarily satisfy Wikimedia
Foundation-maintained user experiences only, and so this is classified with
the "unstable" moniker.
== Looking ahead ==
Generally, as noted in the lead, we recommend that folks who use MCS (or
PCS, for that matter) switch over to Parsoid for accessing Wikipedia
article content programmatically for the most predictable service.
The HTML produced by Parsoid has a versioned specification [2] and because
Parsoid is accessed regularly by a number of components across the globe
tends to have fairly well cached responses. However, please note that
Parsoid may be subject to stricter rate limits that can apply under certain
traffic patterns.
At this point, I do also want to note that in order to keep up with
contemporary HTML standards, particularly those favoring accessibility and
machine readability enhancements, Parsoid HTML will undergo change as we
further converge parsing stacks [3]. Generally, you should expect iteration
on the Parsoid HTML spec, and of course as you may have come to appreciate
that the shape of HTML in practice can vary nontrivially wiki-by-wiki as
practices across wikis vary.
You may also want to consider Wikimedia Enterprise API options, which range
from no cost to higher volume access paid options.
https://meta.wikimedia.org/wiki/Wikimedia_Enterprise#Access
== Forking okay, but not recommended ==
Because MCS acts as a service aggregate and makes multiple backend API
calls, caveats can apply for those subresources - possibility of API
changes, deprecation, and the like. We do not recommend a plain fork of MCS
code because of the subresource fetch behavior. This said, of course you
are welcome to fork in a way compatible with MCS's license.
== Help spread the word ==
Although we are aware of the top two remaining consumers of MCS, we also
are not sure who else is accessing MCS and anticipate that some downstream
tech may break when MCS is turned off. As we are cross-posting this
message, we hope most people who have come to rely upon MCS will see this
message. Please feel free to forward this message to contacts if you know
they are using MCS.
== Help ==
Although we intend to decommission MCS in July 2023, we would like to share
resources if you need some help. We plan to hold office hours in case you
would like to meet with us to discuss this or other Content Transform Team
matters. We will host these events on Google Meet. We will provide notice
of these office hours on the wikitech-l mailing list in the coming weeks
and months.
Additionally, if you would like to discuss your MCS transition plans,
please visit the Content Transform Team talk page:
https://www.mediawiki.org/wiki/Talk:Content_Transform_Team
Finally, some Content Transform Team members will also be at the Wikimedia
Hackathon [4] if you would like some in-person support.
Thank you.
Adam Baso (he/him/his/Adam), on behalf of the Content Transform Team
Director of Engineering
Wikimedia Foundation
[1] https://www.mediawiki.org/wiki/Parsoid
[2] https://www.mediawiki.org/wiki/Specs/HTML
[3] https://www.mediawiki.org/wiki/Parsoid/Parser_Unification/Updates
[4] https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2023
_______________________________________________
Mediawiki-api-announce mailing list -- mediawiki-api-announce(a)lists.wikimedia.org
To unsubscribe send an email to mediawiki-api-announce-leave(a)lists.wikimedia.org