Mediawiki-api November 2018

mediawiki-api@lists.wikimedia.org

2 participants
4 discussions

Need to extract abstract of a wikipedia page

by aditya srinivas

Hello, I am writing a Java program to extract the abstract of the wikipedia page given the title of the wikipedia page. I have done some research and found out that the abstract with be in rvsection=0 So for example if I want the abstract of 'Eiffel Tower" wiki page then I am querying using the api in the following way. http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Eiffel… and parse the XML data which we get and take the wikitext in the tag <rev xml:space="preserve"> which represents the abstract of the wikipedia page. But this wiki text also contains the infobox data which I do not need. I would like to know if there is anyway in which I can remove the infobox data and get only the wikitext related to the page's abstract Or if there is any alternative method by which I can get the abstract of the page directly. Looking forward to your help. Thanks in Advance Aditya Uppu

5 months

[Mediawiki-api-announce] Deprecation of list=allusers 'recenteditcount' result property

by Brad Jorsch (Anomie)

When list=allusers is used with auactiveusers, a property 'recenteditcount' is returned in the result. In bug 67301[1] it was pointed out that this property is including various other logged actions, and so should really be named something like "recentactions". Gerrit change 130093,[2] merged today, adds the "recentactions" result property. "recenteditcount" is also returned for backwards compatability, but will be removed at some point during the MediaWiki 1.25 development cycle. Any clients using this property should be updated to use the new property name. The new property will be available on WMF wikis with 1.24wmf12, see https://www.mediawiki.org/wiki/MediaWiki_1.24/Roadmap for the schedule. [1]: https://bugzilla.wikimedia.org/show_bug.cgi?id=67301 [2]: https://gerrit.wikimedia.org/r/#/c/130093/ -- Brad Jorsch (Anomie) Software Engineer Wikimedia Foundation _______________________________________________ Mediawiki-api-announce mailing list Mediawiki-api-announce(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce

5 years

[Mediawiki-api-announce] DEPRECATION: Action API uncaught exception error codes

by Brad Jorsch (Anomie)

Currently the codes for uncaught exceptions include the class name, for example "internal_api_error_ReadOnlyError", or "internal_api_error_DBQueryError", or possibly something like "internal_api_error_MediaWiki\Namespace\FooBarException". As you can see in that last example, that can get rather ugly and complicates recent attempts to verify that all error codes use a restricted character set. Thus, we are deprecating these error codes. In the future all such errors will use the code "internal_api_error". The date for that change has not yet been set. If a client for some reason needs to see the class of the uncaught exception, this is available in a new 'errorclass' data property in the API error. This will be returned beginning in 1.33.0-wmf.8 or later, see https://www.mediawiki.org/wiki/MediaWiki_1.33/Roadmap for a schedule. Note that database errors will report the actual class, such as "MediaWiki\rdbms\DBQueryError", rather than the old unprefixed name that had been being maintained for backwards compatibility. Clients relying on specific internal error codes or detecting internal errors by looking for a "internal_api_error_" prefix should be updated to recognize "internal_api_error" and to use 'errorclass' in preference to using any class name that might be present in the error code. In JSON format with errorformat=bc, an internal error might look something like this: { "error": { "code": "internal_api_error_InvalidArgumentException", "info": "[61e9f71eedbe401f17d41dd2] Exception caught: Testing", "errorclass": "InvalidArgumentException", "trace": "InvalidArgumentException at ..." }, "servedby": "hostname" } With modern errorformats, it might look like this: { "errors": [ { "code": "internal_api_error_InvalidArgumentException", "text": "[61e9f71eedbe401f17d41dd2] Exception caught: Testing", "data": { "errorclass": "InvalidArgumentException" } } ], "trace": "InvalidArgumentException at ...", "servedby": "hostname" } -- Brad Jorsch (Anomie) Senior Software Engineer Wikimedia Foundation _______________________________________________ Mediawiki-api-announce mailing list Mediawiki-api-announce(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce

5 years, 5 months

Fwd: [Wikitech-l] Content Negotiation Protocol for Parsoid HTML in the REST API

by Marko Obrovac

FYI ---------- Forwarded message ---------- From: Subramanya Sastry <ssastry(a)wikimedia.org> Date: 14 November 2018 at 21:48 Subject: [Wikitech-l] Content Negotiation Protocol for Parsoid HTML in the REST API To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org> Hello everyone, The Core Platform and Parsing teams at the Wikimedia Foundation are glad to announce the implementation of a content negotiation protocol for Parsoid HTML in the REST API [1]. This was deployed to the Wikimedia cluster on October 1, 2018. TL;DR ----- Parsoid HTML clients can now use the Accept header to specify which version of content they expect when requesting Parsoid HTML from the REST API. If omitted, as before, they will get whatever version of the HTML is in storage, regardless of any breaking changes it may contain. Parsoid's HTML is versioned --------------------------- An advantage of Parsoid’s HTML output is that it is both specced and versioned [2]. By adhering to the principles of semantic versioning [3], Parsoid can signal to clients what kinds of changes can be expected in the output between versions. However, until recently, Parsoid always returned the latest version of its HTML. Naturally, this posed challenges when deploying breaking changes since clients had to be prepared to consume the newer version. Rolling out new HTML versions without breaking clients ------------------------------------------------------ Throughout its history, Parsoid developers have had close enough contact with the developers of Parsoid clients (they are internal to the Wikimedia Foundation for the most part) to coordinate deployment of breaking changes to the HTML. This mainly involved ensuring all known clients were forward and backwards compatible with the newer HTML version before deploying the change. Needless to say, as more clients were coming along, this informal process would not suffice; a scalable and predictable version upgrade solution was needed. Content Negotiation Protocol ---------------------------- To solve this problem, a content negotiation protocol [4] relying on HTTP Accept headers was implemented. See RESTBase’s documentation [5] for the exact details of the protocol. What follows is just an informal description. Parsoid clients are expected to pass an Accept header that specifies the HTML version they can handle. If the version present in storage does not satisfy the request, RESTBase will attempt to resolve the inconsistency. However, if the requested version cannot be satisfied, an (HTTP 406) error will be returned. The meaning of “satisfied” here mostly follows semver’s caret semantics [6] (the main difference being that the patch level is ignored). If a client does not pass the Accept header, everything works exactly like before, with all the downsides of the previous behaviour: no protection from breaking changes; you get whatever HTML version is currently in storage. Caveat emptors -------------- The deployed Parsoid version generates HTML versions 1.8.0 [7] and 2.0.0 [8]. But, it is worth mentioning that the oldest acceptable version supported is 1.6.0, so if you’re sending an Accept header with a version less than 1.6.0, your application will break. The reason for this odd constraint is that we mistakenly released that version without bumping the major version [9] even though it introduced a breaking change. Mea culpa! Also, RESTBase only stores the latest version so, as content gets rerendered and storage gets replaced, clients requesting older content have to pay a latency penalty while the stored content is downgraded to an appropriate version. Hence, we encourage Parsoid HTML clients to pay attention to announcements about major version changes and upgrade promptly. Going forward, we’ll send announcements about Parsoid HTML versions changes on the mediawiki-api-announce mailing list. How does this impact 3rd party wikis? ------------------------------------- Finally, astute readers will have noted that this announcement is concerning the REST API. However, many 3rd party installs have VE communicating directly with Parsoid and may be wondering how they’ll be impacted by the change. Parsoid has had a similar protocol (the difference is mainly in respecting the patch level) implemented since the v0.9.0 release [7]. So, going forward, when upgrading Parsoid or VE, if the HTML version requested by VE can be provided by Parsoid, the upgrade will be safe. In Conclusion ------------- Content negotiation now allows us to deploy new Parsoid features to the Wikimedia cluster without needing prior coordination with all clients. Clients can continue to request older versions until they are ready to update (assuming they don’t fall too far behind since we only plan on supporting two major versions concurrently). And, conversely, they can request newer versions with the guarantee that they will not receive incompatible content. [1]: https://phabricator.wikimedia.org/T128040 [2]: https://www.mediawiki.org/wiki/Specs/HTML [3]: https://semver.org/ [4]: https://tools.ietf.org/html/rfc7231#section-5.3 [5]: https://www.mediawiki.org/wiki/API_versioning#Content_format _stability_and_negotiation [6]: https://www.npmjs.com/package/semver#caret-ranges-123-025-004 [7]: https://www.mediawiki.org/wiki/Specs/HTML/1.8.0 [8]: https://www.mediawiki.org/wiki/Specs/HTML/2.0.0 [9]: https://lists.wikimedia.org/pipermail/mediawiki-l/2018-March /047337.html _______________________________________________ Wikitech-l mailing list Wikitech-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

5 years, 5 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Mediawiki-api November 2018