History pages do not (obviously) refer to searching in old versions of article. There is certainly a way to retrieve article versions one by one, and one could (theoretically) do that in batch mode and search in them locally. And that would be suboptimal wrt both user bandwidth and wikipedia server load. Is there a way to search in old versions of article?
For instance, one knows that there was a version of http://en.wikipedia.org/wiki/Back_to_the_future that contained a reference to http://en.wikipedia.org/wiki/Mall, "twin pines" phrase, "destroyed" word. The date of the version is unknown. How does one find that version starting from http://en.wikipedia.org/w/index.php?title=Back_to_the_Future&action=history?
Ilya N. Golubev wrote:
History pages do not (obviously) refer to searching in old versions of article. There is certainly a way to retrieve article versions one by one, and one could (theoretically) do that in batch mode and search in them locally. And that would be suboptimal wrt both user bandwidth and wikipedia server load. Is there a way to search in old versions of article?
No. Maintaining a fulltext search index of old revisions is hypothetically possible but would be prohibitive in terms of eg disk space usage. (As it is we put a lot of effort into compressing old revision data, as that makes up the bulk of our database.)
We can't afford to dedicate the necessary resources for it at this time.
Pulling revisions of a particular page one-by-one and searching in them might be possible but will also be an expensive runtime operation.
-- brion vibber (brion @ pobox.com)
On Sat, 05 Mar 2005 18:21:31 +0300, Ilya N. Golubev gin@mo.msk.ru wrote:
For instance, one knows that there was a version of http://en.wikipedia.org/wiki/Back_to_the_future that contained a reference to http://en.wikipedia.org/wiki/Mall, "twin pines" phrase, "destroyed" word. The date of the version is unknown. How does one find that version starting from http://en.wikipedia.org/w/index.php?title=Back_to_the_Future&action=history?
Use http://en.wikipedia.org/wiki/Special:Export (clearing the checkbox) and import into a database. Make a fulltext field there and run your search there. ;)
I already found out just from searching the XML that "Mall" does not occur anywhere in the article. (:
wikitech-l@lists.wikimedia.org