On 04/06/05, QuotationsBook.com Webmaster/Support quotationsbook@gmail.com wrote:
Thanks very much for your note. I had a good read of these links, and due to being new to wikipedia data access methods, still couldn't make sense of what would be the best course of action. I essentially need to pass a search query to wikipedia e.g. "Wilde, Oscar" and for the first article returned, display the article text on quotationsbook.com for that author.
Well, if you want to use a large amount of such content, but don't mind it lagging slightly behind the copy on Wikipedia itself, the "cleanest" solution is to download a copy of the database and extract the information yourself. However, that might seem a technically complex solution, and the content you want may actually only be a fraction of the Wikipedia database.
In which case, you have a further two options: ask for the "wikitext" source of an article, using Special:Export or en.wikipedia.org/w/index.php?title=<some article>&action=raw and have a local copy of MediaWiki (or one of the programs at http://meta.wikimedia.org/wiki/Alternative_parsers) to turn that into HTML for you (less load on the Wikipedia server, but more complex for you) or just request the rendered article and separate the content from the navigation stuff (easy enough to automate if you look at the source, though you may want to play around with some styling to make things look right).
BTW, note that person articles in Wikipedia are generally titled "Firstname Lastname", not "Lastname, Firstname" - e.g. http://en.wikipedia.org/wiki/Oscar_Wilde You could probably have your software guess the correct name in most cases, but http://en.wikipedia.org/wiki/Special:Search?search=<some terms> may also be useful (this will return an article with a completely matching name if it exists, and search results otherwise).
I don't know whether I can do this dynamically (request by request), or store a single cached copy of an article on my site, so that the request is only made once.
Well, that's almost entirely up to you - once you've downloaded data, you can do what you like with it; nothing Wikipedia does could allow or prevent a particular caching scheme at your end. However, out of consideration for the frequently overloaded servers maintained by the non-profit Wikimedia Foundation, some form of caching would probably be considered far preferable to making a fresh request every time. A standard HTTP "if-modified-since:" header, like most browsers and proxies use, would do if you wanted to stay up-to-date; but how you actually manage the request storage is entirely up to you.
Also note that relying on Wikipedia responding before you returned any of your own content could slow down your site *a lot*, as the servers often have heavy load or even go down for hours at a time. Not necessarily a problem, but worth bearing in mind when designing your caching solution.
wikitech-l@lists.wikimedia.org