Thanks for the link, I surely will use this for some other screen
scraping project, but in this context I was looking for pointers to
previous works on screen scraping in Mediawiki in general but also
especially for Wikidata-like sites. The simple REST-like previously
built tables are pretty easy to handle in tag- and parser functions,
but the state-full pages where queries are built interactively are
very hard to automate.
John
On Sun, Mar 18, 2012 at 11:32 AM, Leonard Wallentin
<leo_wallentin(a)hotmail.com> wrote:
Are you trying to achieve this from within MediaWiki?
Otherwise Google Docs
is a good tool for screen scraping, that can be used to produce csv-files
for you wiki from sources without an API. I wrote about it here, in Swedish:
http://blogg.svt.se/nyhetslabbet/2012/01/screen-scraping-sa-har-gar-det-til… (assuming
you are Norwegian).
/Leo
________________________________
Leonard Wallentin
leo_wallentin(a)hotmail.com
+46 (0)735-933 543
Twitter: @leo_wallentin
Skype: leo_wallentin
http://svt.se/nyhetslabbet
http://säsongsmat.nu
WikiSkills:
http://wikimediasverige.wordpress.com/2012/03/01/1519/
http://nairobikoll.se
Date: Sun, 18 Mar 2012 09:57:34 +0100
From: jeblad(a)gmail.com
To: wikidata-l(a)lists.wikimedia.org
Subject: [Wikidata-l] Import from external sources
>
> sources, especially those that do not have any prepared an
> well-defined API?
>
> A rather simple example from the website for Statistics Norway is an
> article on a website like this
>
http://www.ssb.no/fobstud/
> and a table like this
>
http://www.ssb.no/fobstud/tab-2002-11-21-02.html
>
> In that example you must follow a link to a new page which you then
> must monitor for changes. Inside that page you can use Xpath to to
> extract a field, and then optionally use something like a regexp to
> identify and split fields. As an alternate solution you might use XLT
> to transform the whole page.
>
> Anyhow, this can quite easily be formulated both as a parser function
> and a tag function.
>
> At the same site there is something called "Statistikkbanken"
> (
http://statbank.ssb.no/statistikkbanken/) where you can (must) log on
> and then iterate through a sequence of pages.
>
> Similar data as in the previous example can be found in
>
>
http://statbank.ssb.no/statistikkbanken/selectvarval/Define.asp?MainTable=F…
> But it is very difficult to formulate a kind of click-sequence inside that
> page.
>
> Any idea? Some kind of click-sequence recording?
>
> Statistics Norway publish statistics about Norway for free reuse as
> long as they are credited as appropriate.
>
http://www.ssb.no/english/help/
>
> John
>
> _______________________________________________
> Wikidata-l mailing list
> Wikidata-l(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/wikidata-l