For now, Wikidata does not plan to cover the part of scraping data automatically from the Web, but only to provide a place where such data can be edited, stored, and re-published, including references. My assumption is that the community might create bots to perform such scrapings, and maybe upload it to Wikidata, but if they want this and how this might happen is a decision for the community, once it exists.

Cheers,
Denny

2012/3/18 Martynas Jusevicius <martynas@graphity.org>
Hey,

I think you should take a look at
GRDDL http://www.w3.org/TR/grddl/
ScraperWiki https://scraperwiki.com/

Martynas
graphity.org

On Sun, Mar 18, 2012 at 11:55 AM, John Erling Blad <jeblad@gmail.com> wrote:
> Thanks for the link, I surely will use this for some other screen
> scraping project, but in this context I was looking for pointers to
> previous works on screen scraping in Mediawiki in general but also
> especially for Wikidata-like sites. The simple REST-like previously
> built tables are pretty easy to handle in tag- and parser functions,
> but the state-full pages where queries are built interactively are
> very hard to automate.
>
> John
>
> On Sun, Mar 18, 2012 at 11:32 AM, Leonard Wallentin
> <leo_wallentin@hotmail.com> wrote:
>> Are you trying to achieve this from within MediaWiki? Otherwise Google Docs
>> is a good tool for screen scraping, that can be used to produce csv-files
>> for you wiki from sources without an API. I wrote about it here, in Swedish:
>>  http://blogg.svt.se/nyhetslabbet/2012/01/screen-scraping-sa-har-gar-det-till/ (assuming
>> you are Norwegian).
>>
>> /Leo
>>
>> ________________________________
>> Leonard Wallentin
>> leo_wallentin@hotmail.com
>> +46 (0)735-933 543
>> Twitter: @leo_wallentin
>> Skype: leo_wallentin
>>
>> http://svt.se/nyhetslabbet
>> http://säsongsmat.nu
>> WikiSkills: http://wikimediasverige.wordpress.com/2012/03/01/1519/
>> http://nairobikoll.se
>>
>>> Date: Sun, 18 Mar 2012 09:57:34 +0100
>>> From: jeblad@gmail.com
>>> To: wikidata-l@lists.wikimedia.org
>>> Subject: [Wikidata-l] Import from external sources
>>
>>>
>>  > sources, especially those that do not have any prepared an
>>> well-defined API?
>>>
>>> A rather simple example from the website for Statistics Norway is an
>>> article on a website like this
>>> http://www.ssb.no/fobstud/
>>> and a table like this
>>> http://www.ssb.no/fobstud/tab-2002-11-21-02.html
>>>
>>> In that example you must follow a link to a new page which you then
>>> must monitor for changes. Inside that page you can use Xpath to to
>>> extract a field, and then optionally use something like a regexp to
>>> identify and split fields. As an alternate solution you might use XLT
>>> to transform the whole page.
>>>
>>> Anyhow, this can quite easily be formulated both as a parser function
>>> and a tag function.
>>>
>>> At the same site there is something called "Statistikkbanken"
>>> (http://statbank.ssb.no/statistikkbanken/) where you can (must) log on
>>> and then iterate through a sequence of pages.
>>>
>>> Similar data as in the previous example can be found in
>>>
>>> http://statbank.ssb.no/statistikkbanken/selectvarval/Define.asp?MainTable=FoBKhtab12III&SubjectCode=02&planguage=0&nvl=True&mt=1&nyTmpVar=true
>>> But it is very difficult to formulate a kind of click-sequence inside that
>>> page.
>>>
>>> Any idea? Some kind of click-sequence recording?
>>>
>>> Statistics Norway publish statistics about Norway for free reuse as
>>> long as they are credited as appropriate.
>>> http://www.ssb.no/english/help/
>>>
>>> John
>>>
>>> _______________________________________________
>>> Wikidata-l mailing list
>>> Wikidata-l@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>
> _______________________________________________
> Wikidata-l mailing list
> Wikidata-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata-l

_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l