For now, Wikidata does not plan to cover the part of scraping data
automatically from the Web, but only to provide a place where such data can
be edited, stored, and re-published, including references. My assumption is
that the community might create bots to perform such scrapings, and maybe
upload it to Wikidata, but if they want this and how this might happen is a
decision for the community, once it exists.
Cheers,
Denny
2012/3/18 Martynas Jusevicius <martynas(a)graphity.org>
Hey,
I think you should take a look at
GRDDL
http://www.w3.org/TR/grddl/
ScraperWiki
https://scraperwiki.com/
Martynas
graphity.org
On Sun, Mar 18, 2012 at 11:55 AM, John Erling Blad <jeblad(a)gmail.com>
wrote:
Thanks for the link, I surely will use this for
some other screen
scraping project, but in this context I was looking for pointers to
previous works on screen scraping in Mediawiki in general but also
especially for Wikidata-like sites. The simple REST-like previously
built tables are pretty easy to handle in tag- and parser functions,
but the state-full pages where queries are built interactively are
very hard to automate.
John
On Sun, Mar 18, 2012 at 11:32 AM, Leonard Wallentin
<leo_wallentin(a)hotmail.com> wrote:
> Are you trying to achieve this from within MediaWiki? Otherwise Google
Docs
> is a good tool for screen scraping, that can
be used to produce
csv-files
> for you wiki from sources without an API. I
wrote about it here, in
Swedish:
>
http://blogg.svt.se/nyhetslabbet/2012/01/screen-scraping-sa-har-gar-det-til…
(assuming
>> you are Norwegian).
>
>> /Leo
>
>>
________________________________
>> Leonard Wallentin
>> leo_wallentin(a)hotmail.com
>> +46 (0)735-933 543
>> Twitter: @leo_wallentin
>> Skype: leo_wallentin
>
>>
http://svt.se/nyhetslabbet
>>
http://säsongsmat.nu <http://xn--ssongsmat-v2a.nu>
>> WikiSkills:
http://wikimediasverige.wordpress.com/2012/03/01/1519/
>>
http://nairobikoll.se
>
>>> Date: Sun, 18 Mar 2012
09:57:34 +0100
>>> From: jeblad(a)gmail.com
>>> To: wikidata-l(a)lists.wikimedia.org
>>> Subject: [Wikidata-l] Import from external sources
>
>>>
>> > sources, especially those that do not have any prepared an
>>> well-defined API?
>>>
>>> A rather simple example from the website for Statistics Norway is an
>>> article on a website like this
>>>
http://www.ssb.no/fobstud/
>>> and a table like this
>>>
http://www.ssb.no/fobstud/tab-2002-11-21-02.html
>>>
>>> In that example you must follow a link to a new page which you then
>>> must monitor for changes. Inside that page you can use Xpath to to
>>> extract a field, and then optionally use something like a regexp to
>>> identify and split fields. As an alternate solution you might use XLT
>>> to transform the whole page.
>>>
>>> Anyhow, this can quite easily be formulated both as a parser function
>>> and a tag function.
>>>
>>> At the same site there is something called "Statistikkbanken"
>>> (
http://statbank.ssb.no/statistikkbanken/) where you can (must) log on
>>> and then iterate through a sequence of pages.
>>>
>>> Similar data as in the previous example can be found in
>>>
>>>
http://statbank.ssb.no/statistikkbanken/selectvarval/Define.asp?MainTable=F…
>> But it is very difficult to formulate a
kind of click-sequence inside
that
>
page.
>
> Any idea? Some kind of click-sequence recording?
>
> Statistics Norway publish statistics about Norway for free reuse as
> long as they are credited as appropriate.
>
http://www.ssb.no/english/help/
>
> John
>
> _______________________________________________
> Wikidata-l mailing list
> Wikidata-l(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/wikidata-l
_______________________________________________
Wikidata-l mailing list
Wikidata-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l
_______________________________________________
Wikidata-l mailing list
Wikidata-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l