Hello
I'm looking through the dump files and am not sure 'what contains what'. Maybe there's a descriptive page that I've missed somewhere?
I'd like XML or HTML, no images, to make a crawl of UK local elections, via keywords or simulating a web crawler (or a mixture of both, some pruning, then crawling).
Sorry about the question, best regards Hugh Barnard
--------- https://www.hughbarnard.org Twitter: @hughbarnard
Hi Hugh,
Have you taken a look here: https://meta.wikimedia.org/wiki/Data_dumps I think this might have what you are looking for Also, this page can come in handy to know the database structure https://www.mediawiki.org/wiki/Manual:Database_layout
Regards, Tom
Not a silly question at all, as there are many options.
Il 06/02/22 18:33, Hugh Barnard via Xmldatadumps-l ha scritto:
I'd like XML or HTML, no images, to make a crawl of UK local elections, [...]
It sounds like an exploratory phase where you may benefit from a higher-level look at the data access options. See also: https://www.mediawiki.org/wiki/API:Get_the_contents_of_a_page https://www.wikidata.org/wiki/Wikidata:Data_access
The database dumps are only useful after you've identified what exactly you need to extract from the database and how.
Federico
xmldatadumps-l@lists.wikimedia.org