Does anyone know where old database dumps are kept? (all revisions preferable). I asked in #wikimedia-tech but was told that that Wikimedia does not keep that kind of thing.
Anyone have any ideas? It's for a project to develop a new grammar checker that needs to see how articles are created and deleted over time - thus just the old revisions wouldn't work.
I thought this quote was a good one, and would be an acceptable solution.
"Only wimps use tape backup: _real_ men just upload their important stuff on ftp, and let the rest of the world mirror it ;)" Torvalds, Linus (1996-07-20). Post to linux.dev.kernel newsgroup. Retrieved on 2006-08-28.
Thanks, User:mboverload
On Fri, Aug 22, 2008 at 8:33 AM, mboverload mboverloadlister@gmail.com wrote:
Does anyone know where old database dumps are kept? (all revisions preferable). I asked in #wikimedia-tech but was told that that Wikimedia does not keep that kind of thing.
Anyone have any ideas? It's for a project to develop a new grammar checker that needs to see how articles are created and deleted over time - thus just the old revisions wouldn't work.
I thought this quote was a good one, and would be an acceptable solution.
"Only wimps use tape backup: _real_ men just upload their important stuff on ftp, and let the rest of the world mirror it ;)" Torvalds, Linus (1996-07-20). Post to linux.dev.kernel newsgroup. Retrieved on 2006-08-28.
There is no a lot of sense to keep historical dumps because the only "historical information" from such dumps would be a timestamp and, possibly, a different file format (it is XML now, it was SQL in the past). All relevant historical informations which are kept inside of the dumps are inside of the latest database dump.
On Fri, Aug 22, 2008 at 12:19 PM, Milos Rancic millosh@gmail.com wrote:
There is no a lot of sense to keep historical dumps because the only "historical information" from such dumps would be a timestamp and, possibly, a different file format (it is XML now, it was SQL in the past). All relevant historical informations which are kept inside of the dumps are inside of the latest database dump.
Deleted articles, oversighted versions, anyone?Ä
Hi,
Deleted articels are also in the dump like oversight also. If you delete a articele is stay's in the database.
Greatings, Huib
2008/8/22, Mathias Schindler mathias.schindler@gmail.com:
On Fri, Aug 22, 2008 at 12:19 PM, Milos Rancic millosh@gmail.com wrote:
There is no a lot of sense to keep historical dumps because the only "historical information" from such dumps would be a timestamp and, possibly, a different file format (it is XML now, it was SQL in the past). All relevant historical informations which are kept inside of the dumps are inside of the latest database dump.
Deleted articles, oversighted versions, anyone?Ä
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
On Fri, Aug 22, 2008 at 1:39 PM, Huib Laurens sterkebak@gmail.com wrote:
Hi,
Deleted articels are also in the dump like oversight also. If you delete a articele is stay's in the database.
This is wrong. the deleted data (and oversight of course) are not available @ down.wm.org
On Fri, Aug 22, 2008 at 1:39 PM, Huib Laurens sterkebak@gmail.com wrote:
Hi,
Deleted articels are also in the dump like oversight also. If you delete a articele is stay's in the database.
Yes, but these are private data: e.g. http://download.wikimedia.org/enwiki/20080724/ «Deleted page and revision data. (private)».
Nemo
Once I had this idea: a tool that shows Wikipedia at a certain, chosen point of time. For example, I'd like to browse through Wikipedia seeing always the state of January 1st 2003. Image if Wikipedia were already decades old and we could read the state of 1965. (One can always use the version history, yes, but that's more work for the reader.) Maybe this is something more interesting to a historian like me than to other people. :-) Ziko
2008/8/22 Milos Rancic millosh@gmail.com:
On Fri, Aug 22, 2008 at 8:33 AM, mboverload mboverloadlister@gmail.com wrote:
Does anyone know where old database dumps are kept? (all revisions preferable). I asked in #wikimedia-tech but was told that that Wikimedia does not keep that kind of thing.
Anyone have any ideas? It's for a project to develop a new grammar checker that needs to see how articles are created and deleted over time - thus just the old revisions wouldn't work.
I thought this quote was a good one, and would be an acceptable solution.
"Only wimps use tape backup: _real_ men just upload their important stuff on ftp, and let the rest of the world mirror it ;)" Torvalds, Linus (1996-07-20). Post to linux.dev.kernel newsgroup. Retrieved on 2006-08-28.
There is no a lot of sense to keep historical dumps because the only "historical information" from such dumps would be a timestamp and, possibly, a different file format (it is XML now, it was SQL in the past). All relevant historical informations which are kept inside of the dumps are inside of the latest database dump.
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Ziko van Dijk wrote:
Once I had this idea: a tool that shows Wikipedia at a certain, chosen point of time. For example, I'd like to browse through Wikipedia seeing always the state of January 1st 2003. Image if Wikipedia were already decades old and we could read the state of 1965. (One can always use the version history, yes, but that's more work for the reader.) Maybe this is something more interesting to a historian like me than to other people. :-)
Me too, that would be excellent :) Not sure how light on the database it could be made, but it shouldn't be too hard to make static pages frozen at a certain point of time.
Ziko van Dijk wrote:
Once I had this idea: a tool that shows Wikipedia at a certain, chosen point of time. For example, I'd like to browse through Wikipedia seeing always the state of January 1st 2003. Image if Wikipedia were already decades old and we could read the state of 1965. (One can always use the version history, yes, but that's more work for the reader.) Maybe this is something more interesting to a historian like me than to other people. :-)
Me too, that would be excellent :) Not sure how light on the database it could be made, but it shouldn't be too hard to make static pages frozen at a certain point of time.
On Fri, Aug 22, 2008 at 5:12 AM, Nikola Smolenski smolensk@eunet.yu wrote:
Ziko van Dijk wrote:
Once I had this idea: a tool that shows Wikipedia at a certain, chosen point of time. For example, I'd like to browse through Wikipedia seeing always the state of January 1st 2003. Image if Wikipedia were already decades old and we could read the state of 1965. (One can always use the version history, yes, but that's more work for the reader.) Maybe this is something more interesting to a historian like me than to other people. :-)
Me too, that would be excellent :) Not sure how light on the database it could be made, but it shouldn't be too hard to make static pages frozen at a certain point of time.
Like http://nostalgia.wikimedia.org, but for more dates? :)
Oh, this nostalgia wp still exists, yes.
I thought about a tool or a user surface where I simply type "2003-01-01" (as an example) and Wikipedia will show me the articles from that point of time. I understand that there might be problems with deleted images, merged articles, right. But it would still be interesting enough, certainly the older Wikipedia grows. I do not know so much about technical matters, but I can not imagine that such a tool would be very complicated. (?)
Greetings Ziko
2008/8/25 phoebe ayers phoebe.wiki@gmail.com:
On Fri, Aug 22, 2008 at 5:12 AM, Nikola Smolenski smolensk@eunet.yu wrote:
Ziko van Dijk wrote:
Once I had this idea: a tool that shows Wikipedia at a certain, chosen point of time. For example, I'd like to browse through Wikipedia seeing always the state of January 1st 2003. Image if Wikipedia were already decades old and we could read the state of 1965. (One can always use the version history, yes, but that's more work for the reader.) Maybe this is something more interesting to a historian like me than to other people. :-)
Me too, that would be excellent :) Not sure how light on the database it could be made, but it shouldn't be too hard to make static pages frozen at a certain point of time.
Like http://nostalgia.wikimedia.org, but for more dates? :)
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
What would be ideal is a client-side wiki reader that could load past revisions at runtime.
On Mon, Aug 25, 2008 at 12:26 PM, Ziko van Dijk zvandijk@googlemail.com wrote:
Oh, this nostalgia wp still exists, yes.
I thought about a tool or a user surface where I simply type "2003-01-01" (as an example) and Wikipedia will show me the articles from that point of time. I understand that there might be problems with deleted images, merged articles, right. But it would still be interesting enough, certainly the older Wikipedia grows. I do not know so much about technical matters, but I can not imagine that such a tool would be very complicated. (?)
Greetings Ziko
2008/8/25 phoebe ayers phoebe.wiki@gmail.com:
On Fri, Aug 22, 2008 at 5:12 AM, Nikola Smolenski smolensk@eunet.yu wrote:
Ziko van Dijk wrote:
Once I had this idea: a tool that shows Wikipedia at a certain, chosen point of time. For example, I'd like to browse through Wikipedia seeing always the state of January 1st 2003. Image if Wikipedia were already decades old and we could read the state of 1965. (One can always use the version history, yes, but that's more work for the reader.) Maybe this is something more interesting to a historian like me than to other people. :-)
Me too, that would be excellent :) Not sure how light on the database it could be made, but it shouldn't be too hard to make static pages frozen at a certain point of time.
Like http://nostalgia.wikimedia.org, but for more dates? :)
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
-- Ziko van Dijk NL-Silvolde
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
I would really like to be able to access at least the _previous_ dumps: this can be very useful when the current dumps are either still running, or are aborted/stopped (as many of them are right now due to disk space issues). Are the previous dumps available anywhere?
Luca
On Wed, Aug 27, 2008 at 9:50 PM, Ben Yates ben.louis.yates@gmail.comwrote:
What would be ideal is a client-side wiki reader that could load past revisions at runtime.
On Mon, Aug 25, 2008 at 12:26 PM, Ziko van Dijk zvandijk@googlemail.com wrote:
Oh, this nostalgia wp still exists, yes.
I thought about a tool or a user surface where I simply type "2003-01-01" (as an example) and Wikipedia will show me the articles from that point of time. I understand that there might be problems with deleted images, merged articles, right. But it would still be interesting enough, certainly the older Wikipedia grows. I do not know so much about technical matters, but I can not imagine that such a tool would be very complicated. (?)
Greetings Ziko
2008/8/25 phoebe ayers phoebe.wiki@gmail.com:
On Fri, Aug 22, 2008 at 5:12 AM, Nikola Smolenski smolensk@eunet.yu
wrote:
Ziko van Dijk wrote:
Once I had this idea: a tool that shows Wikipedia at a certain, chosen point of time. For example, I'd like to browse through Wikipedia seeing always the state of January 1st 2003. Image if Wikipedia were already decades old and we could read the state of 1965. (One can always use the version history, yes, but that's more work for the reader.) Maybe this is something more interesting to a historian like me than to other people. :-)
Me too, that would be excellent :) Not sure how light on the database
it
could be made, but it shouldn't be too hard to make static pages frozen at a certain point of time.
Like http://nostalgia.wikimedia.org, but for more dates? :)
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
-- Ziko van Dijk NL-Silvolde
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
-- Ben Yates Wikipedia blog - http://enotes.com/blogs/wikipedia
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Client-side might be ideal but also a lot harder to make. Something live-mirrorish would be fairly easy, but would of course violate the "no live mirrors" rule. To go completely server-side would require a *lot* of disk space, and/or some tricky db compression which would eat up lots of CPU cycles. Plus you'd need to find or make a full history dump. I guess if you've got the bandwidth, disk space, and/or CPU cycles to spare you could relatively easily scrape up your own full history dump, though.
A mediawiki extension probably wouldn't be too hard, but I don't know mediawiki well enough to be volunteering. For performance reasons it might require a new db table/column/index. I don't think mediawiki tables are optimized for looking up the latest version of a page on a particular date.
I might try hacking up a live-mirrorish version next time I get enough free time. Lets see - I'd have to find the right templates, article, categories, and images, presumably working from the stub dump, and then merge them all together. Anything else? Historical skins would be nice but unnecessary, historical parsing algorithms would be cool but probably overkill. Anyone have a tool to recursively parse templates? I always get stuck there trying to make a perfect parser. On a similar note, is there a standalone parser yet, or would I have to import it all into a database?
Seems neat, though. One thing that comes to mind is checking out various articles on the days on and around 9/11/01.
Anthony
On Thu, Aug 28, 2008 at 12:50 AM, Ben Yates ben.louis.yates@gmail.comwrote:
What would be ideal is a client-side wiki reader that could load past revisions at runtime.
On Mon, Aug 25, 2008 at 12:26 PM, Ziko van Dijk zvandijk@googlemail.com wrote:
Oh, this nostalgia wp still exists, yes.
I thought about a tool or a user surface where I simply type "2003-01-01" (as an example) and Wikipedia will show me the articles from that point of time. I understand that there might be problems with deleted images, merged articles, right. But it would still be interesting enough, certainly the older Wikipedia grows. I do not know so much about technical matters, but I can not imagine that such a tool would be very complicated. (?)
Greetings Ziko
On Thu, Aug 28, 2008 at 8:12 AM, Anthony wikimail@inbox.org wrote:
Lets see - I'd have to find the right templates, article, categories, and images, presumably working from the stub dump, and then merge them all together. Anything else?
Red vs. Blue links... Boy, I hope there's a standalone parser.
On Thu, Aug 28, 2008 at 5:14 AM, Anthony wikimail@inbox.org wrote:
On Thu, Aug 28, 2008 at 8:12 AM, Anthony wikimail@inbox.org wrote:
Lets see - I'd have to find the right templates, article, categories, and images, presumably working from the stub dump, and then merge them all together. Anything else?
Red vs. Blue links... Boy, I hope there's a standalone parser.
You also need the correct versions of the CSS and JS files, which is pain since those file locations have changed over time. If you wanted to be really thorough you'd have to look at the Mediawiki space as well (for example to capture the evolution of the sidebar), but that has the extra wrinkle that the way the Mediawiki engine parses content in the Mediawiki space has also evolved over time.
Being completely accurate would be nearly impossible, but one could do a good approximation with enough effort.
-Robert Rohde
I think that's a fascinating idea. But then, I'm a history buff.
Ziko van Dijk wrote:
Once I had this idea: a tool that shows Wikipedia at a certain, chosen point of time. For example, I'd like to browse through Wikipedia seeing always the state of January 1st 2003. Image if Wikipedia were already decades old and we could read the state of 1965. (One can always use the version history, yes, but that's more work for the reader.) Maybe this is something more interesting to a historian like me than to other people. :-) Ziko
2008/8/22 Milos Rancic millosh@gmail.com:
On Fri, Aug 22, 2008 at 8:33 AM, mboverload mboverloadlister@gmail.com wrote:
Does anyone know where old database dumps are kept? (all revisions preferable). I asked in #wikimedia-tech but was told that that Wikimedia does not keep that kind of thing.
Anyone have any ideas? It's for a project to develop a new grammar checker that needs to see how articles are created and deleted over time - thus just the old revisions wouldn't work.
I thought this quote was a good one, and would be an acceptable solution.
"Only wimps use tape backup: _real_ men just upload their important stuff on ftp, and let the rest of the world mirror it ;)" Torvalds, Linus (1996-07-20). Post to linux.dev.kernel newsgroup. Retrieved on 2006-08-28.
There is no a lot of sense to keep historical dumps because the only "historical information" from such dumps would be a timestamp and, possibly, a different file format (it is XML now, it was SQL in the past). All relevant historical informations which are kept inside of the dumps are inside of the latest database dump.
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
On Fri, Aug 22, 2008 at 1:53 PM, Ziko van Dijk zvandijk@googlemail.com wrote:
Once I had this idea: a tool that shows Wikipedia at a certain, chosen point of time. For example, I'd like to browse through Wikipedia seeing always the state of January 1st 2003. Image if Wikipedia were already decades old and we could read the state of 1965. (One can always use the version history, yes, but that's more work for the reader.) Maybe this is something more interesting to a historian like me than to other people. :-)
Yes, it is interesting. I was thinking about that, too :) The only problem is that such possibility would use a lot of computing resources or a lot of storage resource. So, there is a need for a lot of work to make it available on web. While the computing rule "extract pages earlier than" shouldn't be very complex, it may take a lot of time for generation of such extract (a couple of hours? a couple of days? -- on an ordinary computer).
Such tool may be very interesting for getting large picture not only about Wikipedia (and other Wikimedian projects), but about events, global and local social developments, public persons, as well as about Wikimedians themselves.
Also, for a lot of historical informations it is not necessary to make exactly such tool. It is possible to browse histories of the pages or to make some much simpler tool for connecting them. And it is true that historians which job would be to explore the first decade of 21st century will have much better materials than historians which job would be to explore the decade earlier.
Ziko van Dijk wrote:
Once I had this idea: a tool that shows Wikipedia at a certain, chosen point of time. For example, I'd like to browse through Wikipedia seeing always the state of January 1st 2003. Image if Wikipedia were already decades old and we could read the state of 1965. (One can always use the version history, yes, but that's more work for the reader.) Maybe this is something more interesting to a historian like me than to other people. :-) Ziko
I think it was discussed before on wikitech-l. Probably when talking about implementing stable versions. Wouldn't be too hard to restrict on a given page to the history data at X date. However, -You would need to reverse page moves. -When a page was deleted and some revisions restored before the epoch, you don't know which ones were restored. -Page merges would be specially difficult, as they're a mix of the two above.
On Fri, Aug 22, 2008 at 6:19 AM, Milos Rancic millosh@gmail.com wrote:
All relevant historical informations which are kept inside of the dumps are inside of the latest database dump.
Anyone know when the last valid full history database dump was? I've got a 134 gig one from 20080103, but I seem to remember that being corrupt. I'm also not sure if I missed a more recent one.
mboverload wrote:
Does anyone know where old database dumps are kept? (all revisions preferable). I asked in #wikimedia-tech but was told that that Wikimedia does not keep that kind of thing.
Anyone have any ideas? It's for a project to develop a new grammar checker that needs to see how articles are created and deleted over time - thus just the old revisions wouldn't work.
You have the delete log, but you'd need sopme approximation to when they were created (unless you have deletedhistory right on that wiki).
I thought this quote was a good one, and would be an acceptable solution.
"Only wimps use tape backup: _real_ men just upload their important stuff on ftp, and let the rest of the world mirror it ;)" Torvalds, Linus (1996-07-20). Post to linux.dev.kernel newsgroup. Retrieved on 2006-08-28.
Then bug WMF about those new storage servers :)
wikimedia-l@lists.wikimedia.org