Gentlemen, if your personal Mediawiki wiki has been around since early 2007, you might want to clean out the thousands of Mediawiki: namespace rows that were left in the database by maintenance/deleteDefaultMessages.php . Wouldn't it make you feel good to clean out thousands of wasted rows, leaving behind e.g., on a small wiki, perhaps just a few hundred rows that are actually related to us?
I don't know why the design decision was made to just leave those Mediawiki: namespace items sitting in the archive and text tables. But OK, we proceed to clean them out by hand. I hope I got this right:
$ mysqlshow --count myDatabase > before.txt $ mysql myDatabase SELECT COUNT(*) FROM archive WHERE ar_namespace = 8 AND ar_user_text = 'MediaWiki default'; COUNT(*) 1518 DELETE FROM archive WHERE ar_namespace = 8 AND ar_user_text = 'MediaWiki default'; $ php purgeOldText.php --purge Purge Old Text
Searching for active text records in revisions table...done. Searching for active text records in archive table...done. Searching for inactive text records...done. 1518 inactive items found. Deleting...done. $ mysql myDatabase SELECT COUNT(*) FROM logging WHERE log_comment = 'No longer required' AND log_namespace = 8; COUNT(*) 1510 SELECT MIN(log_timestamp),MAX(log_timestamp) FROM logging WHERE log_comment = 'No longer required' AND log_namespace = 8; MIN(log_timestamp) MAX(log_timestamp) 20070226185326 20070226194040 DELETE FROM logging WHERE log_comment = 'No longer required' AND log_namespace = 8; $ mysqlshow --count myDatabase|diff before.txt -|sed '/|/!d' < | archive | 15 | 2206 |
| archive | 15 | 688 |
< | logging | 10 | 2597 |
| logging | 10 | 1087 |
< | text | 3 | 4466 |
| text | 3 | 2948 |
On Mar 24, 2009, at 1:02 PM, jidanni@jidanni.org wrote:
Gentlemen, if your personal Mediawiki wiki has been around since early
mediawiki-l@, please
On Tue, Mar 24, 2009 at 10:02 PM, jidanni@jidanni.org wrote:
I don't know why the design decision was made to just leave those Mediawiki: namespace items sitting in the archive and text tables. But
Just run update.php. It does all that crap for you.
Hi!
I asked similar question week ago. Can anybody from Wikimedia stuff answer this question?
Eugene.
PS
We have new sysadmin there. May be it's good task to start?
On Tue, Mar 24, 2009 at 4:34 AM, Andrew Garrett andrew@epstone.net wrote:
On Tue, Mar 24, 2009 at 10:02 PM, jidanni@jidanni.org wrote:
I don't know why the design decision was made to just leave those Mediawiki: namespace items sitting in the archive and text tables. But
Just run update.php. It does all that crap for you.
-- Andrew Garrett
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
mediawiki-l@, please
OK, sorry. Next time...
I don't know why the design decision was made to just leave those Mediawiki: namespace items sitting in the archive and text tables. But
Just run update.php. It does all that crap for you.
I am sorry but update.php just calls deleteDefaultMessages.php which just leaves them sitting in the text and archive tables forever!
It just does a courteous delete, leaving all 1500 yes, deleted, but not expunged, forever.
If your wiki has been around since early 2007, have a look in your archive and text database tables. The 1500 hundred message strings should be very obvious there in the text table, and their corresponding items in the logging and archive tables.
Years and years will go by and one day you get curious and look in the database and surprise, for a small wiki perhaps 90% of the database is occupied by useless bloat.
On Wed, 25 Mar 2009 04:38:52 +0800, jidanni wrote:
mediawiki-l@, please
OK, sorry. Next time...
I don't know why the design decision was made to just leave those Mediawiki: namespace items sitting in the archive and text tables. But
Just run update.php. It does all that crap for you.
I am sorry but update.php just calls deleteDefaultMessages.php which just leaves them sitting in the text and archive tables forever!
It just does a courteous delete, leaving all 1500 yes, deleted, but not expunged, forever.
If your wiki has been around since early 2007, have a look in your archive and text database tables. The 1500 hundred message strings should be very obvious there in the text table, and their corresponding items in the logging and archive tables.
Years and years will go by and one day you get curious and look in the database and surprise, for a small wiki perhaps 90% of the database is occupied by useless bloat.
maintenance/nukeNS.php was written for pretty much that purpose; to purge all the obsolete stuff from that namespace without flooding delete logs, etc.
"SS" == Steve Sanbeg ssanbeg@ask.com writes:
SS> maintenance/nukeNS.php was written for pretty much that purpose; to purge SS> all the obsolete stuff from that namespace without flooding delete logs, SS> etc.
Alas, looking in
* Remove pages with only 1 revision from the MediaWiki namespace, without * flooding recent changes, delete logs, etc. * Irreversible (can't use standard undelete) and does not update link tables * * This is mainly useful to run before maintenance/update.php when upgrading * to 1.9, to prevent flooding recent changes/deletion logs. It's intended * to be conservative, so it's possible that a few entries will be left for * deletion by the upgrade script. It's also possible that it hasn't been * tested thouroughly enough, and will delete something it shouldn't; so * back up your DB if there's anything in the MediaWiki that is important to ^namespace * you.
The problem is nukeNS.php was and is never called by update.php! Check with ls -ltu (if your access times are tracked in mount(8).) It is never referred to by any other file but itself: $ find|LC_ALL=C xargs grep -il nukens ./maintenance/nukeNS.php
Hence its goal of * run before maintenance/update.php when upgrading to 1.9, was never achieved!
And, calling it now years later won't help, as update.php invocation of deleteDefaultMessages.php has already long ago put the messages etc. in the areas of the database where nukeNS.php doesn't look.
Hence I have proved that wikis that have been around since 1.8 will have about 1500 rows of useless messages still in the text table, and corresponding entries in the archive table.
Sure, on Wikipedia that is a speck of sand, but e.g., on a small quiet wiki the vast majority of the rows in the database will be useless trash. Repent now!
Another problem for the reader of this thread is "well gosh, has my wiki been around since 1.8? How can I tell for sure if my memory is hazy?" Well, maybe somebody can mention how to detect that. Maybe version increments are logged or should be, for those tracking SVN or not. Anyways, you can always just look for those useless entries in your database, that ought to tell you if your wiki has been around since 1.8.
Hoi, I admire your wish for cleaning up. My question is what are we talking about. Is this about cluttering up disk space or are these messages in memory. Thanks, GerardM
2009/3/26 jidanni@jidanni.org
"SS" == Steve Sanbeg ssanbeg@ask.com writes:
SS> maintenance/nukeNS.php was written for pretty much that purpose; to purge SS> all the obsolete stuff from that namespace without flooding delete logs, SS> etc.
Alas, looking in
- Remove pages with only 1 revision from the MediaWiki namespace, without
- flooding recent changes, delete logs, etc.
- Irreversible (can't use standard undelete) and does not update link
tables
- This is mainly useful to run before maintenance/update.php when
upgrading
- to 1.9, to prevent flooding recent changes/deletion logs. It's intended
- to be conservative, so it's possible that a few entries will be left for
- deletion by the upgrade script. It's also possible that it hasn't been
- tested thouroughly enough, and will delete something it shouldn't; so
- back up your DB if there's anything in the MediaWiki that is important
to ^namespace
- you.
The problem is nukeNS.php was and is never called by update.php! Check with ls -ltu (if your access times are tracked in mount(8).) It is never referred to by any other file but itself: $ find|LC_ALL=C xargs grep -il nukens ./maintenance/nukeNS.php
Hence its goal of
- run before maintenance/update.php when upgrading to 1.9,
was never achieved!
And, calling it now years later won't help, as update.php invocation of deleteDefaultMessages.php has already long ago put the messages etc. in the areas of the database where nukeNS.php doesn't look.
Hence I have proved that wikis that have been around since 1.8 will have about 1500 rows of useless messages still in the text table, and corresponding entries in the archive table.
Sure, on Wikipedia that is a speck of sand, but e.g., on a small quiet wiki the vast majority of the rows in the database will be useless trash. Repent now!
Another problem for the reader of this thread is "well gosh, has my wiki been around since 1.8? How can I tell for sure if my memory is hazy?" Well, maybe somebody can mention how to detect that. Maybe version increments are logged or should be, for those tracking SVN or not. Anyways, you can always just look for those useless entries in your database, that ought to tell you if your wiki has been around since 1.8.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Thu, Mar 26, 2009 at 7:02 AM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, I admire your wish for cleaning up. My question is what are we talking about. Is this about cluttering up disk space or are these messages in memory.
Apparently they are in the MySQL database... and the less data is present in it, the better, I think.
Marco
Marco Schuster marco@harddisk.is-a-geek.org wrote:
I admire your wish for cleaning up. My question is what are we talking about. Is this about cluttering up disk space or are these messages in memory.
Apparently they are in the MySQL database... and the less data is present in it, the better, I think.
First rule of optimization: Don't think, measure.
Tim
On Thu, Mar 26, 2009 at 2:02 AM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
I admire your wish for cleaning up. My question is what are we talking about. Is this about cluttering up disk space or are these messages in memory.
They use some disk space, which should be negligible for most people.
Aryeh Gregor wrote:
On Thu, Mar 26, 2009 at 2:02 AM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
I admire your wish for cleaning up. My question is what are we talking about. Is this about cluttering up disk space or are these messages in memory.
They use some disk space, which should be negligible for most people.
All entries at NS_MEDIAWIKI are loaded from db (or memcached) on each page request. It's a tiny amount, but I'd be wary of things moved so often.
On Thu, Mar 26, 2009 at 7:05 PM, Platonides Platonides@gmail.com wrote:
All entries at NS_MEDIAWIKI are loaded from db (or memcached) on each page request. It's a tiny amount, but I'd be wary of things moved so often.
Only pages that exist will be loaded. The pages in question have been deleted and therefore will not be loaded. The only question is whether the rows are kept in the text table.
"AG" == Aryeh Gregor Simetrical+wikilist@gmail.com writes:
AG> The only question is whether the rows are kept in the text table.
Yes, for wikis that have been around since 1.8, run this simple test $ mysql MyDatabase <<\EOF SELECT COUNT(*) FROM archive WHERE ar_namespace = 8 AND ar_user_text = 'MediaWiki default'; SELECT COUNT(*) FROM logging WHERE log_namespace = 8 AND log_comment = 'No longer required'; EOF COUNT(*) 1518 COUNT(*) 1510
(Note you may have to say e.g., FROM wiki_archive depending on how your tables are named.)
Now allow us to examine a small, but "healthy" wiki, http://transgender-taiwan.org/ , which was born after the 1.8/1.9 boundary in the Mediawiki stratigraphic record ( http://en.wikipedia.org/wiki/Cretaceous%E2%80%93Tertiary_extinction_event ).
$ mysqlshow --count transgender|egrep text|\ archive|Rows | Tables | Columns | Total Rows | | archive | 15 | 7 | | text | 3 | 62 |
This small wiki escaped having 1500 dinosaur corpse rows being added to each of those two tables.
Sure, the sanitary landfill that our database has now become will still function fine, but doesn't it feel bad to know that at least for a smaller wiki the majority of the rows may very well just be junk from a chance mishap at the 1.8-1.9 boundary? Yes, counting bytes instead of rows might help one sleep at night.
Anyway, how would a WikiSysop who feels this is bad correct it for his own wiki? I hope the method I used at the beginning of this thread is safe and correct.
jidanni@jidanni.org wrote:
I don't know why the design decision was made to just leave those Mediawiki: namespace items sitting in the archive and text tables.
The backup feature was added after initial testing, due to the potential for loss of important data. I considered that it was worth using a few hundred KB of disk space in exchange for theoretical disaster recovery.
-- Tim Starling
"TS" == Tim Starling tstarling@wikimedia.org writes:
TS> I considered that it was worth using a few hundred KB of disk TS> space in exchange for theoretical disaster recovery.
OK, I just hope the method mentioned at the beginning of this thread is a safe way to clean up the 1500 entries from the three affected tables, a total of 4500 rows, for those who now wish to.
A quick slob check of how many rows my small unaffected (post 1.8-1.9 boundary) wiki has: $ mysqlshow --count transgender| perl -nlwe 'next if / set./;if(/(\d+)\D+$/){$tot+=$1};END{print $tot}' 328
wikitech-l@lists.wikimedia.org