On 9/11/07, brion@svn.wikimedia.org brion@svn.wikimedia.org wrote:
Log Message:
Revert r25768, r25771 I really don't like inserting bogus entries into the link tables, that looks fragile and generally horrifying.
domas was the one who suggested it, and Tim didn't object. It seems reasonable enough to me.
It does feel messy and fragile to me. Maybe a separate table could store this?
Like an empty_links table having columns page_id, link_type (cat,pag,transclude) or such.
Simetrical-3 wrote:
On 9/11/07, brion@svn.wikimedia.org brion@svn.wikimedia.org wrote:
Log Message:
Revert r25768, r25771 I really don't like inserting bogus entries into the link tables, that looks fragile and generally horrifying.
domas was the one who suggested it, and Tim didn't object. It seems reasonable enough to me.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hi!
It does feel messy and fragile to me. Maybe a separate table could store this?
Separate table would have to be maintained in very same way - there's nothing less fragile about it. Now having it all in main links tables allows data views with more information :)
Hi!
Log Message:
Revert r25768, r25771 I really don't like inserting bogus entries into the link tables, that looks fragile and generally horrifying.
Brion, did you find any problems with it, or is that just "I don't like it"? There was a lengthy discussion and working session on that yesterday - pity you missed it, probably you were sleeping.
Cheers,
On 12/09/2007, Domas Mituzas midom.lists@gmail.com wrote:
Hi!
Log Message:
Revert r25768, r25771 I really don't like inserting bogus entries into the link tables, that looks fragile and generally horrifying.
Brion, did you find any problems with it, or is that just "I don't like it"? There was a lengthy discussion and working session on that yesterday
- pity you missed it, probably you were sleeping.
Aside from the "bogus entries" and "looks fragile" bits?
I don't like it either.
Rob Church
Hi!
Aside from the "bogus entries" and "looks fragile" bits?
Bogus entry? What about broken links? Are they bogus entries? "Looks fragile"? Of course, having query recaching scripts, that quite often get killed, is not fragile at all. And everyone knows that we don't have any fragile features, site_stats is the most trusted and stable counter in the world. :)
I don't like it either.
Well, no wonder. You're after elegant solutions and procedures: "The schema change requirement was noted and made quite clear. If it wasn't taken live before the software was updated, it's no fault of the development team. " As you remember, that schema change was not needed at all, and one- line change fixed the performance completely. It wasn't elegant (dedicated indexing, tables, columns, etc - wow, how nice), but it was practical.
And I appreciate practical solutions, cause we have a site to run.
Cheers,
On 12/09/2007, Domas Mituzas midom.lists@gmail.com wrote:
Well, no wonder. You're after elegant solutions and procedures: "The schema change requirement was noted and made quite clear. If it wasn't taken live before the software was updated, it's no fault of the development team. " As you remember, that schema change was not needed at all, and one- line change fixed the performance completely. It wasn't elegant (dedicated indexing, tables, columns, etc - wow, how nice), but it was practical.
And I appreciate practical solutions, cause we have a site to run.
I'm well aware we have a site to run. If I committed code which wasn't acceptable, then WHY THE FUCK WASN'T IT REVERTED? If I missed out the "elegant" compromise (a bit of clever condition work), then WHY THE FUCK DIDN'T YOU TELL ME?
I am a volunteer developer. I have the ability to commit code. I do not have, and do not want, the ability or power to take that code live. It's not my responsibility if something hasn't been reviewed properly and goes live. It's not my job. I contributed, and still contribute, to MediaWiki, because I want to, and because I could and have benefit(ed) the project.
In all honesty, I'm no worse than any of the junior committers who frequently cock up, break things, etc. but because I have been around for a bit longer, I'm screamed at a bit more when I screw up. Er, forget that - I get screamed at, and they don't. You've been doing this a lot longer than I have, and you're a damn sight better at database management and maintenance than I ever will be, and I defer to that. I completely resent, however, your attitude when somebody makes a mistake - you are patronising. You don't have the budget to be patronising; you can't afford to piss off all your volunteers.
You have got to remember, Domas, that yes, we are running a site - and yes, it's a big site, and we are on a less-than-shoestring budget - but if the database schema doesn't support everything we need it to do "fast enough", then IT HAS GOT TO BE CHANGED. There is nothing wrong with adding new features, yet you seem to thoroughly hate the idea of some rather necessary schema changes.
I furthermore resent your snide attitude towards my opinions on programming. I'm not the world's greatest programmer, and I doubt I'll ever be that, but I do like a nice, clean elegant solution - yes. Tim Starling is pretty much the same, actually, and so is Brion Vibber. Do you know why? Because "elegant" solutions have a habit of working much faster, and being easier to maintain in the long run than a quick live hack.
Rob Church
Hello!
I'm well aware we have a site to run. If I committed code which wasn't acceptable, then WHY THE FUCK WASN'T IT REVERTED? If I missed out the "elegant" compromise (a bit of clever condition work), then WHY THE FUCK DIDN'T YOU TELL ME?
Compromise: http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/ includes/DifferenceEngine.php?r1=25409&r2=25525 Revert: http://svn.wikimedia.org/viewvc/mediawiki? view=rev&revision=25527
And you were not on channel for quite a while. :)
I am a volunteer developer.
We're closer than you'd think - I'm too.
I have the ability to commit code. I do not have, and do not want, the ability or power to take that code live. It's not my responsibility if something hasn't been reviewed properly and goes live. It's not my job.
So why do you intervene with "I don't like it either" without any arguments into discussion about code that was reviewed properly and was ready to go live? Its not your job either.
I contributed, and still contribute, to MediaWiki, because I want to, and because I could and have benefit(ed) the project.
Same here :)
In all honesty, I'm no worse than any of the junior committers who frequently cock up, break things, etc. but because I have been around for a bit longer, I'm screamed at a bit more when I screw up.
Oh, you just hate criticism far more :)
to that. I completely resent, however, your attitude when somebody makes a mistake - you are patronising. You don't have the budget to be patronising; you can't afford to piss off all your volunteers.
Oh come on, I'm a volunteer too, and I also have my right to be pissed, when someone comes up with "this sucks" attitude without even trying to understand who did that and why.
You have got to remember, Domas, that yes, we are running a site - and yes, it's a big site, and we are on a less-than-shoestring budget - but if the database schema doesn't support everything we need it to do "fast enough", then IT HAS GOT TO BE CHANGED.
And we're changing that. Though every change has to be thought off. We can't index every field, because indexes have to be maintained. There always has to be compromise between absolutely amazing fantastic feature set and something we can run.
There is nothing wrong with adding new features, yet you seem to thoroughly hate the idea of some rather necessary schema changes.
Which necessary schema change did I ever oppose? It is not me doing schema changes usually anyway :-) I'm quite ready to extend the data environment in any needed direction - that is what we do.
I furthermore resent your snide attitude towards my opinions on programming. I'm not the world's greatest programmer, and I doubt I'll ever be that, but I do like a nice, clean elegant solution - yes. Tim Starling is pretty much the same, actually, and so is Brion Vibber.
I have opinion of Tim, that he's the best at looking for (and finding) compromises.
Do you know why? Because "elegant" solutions have a habit of working much faster, and being easier to maintain in the long run than a quick live hack.
Actually, some of quick live hacks we did, evolved into standard practices (shame oh shame, we should've went the elegant industry path with big iron). Because they were simply "good enough" for the job. That "good enough" seems to be core for the kind of operation we're having. Of course, thats already engineering - not development :)
BR,
If I recall correctly, I had to spend a good deal of time convincing you that since page moves don't affect `recentchanges`, and thus you can't use "rc_namespace =X AND rc_title = Y" in the WHERE clause (just rc_timestamp), without the code failing to find matches that it should. So you probably could lose the patronizing attitude ;)
But yes, the timestamp index use a good practice measure, but it also wasn't fragile and cryptic. Adding dummy rows defies the name/definition of the table, and it thus becomes harder for other devs to figure out (especially if dummy rows are not mentioned in the help/MW.org stuff, which I suspect they wont be). It's annoying as hell to have to review patches/fix bugs/add features to cryptic code. I remember getting hung up on tail() in checkuser just to add search method to it.
I am always trying to write my code cleaner, and I already use more comments. Readability is important, and I'm sure Brion can attest to that.
Domas Mituzas wrote:
Hi!
Aside from the "bogus entries" and "looks fragile" bits?
Bogus entry? What about broken links? Are they bogus entries? "Looks fragile"? Of course, having query recaching scripts, that quite often get killed, is not fragile at all. And everyone knows that we don't have any fragile features, site_stats is the most trusted and stable counter in the world. :)
I don't like it either.
Well, no wonder. You're after elegant solutions and procedures: "The schema change requirement was noted and made quite clear. If it wasn't taken live before the software was updated, it's no fault of the development team. " As you remember, that schema change was not needed at all, and one- line change fixed the performance completely. It wasn't elegant (dedicated indexing, tables, columns, etc - wow, how nice), but it was practical.
And I appreciate practical solutions, cause we have a site to run.
Cheers,
Domas Mituzas -- http://dammit.lt/ -- [[user:midom]]
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hi!
. So you probably could lose the patronizing attitude ;)
What is patronizing? Is that "give encouragement" as dictionary tells? Sorry, I will encourage people further ;-p As you remember, I engaged into discussion and resolved issues in it. Of course, I don't know all schema issues completely (haha, its not my work!), so I consult with people around, thanks for helping, by the way.
But whenever I spot anything wrong, going into defensive stance will never help. Going into cooperative and communicative stance helps a lot.
Adding dummy rows defies the name/definition of the table, and it thus becomes harder for other devs to figure out (especially if dummy rows are not mentioned in the help/MW.org stuff, which I suspect they wont be).
Would having NULL instead of '' be less cryptic? :) Cause then, if followed literally, it would mean: "I don't know about any categories/ links the page is in". We actually discussed that, and quite a few people chose ''. And mw.org/help is editable, if anyone needs to understand what empty categories mean :)
This is trunk change, it had entry in RELEASE NOTES. Who is not reading RELEASE NOTES?
It's annoying as hell to have to review patches/fix bugs/add features to cryptic code.
What makes adding empty category rows cryptic? Of course, not having that code at all helps a lot - nothing to fix, nothing to review, no new features can be built on top.
I remember getting hung up on tail() in checkuser just to add search method to it.
That code is still broken, actually. It fails to tail if concurrent NFS appends put zeroes into the file :)
I am always trying to write my code cleaner, and I already use more comments. Readability is important, and I'm sure Brion can attest to that.
So, are we talking about clean code, more comments, or the problem with more information about links in links table?
When did all the MediaWiki developers turn into children? Enough with the personal attacks, already!
Hi!
When did all the MediaWiki developers turn into children? Enough with the personal attacks, already!
All communities need drama. And: "All generalizations are false, including this one. " -- Mark Twain
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Domas Mituzas wrote:
Log Message:
Revert r25768, r25771 I really don't like inserting bogus entries into the link tables, that looks fragile and generally horrifying.
Brion, did you find any problems with it, or is that just "I don't like it"? There was a lengthy discussion and working session on that yesterday
- pity you missed it, probably you were sleeping.
1) You can't tell whether the update has been applied or not
2) You can't tell on-purpose blank entries from software-is-broken entries
3) Various operations on the link tables may break in interesting ways when encountering the blank entries
A purpose-built table would be much more sensible.
- -- brion vibber (brion @ wikimedia.org)
Brion,
- You can't tell whether the update has been applied or not
I'm not sure it matters that much, as it affects special-pages only. Of course, having the toggle "use old schema" could be way to go. Non- wikimedia sites should run updaters whenever they update code anyway.
- You can't tell on-purpose blank entries from software-is-broken
entries
You can, by seeing non-blank entries for same page, together with a blank entry :) It is a matter of self-join, if you need to find such. On the other hand, software-is-broken entries should be resolved in any case, do we use empty/null entries or not.
- Various operations on the link tables may break in interesting ways
when encountering the blank entries
Thats where we fix them :)
A purpose-built table would be much more sensible.
The only problematic area is how application treats empty (or NULL) entries. The other concerns would exist in any case.
On 9/13/07, Brion Vibber brion@wikimedia.org wrote:
- You can't tell whether the update has been applied or not
Actually, Simetrical and I discussed this, and we determined that something like "SELECT 1 FROM pagelinks WHERE pl_namespace=0 AND pl_title='' LIMIT 1;" would work okay, although it's admittedly not ideal. The update doesn't really hurt anything if it's run multiple times, either -- the way the update works is that it will add a bogus entry for any page with no entries whatsoever (not even bogus ones).
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Andrew Garrett wrote:
On 9/13/07, Brion Vibber brion@wikimedia.org wrote:
- You can't tell whether the update has been applied or not
Actually, Simetrical and I discussed this, and we determined that something like "SELECT 1 FROM pagelinks WHERE pl_namespace=0 AND pl_title='' LIMIT 1;" would work okay, although it's admittedly not ideal.
1) It may detect existing incorrect entries (false positive), thus failing to apply the update.
2) You'll get a false negative if there are no applicable pages, thus running the full ugly query unnecessarily.
The update doesn't really hurt anything if it's run multiple times, either -- the way the update works is that it will add a bogus entry for any page with no entries whatsoever (not even bogus ones).
The update is potentially very slow; running it every time update checks are done would be disruptive to a large site, thus unacceptable.
- -- brion vibber (brion @ wikimedia.org)
Brion Vibber wrote:
The update doesn't really hurt anything if it's run multiple times, either -- the way the update works is that it will add a bogus entry for any page with no entries whatsoever (not even bogus ones).
The update is potentially very slow; running it every time update checks are done would be disruptive to a large site, thus unacceptable.
- -- brion vibber (brion @ wikimedia.org)
Add a field updates_applied to site_stats?
On Thu, Sep 13, 2007 at 04:36:47PM +0200, Platonides wrote:
Brion Vibber wrote:
The update doesn't really hurt anything if it's run multiple times, either -- the way the update works is that it will add a bogus entry for any page with no entries whatsoever (not even bogus ones).
The update is potentially very slow; running it every time update checks are done would be disruptive to a large site, thus unacceptable.
- -- brion vibber (brion @ wikimedia.org)
Add a field updates_applied to site_stats?
Or a field schema_version that's increased with each installed patch?
jens
Jens Frank wrote:
On Thu, Sep 13, 2007 at 04:36:47PM +0200, Platonides wrote:
Brion Vibber wrote:
The update doesn't really hurt anything if it's run multiple times, either -- the way the update works is that it will add a bogus entry for any page with no entries whatsoever (not even bogus ones).
The update is potentially very slow; running it every time update checks are done would be disruptive to a large site, thus unacceptable.
- -- brion vibber (brion @ wikimedia.org)
Add a field updates_applied to site_stats?
Or a field schema_version that's increased with each installed patch?
I was thinking along the same lines, but something a bit more general. A site_props table, with key-value pairs. It would be like objectcache but guaranteed persistent, and it could be used to hold things such as a boolean value showing when the update in question is applied.
-- Tim Starling
Or a field schema_version that's increased with each installed patch?
I was thinking along the same lines, but something a bit more general. A site_props table, with key-value pairs. It would be like objectcache but guaranteed persistent, and it could be used to hold things such as a boolean value showing when the update in question is applied.
I prefer that idea. A version number requires that patches be installed in a particular order, which could cause problems (especially if 2 patches are being developed simultaneously). I don't think a boolean value would be required - just the existence of the row would be enough.
Thomas Dalton wrote:
Or a field schema_version that's increased with each installed patch?
I was thinking along the same lines, but something a bit more general. A site_props table, with key-value pairs. It would be like objectcache but guaranteed persistent, and it could be used to hold things such as a boolean value showing when the update in question is applied.
I prefer that idea. A version number requires that patches be installed in a particular order, which could cause problems (especially if 2 patches are being developed simultaneously). I don't think a boolean value would be required - just the existence of the row would be enough.
You may have versions of the same sub-schema.
On 13/09/2007, Platonides Platonides@gmail.com wrote:
Thomas Dalton wrote:
Or a field schema_version that's increased with each installed patch?
I was thinking along the same lines, but something a bit more general. A site_props table, with key-value pairs. It would be like objectcache but guaranteed persistent, and it could be used to hold things such as a boolean value showing when the update in question is applied.
I prefer that idea. A version number requires that patches be installed in a particular order, which could cause problems (especially if 2 patches are being developed simultaneously). I don't think a boolean value would be required - just the existence of the row would be enough.
You may have versions of the same sub-schema.
A boolean won't help with that, though. You would need a integer. Or just append the version number to the key.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Thomas Dalton wrote:
On 13/09/2007, Platonides Platonides@gmail.com wrote:
Thomas Dalton wrote:
Or a field schema_version that's increased with each installed patch?
I was thinking along the same lines, but something a bit more general. A site_props table, with key-value pairs. It would be like objectcache but guaranteed persistent, and it could be used to hold things such as a boolean value showing when the update in question is applied.
I prefer that idea. A version number requires that patches be installed in a particular order, which could cause problems (especially if 2 patches are being developed simultaneously). I don't think a boolean value would be required - just the existence of the row would be enough.
You may have versions of the same sub-schema.
A boolean won't help with that, though. You would need a integer. Or just append the version number to the key.
That's part of the reason we've used targetted checks for specific fields, indexes, etc in the past. Don't have to worry about version numbers... :)
- -- brion
wikitech-l@lists.wikimedia.org