On 21 Oct 2004, at 23:31, Magnus Manske wrote:
Jens Ropers wrote:
Anyway, just musing and mulling:
Could the PULL method be implemented w/ a checksum?
Say, generate a fairly short checksum (we're talking versioning here,
not security) for every article revision.
Then, with each request hitting an Intarweb-facing (caching)
webserver, have that cache box look if there's a version stored in
its cache for said article (if not, fetch the article from the actual
DB , etc.). IF however there is a cached version, ask the DB server
for its current checksum on its current version. If this matches the
checksum the cache has for its version, just don't bother the DB any
further and serve the page from the cache. If the checksums differ,
then again fetch the article from the DB and serve that (and cache
the new article and checksum for potential subsequent requests). This
entire checksum thing will NOT be required for any cached non-current
revisions, because they won't change. So, yes, for each request
hitting the cache server, there'd be a short checksum PULL with the
actual DB server, but other than that (and provided the article
hasn't changed) it can just be served from the cache.
So, the DB server keeps a list with a checksum (or a version number;
this is supposed to be wiki-like) for each data entry, and likewise
does the article, right?
Yup.
What if it more than a single data entry in that
article? Like the
list of species I mentioned.Say, a new species was added at wikidata;
how to handle that one?
What if there are multiple queries in one article?
What if (in my example) the actual query is in a template?
What if that template includes other templates that contain queries?
I wouldn't have a clue to be honest. I would, in my non-coder and
possibly naive imagination reckon that maybe a "one article-one
checksum" principle should work with most pages. As regards Wikidata et
alia, well, I dunno -- counting templates (and possibly single data
sets; but I don't really know a lot about what you're building/you've
built there) as articles may work as well. Then again, just having
(only) ordinary articles intelligently cached as per the above proposal
might solve the biggest part of our problem.
Yes, I think that it could be done. But, and I say that as someone who
started programming with "spaghetti code", it looks like a mess to me.
A dependency nightmare. We are already suffering from such effects
(think categories in templates) without wikidata to look out for.
Also, you will have to query the DB server and wait for its answer on
*every* page view, including cached/anons, to deliver the checksum(s).
True.
I'm really not trying to deliberately complicate things , but "we"
could also make the DB servers report all checksums to a group of
separate dedicated checksum cache servers (which would replicate
between each other like there's no tomorrow, to make damn sure that all
checksum cache servers would ALWAYS have the identical set of
checksums). One of these redundant checksum servers would then be
checked (which should spread the load) and if there's so much as a
delay with querying one of them, then the checksum query could
fail-over (round robin) to the next checksum cache box. There prolly
should also remain a last resort fail-over option of querying the DB
server direct and doing away with the entire checksum thing (which,
after all is only there to save time and shield the DB server from
excessive queries).
And, this will work only with the most rudimentary
database structure,
like "SELECT * from specieslist where name='Foo'". If wikidata is to
become more complex than that (and I don't say it should, just
speculating), if wikidata tables can be interlinked, then there will
be no "simple" dependency on a single data entry anymore.
Does that make sense to people?
Or am I reinventing the wheel or something?
I'm just brainstorming, I'm not even a real programmer. (Translation:
The above may be--or may not be--rubbish.)
Definitely not rubbish. But a lot more complicated than it looks at
first glance, IMHO.
Yea, that's probably true. All the worse as I won't be the one coding
it, because I, uh, lack the requisite programming skills :-/
--ropers
Magnus
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)wikimedia.org
http://mail.wikipedia.org/mailman/listinfo/wikitech-l