Jeremy Dunck wrote:
On 9/10/05, Timwi timwi@gmx.net wrote:
Do we still query the database for all these links to decide whether they exist and (depending on user settings) whether they are stubs? As I recall, someone once had an idea to write a simple, light-weight daemon for this that uses a dictionary data structure (I presume they were thinking of a DAWG (directed acyclic word graph)) for this.
I'm not familiar with DAWG, but why wouldn't a nested sparse hash work? link[from-id][to-title] = 0 if no link, 1 if link.
Well, firstly, we're not interested in links between pages here, but merely in whether a page exists and if so, whether a page is a "stub".
The answer to your question is that a hash table uses much more memory. It says on the page you linked to that it has "2 bits per entry overhead", so the entire hash table would be the size of the list of all article titles plus 2 bits times the number of articles. A DAWG is usually much smaller than the size of the list of all entries.
The downside, of course, is that in a hash you can store any value with all the keys, while a DAWG is just a binary (yes/no) decider.
Timwi