On Tue, Dec 18, 2012 at 11:10 PM, Yuri Astrakhan yuriastrakhan@gmail.com wrote:
The linkTitle foreach causes 18 more api calls to start getting the links, all with plcontinue, before it yeilds even a single link.
Yeah, it's certainly possible there will be many plcontinue calls just to get the first link. But that doesn't mean you have to get all plcontinues when you want only some links.
On Tue, Dec 18, 2012 at 11:16 PM, Yuri Astrakhan yuriastrakhan@gmail.com wrote:
Petr, when you say you have two nested foreach(), the outer foreach does not iterate through the blocks, it iterates through pages. Which means you still must iterate through every plcontinue in the set before issuing next gapcontinue.
It doesn't mean that. For example, in the extreme case where you don't want to know any links from this page (say, because you want to filter the articles in a way that cannot be expressed directly by the API), you don't have to use plcontinue for this page at all.
A specific example might be changing your code into (yeah, built specifically to make my point):
foreach (var page in source.Where(p => p.Info.title.Contains("\u2014")).Take(2000))
In this case, the link for "—All You Zombies—" will be retrieved from the first call, so no plcontinue is needed. The link for "—And He Built a Crooked House—" will be retrieved using one plcontinue call. But there are no more articles with that character in title in the first page, so no more plcontinue calls are necessary, and gapcontinue can be used now. The second page contains no articles with that character at all, so without any plconitines, gapcontinue will be used right away.
With your “dumb query-continue”, doing this would require many more calls.
Petr Onderka [[en:User:Svick]]