Sorry, hit send too fast:

Petr, when you say you have two nested foreach(), the outer foreach does not iterate through the blocks, it iterates through pages. Which means you still must iterate through every plcontinue in the set before issuing next gapcontinue. In other words - your library does exactly that - a simple iteration. You don't skip blocks of results midway, and you lib would benefit from the change. (all this assumes I understood your code correctly)


On Tue, Dec 18, 2012 at 5:10 PM, Yuri Astrakhan <yuriastrakhan@gmail.com> wrote:
Petr, I played with your library a bit. Its has some interesting and creative pieces and uses some cool tech (love Roslyn). Might need a bit of love and polishing, as I think the syntax is too verbose, but that's irrelevant here.

This is your code to list link titles from all non-redirect pages in a wiki.

var source = wiki.Query.allpages()
    .Where(p => p.filterredir == allpagesfilterredir.nonredirects)
    .Pages
    .Select(p => PageResult.Create(p.info, p.links().Select(l => l.title).ToEnumerable()));;

foreach ( var page in source.Take(2000))   // just the first 10 pages
    foreach( var linkTitle in page.Data.Take(1))  // first 1 link from each page
         Console.WriteLine(linkTitle);

The "page" foreach starts by getting
http://en.wikipedia.org/w/api.php: action=query & meta=siteinfo & siprop=namespaces

The linkTitle foreach causes 18 more api calls to start getting the links, all with plcontinue, before it yeilds even a single link. 

And the reason for it, as Brad correctly noted, is that links are sorted in a different order from titles. At this point, you are half way through the current block, you have made 19 fairly expensive api calls, and if (and that's a big if) you decide to continue with the next gapcontinue, based on the first link you get, you still need to do each "plcontinue" so that you don't miss any pages.

The only thing you can really do, with minimal calls is -- get a block of  data, take a RANDOM page with links on it, check the first link, and decide to go on to the next block. I see absolutelly no sense in this use.

In short - there are no way you can say "next page" until you iterate through every plcontinue in the current set.  EXCEPT! if you go one page at a time (gaplimit=1) - in which case you can safely skip to the next gapcontinue. But this is exactly what I am trying to avoid, because it does not give any benefit whatsoever in using the generator. I might even suspect that it costs much more - because running generator, even with limit=1 has a bigger cost than just querying one specific page info and filling it out.


On Tue, Dec 18, 2012 at 3:59 PM, Petr Onderka <gsvick@gmail.com> wrote:
Well, I can't tell you any use cases from my library users, because
there aren't any
(like I said, I didn't actually publicize it yet).

And my library would solve most of those cases the way I explained before:
IEnumerable inside IEnumerable (the exact shape depends on the user).

In the case of more than one prop being used,
it always continues all props, even if the user iterates only one of them.

Petr Onderka
[[en:User:Svick]]

On Tue, Dec 18, 2012 at 9:00 PM, Yuri Astrakhan <yuriastrakhan@gmail.com> wrote:
> Petr, thanks, I will look closely at your library and post my thoughts.
>
> Could you take look at
> http://www.mediawiki.org/wiki/Manual:Pywikipediabot/Recipes and see how your
> library would solve these? Also, if you can think of other common use cases
> from your library users (not your library internals, as it is just an
> intermediary), please post them too. I posted cases I saw in interwiki &
> casechecker bots.
>
> Thanks!
>
>
> On Tue, Dec 18, 2012 at 2:25 PM, Petr Onderka <gsvick@gmail.com> wrote:
>>
>> On Tue, Dec 18, 2012 at 5:39 PM, Yuri Astrakhan <yuriastrakhan@gmail.com>
>> wrote:
>> > Same goes for iterating through a collection - none of the programming
>> > languages offering IEnumerable have stream control functionality - too
>> > complicated without clear benefits.
>>
>> Actually in my C# library [1] (I plan to publicize it more later)
>> a query like generator=allpages&prop=links might result in something
>> like IEnumerable<IEnumerable<Link>> [2].
>> And iterating the outer IEnumerable corresponds to iterating gapcontinue,
>> while iterating the inner IEnumerable corresponds to plcontinue
>> (of course it's not that simple, since I'm not using limit=1, but I
>> hope you get the idea).
>>
>> And while this means some more work for the library writer (in this case,
>> me)
>> than your alternative, it also means the user has more control over
>> what exactly is retrieved.
>>
>> Petr Onderka
>> [[en:User:Svick]]
>>
>> [1] https://github.com/svick/LINQ-to-Wiki/
>> [2] Or, more realistically, IEnumerable<Tuple<Page, IEnumerable<Link>>>,
>> but I didn't want to complicate it with even more generics.
>>
>> _______________________________________________
>> Mediawiki-api mailing list
>> Mediawiki-api@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
>
>
>
> _______________________________________________
> Mediawiki-api mailing list
> Mediawiki-api@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
>

_______________________________________________
Mediawiki-api mailing list
Mediawiki-api@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api