As the zero team starts to think about pre-loaded content the question of how search will function within an off line environment has come up. While it'll be up to each individual community to think about the size of the pre-loaded we should think about these collections being longer than a user would want to scroll through given only our article title search.
Thus i'm eager to get a discussion going abut how we would support the following users story
"As a user who has a Wikipedia pre-loaded device with little or no internet connectivity, I would like to search by article text, so that I can find multiple articles that could be relevant to me"
Given this, an article title search is not good enough.
* What would we have to change about our underlying data storage architecture to do full text search? * How fast would it be? * Would it scale to 100's/100's/etc on articles ? * What would the user experience look like? ...
* ... other bits i haven't thought about ?
This is not at a resourced feature level discussion yet but I'd like to get some engineering thoughts on it before we get there.
--tomasz
I thought this is something solved for years now with applications like kiwix and Zim format?
Rupert On Sep 30, 2014 11:27 PM, "Tomasz Finc" tfinc@wikimedia.org wrote:
As the zero team starts to think about pre-loaded content the question of how search will function within an off line environment has come up. While it'll be up to each individual community to think about the size of the pre-loaded we should think about these collections being longer than a user would want to scroll through given only our article title search.
Thus i'm eager to get a discussion going abut how we would support the following users story
"As a user who has a Wikipedia pre-loaded device with little or no internet connectivity, I would like to search by article text, so that I can find multiple articles that could be relevant to me"
Given this, an article title search is not good enough.
- What would we have to change about our underlying data storage
architecture to do full text search?
- How fast would it be?
- Would it scale to 100's/100's/etc on articles ?
- What would the user experience look like?
...
- ... other bits i haven't thought about ?
This is not at a resourced feature level discussion yet but I'd like to get some engineering thoughts on it before we get there.
--tomasz
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
I thought this is something solved for years now with applications like
kiwix and Zim format?
Looking at the Kiwix app for Android, it doesn't seem to have full-text search within articles (unless I missed it). I assume that this was left out of the Android version for performance reasons. So, this is precisely the problem that would need to be solved...
-Dmitry
On Tue, Sep 30, 2014 at 3:05 PM, rupert THURNER rupert.thurner@gmail.com wrote:
I thought this is something solved for years now with applications like kiwix and Zim format?
Rupert On Sep 30, 2014 11:27 PM, "Tomasz Finc" tfinc@wikimedia.org wrote:
As the zero team starts to think about pre-loaded content the question of how search will function within an off line environment has come up. While it'll be up to each individual community to think about the size of the pre-loaded we should think about these collections being longer than a user would want to scroll through given only our article title search.
Thus i'm eager to get a discussion going abut how we would support the following users story
"As a user who has a Wikipedia pre-loaded device with little or no internet connectivity, I would like to search by article text, so that I can find multiple articles that could be relevant to me"
Given this, an article title search is not good enough.
- What would we have to change about our underlying data storage
architecture to do full text search?
- How fast would it be?
- Would it scale to 100's/100's/etc on articles ?
- What would the user experience look like?
...
- ... other bits i haven't thought about ?
This is not at a resourced feature level discussion yet but I'd like to get some engineering thoughts on it before we get there.
--tomasz
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Dmitry Brant, 01/10/2014 01:07:
Looking at the Kiwix app for Android, it doesn't seem to have full-text search within articles (unless I missed it). I assume that this was left out of the Android version for performance reasons.
Don't assume, ask Emmanuel (cc Offline-l). If I understand correctly, you're talking of small selections of articles (hundreds or thousands). Making an index for tens of GB of text takes hours, so Kiwix doesn't always make one (on desktop, you usually download the pre-made index). But for few articles, the problem is easier (and one can also reduce compression). In general, I have no idea what it means that "zero team starts to think about pre-loaded content": sounds a lot like reinventing Kiwix, which would be a disastrous idea. :)
Nemo