Hi guys,
I have an own idea for my GSoC project that I'd like to share with you. Its not a perfect one, so please forgive any mistakes.
The project is related to the existing GSoC project "*Incremental Data dumps *" , but is in no way a replacement for it.
*Offline Wikipedia*
For a long time, a lot of offline solutions for Wikipedia have sprung up on the internet. All of these have been unofficial solutions, and have limitations. A major problem is the* increasing size of the data dumps*, and the problem of *updating the local content. *
Consider the situation in a place where internet is costly/ unavailable.(For the purpose of discussion, lets consider a school in a 3rd world country.) Internet speeds are extremely slow, and accessing Wikipedia directly from the web is out of the question. Such a school would greatly benefit from an instance of Wikipedia on a local server. Now up to here, the school can use any of the freely available offline Wikipedia solutions to make a local instance. The problem arises when the database in the local instance becomes obsolete. The client is then required to download an entire new dump(approx. 10 GB in size) and load it into the database. Another problem that arises is that most 3rd part programs *do not allow network access*, and a new instance of the database is required(approx. 40 GB) on each installation.For instance, in a school with around 50 desktops, each desktop would require a 40 GB database. Plus, *updating* them becomes even more difficult.
So here's my *idea*: Modify the existing MediaWiki software and to add a few PHP/Python scripts which will automatically update the database and will run in the background.(Details on how the update is done is described later). Initially, the MediaWiki(modified) will take an XML dump/ SQL dump (SQL dump preferred) as input and will create the local instance of Wikipedia. Later on, the updates will be added to the database automatically by the script.
The installation process is extremely easy, it just requires a server package like XAMPP and the MediaWiki bundle.
Process of updating:
There will be two methods of updating the server. Both will be implemented into the MediaWiki bundle. Method 2 requires the functionality of incremental data dumps, so it can be completed only after the functionality is available. Perhaps I can collaborate with the student selected for incremental data dumps.
Method 1: (online update) A list of all pages are made and published by Wikipedia. This can be in an XML format. The only information in the XML file will be the page IDs and the last-touched date. This file will be downloaded by the MediaWiki bundle, and the page IDs will be compared with the pages of the existing local database.
case 1: A new page ID in XML file: denotes a new page added. case 2: A page which is present in the local database is not among the page IDs- denotes a deleted page. case 3: A page in the local database has a different 'last touched' compared to the one in the local database- denotes an edited page.
In each case, the change is made in the local database and if the new page data is required, the data is obtained using MediaWiki API. These offline instances of Wikipedia will be only used in cases where the internet speeds are very low, so they *won't cause much load on the servers* .
method 2: (offline update): (Requires the functionality of the existing project "Incremental data dumps"): In this case, the incremental data dumps are downloaded by the user(admin) and fed to the MediaWiki installation the same way the original dump is fed(as a normal file), and the corresponding changes are made by the bundle. Since I'm not aware of the XML format used in incremental updates, I cannot describe it now.
Advantages : An offline solution can be provided for regions where internet access is a scarce resource. this would greatly benefit developing nations , and would help in making the world's information more free and openly available to everyone.
All comments are welcome !
PS: about me: I'm a 2nd year undergraduate student in Indian Institute of Technology, Patna. I code for fun. Languages: C/C++,Python,PHP,etc. hobbies: CUDA programming, robotics, etc.
I think this is a well-thought out idea. I'm just going to add a few comments on Method 1:
* Wikimedia provides page.sql.gz dumps (EX: http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-page.sql.gz) This table does have page_id and page_touched (the latter seems to correlate to your "last touched") The file is hefty at 935 MB. (This is because it has other columns, like page_title). However, I think with 11 million+ pages, you're not probably going to do much better than 100 MB (using 28 characters per entry, like "(1234567,'20130407202126')," and a 30% zip ratio)
* Synchronizing latest versions will still be time-consuming. I'd guesstimate that there are something like 50k changed articles per month. I'm basing this number on http://stats.wikimedia.org/EN/TablesWikipediaEN.htm which lists 800 new articles per day. I then threw in another 800 unique page edits per day and multiplying by 30 to get to a ballpark 50k. This correlates to a monthly churn of 1%-2% of the entire article namespace (4.1 million) which I think is a conservative percentage.
So, assuming this number is somewhat accurate, 50,000 API calls would not be trivial -- especially for a user with limited internet connectivity. This is to say nothing of Wikimedia's servers which will need to handle 50k calls per client at that time of month. In short, I think synchronizing that many pages would best be served by its own dump.
Also, there may be some months where this percentage is much higher. For example, when Wikipedia switched its links over to Wikidata, I assume that at least 50% of the pages were touched. Granted, this is not a common occurrence, but as more bot activity rises (Wikidata properties for infoboxes?), then this will complicate the sync accordingly.
Hope this helps and good luck with your project.
On Fri, Apr 26, 2013 at 4:27 PM, Kiran Mathew Koshy < kiranmathewkoshy@gmail.com> wrote:
Hi guys,
I have an own idea for my GSoC project that I'd like to share with you. Its not a perfect one, so please forgive any mistakes.
The project is related to the existing GSoC project "*Incremental Data dumps *" , but is in no way a replacement for it.
*Offline Wikipedia*
For a long time, a lot of offline solutions for Wikipedia have sprung up on the internet. All of these have been unofficial solutions, and have limitations. A major problem is the* increasing size of the data dumps*, and the problem of *updating the local content. *
Consider the situation in a place where internet is costly/ unavailable.(For the purpose of discussion, lets consider a school in a 3rd world country.) Internet speeds are extremely slow, and accessing Wikipedia directly from the web is out of the question. Such a school would greatly benefit from an instance of Wikipedia on a local server. Now up to here, the school can use any of the freely available offline Wikipedia solutions to make a local instance. The problem arises when the database in the local instance becomes obsolete. The client is then required to download an entire new dump(approx. 10 GB in size) and load it into the database. Another problem that arises is that most 3rd part programs *do not allow network access*, and a new instance of the database is required(approx. 40 GB) on each installation.For instance, in a school with around 50 desktops, each desktop would require a 40 GB database. Plus, *updating* them becomes even more difficult.
So here's my *idea*: Modify the existing MediaWiki software and to add a few PHP/Python scripts which will automatically update the database and will run in the background.(Details on how the update is done is described later). Initially, the MediaWiki(modified) will take an XML dump/ SQL dump (SQL dump preferred) as input and will create the local instance of Wikipedia. Later on, the updates will be added to the database automatically by the script.
The installation process is extremely easy, it just requires a server package like XAMPP and the MediaWiki bundle.
Process of updating:
There will be two methods of updating the server. Both will be implemented into the MediaWiki bundle. Method 2 requires the functionality of incremental data dumps, so it can be completed only after the functionality is available. Perhaps I can collaborate with the student selected for incremental data dumps.
Method 1: (online update) A list of all pages are made and published by Wikipedia. This can be in an XML format. The only information in the XML file will be the page IDs and the last-touched date. This file will be downloaded by the MediaWiki bundle, and the page IDs will be compared with the pages of the existing local database.
case 1: A new page ID in XML file: denotes a new page added. case 2: A page which is present in the local database is not among the page IDs- denotes a deleted page. case 3: A page in the local database has a different 'last touched' compared to the one in the local database- denotes an edited page.
In each case, the change is made in the local database and if the new page data is required, the data is obtained using MediaWiki API. These offline instances of Wikipedia will be only used in cases where the internet speeds are very low, so they *won't cause much load on the servers* .
method 2: (offline update): (Requires the functionality of the existing project "Incremental data dumps"): In this case, the incremental data dumps are downloaded by the user(admin) and fed to the MediaWiki installation the same way the original dump is fed(as a normal file), and the corresponding changes are made by the bundle. Since I'm not aware of the XML format used in incremental updates, I cannot describe it now.
Advantages : An offline solution can be provided for regions where internet access is a scarce resource. this would greatly benefit developing nations , and would help in making the world's information more free and openly available to everyone.
All comments are welcome !
PS: about me: I'm a 2nd year undergraduate student in Indian Institute of Technology, Patna. I code for fun. Languages: C/C++,Python,PHP,etc. hobbies: CUDA programming, robotics, etc.
-- Kiran Mathew Koshy Electrical Engineering, IIT Patna, Patna _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 04/26/2013 04:27 PM, Kiran Mathew Koshy wrote:
Hi guys,
I have an own idea for my GSoC project that I'd like to share with you. Its not a perfect one, so please forgive any mistakes.
The project is related to the existing GSoC project "*Incremental Data dumps *" , but is in no way a replacement for it.
*Offline Wikipedia*
For a long time, a lot of offline solutions for Wikipedia have sprung up on the internet. All of these have been unofficial solutions, and have limitations. A major problem is the* increasing size of the data dumps*, and the problem of *updating the local content. *
Consider the situation in a place where internet is costly/ unavailable.(For the purpose of discussion, lets consider a school in a 3rd world country.) Internet speeds are extremely slow, and accessing Wikipedia directly from the web is out of the question. Such a school would greatly benefit from an instance of Wikipedia on a local server. Now up to here, the school can use any of the freely available offline Wikipedia solutions to make a local instance. The problem arises when the database in the local instance becomes obsolete. The client is then required to download an entire new dump(approx. 10 GB in size) and load it into the database. Another problem that arises is that most 3rd part programs *do not allow network access*, and a new instance of the database is required(approx. 40 GB) on each installation.For instance, in a school with around 50 desktops, each desktop would require a 40 GB database. Plus, *updating* them becomes even more difficult.
So here's my *idea*: Modify the existing MediaWiki software and to add a few PHP/Python scripts which will automatically update the database and will run in the background.(Details on how the update is done is described later). Initially, the MediaWiki(modified) will take an XML dump/ SQL dump (SQL dump preferred) as input and will create the local instance of Wikipedia. Later on, the updates will be added to the database automatically by the script.
The installation process is extremely easy, it just requires a server package like XAMPP and the MediaWiki bundle.
Process of updating:
There will be two methods of updating the server. Both will be implemented into the MediaWiki bundle. Method 2 requires the functionality of incremental data dumps, so it can be completed only after the functionality is available. Perhaps I can collaborate with the student selected for incremental data dumps.
Method 1: (online update) A list of all pages are made and published by Wikipedia. This can be in an XML format. The only information in the XML file will be the page IDs and the last-touched date. This file will be downloaded by the MediaWiki bundle, and the page IDs will be compared with the pages of the existing local database.
case 1: A new page ID in XML file: denotes a new page added. case 2: A page which is present in the local database is not among the page IDs- denotes a deleted page. case 3: A page in the local database has a different 'last touched' compared to the one in the local database- denotes an edited page.
In each case, the change is made in the local database and if the new page data is required, the data is obtained using MediaWiki API. These offline instances of Wikipedia will be only used in cases where the internet speeds are very low, so they *won't cause much load on the servers* .
method 2: (offline update): (Requires the functionality of the existing project "Incremental data dumps"): In this case, the incremental data dumps are downloaded by the user(admin) and fed to the MediaWiki installation the same way the original dump is fed(as a normal file), and the corresponding changes are made by the bundle. Since I'm not aware of the XML format used in incremental updates, I cannot describe it now.
Advantages : An offline solution can be provided for regions where internet access is a scarce resource. this would greatly benefit developing nations , and would help in making the world's information more free and openly available to everyone.
All comments are welcome !
PS: about me: I'm a 2nd year undergraduate student in Indian Institute of Technology, Patna. I code for fun. Languages: C/C++,Python,PHP,etc. hobbies: CUDA programming, robotics, etc.
Thanks for your ideas, Kiran! So, a few comments:
* In the future, please use a more descriptive email subject line. As you can see in http://lists.wikimedia.org/pipermail/wikitech-l/2013-April/ there's a lot of mail on this list, especially mail about Google Summer of Code proposals. A subject line like "GSoC proposal: supplementing incremental data dumps with indexes" or something like that helps people decide to read it. * You probably want to try to look at some statistics and research to make sure you are solving the right problem, and see how educators and school pupils are actually interacting with Wikipedia in low-connectivity environments. Check out the presentation "A Snapshot of Open Source in West Africa" http://opensourcebridge.org/sessions/884 -- some people burn DVDs regularly and drive them around to local schools to be copied onto lab computers, for instance. https://meta.wikimedia.org/wiki/Research:Data and the archives of the offline and data dumps discussion lists will be useful: https://lists.wikimedia.org/mailman/listinfo/offline-l and https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l . * Thank you for aiming to work on improving Wikimedia access in places with bad net access. We care about this, a lot. I hope you're able to help out with this issue, whether it's under GSoC or not!
Thanks, Sumana
On 04/26/2013 01:27 PM, Kiran Mathew Koshy wrote:
Hi guys,
I have an own idea for my GSoC project that I'd like to share with you. Its not a perfect one, so please forgive any mistakes.
The project is related to the existing GSoC project "*Incremental Data dumps *" , but is in no way a replacement for it.
*Offline Wikipedia*
There is not much time left and we would still need to find mentors for this proposal. Still, you are encouraged to follow the process ad submit your proposal as described at
https://www.mediawiki.org/wiki/Mentorship_programs/Application_template
Good luck!
PS: same advice as we have given to other students: please be more specific in the subject of your emails to mailing lists.
Dear Kiran
Before commenting your proposal, let me thank: * Quim for having renamed this thread... I wouldn't have got a chance to read it otherwise. * Gnosygnu and Sumana for their previous answers.
Your emails points three problems: (1) The size of the offline dumps (2) Server mode of the offline solution (3) The need of incremental updates
Regarding (1), I disagree. We have the ZIM format which is open, has an extremly efficient standard implementation, provides high compression rates and fast random access: http://www.openzim.org
Regarding (2), Kiwix, which is a ZIM reader, already does it: you can either share Kiwix on a network disk or use Kiwix HTTP compatible daemon called kiwix-serve: http://www.kiwix.org/wiki/Kiwix-serve
Regarding (3), I agree. This is an old feature request in the openZIM project. It's both on the roadmap and in the bug tracker: * http://www.openzim.org/wiki/Roadmap * https://bugzilla.wikimedia.org/show_bug.cgi?id=47406
But, I also think the solution you propose isn't adapted to the problem. Setting up a Mediawiki is not easy, it's resource intensive and you don't need all this power (of the software setup) for the usage you want to do.
On the other side, with ZIM you have a format which provides all what you need, runs on devices which costs only a few dozens of USD and we will make this incremental update trivial for the final user (it's just a matter of time ;).
So to fix that problem, there is my approach: we should implement two tools I call "zimdiff" and "zimpatch": * zimdiff is a tool able to compute the difference between two ZIM files * zimpatch is a tool able to patch a ZIM file with a ZIM diff file
The incrementation process would be: * Compute a ZIM diff file (done by the ZIM provider) * Download and path the "old" ZIM file with the ZIM diff file (done by the user)
We could implement two modes for zimpatch, "leasy" and "normal": * leasy mode: simple merge of the file and rewriting of the index (fast but need a lot of mass storage) * normal mode: recompute a new file (slow but need less mass storage)
Regarding the ZIM diff file format... the discussion is open, but it looks like we could simply reuse the ZIM format and zimpatch would work like a "zimmerge" (does not exist, it's just for the explanation).
Everything could be done IMO in "only" a few hundreds of smart lines of C++. I would be really surprised if this need more than 2000 lines. But, to do that, we need a pretty talentuous C++ developer, maybe you?
If your or someone else is interested we would probably be able to find a tutor.
Kind regards Emmanuel
PS: Wikimedia has an offline centric mailing list, let me add it in CC: https://lists.wikimedia.org/mailman/listinfo/offline-l
Le 26/04/2013 22:27, Kiran Mathew Koshy a écrit :
Hi guys,
I have an own idea for my GSoC project that I'd like to share with you. Its not a perfect one, so please forgive any mistakes.
The project is related to the existing GSoC project "*Incremental Data dumps *" , but is in no way a replacement for it.
*Offline Wikipedia*
For a long time, a lot of offline solutions for Wikipedia have sprung up on the internet. All of these have been unofficial solutions, and have limitations. A major problem is the* increasing size of the data dumps*, and the problem of *updating the local content. *
Consider the situation in a place where internet is costly/ unavailable.(For the purpose of discussion, lets consider a school in a 3rd world country.) Internet speeds are extremely slow, and accessing Wikipedia directly from the web is out of the question. Such a school would greatly benefit from an instance of Wikipedia on a local server. Now up to here, the school can use any of the freely available offline Wikipedia solutions to make a local instance. The problem arises when the database in the local instance becomes obsolete. The client is then required to download an entire new dump(approx. 10 GB in size) and load it into the database. Another problem that arises is that most 3rd part programs *do not allow network access*, and a new instance of the database is required(approx. 40 GB) on each installation.For instance, in a school with around 50 desktops, each desktop would require a 40 GB database. Plus, *updating* them becomes even more difficult.
So here's my *idea*: Modify the existing MediaWiki software and to add a few PHP/Python scripts which will automatically update the database and will run in the background.(Details on how the update is done is described later). Initially, the MediaWiki(modified) will take an XML dump/ SQL dump (SQL dump preferred) as input and will create the local instance of Wikipedia. Later on, the updates will be added to the database automatically by the script.
The installation process is extremely easy, it just requires a server package like XAMPP and the MediaWiki bundle.
Process of updating:
There will be two methods of updating the server. Both will be implemented into the MediaWiki bundle. Method 2 requires the functionality of incremental data dumps, so it can be completed only after the functionality is available. Perhaps I can collaborate with the student selected for incremental data dumps.
Method 1: (online update) A list of all pages are made and published by Wikipedia. This can be in an XML format. The only information in the XML file will be the page IDs and the last-touched date. This file will be downloaded by the MediaWiki bundle, and the page IDs will be compared with the pages of the existing local database.
case 1: A new page ID in XML file: denotes a new page added. case 2: A page which is present in the local database is not among the page IDs- denotes a deleted page. case 3: A page in the local database has a different 'last touched' compared to the one in the local database- denotes an edited page.
In each case, the change is made in the local database and if the new page data is required, the data is obtained using MediaWiki API. These offline instances of Wikipedia will be only used in cases where the internet speeds are very low, so they *won't cause much load on the servers* .
method 2: (offline update): (Requires the functionality of the existing project "Incremental data dumps"): In this case, the incremental data dumps are downloaded by the user(admin) and fed to the MediaWiki installation the same way the original dump is fed(as a normal file), and the corresponding changes are made by the bundle. Since I'm not aware of the XML format used in incremental updates, I cannot describe it now.
Advantages : An offline solution can be provided for regions where internet access is a scarce resource. this would greatly benefit developing nations , and would help in making the world's information more free and openly available to everyone.
All comments are welcome !
PS: about me: I'm a 2nd year undergraduate student in Indian Institute of Technology, Patna. I code for fun. Languages: C/C++,Python,PHP,etc. hobbies: CUDA programming, robotics, etc.
First of all, let me thank everyone who has commented on this thread. Sorry about not responding earlier. My exams are going on. You can certainly expect more response from me once they are over.
On Tue, Apr 30, 2013 at 4:18 AM, Emmanuel Engelhart emmanuel@engelhart.orgwrote:
Dear Kiran
Before commenting your proposal, let me thank:
- Quim for having renamed this thread... I wouldn't have got a chance to
read it otherwise.
- Gnosygnu and Sumana for their previous answers.
Your emails points three problems: (1) The size of the offline dumps (2) Server mode of the offline solution (3) The need of incremental updates
Regarding (1), I disagree. We have the ZIM format which is open, has an extremly efficient standard implementation, provides high compression rates and fast random access: http://www.openzim.org
Regarding (2), Kiwix, which is a ZIM reader, already does it: you can either share Kiwix on a network disk or use Kiwix HTTP compatible daemon called kiwix-serve: http://www.kiwix.org/wiki/Kiwix-serve
Regarding (3), I agree. This is an old feature request in the openZIM project. It's both on the roadmap and in the bug tracker:
But, I also think the solution you propose isn't adapted to the problem. Setting up a Mediawiki is not easy, it's resource intensive and you don't need all this power (of the software setup) for the usage you want to do.
On the other side, with ZIM you have a format which provides all what you need, runs on devices which costs only a few dozens of USD and we will make this incremental update trivial for the final user (it's just a matter of time ;).
I don't think power is much of a priority, but I agree the ZIM format would be easier, since it directly reads from the ZIM file
So to fix that problem, there is my approach: we should implement two tools I call "zimdiff" and "zimpatch":
- zimdiff is a tool able to compute the difference between two ZIM files
- zimpatch is a tool able to patch a ZIM file with a ZIM diff file
The incrementation process would be:
- Compute a ZIM diff file (done by the ZIM provider)
- Download and path the "old" ZIM file with the ZIM diff file (done by
the user)
We could implement two modes for zimpatch, "leasy" and "normal":
- leasy mode: simple merge of the file and rewriting of the index (fast
but need a lot of mass storage)
- normal mode: recompute a new file (slow but need less mass storage)
Regarding the ZIM diff file format... the discussion is open, but it looks like we could simply reuse the ZIM format and zimpatch would work like a "zimmerge" (does not exist, it's just for the explanation).
Everything could be done IMO in "only" a few hundreds of smart lines of C++. I would be really surprised if this need more than 2000 lines. But, to do that, we need a pretty talentuous C++ developer, maybe you?
Yes, this is a quite easy task. I can do this. I can go through the ZIM format and the zimlib library in a few days.
Regarding the *zimpatch*, I think it would be better to implement both methods( although I prefer the 2nd one). The user can then select the one which he wants , depending on his configuration. Lastly, we can add the *zimdiff as an automated task in the server*. zimpatch and downloading the zim file can also be automated and added to Kiwix.
If there's time left, I can port the zimlib library to python or PHP, so it becomes easier for people to hack.
If you have any more suggestions, please comment. I'll submit the proposal in ~ 12 hours.(again, exams).
If your or someone else is interested we would probably be able to find a tutor.
Kind regards Emmanuel
PS: Wikimedia has an offline centric mailing list, let me add it in CC: https://lists.wikimedia.org/mailman/listinfo/offline-l
Le 26/04/2013 22:27, Kiran Mathew Koshy a écrit :
Hi guys,
I have an own idea for my GSoC project that I'd like to share with you. Its not a perfect one, so please forgive any mistakes.
The project is related to the existing GSoC project "*Incremental Data
dumps
*" , but is in no way a replacement for it.
*Offline Wikipedia*
For a long time, a lot of offline solutions for Wikipedia have sprung up
on
the internet. All of these have been unofficial solutions, and have limitations. A major problem is the* increasing size of the data dumps*, and the problem of *updating the local content. *
Consider the situation in a place where internet is costly/ unavailable.(For the purpose of discussion, lets consider a school in a
3rd
world country.) Internet speeds are extremely slow, and accessing
Wikipedia
directly from the web is out of the question. Such a school would greatly benefit from an instance of Wikipedia on a local server. Now up to here, the school can use any of the freely available offline Wikipedia solutions to make a local instance. The
problem
arises when the database in the local instance becomes obsolete. The
client
is then required to download an entire new dump(approx. 10 GB in size)
and
load it into the database. Another problem that arises is that most 3rd part programs *do not allow network access*, and a new instance of the database is required(approx.
40
GB) on each installation.For instance, in a school with around 50
desktops,
each desktop would require a 40 GB database. Plus, *updating* them
becomes
even more difficult.
So here's my *idea*: Modify the existing MediaWiki software and to add a few PHP/Python
scripts
which will automatically update the database and will run in the background.(Details on how the update is done is described later). Initially, the MediaWiki(modified) will take an XML dump/ SQL dump (SQL dump preferred) as input and will create the local instance of Wikipedia. Later on, the updates will be added to the database automatically by the script.
The installation process is extremely easy, it just requires a server package like XAMPP and the MediaWiki bundle.
Process of updating:
There will be two methods of updating the server. Both will be
implemented
into the MediaWiki bundle. Method 2 requires the functionality of incremental data dumps, so it can be completed only after the
functionality
is available. Perhaps I can collaborate with the student selected for incremental data dumps.
Method 1: (online update) A list of all pages are made and published by Wikipedia. This can be in an XML format. The only information in the XML file will be the page IDs and the last-touched date. This file will be downloaded by the MediaWiki bundle, and the page IDs will be compared
with
the pages of the existing local database.
case 1: A new page ID in XML file: denotes a new page added. case 2: A page which is present in the local database is not among the
page
IDs- denotes a deleted page. case 3: A page in the local database has a different 'last touched' compared to the one in the local database- denotes an edited page.
In each case, the change is made in the local database and if the new
page
data is required, the data is obtained using MediaWiki API. These offline instances of Wikipedia will be only used in cases where the internet speeds are very low, so they *won't cause much load on the
servers*
.
method 2: (offline update): (Requires the functionality of the existing project "Incremental data dumps"): In this case, the incremental data dumps are downloaded by the user(admin) and fed to the MediaWiki installation the same way the
original
dump is fed(as a normal file), and the corresponding changes are made by the bundle. Since I'm not aware of the XML format used in incremental updates, I cannot describe it now.
Advantages : An offline solution can be provided for regions where
internet
access is a scarce resource. this would greatly benefit developing
nations
, and would help in making the world's information more free and openly available to everyone.
All comments are welcome !
PS: about me: I'm a 2nd year undergraduate student in Indian Institute of Technology, Patna. I code for fun. Languages: C/C++,Python,PHP,etc. hobbies: CUDA programming, robotics, etc.
Thank you for this reply, Emmanuel. GSoC / OPW candidates learn a lot from emails like this! (and the rest of us too)
On 04/29/2013 03:48 PM, Emmanuel Engelhart wrote:
let me thank:
- Quim for having renamed this thread... I wouldn't have got a chance to
read it otherwise.
Then please make sure the "Offline Wikipedia" is in the subject of your replies. ;)
If your or someone else is interested we would probably be able to find a tutor.
Please find one. Or even better two co-mentors, as we are requesting at http://lists.wikimedia.org/pipermail/wikitech-l/2013-April/068873.html
There is not much time left.
PS: Wikimedia has an offline centric mailing list, let me add it in CC: https://lists.wikimedia.org/mailman/listinfo/offline-l
Cross-posting to the offline list since this is urgent.
wikitech-l@lists.wikimedia.org