We have reached a critical point in the project, where the tools zimdiff
and zimpatch have been completed. Everyone is invited to test the code and
post their comments.
Bugzilla page: https://bugzilla.wikimedia.org/show_bug.cgi?id=47406
The links to the source code for both projects are given in the bugzilla
page.
--
Kiran Mathew Koshy
Electrical Engineering,
IIT Patna,
Patna,
India.
v0.7.0 is a general rollup release. In addition, it marks the end to
the alpha series of releases. I've received enough feedback that I can
state that xowa is stable and usable.
These are the major changes since the last announcement (v0.6.0):
* Sqlite database support (command-line only)
* Dynamic search suggestions
* Missing pages highligted in red
* Import/Script redesigned to be more intuitive
* Toolbar added for editing pages
* it.wikisource.org supported
The files are available here:
https://sourceforge.net/projects/xowa/files/v0.7.0.1/
As always, any feedback is appreciated.
Thanks.
v0.6.0 is a general rollup release. These are the major changes since
the last announcement (v0.5.0):
* Import improvements including direct bz2 reading and direct gz
writing; See Help:Import/Data dump format and Help:Options/Import
* Search syntax supports AND, OR, NOT (-), quotes;
EX:http://en.wikipedia.org/wiki/Special:Search/fulltext=y&search=Earth+AND+….
See Help:Special/Search
* Version 2 category can be done through the UI. See
Help:Import/Script and Help:Core/Category/V2/Setup
* Option pages improvements
The files are available here:
https://sourceforge.net/projects/xowa/files/v0.6.0/
As always, any feedback is appreciated.
Thanks.
Dear Ariel,
1) Profiling
WP-MIRROR 0.6 saw the introduction of a `--profile' command-line
option. Which provides a detailed breakdown of where time is spent
during a mirror build. Unfortunately, WP-MIRROR 0.5 did not have this
feature, so only aggregate comparisons are possible.
2) Performance studies
Much of the Winter and Spring was devoted to time trials. These are
all documented in WP-MIRROR 0.6 Reference Manual, Appendix G. The
following may be very interesting for you
2.1) G.8 Experiments with InnoDB data compression
Currently, InnoDB offers two on-dist storage formats, Antelope and
Barracuda. The later offers data compress. I performed experiments
on many of the largest wiki tables (e.g. categorylinks, image,
langlinks, pagelinks, templatelinks, text) to determine the space
savings and time penalty. For a summary of results, please take a look
at Table G.1 Database table size v. ROW_FORMAT and KEY_BLOCK_SIZE.
2.2) G.9 Experiments with commonswiki.image
For several reasons, I wanted to import commonswiki.image as part of
the mirror build process. That table is large and takes time to
import. However, once imported WP-MIRROR 0.6 saves a greater amount of
time elsewhere (e.g. I do not have to scrape the XML dumps for image
file names).
Because of the size of commonswiki.image, I performed a lengthy series
of experiments to determine the fastest way of importing it. Many of
the best methods make use of features first offered in MySQL 5.5 and
InnoDB 1.1 (e.g fast index creation). For a summary of results,
please take a look at Figure G.1 InnoDB Experiments Importing
commonswiki.image Table.
3) Documentation
The above mentioned WP-MIRROR 0.6 Reference Manual may be found at
<http://www.nongnu.org/wp-mirror/manual/>. It is also included in
with the DEB package.
4) Questions
4.1) mwxml2sql
While reading the code for `mwxml2sql.c', I noticed `-n, --nodrop'
option which sets KEY_BLOCK_SIZE=16 for the `text' table. From my
experiments (see Appendix G.8) I do not think that any compression
takes place. KEY_BLOCK_SIZE=4 might be a better choice.
4.2) INSERT IGNORE vs. REPLACE INTO
For the initial mirror build, INSERT INTO commands get the job done.
However, for updates to an existing mirror, I would prefer not to DROP
TABLE every time. For this reason, I rewrite the INSERT INTO commands
into REPLACE INTO commands. This rewrite works fine.
What I would like to know from you is this: for which table is INSERT
IGNORE a better choice than REPLACE INTO. I can see that the
`revision' and `text' table are candidates because they are never
updated, only added to. But what about the other tables? Any advise
would be appreciated.
Sincerely Yours,
Kent
On 6/3/13, Ariel T. Glenn <ariel(a)wikimedia.org> wrote:
> Στις 03-06-2013, ημέρα Δευ, και ώρα 10:22 -0400, ο/η wp mirror έγραψε:
>> Dear list members,
>>
>> I am pleased to announce the release of WP-MIRROR 0.6.
>>
>> The main design objective was this: PERFORMANCE. WP-MIRROR 0.6 now
>> builds the `enwiki' (which is the most demanding case) with 80% less
>> time and 75% less memory than v0.5.
>>
> Sounds great! Can you give us some benchmarks? I'm particularly
> interested in length of time with the old and nw versions of your
> package for the various stages of setting up a dump of current pages for
> the English language Wikipedia, on whatever hardware you are using for
> testing.
>
> Ariel
>
>
Dear list members,
I am pleased to announce the release of WP-MIRROR 0.6.
The main design objective was this: PERFORMANCE. WP-MIRROR 0.6 now
builds the `enwiki' (which is the most demanding case) with 80% less
time and 75% less memory than v0.5.
Feature: One new feature was added. WP-MIRROR 0.6 can now mirror
wikis from most other WMF projects (e.g. wikibooks, wiktionary, etc).
Reliability: Downloads are now performed with the aid of `wget' which
has an automatic restart feature. This virtually eliminates the
problem of partial downloads.
Images: WP-MIRROR 0.6 makes use of image dump tarballs found at
<http://ftpmirror.your.org/>. WP-MIRROR 0.6 then does a thorough job
of identifying image files missing from the tarballs, and downloads
them efficiently using HTTP/1.1 persistent connections.
Packaging: The DEB package for WP-MIRROR 0.6 should work
`out-of-the-box' with no user configuration for the following
distributions:
o Debian GNU/Linux 7.0 (wheezy)
o Ubuntu 12.10 (quantal)
Virtual Hosts: Browsing of mirrored wikis is done via virtual hosts
with names like <http://simple.wikipedia.site/> and
<http://simple.wiktionary.site/>. Simply take the URL that WMF
offers, and replace `.org' with `.site'.
Project Home Page: <http://www.nongnu.org/wp-mirror/> has been updated.
Feedback is welcome.
Sincerely Yours,
Dr. Kent L. Miller
xowa is a new, open-source, offline wiki application. It imports directly
from the Wikimedia data dumps, and shows articles in an HTML browser
window. It can also download and display images.
v0.5.0 is a general rollup release. It is intended to be stable.
These are the changes since the last announcment (v0.4.0):
* Wikidata #property tag (Phase 2)
* Wikidata JSON structured data formatter {contributed by Schnark}
* Score extension for music transcription through lilypond
* Improved Scribunto support for 2013-04 / 2013-05 English Wikipedia
* MediaWiki-like Categories (command line install only)
* JavaScript injection prevention
The files are here: https://sourceforge.net/projects/xowa/files/v0.5.0.1/
As always, any feedback is appreciated.
Hi,
A first version of Kiwix for Android was released a month ago. The app
was warm welcome with around 2.000 total installations and 1.000 active
ones.
An average note of 4.25/5 were given in 25 feedbacks. Almost no bug were
detected and people simply want more features.
We have released a few hours ago a new version fixing the only one bug
we have detected and providing a few new features:
https://play.google.com/store/apps/details?id=org.kiwix.kiwixmobile
We need new Java developers to implement features like tabs, bookmarks,
navigation history... More details in the bug tracker:
https://sourceforge.net/p/kiwix/feature-requests/search/?q=status%3Aopen+%2…
Beginners are welcome, stepping in is almost trivial, everything is
explained step by step in the COMPILE file:
https://sourceforge.net/p/kiwix/kiwix/ci/master/tree/
Regards
Emmanuel