What is the intended format of the dump files? The page makes it sound like it will have a binary format, which I'm not opposed to, but is definitely something you should decide on.
Also, I really like the idea of writing it in a low level language and then having bindings for something higher. However, unless you plan of having multiple language bindings (e.g., *both* C# and Python), you may want to pick a different route. For example, if you decide to only bind to Python, you can use something like Cython, which would allow you to write pseudo-Python that is still compiled to C. Of course, if you want multiple language bindings, this is likely no longer an option.
*-- * *Tyler Romeo* Stevens Institute of Technology, Class of 2016 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com
On Mon, Jul 1, 2013 at 10:00 AM, Petr Onderka gsvick@gmail.com wrote:
For my GSoC project Incremental data dumps [1], I'm creating a new file format to replace Wikimedia's XML data dumps. A sketch of how I imagine the file format to look like is at http://www.mediawiki.org/wiki/User:Svick/Incremental_dumps/File_format.
What do you think? Does it make sense? Would it work for your use case? Any comments or suggestions are welcome.
Petr Onderka [[User:Svick]]
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l