Hi all,
I'm in the process of developing a media handling extension for MediaWiki that will allow users with WebGL-enabled browsers to manipulate 3D models of large biological molecules, like proteins and DNA. I'm new to MediaWiki development, and I've got some questions about how I should go forward with development of this extension if I want to ultimately get it into official Wikimedia MediaWiki deployments.
My initial goal is to put the kind of interactive model available at http://webglmol.sourceforge.jp/glmol/viewer.html into infoboxes like the one in http://en.wikipedia.org/wiki/FOXP2. The library enabling this interactivity is called GLmol -- it's licensed under LGPL and described at http://webglmol.sourceforge.jp/index-en.html. There is some more background discussion on the extension at http://en.wikipedia.org/wiki/Portal:Gene_Wiki/Discussion#Enabling_molecular_....
I have a prototype of the extension working on a local deployment of MediaWiki 1.18.1. I've tried to organize the extension's code roughly along the lines of http://www.mediawiki.org/wiki/Extension:OggHandler. The user workflow to get an interactive protein model into an article is to:
1) Upload a PDB file (e.g. http://www.rcsb.org/pdb/files/2A07.pdb) representing the protein structure through MediaWiki's standard file upload UI. 2) Add a wikilink to the resulting file, very similar to what's done with images. For example, [[File:2A07.pdb]].
If the user's browser has WebGL enabled, an interactive model of the macromolecule similar to one in the linked GLmol demo is then loaded onto the page via an asynchronous request to get the 3D model's atomic coordinate data. I've done work to decrease the time needed to render the 3D model and the size of the 3D model data (much beyond gzipping), so my prototype loads faster than the linked demo.
A main element of this extension -- which I haven't yet developed -- is how it will gracefully degrade for users without WebGL enabled. IE8 and IE9 don't support WebGL, and IE10 probably won't either. Safari 5.1.5 supports WebGL, but not by default. WebGL is also not supported on many smartphones.
One idea is to fall back to a 2D canvas representation of the model, perhaps like the 3D-to-2D examples at https://github.com/mrdoob/three.js/. I see several drawbacks to this. First, it would not be a fall-back for clients with JavaScript disabled. Second, the GLmol molecular viewer library doesn't currently support 2D canvas fall-back, and it would probably take substantial time and effort to add that feature. Third, there are browser plug-ins for IE that enable WebGL, e.g. http://iewebgl.com/.
Given that, my initial plan for handling browsers without WebGL enabled is to fall back to a static image of the corresponding protein/DNA structure. A few years ago I wrote a program to take in a PDB file and output a high-quality static image of the corresponding structure. This resulted in PDBbot (http://commons.wikimedia.org/wiki/User:PDBbot, http://code.google.com/p/pdbbot/). That code could likely be repurposed in this media handling extension to generate a static image upon the upload of a PDB file. The PDBbot code is mostly Python 3, and it interacts with GIMP (via scripts in scheme) and PyMOL (http://en.wikipedia.org/wiki/PyMOL, freely licensed: http://pymol.svn.sourceforge.net/viewvc/pymol/trunk/pymol/LICENSE?revision=3... ).
Would requiring Python, GIMP and PyMOL to be installed on the server be workable for a WMF MediaWiki deployment? If not, then there is a free web service developed for Wikipedia (via Gene Wiki) available from the European Bioinformatics Institute, which points to their pre-rendered static images for macromolecules. The static images could thus be retrieved from a remote server if it wouldn't be feasible to generate them on locally on the upload server. I see a couple of disadvantages to this approac, e.g. relying on a remote third-party web service, but I thought I'd put the idea out for consideration. If generating static images on the upload server wouldn't be possible, would this be a workable alternative?
After I get an answer on the questions above, I can begin working on that next major part of the extension. This is a fairly blocking issue, so feedback would definitely be appreciated.
Beyond that, and assuming this extension seems viable so far, I've got some more questions:
1. Once I get the prototype more fully developed, what would be the best next step to presenting it and getting it code reviewed? Should I set up a demo on a random domain/third-party VPN, or maybe something like http://deployment.wikimedia.beta.wmflabs.org/wiki/Main_Page? Or maybe the former would come before the latter?
2. PDB (.pdb) is a niche file type that has a non-standard MIME type of "chemical/x-pdb". See http://en.wikipedia.org/wiki/Protein_Data_Bank_%28file_format%29 for more. To upload files with this MIME type, in my local MediaWiki deployment I had to relax a constraint in the 'image' database table on what MIME types are allowed. If I recall correctly there was an enum that allowed only a small handful of MIME types to be uploaded. I also had to adjust some other configuration settings in Apache and MediaWiki so that .pdb files were properly handled. Would these things be doable in an official WMF deployment? If not, what are some possible workarounds?
3. If at all possible, I'd like to have the molecular models be interactive by default, i.e. be manipulable when the page loads without the WebGL-enabled user having to click some control to replace a static image with a model to enable interactivity. The advantage of this is that it would make the feature easier to discover and quicker to use. Talking around, the main potential disadvantage I've heard of this approach is that it might take long to load. However, with the optimizations I've made to GLmol I think it would be possible to have the interactive models load on the same order of time that images take to load on pages. Does having model interactivity by default for WebGL-enabled users sound feasible?
Thanks in advance for any answers to these questions or general feedback.
Best, Eric http://en.wikipedia.org/wiki/User:Emw
On 21/04/12 17:07, emw wrote:
Hi all,
I'm in the process of developing a media handling extension for MediaWiki that will allow users with WebGL-enabled browsers to manipulate 3D models of large biological molecules, like proteins and DNA. I'm new to MediaWiki development, and I've got some questions about how I should go forward with development of this extension if I want to ultimately get it into official Wikimedia MediaWiki deployments.
(...)
Given that, my initial plan for handling browsers without WebGL enabled is to fall back to a static image of the corresponding protein/DNA structure.
Seems the appropiate thing to do.
Would requiring Python, GIMP and PyMOL to be installed on the server be workable for a WMF MediaWiki deployment?
Not ideal, but is probably workable. Still much better than relying (and potentially DDOSing) on a third party. If you could drop GIMP requirement, that'd be even better (why is it needed?).
1. Once I get the prototype more fully developed, what would be the
best next step to presenting it and getting it code reviewed? Should I set up a demo on a random domain/third-party VPN, or maybe something like http://deployment.wikimedia.beta.wmflabs.org/wiki/Main_Page? Or maybe the former would come before the latter?
I'd go directly on labs.
2. PDB (.pdb) is a niche file type that has a non-standard MIME type of
"chemical/x-pdb". See http://en.wikipedia.org/wiki/Protein_Data_Bank_%28file_format%29 for more.
We could detect the format. Don't worry about that.
3. If at all possible, I'd like to have the molecular models be
interactive by default, Does having model interactivity by default for WebGL-enabled users
sound feasible?
Maybe, maybe not. Needs testing. I'd wait after having the prototype for deciding that default.
Thanks for your contributions!
Thanks for the feedback Platonides.
Would requiring Python, GIMP and PyMOL to be installed on the server be workable for a WMF MediaWiki deployment?
Not ideal, but is probably workable. Still much better than relying (and potentially DDOSing) on a third party. If you could drop GIMP requirement, that'd be even better (why is it needed?).
A small script that hooks into GIMP API methods is used to tidy up PNGs output by the PyMOL molecular visualization program. The PyMOL images are originally output with a lot of extraneous whitespace. Specifically, the script takes in a PNG and outputs an autocropped image with 50 pixels of whitespace around the subject -- e.g. http://en.wikipedia.org/wiki/File:Protein_FOXP2_PDB_2a07.png. The script: https://code.google.com/p/pdbbot/source/browse/trunk/crop-and-pad-pdb.scm.
ImageMagick seems like it might also have the ability to programmatically autocrop an image and add a certain padding around the subject. I'll look into that and substitute an ImageMagick script for the GIMP one if possible.
If anyone can think of a better option for that, please let me know.
- Eric
On 22/04/12 15:54, emw wrote:
A small script that hooks into GIMP API methods is used to tidy up PNGs output by the PyMOL molecular visualization program. The PyMOL images are originally output with a lot of extraneous whitespace. Specifically, the script takes in a PNG and outputs an autocropped image with 50 pixels of whitespace around the subject -- e.g. http://en.wikipedia.org/wiki/File:Protein_FOXP2_PDB_2a07.png. The script: https://code.google.com/p/pdbbot/source/browse/trunk/crop-and-pad-pdb.scm.
That seems simple. In worst case, it could be provided as a script and we could look into a different alternative for that.
ImageMagick seems like it might also have the ability to programmatically autocrop an image and add a certain padding around the subject. I'll look into that and substitute an ImageMagick script for the GIMP one if possible.
Yes, ImageMagick seems like a more suited solution.
On 04/21/2012 08:07 AM, emw wrote:
Hi all,
I'm in the process of developing a media handling extension for MediaWiki that will allow users with WebGL-enabled browsers to manipulate 3D models of large biological molecules, like proteins and DNA. I'm new to MediaWiki development, and I've got some questions about how I should go forward with development of this extension if I want to ultimately get it into official Wikimedia MediaWiki deployments.
Eric, thank you for this contribution, and, like, wow, cool! Let me point you to https://www.mediawiki.org/wiki/Writing_an_extension_for_deployment , which answers several of your questions, I think.
Please keep sharing your progress here.
That page is precisely the kind of information I'm looking for, thanks! Per the instructions there, I'll talk with Howie Fung about the extension. And I'll update here with further questions or significant notes as they come along.
- Eric
On Sat, Apr 21, 2012 at 4:12 PM, Sumana Harihareswara <sumanah@wikimedia.org
wrote:
On 04/21/2012 08:07 AM, emw wrote:
Hi all,
I'm in the process of developing a media handling extension for MediaWiki that will allow users with WebGL-enabled browsers to manipulate 3D models of large biological molecules, like proteins and DNA. I'm new to
MediaWiki
development, and I've got some questions about how I should go forward
with
development of this extension if I want to ultimately get it into
official
Wikimedia MediaWiki deployments.
Eric, thank you for this contribution, and, like, wow, cool! Let me point you to https://www.mediawiki.org/wiki/Writing_an_extension_for_deployment , which answers several of your questions, I think.
Please keep sharing your progress here.
-- Sumana Harihareswara Volunteer Development Coordinator Wikimedia Foundation
wikitech-l@lists.wikimedia.org