On Mar 27, 2013 1:31 AM, "Teresa Cho" tcho708@gmail.com wrote:
Hi everyone,
My name is Teresa (or terrrydactyl if you've seen me on IRC) and I've been interning at Wikimedia for the last few months through the Outreach Program for Women[1]. My project, Git2Pages[2], is an extension to pull snippets of code/text from a git repository. I've been working hard on learning PHP and the MediaWiki framework/development cycle. My internship is ending soon and I wanted to reach out to the community and ask for feedback.
Cool stuff!
Here's what the program currently does:
- User supplies (git) url, filename, branch, startline, endline using
the #snippet tag
- Git2Pages.body.php will validate the information and then pass on
the inputs into my library, GitRepository.php
- GitRepository will do a sparse checkout on the information, that is,
it will clone the repository but only keep the specified file (this was implemented to save space)
- The repositories will be cloned into a folder that is a md5 hash of
the url + branch to make sure that the program isn't cloning a ton of copies of the same repository
Why hash it, and not just keep the url + branch encoded to some charset that is a valid path, saving rare yet hairy collisions?
- If the repository already exists, the file will be added to the
sparse-checkout file and the program will update the working tree
Will there be a re checkout for a duplicate request? Will the cache of files ever be cleaned?
- Once the repo is cloned, the program will go and yank the lines that
the user requested and it'll return the text encased in a <pre> tag.
This is my baseline program. It works (for me at least). I have a few ideas of what to work on next, but I would really like to know if I'm going in the right direction. Is this something you would use? How does my code look, is the implementation up to the MediaWiki coding standard? buttt You can find the progression of the code on gerrit[3].
Here are some ideas of what I might want to implement while still on the internship:
- Instead of a <pre> tag, encase it in a <syntaxhighlight lang> tag if
it's code, maybe add a flag for user to supply the language
- Keep a database of all the repositories that a wiki has (though not
sure how to handle deletions)
Here are some problems I might face:
- If I update the working tree each time a file from the same
repository is added, then the line numbers may not match the old file
- Should I be periodically updating the repositories or perhaps keep
multiple snapshots of the same repository
- Cloning an entire repository and keeping only one file does not seem
ideal, but I've yet to find a better solution, the more repositories being used concurrently the bigger an issue this might be
- I'm also worried about security implications of my program. Security
isn't my area of expertise, and I would definitely appreciate some input from people with a security background
Thanks for taking the time to read this and thanks in advance for any feedback, bug reports, etc.
Have a great day, Teresa http://www.mediawiki.org/wiki/User:Chot
[1] https://www.mediawiki.org/wiki/Outreach_Program_for_Women [2] http://www.mediawiki.org/wiki/Extension:Git2Pages [3]
https://gerrit.wikimedia.org/r/#/q/project:mediawiki/extensions/Git2Pages,n,...
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l