I co maintain codesearch with Kunal and I have similar notes. I hope instead of duplicating the work, we could join forces to improve the development productivity infrastructure.
Codesearch has been working fine in the past couple of years. There is a new frontend being built and I hope we can deploy it soon to provide a better user experience and I personally don't see a value in re-implementing codesearch. Especially using non-open source software.
In a rather long-term solution, I hope/dream we could implement what Google has in automating refactoring. It's called LSC [1] (Large Scale Changes) and we can even piggy back to the library upgrader tool to automate easy depreciation fixes so developers could focus on complex cases. It's disheartening to me to see the valuable time of our volunteer developers being spent on something that could be automated. (For example see the sheer number of patches made for this deprecation: https://gerrit.wikimedia.org/r/q/bug:T286694). It doesn't have to be able to parse php code and do complex magic at first. We can start with simple regex replacements and then add using rector (a really nice library for doing refactors in php) and its equivalent in other languages.
[1] For more information see "Software Engineering at Google: Lessons Learned from Programming Over Time" book: https://www.goodreads.com/book/show/48816586-software-engineering-at-google
Best
On Sat, Feb 5, 2022 at 2:41 AM Kunal Mehta legoktm@debian.org wrote:
Hi,
On 2/4/22 08:58, Adam Baso wrote:
Thanks for sharing this. I was wondering, might it be possible to help bring this sort of functionality into Code Search ( https://codesearch.wmflabs.org https://codesearch.wmflabs.org ) ? I noticed the presentation of the search UI looked similar, but I see how the symbol resolution might be something useful for Code Search and upstream Hound.
Indeed, we've been discussing and exploring symbol-based search for a while now: https://phabricator.wikimedia.org/T183795. There are some pretty neat upstream projects that do this like https://searchfox.org/ and zoekt, which is designed for Gerrit integration. I would also note that things are likely to change whenever we migrate to GitLab, which has its own search functionality built-in (https://phabricator.wikimedia.org/T268196). My assumption is that GitLab will add symbol-based search eventually to compete with GitHub, hopefully that ends up in the CE version someday...
While I very much disagree with the opening proposition that "Working with Wikimedia code is time-consuming and risky", I think symbol search of the MediaWiki codebase would be incredibly powerful and unlock a new level of tooling, just like Codesearch did when it was first introduced, so I'm glad to see people looking into it! For example we could do stuff like https://phabricator.wikimedia.org/T186771 with it.
There were two main principles in building MediaWiki Codesearch, first that everything be licensed as free software[1] (which Bryan covered as well) and that we try to use upstream as much as possible. Our modifications are injected by a proxy rather than patch the upstream code...which has turned out to be incredibly stable over the years.
If you want to collaborate, all the code and setup is in Git and can add you to the project, but I see little to no value in building proprietary tools or reinventing what other projects have done pretty well rather than building on top of them.
[1] https://mako.cc/writing/hill-free_tools.html
-- Legoktm _______________________________________________ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/