I co maintain codesearch with Kunal and I have similar notes. I hope instead of duplicating the work, we could join forces to improve the development productivity infrastructure.

Codesearch has been working fine in the past couple of years. There is a new frontend being built and I hope we can deploy it soon to provide a better user experience and I personally don't see a value in re-implementing codesearch. Especially using non-open source software.

In a rather long-term solution, I hope/dream we could implement what Google has in automating refactoring. It's called LSC [1] (Large Scale Changes) and we can even piggy back to the library upgrader tool to automate easy depreciation fixes so developers could focus on complex cases. It's disheartening to me to see the valuable time of our volunteer developers being spent on something that could be automated. (For example see the sheer number of patches made for this deprecation: https://gerrit.wikimedia.org/r/q/bug:T286694). It doesn't have to be able to parse php code and do complex magic at first. We can start with simple regex replacements and then add using rector (a really nice library for doing refactors in php) and its equivalent in other languages.

[1] For more information see "Software Engineering at Google: Lessons Learned from Programming Over Time" book: https://www.goodreads.com/book/show/48816586-software-engineering-at-google

Best

On Sat, Feb 5, 2022 at 2:41 AM Kunal Mehta <legoktm@debian.org> wrote:

Hi,

On 2/4/22 08:58, Adam Baso wrote:
> Thanks for sharing this. I was wondering, might it be possible to help
> bring this sort of functionality into Code Search (
> https://codesearch.wmflabs.org <https://codesearch.wmflabs.org> ) ? I
> noticed the presentation of the search UI looked similar, but I see how
> the symbol resolution might be something useful for Code Search and
> upstream Hound.

Indeed, we've been discussing and exploring symbol-based search for a
while now: <https://phabricator.wikimedia.org/T183795>. There are some
pretty neat upstream projects that do this like <https://searchfox.org/>
and zoekt, which is designed for Gerrit integration. I would also note
that things are likely to change whenever we migrate to GitLab, which
has its own search functionality built-in
(<https://phabricator.wikimedia.org/T268196>). My assumption is that
GitLab will add symbol-based search eventually to compete with GitHub,
hopefully that ends up in the CE version someday...

While I very much disagree with the opening proposition that "Working
with Wikimedia code is time-consuming and risky", I think symbol search
of the MediaWiki codebase would be incredibly powerful and unlock a new
level of tooling, just like Codesearch did when it was first introduced,
so I'm glad to see people looking into it! For example we could do stuff
like <https://phabricator.wikimedia.org/T186771> with it.

There were two main principles in building MediaWiki Codesearch, first
that everything be licensed as free software[1] (which Bryan covered as
well) and that we try to use upstream as much as possible. Our
modifications are injected by a proxy rather than patch the upstream
code...which has turned out to be incredibly stable over the years.

If you want to collaborate, all the code and setup is in Git and can add
you to the project, but I see little to no value in building proprietary
tools or reinventing what other projects have done pretty well rather
than building on top of them.

[1] https://mako.cc/writing/hill-free_tools.html

-- Legoktm
_______________________________________________
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/


--
Amir (he/him)