Hi all,
I'm excited to see that Max has made a lot of great progress in adding Solr support to the GeoData extension so that we don't have to use mysql for spatial search - https://gerrit.wikimedia.org/r/#/c/27610/
GeoData makes use of the Solarium php client, which is currently included as a part of the extension. GeoData will be our second use of Solar, after TranslationMemory extension which is already deployed - https://www.mediawiki.org/wiki/Help:Extension:Translate/Translation_memories... the Wikidata team is working on using Solr in their extensions as well.
TranslationMemory also uses Solarium, a copy of which is also bundled with and loaded from the extension. For a loading and config example - https://gerrit.wikimedia.org/r/gitweb?p=operations/mediawiki-config.git;a=bl...
I think Solr is the right direction for us to go in. Current efforts can pave the way for a complete refresh of WMF's article full text search as well as how our developers approach information retrieval. We just need to make sure that these efforts are unified, with commonality around the client api, configuration, indexing (preferably with updates asynchronously pushed to Solr in near real-time), and schema definition. This is important from an operational aspect as well, where it would be ideal to have a single distributed and redundant cluster.
It would be great to see the i18n, mobile tech, wikidata, and any other interested parties collaborate and agree on a path forward, with a quick sprint around common code that all can use.
-Asher
Whee!
On 18.10.2012, 22:22 Asher wrote:
Hi all,
I'm excited to see that Max has made a lot of great progress in adding Solr support to the GeoData extension so that we don't have to use mysql for spatial search - https://gerrit.wikimedia.org/r/#/c/27610/
GeoData makes use of the Solarium php client, which is currently included as a part of the extension. GeoData will be our second use of Solar, after TranslationMemory extension which is already deployed - https://www.mediawiki.org/wiki/Help:Extension:Translate/Translation_memories and the Wikidata team is working on using Solr in their extensions as well.
A little comment on my choice of client library: I initially tried to use http://php.net/solr but quickly dicovered that it lacks many features, e.g. core support.
TranslationMemory also uses Solarium, a copy of which is also bundled with and loaded from the extension. For a loading and config example - https://gerrit.wikimedia.org/r/gitweb?p=operations/mediawiki-config.git;a=bl...
I think Solr is the right direction for us to go in. Current efforts can pave the way for a complete refresh of WMF's article full text search as well as how our developers approach information retrieval.
We still need a Java developer to port our custom Lucene code to Solr in order to use Solr for wiki search.
We just need to make sure that these efforts are unified, with commonality around the client api, configuration, indexing (preferably with updates asynchronously pushed to Solr in near real-time), and schema definition. This is important from an operational aspect as well, where it would be ideal to have a single distributed and redundant cluster.
I've already discussed with Niklas the possibility of moving Solarium to a shared extension to keep things centralised. Guess we just need a repo set up to move forward.
It would be great to see the i18n, mobile tech, wikidata, and any other interested parties collaborate and agree on a path forward, with a quick sprint around common code that all can use.
+1000000
Asher - great suggestion!
TranslationMemory also uses Solarium, a copy of which is also bundled with and loaded from the extension. For a loading and config example - https://gerrit.wikimedia.org/r/gitweb?p=operations/mediawiki-config.git;a=bl...
Niklas has been pretty satisfied with Solr's performance for TM. We are very interested in collaborating and working with you to make Solr more pervasive on our production infrastructure.
Cheers, Alolita
On Thu, Oct 18, 2012 at 11:46 AM, Max Semenik maxsem.wiki@gmail.com wrote:
Whee!
On 18.10.2012, 22:22 Asher wrote:
Hi all,
I'm excited to see that Max has made a lot of great progress in adding Solr support to the GeoData extension so that we don't have to use mysql for spatial search - https://gerrit.wikimedia.org/r/#/c/27610/
GeoData makes use of the Solarium php client, which is currently included as a part of the extension. GeoData will be our second use of Solar, after TranslationMemory extension which is already deployed - https://www.mediawiki.org/wiki/Help:Extension:Translate/Translation_memories and the Wikidata team is working on using Solr in their extensions as well.
A little comment on my choice of client library: I initially tried to use http://php.net/solr but quickly dicovered that it lacks many features, e.g. core support.
TranslationMemory also uses Solarium, a copy of which is also bundled with and loaded from the extension. For a loading and config example - https://gerrit.wikimedia.org/r/gitweb?p=operations/mediawiki-config.git;a=bl...
I think Solr is the right direction for us to go in. Current efforts can pave the way for a complete refresh of WMF's article full text search as well as how our developers approach information retrieval.
We still need a Java developer to port our custom Lucene code to Solr in order to use Solr for wiki search.
We just need to make sure that these efforts are unified, with commonality around the client api, configuration, indexing (preferably with updates asynchronously pushed to Solr in near real-time), and schema definition. This is important from an operational aspect as well, where it would be ideal to have a single distributed and redundant cluster.
I've already discussed with Niklas the possibility of moving Solarium to a shared extension to keep things centralised. Guess we just need a repo set up to move forward.
It would be great to see the i18n, mobile tech, wikidata, and any other interested parties collaborate and agree on a path forward, with a quick sprint around common code that all can use.
+1000000
-- Best regards, Max Semenik ([[User:MaxSem]])
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hi guys,
as far as I know, there is also a project in GESIS institute that couples Semantic MediaWiki and Solr to get cool faceted search. Simon Bachenberg (in Cc) will make the presentation of this project soon on a conference.
http://semantic-mediawiki.org/wiki/SMWCon_Fall_2012/SolrStore ----- Yury Katkov
On Thu, Oct 18, 2012 at 10:22 PM, Asher Feldman afeldman@wikimedia.org wrote:
Hi all,
I'm excited to see that Max has made a lot of great progress in adding Solr support to the GeoData extension so that we don't have to use mysql for spatial search - https://gerrit.wikimedia.org/r/#/c/27610/
GeoData makes use of the Solarium php client, which is currently included as a part of the extension. GeoData will be our second use of Solar, after TranslationMemory extension which is already deployed - https://www.mediawiki.org/wiki/Help:Extension:Translate/Translation_memories... the Wikidata team is working on using Solr in their extensions as well.
TranslationMemory also uses Solarium, a copy of which is also bundled with and loaded from the extension. For a loading and config example - https://gerrit.wikimedia.org/r/gitweb?p=operations/mediawiki-config.git;a=bl...
I think Solr is the right direction for us to go in. Current efforts can pave the way for a complete refresh of WMF's article full text search as well as how our developers approach information retrieval. We just need to make sure that these efforts are unified, with commonality around the client api, configuration, indexing (preferably with updates asynchronously pushed to Solr in near real-time), and schema definition. This is important from an operational aspect as well, where it would be ideal to have a single distributed and redundant cluster.
It would be great to see the i18n, mobile tech, wikidata, and any other interested parties collaborate and agree on a path forward, with a quick sprint around common code that all can use.
-Asher _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Thu, Oct 18, 2012 at 11:22:05AM -0700, Asher Feldman wrote:
I think Solr is the right direction for us to go in. Current efforts can pave the way for a complete refresh of WMF's article full text search as well as how our developers approach information retrieval. We just need to make sure that these efforts are unified, with commonality around the client api, configuration, indexing (preferably with updates asynchronously pushed to Solr in near real-time), and schema definition. This is important from an operational aspect as well, where it would be ideal to have a single distributed and redundant cluster.
I'm curious, has anyone evaluated ElasticSearch and whether it'd be more or less suitable for us than Solr? If so, I'd be very interested in the comparison results for our use cases.
Regards, Faidon
Faidon, FYI - the i18n eng team considered Elastic Search but did not do a deep evaluation on it before selecting Solr.
-Alolita
On Thu, Oct 18, 2012 at 1:35 PM, Faidon Liambotis faidon@wikimedia.org wrote:
On Thu, Oct 18, 2012 at 11:22:05AM -0700, Asher Feldman wrote:
I think Solr is the right direction for us to go in. Current efforts can pave the way for a complete refresh of WMF's article full text search as well as how our developers approach information retrieval. We just need to make sure that these efforts are unified, with commonality around the client api, configuration, indexing (preferably with updates asynchronously pushed to Solr in near real-time), and schema definition. This is important from an operational aspect as well, where it would be ideal to have a single distributed and redundant cluster.
I'm curious, has anyone evaluated ElasticSearch and whether it'd be more or less suitable for us than Solr? If so, I'd be very interested in the comparison results for our use cases.
Regards, Faidon
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
That is great to hear. Thanks for tying us together, Asher.
For Wikidata, we have not uploaded our Solr extension yet (mostly because we are waiting for the repository to be set up), but we will then upload it soon once it is there. I would be especially interested in sharing schema and config snippets for the many languages we support, but as far as I can tell this is not so much a requirement for Max and maybe not even for the TranslateMemory, not sure.
We also selected Solarium to connect to Solr. It is a bit worrysome that it is basically a one person project if I see this correctly, but the library seems small enough not to pose the risk of becoming too much of a maintenance legacy I'd say -- especially compared to the alternatives.
Any preferences for Solr 3 v 4? It seems that 4 is the smarter choice, but we are having trouble to get 4 run on labs. Solr 3 works pretty much out of the box, though.
Also, we should probably at some point consider how the different extensions and their dependencies should be handled. I'd prefer not to ship three different versions of Solarium with three extensions :)
Cheers, Denny
P.S.: Yuri, regarding GESIS' Solr implementation, they have done some great work for using Solr as a store for the structured data in SMW. Funny thing is, they actually do not use Solr for the search itself! This work is somewhat relevant for Wikidata phase 3, but unfortunately quite irrelevant for TranslationMemory or Geodata.
2012/10/18 Asher Feldman afeldman@wikimedia.org:
Hi all,
I'm excited to see that Max has made a lot of great progress in adding Solr support to the GeoData extension so that we don't have to use mysql for spatial search - https://gerrit.wikimedia.org/r/#/c/27610/
GeoData makes use of the Solarium php client, which is currently included as a part of the extension. GeoData will be our second use of Solar, after TranslationMemory extension which is already deployed - https://www.mediawiki.org/wiki/Help:Extension:Translate/Translation_memories and the Wikidata team is working on using Solr in their extensions as well.
TranslationMemory also uses Solarium, a copy of which is also bundled with and loaded from the extension. For a loading and config example - https://gerrit.wikimedia.org/r/gitweb?p=operations/mediawiki-config.git;a=bl...
I think Solr is the right direction for us to go in. Current efforts can pave the way for a complete refresh of WMF's article full text search as well as how our developers approach information retrieval. We just need to make sure that these efforts are unified, with commonality around the client api, configuration, indexing (preferably with updates asynchronously pushed to Solr in near real-time), and schema definition. This is important from an operational aspect as well, where it would be ideal to have a single distributed and redundant cluster.
It would be great to see the i18n, mobile tech, wikidata, and any other interested parties collaborate and agree on a path forward, with a quick sprint around common code that all can use.
-Asher
On 19.10.2012, 1:56 Denny wrote:
That is great to hear. Thanks for tying us together, Asher.
For Wikidata, we have not uploaded our Solr extension yet (mostly because we are waiting for the repository to be set up), but we will then upload it soon once it is there. I would be especially interested in sharing schema and config snippets for the many languages we support, but as far as I can tell this is not so much a requirement for Max and maybe not even for the TranslateMemory, not sure.
Yes, GeoData doesn't use text search at all.
We also selected Solarium to connect to Solr. It is a bit worrysome that it is basically a one person project if I see this correctly, but the library seems small enough not to pose the risk of becoming too much of a maintenance legacy I'd say -- especially compared to the alternatives.
Any preferences for Solr 3 v 4? It seems that 4 is the smarter choice, but we are having trouble to get 4 run on labs. Solr 3 works pretty much out of the box, though.
At the moment, the only option for WMF is our custom-built 3.6.0, and when thinking about Solr 4 please remember that it's only 4.*0* :)
Another point: I hope, everyone is happy with Jetty as a servlet container?
Also, we should probably at some point consider how the different extensions and their dependencies should be handled. I'd prefer not to ship three different versions of Solarium with three extensions :)
I've created a repo request at https://www.mediawiki.org/wiki/Git/Conversion/Extensions_queue#List
wikitech-l@lists.wikimedia.org