any open search engine for web project starting

List overview All Threads
Download

newer

older

Campaigns for Growth (new editors)

Wikimedia DC Book Grant Program...

carl hansen

19 Mar 2016 19 Mar '16

12:17 a.m.

https://about.commonsearch.org/

"We are building a nonprofit search engine for the Web"

Sounds alot like Knowledge Engine, if there were such a thing. Any overlap with wikimedia projects?

Show replies by date

SarahSV

19 Mar 19 Mar

4:44 a.m.

On Fri, Mar 18, 2016 at 5:17 PM, carl hansen carlhansen1234@gmail.com wrote:

...

https://about.commonsearch.org/

"We are building a nonprofit search engine for the Web"

Sounds alot like Knowledge Engine, if there were such a thing. Any overlap with wikimedia projects?

Thanks for the link, Carl. Erik and Lydia are advisors, so perhaps they could say a bit more about it.

https://about.commonsearch.org/people

Sarah

Erik Moeller

5:15 a.m.

2016-03-18 21:44 GMT-07:00 SarahSV sarahsv.wiki@gmail.com:

...

On Fri, Mar 18, 2016 at 5:17 PM, carl hansen carlhansen1234@gmail.com wrote:

...
https://about.commonsearch.org/

"We are building a nonprofit search engine for the Web"

Sounds alot like Knowledge Engine, if there were such a thing. Any overlap with wikimedia projects?

...

Thanks for the link, Carl. Erik and Lydia are advisors, so perhaps they could say a bit more about it.

Sylvain has been working on this stuff for a while, blissfully ignorant of Wikimedia's discussions of search engines, rocketships and so on. He reached out to me shortly before the public announcement and we've talked a bit about governance, community & funding models. I've agreed to provide some continued advice along the way but have not otherwise been involved.

He recently posted on wikitech-l asking for suggestions how Wikipedia/Wikidata could be integrated: https://lists.wikimedia.org/pipermail/wikitech-l/2016-March/084984.html

There's a lot of heavy lifting still until Common Search can become a viable project even for narrowly defined purposes but I think it's a very worthwhile effort. It also is -- I think correctly -- based on the largest pre-existing open effort to index the web, the Common Crawl. This could lead to a mutually beneficial relationship between Common Search and Common Crawl. From a Wikimedia perspective, it might develop into an opportunity to jointly showcase some of the amazing stuff that Wikidata can already do.

Erik

Dan Garry

29 Mar 29 Mar

12:14 a.m.

A few of us from Discovery (myself, Tomasz Finc, Erik Bernhardson, and some others too) had the opportunity to meet Sylvain recently when he was in San Francisco. We chatted about touch points between Discovery and Common Search.

One important thing I personally learnt from chatting to Sylvain is that the challenges are *very* different for building an in-site search (like Discovery) and building a web search (like Common Search). Data that's critically important for one may be close to irrelevant for the other. Scaling issues for one may not even exist for the other. We did identify a few areas where Common Search may be creating datasets that would be useful for us in Discovery; Sylvain said he'd be in touch with us if that happens.

We're going to keep in touch and see if we can help each other out in the future.

Thanks, Dan

On 18 March 2016 at 22:15, Erik Moeller eloquence@gmail.com wrote:

...

2016-03-18 21:44 GMT-07:00 SarahSV sarahsv.wiki@gmail.com:

...
On Fri, Mar 18, 2016 at 5:17 PM, carl hansen carlhansen1234@gmail.com wrote:

...
https://about.commonsearch.org/

"We are building a nonprofit search engine for the Web"

Sounds alot like Knowledge Engine, if there were such a thing. Any overlap with wikimedia projects?

...
Thanks for the link, Carl. Erik and Lydia are advisors, so perhaps they could say a bit more about it.

Sylvain has been working on this stuff for a while, blissfully ignorant of Wikimedia's discussions of search engines, rocketships and so on. He reached out to me shortly before the public announcement and we've talked a bit about governance, community & funding models. I've agreed to provide some continued advice along the way but have not otherwise been involved.

He recently posted on wikitech-l asking for suggestions how Wikipedia/Wikidata could be integrated: https://lists.wikimedia.org/pipermail/wikitech-l/2016-March/084984.html

There's a lot of heavy lifting still until Common Search can become a viable project even for narrowly defined purposes but I think it's a very worthwhile effort. It also is -- I think correctly -- based on the largest pre-existing open effort to index the web, the Common Crawl. This could lead to a mutually beneficial relationship between Common Search and Common Crawl. From a Wikimedia perspective, it might develop into an opportunity to jointly showcase some of the amazing stuff that Wikidata can already do.

Erik

Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe

-- Dan Garry Lead Product Manager, Discovery Wikimedia Foundation

Andreas Kolbe

12:48 a.m.

Dan,

I understand you are currently only working on internal search, representing stage 1 of this project. But does the long-term vision of the subsequent stages still include things like –

1. incorporation of non-Wikimedia sources in search results, 2. an open source knowledge engine like IBM's Watson (i.e. an answer engine based on structured data), 3. an open source search engine, 4. public curation of relevance (i.e. volunteer-based search results ranking)?

1 and 4 remain mentioned in the Discovery FAQ[1] on MediaWiki; 1, 2 and 3 have been mentioned in recent on-wiki discussions by Jimmy Wales.[2] In fact, I see little in the Knowledge Engine grant agreement that is incompatible with the FAQ and those recent discussions.

...

From a fundraising point of view, I could fully understand why the WMF

might consider it a desirable long-term goal to turn wikipedia.org into a high-traffic search and answer engine. I am not sure how successful such an attempt would be – internet users have a marked preference for one-stop shops, and it would take some really nifty features to entice users away from the established search and answer engines – but I do understand why the idea would be attractive.

Andreas

[1] https://www.mediawiki.org/wiki/Wikimedia_Discovery/FAQ [2] https://en.wikipedia.org/wiki/User_talk:Jimbo_Wales

On Tue, Mar 29, 2016 at 12:14 AM, Dan Garry dgarry@wikimedia.org wrote:

...

A few of us from Discovery (myself, Tomasz Finc, Erik Bernhardson, and some others too) had the opportunity to meet Sylvain recently when he was in San Francisco. We chatted about touch points between Discovery and Common Search.

One important thing I personally learnt from chatting to Sylvain is that the challenges are *very* different for building an in-site search (like Discovery) and building a web search (like Common Search). Data that's critically important for one may be close to irrelevant for the other. Scaling issues for one may not even exist for the other. We did identify a few areas where Common Search may be creating datasets that would be useful for us in Discovery; Sylvain said he'd be in touch with us if that happens.

We're going to keep in touch and see if we can help each other out in the future.

Thanks, Dan

On 18 March 2016 at 22:15, Erik Moeller eloquence@gmail.com wrote:

...
2016-03-18 21:44 GMT-07:00 SarahSV sarahsv.wiki@gmail.com:

...
On Fri, Mar 18, 2016 at 5:17 PM, carl hansen <carlhansen1234@gmail.com

...
wrote:

...
https://about.commonsearch.org/

"We are building a nonprofit search engine for the Web"

Sounds alot like Knowledge Engine, if there were such a thing. Any overlap with wikimedia projects?

...
Thanks for the link, Carl. Erik and Lydia are advisors, so perhaps they could say a bit more about it.

Sylvain has been working on this stuff for a while, blissfully ignorant of Wikimedia's discussions of search engines, rocketships and so on. He reached out to me shortly before the public announcement and we've talked a bit about governance, community & funding models. I've agreed to provide some continued advice along the way but have not otherwise been involved.

He recently posted on wikitech-l asking for suggestions how Wikipedia/Wikidata could be integrated: https://lists.wikimedia.org/pipermail/wikitech-l/2016-March/084984.html

There's a lot of heavy lifting still until Common Search can become a viable project even for narrowly defined purposes but I think it's a very worthwhile effort. It also is -- I think correctly -- based on the largest pre-existing open effort to index the web, the Common Crawl. This could lead to a mutually beneficial relationship between Common Search and Common Crawl. From a Wikimedia perspective, it might develop into an opportunity to jointly showcase some of the amazing stuff that Wikidata can already do.

Erik

Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe

-- Dan Garry Lead Product Manager, Discovery Wikimedia Foundation _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe

Dan Garry

8:36 p.m.

On 28 March 2016 at 17:48, Andreas Kolbe jayen466@gmail.com wrote:

...

I understand you are currently only working on internal search, representing stage 1 of this project. But does the long-term vision of the subsequent stages still include things like –

incorporation of non-Wikimedia sources in search results,

Potentially. However, it's a vague idea, and we've got a long way to go before this could be seriously considered, so we're not actively working on it. Right now we have a lot of information even within Wikimedia sites which is surfaced poorly; surfacing such information is an important part of Discovery's narrative for FY2016-17 https://www.mediawiki.org/wiki/Wikimedia_Discovery/FDC_Proposal (July 2016 - June 2017). I intend for Discovery to work on improving that problem first.

...

an open source knowledge engine like IBM's Watson (i.e. an answer engine

based on structured data),

https://askplatyp.us/ does a pretty good job of this already, and it's backed by Wikidata. If you want to learn more, you can read the blog post on wikimedia.de https://blog.wikimedia.de/2015/02/23/platypus-a-speaking-interface-for-wikidata/ and check out the website of the creators https://projetpp.github.io/.

In the long term, I could see something like this being incorporated into search on our sites if it's good enough. Like the above, it's also a long way off, and we're not actively working on these efforts.

...

an open source search engine,

Clearly yes, because we're actively building a search engine for Wikipedia and our work is open source. If you actually mean "a general purpose web search engine", then this question is already in the FAQ https://www.mediawiki.org/wiki/Wikimedia_Discovery/FAQ#Are_you_building_a_search_engine_to_compete_with_Google.3F which you referenced, and the answer is no. I presently don't see how Discovery could offer something worthwhile to users here, especially with projects like Common Search already working on the problem.

...

public curation of relevance (i.e. volunteer-based search results

ranking)?

Yes, if users are interested. This is an incredibly early idea that is not fully fleshed out; we don't know how we would achieve something like this right now. A naïve example of how we could do something like this is by boosting the score of certain search results based on the presence of templates on the page. The reality would likely be something significantly more complex than this.

So, in short, many things are potentially on the table, but they're early ideas which are not actively being explored, and in exploring them we may decide not to do them. Sorry if that's not definitive enough of a statement, but roadmaps are intentionally not set in stone so as to be flexible and iterative.

Dan

-- Dan Garry Lead Product Manager, Discovery Wikimedia Foundation

Dan Garry

8:41 p.m.

On 29 March 2016 at 13:36, Dan Garry dgarry@wikimedia.org wrote:

...

Yes, if users are interested. This is an incredibly early idea that is not fully fleshed out; we don't know how we would achieve something like this right now. A naïve example of how we could do something like this is by boosting the score of certain search results based on the presence of templates on the page. The reality would likely be something significantly more complex than this.

A detail I forgot to mention here is that this example is not hypothetical. I was referring to boost-templates https://www.mediawiki.org/wiki/Help:CirrusSearch#boost-templates:, which one can already use in CirrusSearch. This work predates the Discovery Department. So, in that sense, public curation of relevance already exists in some sense. My point was that building on this is not something we're working on right now.

Dan

-- Dan Garry Lead Product Manager, Discovery Wikimedia Foundation

3173

Age (days ago)

3183

Last active (days ago)

wikimedia-l@lists.wikimedia.org

6 comments

5 participants

tags (0)

participants (5)

Andreas Kolbe
carl hansen
Dan Garry
Erik Moeller
SarahSV