Dear Thomas,

On Sat, Oct 8, 2016 at 12:07 PM, Thomas Douillard <thomas.douillard@gmail.com> wrote:
Probably a silly question but ... did you all consider creating a datatype for molecue representation ? This seem to be a very similar usecase than mathematica formula. Essentially we're not dealing with a raw string but a representation of molecule formulas, with its own encoding ...

The InChI is actually not a structural representation, but a derived unique identifier.

What you propose would, however, apply to the SMILES. That one is generally of about the same size as the InChI, and there your solution sounds like a great idea!

Egon
 
Changing the limit seem to be a poor workaround to a dedicated datatype - nobody seems to have found a relevant usecase and it seem to me that we're essentially abusing strings for storing blobs ...

2016-10-08 11:33 GMT+02:00 Egon Willighagen <egon.willighagen@gmail.com>:


On Sat, Oct 8, 2016 at 11:28 AM, Lydia Pintscher <lydia.pintscher@wikimedia.de> wrote:
On Sat, Oct 8, 2016 at 11:23 AM, Egon Willighagen
<egon.willighagen@gmail.com> wrote:
> Ah, those numbers are for https://www.wikidata.org/wiki/Property:P234 ...

External identifier then. Cool. And for string like in
https://www.wikidata.org/wiki/Property:P233? Sebastian's initial email 
says 1500 to 2000. Is this still a good number after this discussion?

Yes, that would cover more than 99.9% of all InChIs in PubChem. (See Sebastian's reply earlier in this thread.)

Egon

--
E.L. Willighagen
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers
ORCID: 0000-0001-7542-0286
ImpactStory: https://impactstory.org/u/egonwillighagen

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata



_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata




--
E.L. Willighagen
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers
ORCID: 0000-0001-7542-0286
ImpactStory: https://impactstory.org/u/egonwillighagen