Hi Leila,
I can point you to two methods: CL-ESA and CL-CNG.
Cross-Language Explicit Semantic Analyse (CL-ESA): http://www.uni-weimar.de/medien/webis/publications/papers/stein_2008b.pdf
This model allows for language-independent comparison of texts without relying on parallel corpora or translation dictionaries for training. Rather, it exploits the cross-language links of Wikipedia articles to embed documents from two or more languages in a joint vector space, rendering them directly comparable, e.g., using cosine similarity. The more language links exit between two Wikipedia languages, the higher the dimensionality of the joint vector space can be made, and the better a cross-language ranking will perform. At the document level, near-perfect recall on a ranking task is achieved at 100,000 dimensions (=linked articles across languages). See Table 2 of the paper. The model is easy to be implemented, however, somewhat expensive to compute.
Cross-language Character N-Gram model (CL-CNG): In subsequent experiments, we compared the model with alternatives; one that is trained on the basis of a parallel corpus, and another that simply exploits lexical overlap of character N-grams between pairs of documents from different languages: http://www.uni-weimar.de/medien/webis/publications/papers/stein_2011b.pdf
As it turns out, CL-C3G (i.e., N=3) is extremely effective, too, on language pairs that share an alphabet and where lexical overlap can be expected, e.g., due to them having a common ancestor. So, it works very well for German-Dutch, but less so for English-Russian. In the latter case, CL-ESA works, though. The CL-CNG model is even easier to be implemented and very scalable. Dependent on the language pairs you are investigating, this model may help a great deal.
Perhaps these models may be of use when building a cross-language alignment tool.
Best, Martin