Detecting lexical entailment plays a fundamental role in a variety of natural language processing tasks and is key to language understanding. Unsupervised methods still play an important role due to the lack of coverage of lexical databases in some domains and languages. Most of the previous approaches were either based on statistical hypothesis of specific entailment relations or tried to encode word relations in low-dimensional vector embeddings. This thesis builds upon one of the few approaches which intrinsically model entailment in a vector space. We then further generalize this model by introducing an alternative, distributional representations for words which harnesses tools from optimal transport to define distance or entailment measures between such representations. We evaluated the models on hypernymy detection where our distributional estimate significantly improves over the underlying model and even outperforms state-of-the-art on some datasets.