Unmanned Aerial Vehicles are becoming increasingly popular for a broad variety of tasks ranging from aerial imagery to objects delivery. With the expansion of the areas, where drones can be efficiently used, the collision risk with other flying objects increases. Avoiding such collisions would be a relatively easy task, if all the aircrafts in the neighboring airspace could communicate with each other and share their location information. However, it is often the case that either location information is unavailable (e.g. flying in GPS-denied environments) or communication is not possible (e.g. different communication channels or non-cooperative flight scenario). To ensure flight safety in this kind of situations drones need a way to autonomously detect other objects that are intruding the neighboring airspace. Visual-based collision avoidance is of particular interest as cameras generally consume less power and are more lightweight than active sensor alternatives such as radars and lasers. We have therefore developed a set of increasingly sophisticated algorithms to provide drones with a visual collision avoidance capability. First, we present a novel method for detecting flying objects such as drones and planes that occupy a small part of the camera field of view, possibly move in front of complex backgrounds, and are filmed by a moving camera. In order to be solved this problem requires combining motion and appearance information, as neither of the two alone is capable of providing reliable enough detections. We therefore propose a machine learning technique that operates on spatio- temporal cubes of image intensities where individual patches are aligned using an object-centric regression-based motion stabilization algorithm. Second, in order to reduce the need to collect a large training dataset and to manual annotate it, we introduce a way to generate realistic synthetic images. Given only a small set of real examples and a coarse 3D model of the object, synthetic data can be generated in arbitrary quantities and further used to supplement real examples for training a detector. The key ingredient of our method is that the synthetically generated images need to be as close as possible to the real ones not in terms of image quality, but according to the features, used by a machine learning algorithm. Third, though the aforementioned approach yields a substantial increase in performance when using Adaboost and DPM detectors, it does not generalize well to Convolutional Neural Networks, which have become the state-of-the-art. This happens because, as we add more and more synthetic data, the CNNs begin to overfit to the synthetic images at the expense of the real ones. We therefore propose a novel deep domain adaptation technique that allows efficiently combining real and synthetic images without overfitting to either of the two. While most of the adaptation techniques aim at learning features that are invariant to the possible difference of the images, coming from different sources (real and synthetic). Unlike those methods, we suggest modeling this difference with a special two-stream architecture. We evaluate our approach on three different datasets and show its effectiveness for various classification and regression tasks.