Various unsupervised learning algorithms including GMM, Birch, Mean- Shift, K-means and DBSCAN were used to cluster the image and depth sensor data. However, it was not possible to ﬁt the clusters on each objects quite separately in as much as data are too noisy and the components are too small and near each other. Therefore in this project, the capability of inferring small assembly objects is demonstrated by combining the machine learning unsupervised learning with computer vision algorithms. It is proved that vision techniques can assign a contour around each object. Next, the object type is inferred by ﬁtting a Gaussian model to each detected contours. Then, the noise eﬀect on the results decreased considerably by utilizing image registration. Finally after reducing the noise effect, it was possible to infer which pair of the objects have been merged together. Nonetheless, the hyper-parameters of the ﬁnal proposed algorithm require to be tuned for any new scenario; These hyper-parameters include threshold for removal of the contours detected for the holes inside the hollowed gear, the voxel size for down-sampling, the ratio of maximum distance between point clouds used for noise reduction, and the point numbers for KNN1 used for FPFH feature calculation.