We will explore dynamic perception following the visually guided grasping of several objects by a human-like autonomous robot. This competency serves for object categorization. Physical interaction with the hand-held object gives the neural network of the robot the rich, coherent and multi-modal sensory input. Multi-layered self-organizing maps are designed and examined in static and dynamic conditions. The results of the tests in the former condition show its capability of robust categorization against noise. The network also shows better performance than a single-layered map does. In the latter condition we focus on shaking behavior by moving only the forearm of the robot. In some combinations of grasping style and shaking radius the network is capable of categorizing two objects robustly. The results show that the network capability to achieve the task largely depends on how to grasp and how to move the objects. These results together with a preliminary simulation are promising toward the self-organization of a high degree of autonomous dynamic object categorization.