An integrated framework is proposed in which local perception and close manipulation skills are used in conjunction with a high-level behavioral interface based on a "smart object" paradigm as support for virtual agents to perform autonomous tasks. In our model, virtual "smart objects" encapsulate information about possible interactions with agents, including sub-tasks defined by scripts that the agent can perform. We then use Information provided by lowlevel sensing mechanisms (based on a simulated retina) to construct a set of local, perceptual features, with which to categorize at run-time possible target objects. Once an object is identified, the associated smart object representation can be retrieved and a predefined interaction might be selected if this is required by the current agent mission defined in a global plan script. A challenging problem solved here is the construction (abstraction) of a mechanism to link individual perceptions to actions, that can exhibit some human like behavior due to the used simulated retina as perception. As a practical result virtual agents are capable of acting with more autonomy, enhancing their performance