The slowing down of Moore's law and the emergence of new technologies puts an increasing pressure on the field of EDA. There is a constant need to improve optimization algorithms. However, finding and implementing such algorithms is a difficult task, especially with the novel logic primitives and potentially unconventional requirements of emerging technologies. In this paper, we cast logic optimization as a deterministic Markov decision process (MDP). We then take advantage of recent advances in deep reinforcement learning to build a system that learns how to navigate this process. Our design has a number of desirable properties. It is autonomous because it learns automatically and does not require human intervention. It generalizes to large functions after training on small examples. Additionally, it intrinsically supports both single-and multioutput functions, without the need to handle special cases. Finally, it is generic because the same algorithm can be used to achieve different optimization objectives, e. g., size and depth.