This paper presents a method for monitoring activities at a ticket vending machine in a video-surveillance context. Rather than relying on the output of a tracking module, which is prone to errors, the events are direclty recognized from image measurements. This especially does not require tracking. A statistical layered approach is proposed, where in the first layer, several sub-events are defined and detected using a discriminative approach. The second layer uses the result of the first and models the temporal relationships of the high-level event using a Hidden Markov Model (HMM). Results are assessed on 3h30 hours of real video footage coming from Turin metro station.