The implementation of the AES encryption core by Moradi et al. at Eurocrypt 2011 is one of the smallest in terms of gate area. The circuit takes around 2400 gates and operates on an 8 bit datapath. However this is an encryption only core and unable to cater to block cipher modes like CBC and ELmD that require access to both the AES encryption and decryption modules. In this paper we look to investigate whether the basic circuit of Moradi et al. can be tweaked to provide dual functionality of encryption and decryption (ENC/DEC) while keeping the hardware overhead as low as possible. We report two constructions of the AES circuit. The first is an 8-bit serialized implementation that provides the functionality of both encryption and decryption and occupies around 2605 GE with a latency of 226 cycles. This is a substantial improvement over the next smallest AES ENC/DEC circuit (Grain of Sand) by Feldhofer et al. which takes around 3400 gates but has a latency of over 1000 cycles for both the encryption and decryption cycles. In the second part, we optimize the above architecture to provide the dual encryption/decryption functionality in only 2227 GE and latency of 246/326 cycles for the encryption and decryption operations respectively. We take advantage of clock gating techniques to achieve Shiftrow and Inverse Shiftrow operations in 3 cycles instead of 1. This helps us replace many of the scan flip-flops in the design with ordinary flip-flops.Furthermore we take advantage of the fact that the Inverse Mixcolumn matrix in AES is the cube of the Forward Mixcolumn matrix. Thus by executing the Forward Mixcolumn operation three times over the state, one can achieve the functionality of Inverse Mixcolumn. This saves some more gate area as one is no longer required to have a combined implementation of the Forward and Inverse Mixcolumn circuit.