When it comes to performance, embedded systems share many problems with their higher-end counterparts. The growing gap between top processor frequency and memory access speed, the memory wall, is one such problem. Driven, in part, by low energy consumption and low cost requirements, embedded systems are often customized to a single application, or a very small set of applications. In addition, time-to-market requirements and the increasing complexity of embedded systems drives the need for fully or partially automated design tools and also to the extensive use of caches and cache hierarchies. The introduction of multi-processor-based embedded platforms has accelerated this trend; as the design space for embedded systems has grown, designers have become unclear as to whether automatic processor customization tools can cope with this increased complexity. The recent introduction of new techniques addressing the automatic customization, such as Architecturally Visible Storage (AVS) memory-enhanced Instruction Set Extension (ISE) identification algorithms, has also created new challenges. AVS memories are distinct from the cache hierarchy and rely on Direct Memory Access (DMA) transfers to communicate with main memory. In an embedded system containing hardware-managed caches, these extra AVS memories, in combination with their corresponding DMA transfers, cause coherence and consistence problems. Although the problems of coherence and consistence are well known in multi-processor systems, conventional solutions may be expensive in terms of area and power consumption, rendering them unacceptable for use in embedded systems. This thesis presents two low cost coherence mechanisms that solve these two problems. The first mechanism addresses embedded systems that already contain a hardware coherence protocol, like many high-end embedded multi-processor systems. Traditionally, the DMA transactions are transparent to the hardware coherence protocol. By ensuring visibility of these DMA transactions to the hardware coherence protocol, coherence can be guaranteed between AVS memories and data cache(s). As a result, minor changes to the DMA engine are required. Moreover, by forcing the processor pipeline to stall if a DMA transfer is active, memory consistence can be guaranteed. This mechanism provides significant speedup when compared to the execution of a non-ISE-enhanced system; however, due to the increase in bus traffic, this speedup comes at the expense of an increase in energy consumption. Coherent and Speculative DMA are both implementations of this mechanism. Single-processor systems do not contain hardware coherence protocols, and would therefore benefit from a lower-cost solution to the coherence and consistence problems than a hardware coherence protocol. By tightly coupling the AVS memories to the hardware cache, coherence and consistence for the complete system can be guaranteed. This coupling requires insignificant changes to the hardware cache's hit detection circuitry and state machine without influencing its critical path, thus it is inherently inexpensive. This mechanism provides significant speedups and reduces the energy consumed if compared to the execution on non-ISE-enhanced systems. Furthermore, the tight coupling enables direct communication between the AVS memories and the data cache, making this mechanism independent from the processor-to-memory distance. Virtual Ways and Way Stealing are both implementations of this mechanism. Besides enforcing coherence and consistence, the ability to integrate the architectural changes into an automated design flow is important. This thesis shows the influence of Coherent DMA, Speculative DMA, Virtual Ways, and Way Stealing on the ISE-identification algorithm. It shows the architectural requirements and the cost for enforcing coherence and consistence that need to be taken into account when applying these mechanisms in an automated flow without formulating new algorithms.