Hello. I'm wondering if anyway has discovered a method to implement fast memcpy using load multiple/store multiple on an FPGA memory mapped as a GPMCdevice. I've implemented a fast memcpy on the ARMv7 using load/store multiple like this:
loop:
ldmia r1!, { r5-r8 } @load four registers
stmia r0!, { r5-r8 } @store four registers
loop_test:
cmp r4, #0
subne r4, r4, #1
bne loop
but in order to achieve this, the source/target access must be 32-bits.
Does anyone know of a way to implement multiple load/store on ARM using 16 bit accesses?