The same arithmetic used the same data is running on the C6455 and DM642. C6455 takes 3 times more time than DM642.
C6455 CPU clk = 1GHz (confirmed), Memory being used: On chip IRAM L2, used DSP/BOIS, 1000 microsecond/Int
DM642 CPU clk = 600MHz, Memory being used: Off chip SDRAM, used DSP/BOIS, 1000 microsecond/Int
In theory, C6455 should be fast than DM642. I don't know why the result is not.
Ps: My project is based on the dsk6455 example 'dsp_app.prj'