Why is my simulation different to my target in regard to "address endianness"? #251
Description
This is probably more of a "support" question. :S
Background
I am trying to learn how I can stream data from sdram to something else, probably running at a lower clock rate.
Right now I am loading some test data in memory with boot.json together with my bare metal slightly modified demo.bin. This modification includes a small function to dump the contents from memory such that I can verify it looks like I expect. The code snippet looks as follows:
static void dump_data()
{
unsigned int base_addr = 0x41000000;
unsigned int addr;
uint32_t *data = base_addr;
uint32_t i;
printf("\nAddress Raw Word I Q\n");
for (i = 0; i < 124; i=i+1) {
addr = base_addr + i*4;
printf("0x%08x ", *(uint32_t *)&addr);
printf("0x%08x ", *(data+i) );
printf("%*d %*d \n", 4, *(data+i) & 0x0000ffff, 4, (*(data+i) & 0xffff0000) >> 16);
}
printf("\nDone dumping.\n");
}
And I get data out as I expect it.
Address Raw Word I Q
0x41000000 0x020003ff 1023 512
0x41000004 0x02da03ce 974 730
0x41000008 0x038a0346 838 906
0x4100000c 0x03ef027f 639 1007
0x41000010 0x03f601a0 416 1014
0x41000014 0x039d00d4 212 925
0x41000018 0x02f60040 64 758
0x4100001c 0x02200002 2 544
0x41000020 0x01440025 37 324
0x41000024 0x008b00a2 162 139
0x41000028 0x001a0162 354 26
0x4100002c 0x00050240 576 5
...
And I can clearly see that data in correct order in my simulation, but when I run on hardware, which is a custom board, but using a "premade" module artix7 and the ram. Specifically the Enclustra Mars AX3.
Right now I have found that one can use the stream stuff to easily insert a CDC (ClockDomainCrossing) buffer. This appear to perfect in simulation. Note I get the same output from my modified demo.bin in sim and on target, suggesting that the data is put in memory the same way -- leaving only the difference be how litedram accesses the memory.
Problem
Originally I found this discrepancey between the sim and target.
Until it dawened upon me that that the sine wave just seems to have chunks of samples revered.
I get 4x32 bits from the DMA thing, but I get the lowest address last. Compare above dump with
That is directly out of the DMA as I have wired it with self.output_sig2.eq(dma.source.data)
.
This is my code https://github.com/nickoe/litex-boards/blob/f3090247db0a7d26291c39860eede3a3aa46ca64/litex_boards/targets/mars_ax3_custom.py#L133-L152
I tried to use my own fsm for the LiteDRAMDMAReader
first, and I now think I understand that pretty well, but I also tried the DMAReader
from litevideo which appears to give me the same results. Leaving me with a difference in the hw somehow.
This is my platform io definition:
https://github.com/nickoe/litex-boards/blob/f3090247db0a7d26291c39860eede3a3aa46ca64/litex_boards/platforms/mars_ax3.py#L45-L83
Where I added my own new SDRAM timing definitions. It passes the memory check in the bios.
https://github.com/nickoe/litex-boards/blob/f3090247db0a7d26291c39860eede3a3aa46ca64/litex_boards/targets/mars_ax3.py#L88-L108
I hope someone could enlighten my as to why this happens. My expectation was to get the data out in the order I am addressing them in -- and not in reverse order.