Friday, July 31, 2009

SHOWSTOPPER

I have minor good news: both the network over USB and FTP server work fine.

And here are the bad news: there is a data corruption problem when writing to the miniSD in linux. Some users had reported filesystem corruption but seemed to be a minority and I suspected could be a compatibilidy problem in a certain type/brand of card.

This is a fscking showstopper and has become right now #1 in the priority list. The only workaround as of now is to NOT WRITE AT ALL to the miniSD. I know this means states cannot be saved (brightness, volume, game states, etc) since there is right now no safe place to store them.

I will dedicate all efforts to identify the problem and fix it as soon as possible, but this is likely to take some time since the error is not deterministic nor predictable.

If you want to check it out yourself, run in your A320 the following commands:

cd /boot
dd if=/dev/urandom bs=1M count=100 | tee test1.bin | md5sum
md5sum test1.bin
cp test1.bin test2.bin
md5sum test2.bin

The three calculated MD5 sums should be equal (the first one is calculated from the pseudorandom data as it comes out of /dev/urandom, the second one is calculated from the same data stored in file test1.bin, and the third is caculated from file test2.bin which is a copy of test1.bin.

The corruption seems to happen on writing. Reading is ok, which you can test yourself by calculating the MD5 sum of a large file many times. All will yield the same value, whereas whenever you copy the data to a new file the MD5 sum will change.

NOTE that this is not filesystem related, i.e. happens in the FAT and ext2/ext3 releases.

24 comments:

  1. Ah - this would explain why I'm unable to save any data since upgrading to the FAT32 kernel.

    The EXT3 filesystem caused no problems previously.

    At least I know I'm not going mad now!

    For reference, my memory card is a Bytestor 4GIG Mini SHDC Class 4 .

    ReplyDelete
  2. @yt

    The problem is with low level sector writing to the card, so it is independent of the file system used.

    That said, you probably didn't noticed anything in ext2/ext3 because due to its internal structure it's more relsilient.

    ReplyDelete
  3. I seem to have more problems with my 8GB micro SD in mini SD converter than my 256MB mini SD - don't know if this helps you in narrowing down the problem.

    ReplyDelete
  4. I'm able to save states on everything, including Snes9x and ScummVM. Seems like SD cards are the problem not the system. I have 2 gb card.

    ReplyDelete
  5. I haven't had a problem so far with my Transcend Micro-SD 8gb card with an adapter of the same kind, Class 6.

    ReplyDelete
  6. I have tried two different cards, with my micro sdhc kingston 8gb i get all kinds of problems. Files corrupted, deleted etc. My micro SD Sandisk 2gb is more reliable but after a while the card goes read only (no switch on the card), and i have to reformat it.

    ReplyDelete
  7. I get corrupted saves and configs sometimes too on my hama 2GB microSD, but most of the time everything works.

    ReplyDelete
  8. Don't be fooled if you have never experienced any problems. If savegames is all that you're writing to your miniSD you may have just been lucky.

    If you really want to know if you're affected by this problem, execute the commands described in the post. That will move around about 200MB of data. In my case that guarantees some corruption AT LEAST OF THE FILES INVOLVED.

    ReplyDelete
  9. Don't know if this helps but sometimes when I connect the Dingoo to my Linux system it says the card is read-only. Others times it allows me to write to the card without any problems.

    ReplyDelete
  10. Booboo,

    I am still waiting for my Dingoo to arrive, but have been following your blog closely. I did a Google search for FAT corruption on microSD cards on Linux systems, and found a post on the gumstix mailing list listing a similar problem related to a bug on Marvell Controllers.

    I know it is not the same hardware, but could give you a hint to solve the problem. The link is http://bit.ly/1aFrO1. Wish you luck!

    ReplyDelete
  11. @Andrew: If the Linux kernel encounters corruption in the file system, it makes the mount read-only to prevent further damage. There should be a log message about it, check with "dmesg".

    You can force the file system back to read-write using "mount / -o rw,remount" but doing so is dangerous of course. Until this bug is fixed, it's best to make sure you have no valuable files on the card unless you also have a copy elsewhere.

    ReplyDelete
  12. Has anyone done the commands yet with a perfect result? I don't know how to do them, so I can't check.

    ReplyDelete
  13. Don't worry booboo I'm sure you'll work everything out ;)

    ReplyDelete
  14. I have run the tests and I got all three md5sum results as being the same. It takes about 10 minutes or so to run:

    # dd if=/dev/urandom bs=1M count=100 | tee test1.bin | md5sum
    a3c7345f0d226169e1e0d3449c6379b1 -
    # md5sum test1.bin
    a3c7345f0d226169e1e0d3449c6379b1 test1.bin
    # md5sum test1.bin
    # cp test1.bin test2.bin
    # md5sum test2.bin
    a3c7345f0d226169e1e0d3449c6379b1 test2.bin

    ReplyDelete
  15. I think I just had my first corruption. I was playing around with ScummVM, and now I'm getting segmentation faults. It sucks, but I can still read the MiniSD, so I'll just back it all up and reformat.

    ReplyDelete
  16. After further investigation, my Monkey Island 2 folder engulfed the ScummVM folder... like, I opened up ScummVM and there were the files for Monkey Island 2. Also some data for Rise of the Triad was corrupted too.

    ReplyDelete
  17. Is it possible to do all the writing and saves states or whatever to the built-in memory in the Dingoo? And use the SD card solely for storage of ROMs and media?

    ReplyDelete
  18. Exactly the same problem I had on Neo Free Runner smartphone. It was solved by buying new fast and expensive SanDisk one.
    And the second solution was kernel patch, which slowed down access speed for card. (I think googling for OpenMoko neo free runner kernel patches will help this problem to be solved)

    ReplyDelete
  19. It can be anything. Slowing down mmc clock may help as well as turning off DMA in mmc code. I wonder if there is any pattern in the corruption (random bytes here and there or whole blocks) and what data is in the wrong places (random data, data that belongs to file just with bad offset,zeroes,...)

    ReplyDelete
  20. @fanoush

    Tried already disabling DMA (using PIO). The system crashes badly (not even kernel panic).

    Also tried a slower clock to no avail.

    The corrupted data looks like random data, though I must test further. Also the corruption always starts and ends at page boundaries and is a bit less than 256KB in size.

    No wonder it corrupts the file system Just imagine that 256K burst of corruption includes the update of the FAT tables.

    ReplyDelete
  21. Booboo, now that you say it happens at page boundaries with a fixed max size: I remember the rockbox port for sansa ams had a similar problem. They found out, that they can't write over a certain boundary (bank switch or whatever), but have to split up writes between those boundaries. Maybe currently the SD driver writes in "bursts" of 256KB each, and when a write happens right before a page boundary the stuff that is supposed to go after the boundary in the second page is put a the beginning of the first page.

    ReplyDelete
  22. Now that I read your comment again, I see that what I said makes no sense. The pages are much smaller than 256KB

    ReplyDelete
  23. Hello ,
    is this problem effects other files on SDcard. Mean files outside local directory.
    I have roms and videos in other direcotries in root of SD - is they become corrupetd soon or later.

    ReplyDelete
  24. @sverox: Yes, because even if the corruption only appears where the writes are supposed to be, then at some point he will have to write in the "file allocation table". And if that write corrupts, then a lot of files (if not all) will be damaged promptly. But it could also be (I have no idea) that corrupted writes happen at the wrong position, then every single write (even if it is only such a small operation as changing the "last access" time of a file) can corrupt the complete file system

    ReplyDelete