I'm building a big-box-o-storage using a SuperMicro 3U Chassis and Mainboard running a Xeon E5-2640 v4 CPU, 64 GB RAM and a 16-slot SuperMicro/LSI SAS2 6Gbs backplane driven by a LSI 9207-8i HBA in IT mode. I have 16 3TB disks in the chassis.
I can create a 16-disk ZFS pool without problems. When I put a heavy write-load on it, everything goes wrong. Say I copy a 50gig file to the ZFS pool to stress-test the pool. I watch the pool's status with zpool status every 3 seconds. Within 20 seconds after starting the file-copy, all 16 disks start showing massive write errors and in less than a minute, most disks are faulted (the red LED's in the drive-caddies even light up).
At first, I though there might be something wrong with the ZFS implementation in Ubuntu (don't want to hurt anyones feelings here) so I destroyed the ZFS pool, wiped all 16 disks and created the same 16 disk RAID6 array with Linux soft-raid (mdadm). Started the same copy-test and the array fell to pieces again. Massive write errors on all the disks.
I had firmware 20.00.07.00 (the last incarnation of P20 that was released) on the card because the box ran TrueNAS 12 before and P20 is required for TrueNAS. P20 does have a bad reputation though (VMware certified P19 as the highest version for vSAN 6.x for a good reason as vSAN would drop disks with P20) so, using sas2flash.efi, I wiped the controller and put P19 on it, hoping that would fix the problem. It didn't. Heavy writes still make both ZFS and Linux MD arrays crash completely with all 16 disks suffering from large amounts of write-errors within 1 minute. Disks go offline, are marked as faulty etc. etc. After a reboot and doing a ZFS clear, all is good again and stays good when I don't put a big write-load on it.
I have a lot of experience with Linux MD and with ZFS and tried many many things. All with the same result. Firmware P20 or P19 makes no difference. I'm sticking with P19 though, just to be on the safe side.
The crazy thing is that this chassis ran xpenology (a hacked Synology DSM) and TrueNAS without any problems at all. Solid as a rock and great performance. I then wiped all the disks, installed Ubuntu Server 20.04 but can't get the array to run stable.
I'm at a total loss. I had another LSi 9207-8i card lying around so I swapped the cards but the problem remains. I also swapped the cable between backplane and HBA. Nothing worked.
My gut-feeling says it must be a firmware/driver issue but I read many succes-stories about this card and FW combo.
5 Reset to default