Baobab was unavailable from Sunday 14th of April to Monday 15th of April 16h45.
This is the story. This Sunday morning, we were noticed that Baobab was experiencing a storage failure.
I tried to do a quick fix by restarting the faulty service, but unfortunately this was not possible because the root filesystem of Baobab master was mounted in read only due to errors.
Monday morning we saw the issue was related to a storage issue of the root storage of Baobab main server. Hopefully, we found a spare RAID controller and were able to replace it.
Baobab then ran for a couple of hours… and crashed again with same symptoms:(
We double checked the logs, status etc. and everything seems fine… but it wasn’t the case of course.
We then took out every individual HDD of the faulty storage and put them in another server to check if we were able to read some data from it. It was the case for all of them:( Then, we tried again but with more data, and bingo, this time one of the disk produces I/O errors! We exchanged it and since 15th of April ~16h45 Baobab was running again without issue.