- Joined
- Apr 12, 2021
- Messages
- 3,776
- Reaction score
- 46,370
- Awards
- 21
Hi all. As you may have noticed, WF was down for over 24h.
First of all, please accept my sincere apology. This is probably the longest WF outage since I took over the site for WF 4.0. Luckily, WF is backed up often, well, and with multiple layers of redundancy, so not much was lost (see below), but still, this is a major failure and embarrassment on my part, so I deeply apologize.
What happened?
As you may know, WF runs on DMCA ignored hosting, to shield us from legal issues. Even though we do not host any pirated content on WF directly, we still talk about and link to pirated content sometimes, so I run WF off of a datacenter/hosting provider where I can be assured that this will never affect WF.
Unfortunately, this means the selection of hosting providers I have access to is limited, and aren't as hardened/reliable/foolproof as some mega commercial hosting providers. It's just a balance that has to be struck due to the nature of WF.
Of course, these types of situations could occur even with the most professional, commercial hosting providers too, but probably would have been solved faster. In this case, the hosting provider only has a relatively small team, and it took a while to investigate the issue and rectify it. That's partly why I take so much care with backups and redundancy.
Anyway, what happened is one of the managed database clusters (that unfortunately housed WF's database) crashed, and due to the crash, several tables (100s) got corrupted. Even though the hardware was rebooted, the database was irrecoverable.
The support team from our hosting provider has since moved us onto better hardware, more reliable infrastructure, and we have restored a backup from just before the outage.
I don't think this type of issue will be likely in the future.
Missing posts
Since we restored the last 100% stable, safe, secure backup, there are about 20 posts (and maybe a few PMs and profile posts) that are missing. These posts exist in a separate, redundant backup I took, however that backup is dangerously close to the issue that caused the outage, so it would be risky and foolish to roll-forward to that backup for just 20 or so posts.
@Diaa
@FireBorn
@Ziran
@FraterFraxinus
@pf3ix85fe Pandora
@Lion
@Mohammed Salah
@jin2494
@the_snake_charmer
@agloval
The above users may have some posts missing. I didn't look into the table that stores DMs out of respect for users privacy, but there may be a few users with missing DMs as well, so please have a look if you sent a DM in the hour or so before the outage.
All things considered, very little was lost, and I am actually quite thankful to the team of engineers from the hosting company who helped rectify this situation. Getting through this with less than 20 missing posts (some even from mid-way through the outage in brief moments of accessibility) is tantamount to surviving a car crash with a scratch and a bruise.
However, the 24+ hours of downtime was really brutal. Please forgive me for that.
First of all, please accept my sincere apology. This is probably the longest WF outage since I took over the site for WF 4.0. Luckily, WF is backed up often, well, and with multiple layers of redundancy, so not much was lost (see below), but still, this is a major failure and embarrassment on my part, so I deeply apologize.
What happened?
As you may know, WF runs on DMCA ignored hosting, to shield us from legal issues. Even though we do not host any pirated content on WF directly, we still talk about and link to pirated content sometimes, so I run WF off of a datacenter/hosting provider where I can be assured that this will never affect WF.
Unfortunately, this means the selection of hosting providers I have access to is limited, and aren't as hardened/reliable/foolproof as some mega commercial hosting providers. It's just a balance that has to be struck due to the nature of WF.
Of course, these types of situations could occur even with the most professional, commercial hosting providers too, but probably would have been solved faster. In this case, the hosting provider only has a relatively small team, and it took a while to investigate the issue and rectify it. That's partly why I take so much care with backups and redundancy.
Anyway, what happened is one of the managed database clusters (that unfortunately housed WF's database) crashed, and due to the crash, several tables (100s) got corrupted. Even though the hardware was rebooted, the database was irrecoverable.
The support team from our hosting provider has since moved us onto better hardware, more reliable infrastructure, and we have restored a backup from just before the outage.
I don't think this type of issue will be likely in the future.
Missing posts
Since we restored the last 100% stable, safe, secure backup, there are about 20 posts (and maybe a few PMs and profile posts) that are missing. These posts exist in a separate, redundant backup I took, however that backup is dangerously close to the issue that caused the outage, so it would be risky and foolish to roll-forward to that backup for just 20 or so posts.
@Diaa
@FireBorn
@Ziran
@FraterFraxinus
@pf3ix85fe Pandora
@Lion
@Mohammed Salah
@jin2494
@the_snake_charmer
@agloval
The above users may have some posts missing. I didn't look into the table that stores DMs out of respect for users privacy, but there may be a few users with missing DMs as well, so please have a look if you sent a DM in the hour or so before the outage.
All things considered, very little was lost, and I am actually quite thankful to the team of engineers from the hosting company who helped rectify this situation. Getting through this with less than 20 missing posts (some even from mid-way through the outage in brief moments of accessibility) is tantamount to surviving a car crash with a scratch and a bruise.
However, the 24+ hours of downtime was really brutal. Please forgive me for that.