Application server failure
Incident Report for Scrapbox
Postmortem

I apologize for stopping the server for about 50 minutes in total for 2 days.

  • Nov 25, 13:40 - 13:50 JST
  • Nov 25, 15:00 - 15:20 JST
  • Nov 26, 10:50 - 11:10 JST

The failure that occurred

  • Read requests from the application server to MongoDB increased rapidly.
  • External requests to the application server did not increase.

Cause

Probably there is a cause in Node.js 13.10 and newrelic npm 6.1.0. If these two were updated at the same time, the failure occurred three times in two days.
It is speculated that there is a relationship between newrelic getting mongodb metrics and increasing the number of read requests to mongodb.

Failures do not occur in the local development environment, but only in a production environment with many requests.

Fixed

Reverted Node.js and newrelic npm versions.

Since then, no failure has occurred.

Posted Dec 17, 2019 - 07:10 JST

Resolved
Increased read requests to DB due to library bugs.
Posted Nov 26, 2019 - 10:50 JST