Motivation

On a production server several things should be monitored automatically, from inside and outside, with automatic alarms actually reaching someone feeling responsible.

Furthermore some maintenance tasks should be performed regularly to avoid bad-timed surprises.

Local monitoring recommendations

We suggest to monitor:

In case of presumed overload, try very hard to distinct between the several aspects of distributed computation and the whole list of possible bottlenecks down to network usage and disc I/O.

Remote monitoring recommendations

We suggest to monitor:

Maintenance

Somebody should watch the watchers.

Every now and then check:

Files to clean up

Depending on the storage usage strategy it might be a good idea to delete or move away files that are more or less obsolete.

These include: