Motivation

On a production server several things should be monitored automatically, from inside and outside, with automatic alarms actually reaching someone feeling responsible.

Furthermore some maintenance tasks should be performed regularly to avoid bad-timed surprises.

Local monitoring recommendations

We suggest to monitor:

Free hard disc space
Free physical and virtual RAM (but: virtual RAM is a reserve for peak load, no real resource)
CPU load (not only the computation usage, also the overall load respecting I/O and context switches, in Linux think about monitoring /proc/load)

In case of presumed overload, try very hard to distinct between the several aspects of distributed computation and the whole list of possible bottlenecks down to network usage and disc I/O.

Remote monitoring recommendations

We suggest to monitor:

Basic network connectivity (ping with timing)
Application connectivity (HTTP(S)-Requests, checking reaction time and some minimal content bit)

Maintenance

Somebody should watch the watchers.

Every now and then check:

Is the monitoring still running? Eventually stop or interrupt something, at a point in time when you don't ruin someone's day.
Would alarms reach anyone? Eventually send test messages.
Is there activity at all? Idle servers may be idle because the clients can't connect.
Evtl. clean up files (see below)

Files to clean up

Depending on the storage usage strategy it might be a good idea to delete or move away files that are more or less obsolete.

These include:

Log files, i.e. osgi.log in osgi-runner's log\ subdirectory (Under Linux this file is placed in /var/log/osgi-runner).
Temp files left behind by some apps, i.e. in MDA's tempDir.