Preventive maintenance check list

  1. daily checks:
    1. use df to check disk space of all partitions.
    2. disk space of all partitions must be more than 40% free
    3. check which DNS servers are being used
    4. check the status of DNS service
    5. check DNS resolution try to resolve few domains which are not cached
      like oracle.com, starcomsoftware.com, sendmail.org
      linuxhomenetworking.com etc
    6. check if DNS synchronization is running properly
    7. run newaliases to check if there is any errors/warnings on console
    8. check following services are running using netstat and telnet utilities
      sendmail, sendmail-rx, amavis, mfilter, milter-greylisting, clamav,
      freshclam, fetchmail
    9. check for the status of services from Internet for: POP3, IMAP4, SMTP,
      HTTP, SSH
    10. send few test mails from outside domain and check if it is delivered
    11. send few test mails internally and check if it is delivered
    12. check the size of mail Queue of sendmail and sendmail-rx
    13. check if any of the public ip address is black listed
      (at spamhaus.org, spamcop.net)
      read the procedure to remove docs on respective website. and check if
      they send mail to specific email address etc. If the mail is sent to a
      specific email address will that address it must be valid on the server
      and would you be able to access the mailbox of that server
    14. UUCP: check UUCP status in uucp logs and by using uustat command
      check the size of UUCP Queue - check cron of uucp
    15. check if httpd daemon is running on Merce master server
    16. check if mysql service is running on Merce master server
    17. mysql service should not run on Merce slave servers(unless specified)
    18. check OpenVPN daemon is running and it is in startup sequence
    19. check squid service is running on all proxy server
    20. merce-webguard daemon is listening on port 1235
    21. check the load average of the server if it is more than 1 for
      at least 5-10 minutes, identify whether it is due to CPU utilization
      or due to large disk I/O
    22. use dmesg command and grep errors of disks or some segmentation
      faults of processes etc
    23. check the files in /opt/merce/update/outgoing
    24. ntp check -- check date and time of the server -- with time Zone
  2. weekly checks:
    1. freshclam should be configured with proxy user _vscan_ and its
      password, along with proxy server service FQDN and port in
      freshclam.conf
    2. check in mail.log to see if freshclam is being updated properly, or if
      new version of clamav is available
    3. mails of postmaster, abuse, webmaster, root, should be delivered to
      some account and mails from these accounts must be blocked by mfilter
    4. from mailgate try to relay few test mails to pacific.merceworld.com,
      or atlantic.merceworld.com for starcom.logs@merceworld.com user account
      to check outgoing mail flow
    5. check if SMART host is set. if yes then check if it is reachable and
      accepting mails from this server for any domain
    6. mail server must not be an open relay
    7. check if /etc/mail/valid-ids files updated in last 10 minutes
    8. check if mfilter is configured to archive mails
    9. check if samba service is running
    10. check if backup of file server data is done by using xbackup script to
      some separate physical server
    11. check if aclctl script is configured to run in cron -- check cron log
    12. virus filtering: check batch mode virus scanning on file server
    13. check if IMAP and POP services are running properly
    14. check if /var/lib/imap and /var/spool/imap areas are configured in
      xbackup to backup mails on separate physical server
    15. do ping to each server and check the connectivity of each server
    16. check few important POP mailboxes for some specific clients like for
      Modi
      • check if pop1gpi, iccworli etc mailboxes have been able to download
        mails in last fifteen minutes
    17. xbackup -- for mail and data
    18. dbbackup -- Merce database -- TODO should it also be backed up to
      separate physical server?
    19. incremental backup
    20. check the last logins of users. Possible the logins to all Merce servers
      must be only from our office public IP address(61.17.160.122 and
      59.161.6.25). If there is login from remote server other than these IP
      addresses then its possible intrusion.
    21. check log file /var/adm/log/local3.log for the errors for Merce Qfiles
      errors
    22. check /var/adm/log/local6.log for Merce webguard errors
    23. cleanup:
      • check if cleantmp is configured in cron, see the logs in local3.log
      • check clean-reported-events script is running in cron on Merce
        master, see logs in local3.log
      • check the number of files/dirs in /var/amavis/tmp/,
        /var/amavis/virusmails/ these must be cleaned up periodically by
        cleantmp scripts of Merce
      • check the qf, df, Tf, tf files in /var/spool/mqueue,
        /var/spool/mqueue-rx, area which have df but no qf is there, these
        should be counted and reported to Merce team if more than 1000
    24. check cron logs for errors - to see if there is some script which was
      not run due to invalid permissions/configuration
    25. check permissions of:
      • /etc/merce/Siteconfig.sh
      • /etc/merce/Siteconfig.pm
      • /etc/merce/Netconfig.sh
      • /etc/merce/Netconfig.pm
    26. Merce UI:
      • Reports:
        • check mail reports
        • check web access reports
      • Global directories: LDAP
        • check LDAP database is accessible and being updated at
          user editing
      • Disk usage: disk usage reports must be updated in last 24hrs.
      • Firewall dashboards: reports must be update in last 30mins.
      • Merce Insight graphs should be generated
    27. other check for mis-configurations:
      • there should be no entries of @reboot for adding routes
      • there should be no .old, .tmp, .temp, .bac, .bak, .backup etc. files
        in the configuration dir of /etc/mail, /etc, /etc/samba, /etc/httpd.
        All changes must be checked into RCS.
    28. printer: check the status of all printers to check if printers are
      properly configured
    29. check RBL sites: all RBL sites used in merce-rx.cf must be working
      - escalate if any one of them is not working
    30. check the list of services listening on the network ports using
      netstat only required services must be run and any unknown/doubtful
      service must be escalated
  3. monthly checks:
    1. check if there is any warning in mail.log for gethostbyaddr:
      set reverse pointer entry for all network interfaces
    2. check reveres lookup entry from different server on Internet(from
      pacific or atlantic server)
    3. reveres lookup and forward lookup entry must be same
    4. no mail should be delivered to mailboxes under /var/spool/mail/ all
      system mails and non-human mails should be delivered to system mailbox
      of IMAP created by Merce
    5. System mailbox and Merce mailbac should have all mails refiled under
      sub folder yyyy-mm/dd using 'refilemails' script of Merce
    6. in access DB file of sendmail relaying should be allowed only from
      internal IP addresses of the organization network
    7. servers which are not running as an SMTP server should be configured to
      run sendmail which will send root mails to the system account of
      organization. Check the mailq
    8. check the key expiry period of the keys
    9. IPtables must be configured to deny all requests by default
    10. Allow required servers to go via gateway to the Internet
      • like allow only proxy server to set requests to the TCP port 80 and
        443 to the Internet
      • allow DNS server to send port 53 request to outside
    11. The interfaces of the of the gateway must not be set in trusted
      network.
    12. password based login must be disabled from all gatekeepers, mailgates,
      and firewalls only key based login should be allowed. and set some
      complex not guessable passwords for these servers
    13. check the DNS service or domain name expiry in the DNS registrar and
      service providers database.
    14. check if the log rotation is working properly
    15. check if logs of sendmail and squid in Merce area is rotated and
      uploaded in Merce database under /opt/merce/var/log/
    16. VMware ESX/i server checks
      • login to VI3 client interface to see any strange error log on logs
        tab
      • check on all virtual machines VMware tools must be installed
      • check the NTP service on VMware host server
      • NTP must not be running on VMware guest OS. It must be configured to
        synchronize time with VMware host OS
      • check the licence status of VMware ESX server and also reachability
        of licence server.
    17. check if filesend is enabled in cron
    18. check validity of Merce licence
    19. check by running a script to see if as user merce, mercexfr login to all
      of Merce server is working. -- NO fingerprint change problem, no
      confirmation required to add fingerprint in known_hosts
    20. check all required services are added in startup and services not
      required are disabled from startup
      • cups, samba, portmap, squid, sendmail, sendmail-rx, amavisd, clamd,
        freshclam, mfitler, milter-greylist, apcupsd, nfs, ntpd
    21. all routes must be added in /etc/sysconfig/static-routes file to ensure
      they will be loaded at next startup
    22. every mounted partitions should be in /etc/fstab file
    23. UPS:
      • check battery level and average run time of UPS batteries
      • check battery water level
    24. check HTTP certificate expiry
    25. check the temperature of server room
    26. desktops should not be allowed to access any of the direct Internet
      access.
    27. server should only be allowed to access only few services for which they
      are serving.
    28. Windows checks:
      • check CPU usage, memory utilization, available disk space
      • check if Windows disk needs to be defragmented.
      • check validity and availability of Terminal Services licences
      • check validity of Windows server licence
      • check log in Windows Event Viewer to see any fatal error etc.
      • check if roaming profiles of the users are being saved on Merce File
        server
      • check remote desktop connectivity from desktops
    29. check if any update available for Thunderbird, Firefox, OpenOffice etc
  4. quarterly checks:
    1. file system checks every three months
  5. half yearly checks:
    1. update ssh keys on all clients server