Live backups

From LU
Jump to: navigation, search


Hammer.png This article needs work to bring up to the standards of this wiki, either by expansion or lots of editing. Articles requiring only proofreading or wikifying should be tagged with wikify.

There are two main servers hosting ESP Web sites: esp.mit.edu and diogenes.learningu.org. We use Slony-I database replication to keep a "live backup" or "hot spare" of each site on the other server. The primary server is the "master" and the secondary server is the "slave."

The backups were set up on May 15, 2010. They are "live" with the following restrictions:

  • Only one of the servers should be used at a time for each site because
  1. the replication system allows writing only to the "master" node
  2. each server maintains its own memory cache (memcached)

Backup Servers

On esp.mit.edu

  • splashchicago.learningu.org, splashchicago-backup.learningu.org (in /home/price/backup-sites/chicago)
  • stanfordesp.org, stanfordesp-backup.learningu.org (in /home/price/backup-sites/stanford)
  • dukesplash.learningu.org, dukesplash-backup.learningu.org (in /home/price/backup-sites/duke)
  • queens.learningu.org, queens-backup.learningu.org (in /home/price/backup-sites/queens)
  • nusplash.learningu.org, nusplash-backup.learningu.org (in /home/price/backup-sites/northwestern)

On diogenes.learningu.org

  • esp.mit.edu, mitesp-backup.learningu.org (in /lu/sites/mit)

Configuration

  • Each site has its own replication "cluster" which is managed from the host of the primary site (so there are 5 on diogenes and 1 on esp). The cluster configurations are contained in /lu/replication on diogenes and in /home/price/replication on esp.
  • Each cluster configuration directory contains scripts named init.sh, stop.sh and start.sh which control replication. There's also a convenience script reset.sh that runs the others in sequence.
  • I opened the PostgreSQL databases to external connections and added 'trust' lines in pg_hba.conf on each machine so that they can talk to each other. This is insecure, and I'll see if I can switch to 'md5' without breaking anything in the future.
  • Code is not synchronized, but I assume the Git repository is good enough. (The LU server maintains a copy of the Git repository that is updated every hour.)
  • Media files are shared using WebDAV. The Apache configuration on the primary server turns DAV on for the media directories, which are then mounted using "mount -t davfs" on the secondary server. These "shares" are public read-only, and use Apache authentication for write access.

How To

  • Switch over a site:
  1. too
  • Resync replication after substantial database downtime:
  1. tee