Tuesday, February 15, 2011

Backup script for Large Mysql database running for Wordpress with Hyperdb

0 comments
I had to take backup for a Mysql 5.1 Database that was running for Wordpress application with Hyperdb plugin. The application had almost 93000 blogs and as each blog consisted of 8 tables, there were 786000 tables in the database. All of the tables were created using MyISAM Storage Engine. The number of blogs were increasing day by day. 

The structure of the DB as follows :-

SHOW DATABASES
imported_data
rdb
rdb0
rdb1
....
rdb93 

Here,

imported_data - used to store data that were imported from legacy site.

rdb - used to store all common tables for wordpress and blog 1.

rdbN - used to store data for blogs. Blogs were partitioned in different databases using the formula blog_id/1000. That is, blog 0 was stored at db0 and blog 92134 was stored at db92.

At first, I tried with mysqldump program and it took 22 hours to complete the whole database backup. I was in a fix. Then I go with the following steps.


Modified at mysqldump.c

....
static char *get_actual_table_name(const char *old_table_name, MEM_ROOT *root)
{
....
DBUG_PRINT("Rumman", ("SELECT TABLE_NAME FROM information_schema.TABLES WHERE TABLE_NAME = \'%s\'",old_table_name));
my_snprintf(query, sizeof(query), "SELECT TABLE_NAME FROM information_schema.TABLES WHERE TABLE_NAME = \'%s\'",old_table_name
              );
 ...
}
...

Compiled mysql at /root/dbbackup/mysql_r

Backup Scripts 

I used t he following two scripts for digitalsports backup:

1. init_backup.sh : This script is used to initiate backup for all the blogs ( at present 93000). It creates backup slots and then execute the backup_script.sh according to the slots in parallel. Some important variables in this scripts are -

SCRIPT - denotes where the backup_script.sh exists
LOGDIR - directory where all the log files should create
PARALLEL - how many processes we want to start simultaneously for backup 
 2. backup_script.sh  - this script is used call the modified mysqldump. It takes two parameters $START_BLOG_ID and $END_BLOG_ID. Within a for loop, it takes backup for the blogs and create separate sql file for each blog with the name such as blog_.sql.

MYSQLDUMP_ds -  denotes where the modified mysqldump binary is located

PRODUCTION_TIME - denotes backup processes is stopped after this time

Both these scripts are added below.

Backup process works as follows;

  • Take the maximum blog id at first
  • Make slots for backup scripts using $PARALLEL and $STEP
  • Make $BACKUP_DIR and other sub-directories. For each day there is a sub-directory with rdb_<%m%d%y>
  • Call backup_script.sh and start  mysqldump for each blog
  • if stop.tmp is found in the $BACKUP_DIR, then backup processes will be stopped
  • if server time pass the $PRODUCTION_TIME, then backup processes will be stopped
  • To resume backup, we need to execute the init_backup.sh again and it will remove stop.tmp and start backup reading progress_NUMBER files from $BACKUP_DIR


-------------------------

 init_backup.sh  

-------------------------

#!/bin/bash

SCRIPT=/root/dbbackup/backup_scripts/backup_script.sh
LOGDIR=/BACKUP1/log
MIN_BLOG_ID=50
LAST_BLOG_ID=$MIN_BLOG_ID
PARALLEL=4

DBHOST=0.0.0.0
DB=rdb
DBUSER=ruser
DBUSERPWD=********
DBPORT=3306

DT=$(date +%m%d%y)
BACKUP_DIR=/BACKUP1/rdb_"$DT"
LOGDIR=$BACKUP_DIR/log_dir
if [ ! -d $BACKUP_DIR ]; then
   echo "Making backup directory at $BACKUP_DIR with sub-directories"
   mkdir -p $BACKUP_DIR
   mysql -h $DBHOST -P $DBPORT -u $DBUSER --password=$DBUSERPWD -e "show databases like 'rdb%'" -N -s > $BACKUP_DIR/dblist.lst
   for v_dbname in `cat $BACKUP_DIR/dblist.lst`
   do
      BACKUP_DIR=/BACKUP1/rdb_"$DT"/"$v_dbname"
      mkdir -p $BACKUP_DIR
   done  #for v_dbname in `cat $BACKUP_DIR/dblist.lst`
   BACKUP_DIR=/BACKUP1/rdb_"$DT"/r_imported_data
   mkdir -p $BACKUP_DIR
   BACKUP_DIR=/BACKUP1/rdb_"$DT"
   echo "done"
   echo "Making log directory"
   LOGDIR=$BACKUP_DIR/log_dir
   mkdir -p $LOGDIR/error
   echo "done"
else
  rm -f $BACKUP_DIR/stop.tmp
  echo "Resuming Backup"
fi


if [ ! -f "$BACKUP_DIR"/MAX_BLOG ];
then
  echo "MAX_BLOG file not found"
    MAX_BLOG_ID=$(mysql -h $DBHOST -P $DBPORT -u $DBUSER --password=$DBUSERPWD -e "select max(blog_id) from rdb.wp_blogs" -N -s )
    echo  "$MAX_BLOG_ID" > $BACKUP_DIR/MAX_BLOG
else
  echo "MAX_BLOG file found"
  MAX_BLOG_ID=$(cat "$BACKUP_DIR"/MAX_BLOG)
fi    # if [ -f $BACKUP_DIR/MAX_BLOG ];
echo "MAX BLOG = " $MAX_BLOG_ID

let STEP=$MAX_BLOG_ID/$PARALLEL

echo "$SCRIPT -1 0 $BACKUP_DIR  >> $LOGDIR/script_rdb_and_r_imported_data.log.$(date +%m%d%y) &"
$SCRIPT -1 0 $BACKUP_DIR  >> $LOGDIR/script_rdb_and_r_imported_data.log.$(date +%m%d%y) &

while [ $MIN_BLOG_ID -le $MAX_BLOG_ID ]
do                                                                


  let LAST_BLOG_ID="$MIN_BLOG_ID+$STEP"

  if [ "$LAST_BLOG_ID" -gt "$MAX_BLOG_ID" ];
  then
    let LAST_BLOG_ID="$MAX_BLOG_ID"
  fi

  if [ -f $BACKUP_DIR/progress_"$LAST_BLOG_ID" ];
  then
    MIN_BLOG_ID=$(cat $BACKUP_DIR/progress_"$LAST_BLOG_ID")
    echo $MIN_BLOG_ID
  fi #if [ -f $BACKUP_DIR/progress_$MIN_BLOG_ID_$LAST_BLOG_ID ];

  echo "$SCRIPT $MIN_BLOG_ID $LAST_BLOG_ID $BACKUP_DIR >> $LOGDIR/script_"$LAST_BLOG_ID".log.$(date +%m%d%y)  &"
  $SCRIPT $MIN_BLOG_ID $LAST_BLOG_ID $BACKUP_DIR >> $LOGDIR/script_"$LAST_BLOG_ID".log.$(date +%m%d%y)  &
  let MIN_BLOG_ID="$LAST_BLOG_ID+1"
  
done #end while [ $LAST_BLOG_ID -lt $MAX_BLOG_ID]  

  
--------------------------

backup_script.sh

--------------------------


#!/bin/bash

DBHOST=0.0.0.0
DB=rdb_old
DBUSER=ruser
DBUSERPWD=******
DBPORT=3306
PRODUCTION_TIME=090000

MYSQLDUMP_r=/root/dbbackup/mysql_r/bin/mysqldump

if [ -z $1 ]; then
   echo "!!!Start Blog ID for backup cannot be blank!!!"
   exit
elif [ $1 -ne "-1" ] && [ -z $2 ]; then
   echo "!!!End Blog ID for backup cannot be blank!!!"
   exit
fi

START_BLOG_ID=$1
END_BLOG_ID=$2
PARENT_BACKUP_DIR=$3
LOGDIR=$PARENT_BACKUP_DIR/log_dir
OUTFILE=/tmp/blogs_"$START_BLOG_ID"_"$END_BLOG_ID".out
DT=$(date +%m%d%y)
if [ $START_BLOG_ID -eq "-1" ];
then
  BACKUP_DIR=$PARENT_BACKUP_DIR/rdb
  if [ ! -f $BACKUP_DIR/rdb.sql.$(date +%m%d%y) ];
  then
    echo "rdb backup started on the $(date +%m%d%y_%H%M) ..."
    DB=rdb
    $MYSQLDUMP_r $DB -h $DBHOST -u $DBUSER -P $DBPORT --password=$DBUSERPWD  --log_error=$LOGDIR/error/rdb.sql.err        > $BACKUP_DIR/rdb.sql.$(date +%m%d%y)
      mysqldump_success=$?
      if [ $mysqldump_success -eq '0' ];
      then
         rm -f $LOGDIR/error/rdb.sql.err
      else
         cat $LOGDIR/error/rdb.sql.err
      fi   #if [ $mysqldump_success -eq '0' ];

      echo "rDB backup completed on the $(date +%m%d%y_%H%M)"
      touch $LOGDIR/backup_compled_rdb_$(date +%m%d%y_%H%M).tmp
      if [ -f /BACKUP1/rdb_"$DT"/stop.tmp ];
        then
              echo "Stopped"
              touch $LOGDIR/stopped_after_rdb
              exit 1
        fi #if [ -f /BACKUP1/rdb_"$DT"/stop.tmp ];
  fi #if [ ! -f $BACKUP_DIR/rdb.sql.$(date +%m%d%y) ];

  BACKUP_DIR=/BACKUP1/rdb_"$DT"/r_imported_data
  if [ ! -f $BACKUP_DIR/r_imported_data.sql.$(date +%m%d%y)  ];
  then
    echo "r_imported_databackup started on the $(date +%m%d%y_%H%M) ..."
      $MYSQLDUMP_r r_imported_data -h $DBHOST -u $DBUSER -P $DBPORT --password=$DBUSERPWD --log_error=$LOGDIR/error/r_imported_data.sql.err        > $BACKUP_DIR/r_imported_data.sql.$(date +%m%d%y)
      mysqldump_success=$?
      if [ $mysqldump_success -eq '0' ];
      then
         rm -f $LOGDIR/error/r_imported_data.sql.err
      else
       cat $LOGDIR/error/r_imported_data.sql.err
      fi
      echo "r_imported_data backup completed on the $(date +%m%d%y_%H%M)"
      touch $LOGDIR/backup_compled_r_imported_data_$(date +%m%d%y_%H%M).tmp
      exit
  fi    #if [ ! -f $BACKUP_DIR/r_imported_data.sql.$(date +%m%d%y)  ];

fi  
#Per Blog backup
rm -f $OUTFILE
mysql $DB -h $DBHOST -u $DBUSER -P $DBPORT --password=$DBUSERPWD -e "select blog_id  from wp_blogs where blog_id between $START_BLOG_ID and $END_BLOG_ID " -s -N > $OUTFILE
LAST_BLOG_HYPERDB=$(mysql -h $DBHOST -P $DBPORT -u $DBUSER --password=$DBUSERPWD -e "select config_value from r_imported_data.r_config where config_key='last_blog_clustered'" -N -s )
echo $LAST_BLOG_HYPERDB

if [ -s $OUTFILE ]; then
    echo "... Backup started for blog ID $START_BLOG_ID to $END_BLOG_ID on the $(date +%m%d%y_%H%M) ..."
        for i in `cat $OUTFILE`
        do
          
           #If time passed after $PRODUCTION_TIME then stop
           if [ $(date +%H%M%S) -gt $PRODUCTION_TIME ];
           then
              echo "Stoppped as time passed $PRODUCTION_TIME"
              touch $PARENT_BACKUP_DIR/stop.tmp
           fi #if [ $(date +%H%M%S) -gt $PRODUCTION_TIME  ];
          
            #Check for stop
                if [ -f $PARENT_BACKUP_DIR/stop.tmp ];
                then
                  echo "Stopped"
                  touch $LOGDIR/stopped_blog_"$i"
                  exit 1
                fi
                #Find database name
                if [ $i -le $LAST_BLOG_HYPERDB ];
                then
                    let DBNUM="$i"/1000
                    DB=rdb"$DBNUM"
                else
                  DB=rdb_old
                fi  
      
                BACKUP_DIR=$PARENT_BACKUP_DIR/"$DB"
                echo $BACKUP_DIR
                echo $DB
                echo "Backup Blog  $i ..."
                $MYSQLDUMP_r $DB -h $DBHOST -u $DBUSER -P $DBPORT --password=$DBUSERPWD  \
                wp_"$i"_commentmeta            \
                wp_"$i"_comments               \
                wp_"$i"_links                  \
                wp_"$i"_options                \
                wp_"$i"_postmeta               \
                wp_"$i"_posts                  \
                wp_"$i"_term_relationships     \
                wp_"$i"_term_taxonomy          \
                wp_"$i"_terms  --skip-lock-tables --log-error=$LOGDIR/error/blog_"$i".sql.err > $BACKUP_DIR/blog_"$i".sql.$(date +%m%d%y)
                mysqldump_success=$?
                if [ $mysqldump_success -eq '0' ];
            then
               rm -f $LOGDIR/error/blog_"$i".sql.err
               echo "done"
            else
               cat $LOGDIR/error/blog_"$i".sql.err
               echo "!!!Failed!!!"
            fi
            echo "$i" > $PARENT_BACKUP_DIR/progress_"$END_BLOG_ID"
        done
    echo "... Backup end for blog ID  $START_BLOG_ID to $END_BLOG_ID on the $(date +%m%d%y_%H%M) ..."
    touch $LOGDIR/backup_compled_"$START_BLOG_ID"_"$END_BLOG_ID"_$(date +%m%d%y_%H%M).tmp
fi   

Leave a Reply

Labels