2007年07月06日 星期五
重新安装计算节点的注意事项
1. 配置tftp
路径在nkstar1的/tftpboot/pxelinux.cfg.
每个计算节点IP地址转换成16进制后对应一个文件,如node371的dhcp分配地址为
172.16.1.117 对应的文件是
AC 10 01 75
cp stage3.boot AC100175
2. 安装计算节点
给要安装的节点reset,这里是node371,就会看到节点从网络启动了安装程序,安装程序开始后,将tftp配置文件复原
cp AC100175.bak AC100175
这样,安装结束以后就不会再次安装了。
3. 安装后的配置
安装新内核:
到/home/admin/src/kernels/2.4.23aa2nks3下面执行脚本install,安装完成后重启计算机。
安装gm驱动,执行/home/admin/src/kernels/2.4.23aa2nks3的脚本postintall
配置autofs,
scp /home/admin/src/kernels/2.4.23aa2nks3/auto.* node371:/etc/
并重启计算节点autofs
/etc/init.d/autofs restart
安装ganglia,
到/home/admin/ganglia-3.0.2/ganglia-3.0.2下面
make install
cp gmond/gmond.init /etc/init.d/gmond
可以从其他节点拷贝一个gmond.conf到/etc/下面,并修改location = "12,19,0"字段
设置ld.so.conf,可以从其他节点拷贝,具体内容如下:
/usr/kerberos/lib
/usr/X11R6/lib
/usr/lib/sane
/usr/lib/qt-3.1/lib
/usr/lib/mysql
/usr/lib/qt2/lib
/opt/xcat/gm/lib
/opt/intel_cc_80/lib
/opt/intel_fc_80/lib
/opt/intel/mkl70/lib/32
/opt/xcat/i686/lib
并执行ldconfig
很多情况下报错误error while loading shared libraries: libgm.so.0就是因为没有将/opt/xcat/gm/lib加入的缘故。
配置LSF或pbs,安装源里面默认已经安装pbs。
若需要LSF,需要将pbs_mom停止
/etc/init.d/pbs_mom stop
并从nkstar1拷贝启动脚本
cp /etc/init.d/lsf node371:/etc/init.d/
然后启动即可
/etc/init.d/lsf start
2005年11月12日 星期六
An introduction to Portable Batch System
1. Introduction
2. User commands
3. Submitting MPI parallel jobs
4. System configuration
PORTABLE BATCH SYSTEM (PBS) MINI-HOWTO
Introduction
The Portable Batch System (PBS) is available as Open Source software from http://www.OpenPbs.org/. A commercial version can be bought from http://www.PBSPro.com/. The PBSPro also offer support for OpenPBS, and at a decent price for academic institutions.
There exists a very useful collection of user-contributed software/patches for Open PBS at http://www-unix.mcs.anl.gov/openpbs/.
This HowTo document outlines all the steps required to compile and install the Portable Batch System (PBS) version 2.1, 2.2 and 2.3. Most likely the steps will be the same for the PBSPro software.
The latest version of PBS is available from http://www.OpenPbs.org/. The PBS documentation available at the Web-site should be handy for in-depth discussion of the points covered in this HowTo.
We also discuss how to create a PBS script for parallel or serial jobs. The cleanup in an epilogue script may be required for parallel jobs.
Accounting Reports may be generated from PBS' accounting files. We provide a simple tool pbsacct that processes and formats the accounting into a useful report. Download the latest version of pbsacct from the ftp://ftp.fysik.dtu.dk/pub/PBS/ directory.
This page is at http://www.fysik.dtu.dk/CAMP/pbs.html.
Feedback to this document was kindly provided by:
Tim Mattson, timothy.g.mattson (at) intel.com.
HowTo steps
The following steps are what we use to install PBS from scratch on our systems. Please send corrections and additions to Ole.H.Nielsen (at) fysik.dtu.dk.
Ensure that tcl8.0 and tk8.0 are installed on the system. Look into the PBS docs to find out about these packages. The homepage is at http://www.scriptics.com/products/tcltk/. Get Linux RPMs from your favorite distribution, or build it yourself on other UNIXes.
If you installed the PBS binary RPMs on Linux, skip to step 4.
Configure PBS for your choice of spool-directory and the central server machine (named "zeise" in our examples):
./configure --set-server-home=/var/spool/PBS --set-default-server=zeise
On Compaq Tru64 UNIX make sure that you use the Compaq C-compiler in stead of the GNU gcc by doing "setenv CC cc". You should add these flags to the above configure command: --set-cflags="-g3 -O2". It is also important that the /var/spool/PBS does not include any soft-links, such as /var -> /usr/var, since this triggers a bug in the PBS code.
If you compiled PBS for a different architecture before, make sure to clean up before running configure:
gmake distclean
Run a GNU-compatible make in order to build PBS.
On AIX 4.1.5 edit src/tools/Makefile to add a library: LIBS = -lld
On Compaq Tru64 UNIX use the native Compaq C-compiler:
gmake CC=cc
The default CFLAGS are "-g -O2", but the Compaq compiler requires "-g3 -O2" for optimization. Set this with:
./configure (flags) --set-cflags="-g3 -O2"
After the make has completed, install the PBS files as the root superuser:
gmake install
Create the "nodes" file in the central server's (zeise) directory /var/spool/PBS/server_priv containing hostnames, see the PBS 2.2 Admin Guide p.8 (Sec. 2.2 "Installation Overview" point 8.). Substitute the spool-directory name /var/spool/PBS by your own choice (the Linux RPM uses /var/spool/pbs). Check the file /var/spool/PBS/pbs_environment and ensure that important environment variables (such as the TZ timezone variable) have been included by the installation process. Add any required variables in this file.
Initialize the PBS server daemon and scheduler:
/usr/local/sbin/pbs_server -t create
/usr/local/sbin/pbs_sched
The "-t create" should only be executed once, at the time of installation !!
The pbs_server and pbs_sched should be started at boot time: On Linux this is done automatically by /etc/rc.d/init.d/pbs. Otherwise use your UNIX's standard method (e.g. /etc/rc.local) to run the following commands at boot time:
/usr/local/sbin/pbs_server -a true
/usr/local/sbin/pbs_sched
The "-a true" sets the scheduling attribute to True, so that jobs may start running.
Create queues using the "qmgr" command, see the manual page for "pbs_server_attributes" and "pbs_queue_attributes": List the server configuration by the print server command. The output can be used as input to qmgr, so this is a way to make a backup of your server setup. You may stick the output of qmgr (for example, you may use the setup listed below) into a file (removing the first 2 lines which are actually not valid commands). Pipe this file into qmgr like this: cat file | qmgr and everything is configured in a couple of seconds !
Our current configuration is:
# qmgr
Max open servers: 4
Qmgr: print server
#
# Create queues and set their attributes.
#
#
# Create and define queue verylong
#
create queue verylong
set queue verylong queue_type = Execution
set queue verylong Priority = 40
set queue verylong max_running = 10
set queue verylong resources_max.cput = 72:00:00
set queue verylong resources_min.cput = 12:00:01
set queue verylong resources_default.cput = 72:00:00
set queue verylong enabled = True
set queue verylong started = True
#
# Create and define queue long
#
create queue long
set queue long queue_type = Execution
set queue long Priority = 60
set queue long max_running = 10
set queue long resources_max.cput = 12:00:00
set queue long resources_min.cput = 02:00:01
set queue long resources_default.cput = 12:00:00
set queue long enabled = True
set queue long started = True
#
# Create and define queue medium
#
create queue medium
set queue medium queue_type = Execution
set queue medium Priority = 80
set queue medium max_running = 10
set queue medium resources_max.cput = 02:00:00
set queue medium resources_min.cput = 00:20:01
set queue medium resources_default.cput = 02:00:00
set queue medium enabled = True
set queue medium started = True
#
# Create and define queue small
#
create queue small
set queue small queue_type = Execution
set queue small Priority = 100
set queue small max_running = 10
set queue small resources_max.cput = 00:20:00
set queue small resources_default.cput = 00:20:00
set queue small enabled = True
set queue small started = True
#
# Create and define queue default
#
create queue default
set queue default queue_type = Route
set queue default max_running = 10
set queue default route_destinations = small
set queue default route_destinations += medium
set queue default route_destinations += long
set queue default route_destinations += verylong
set queue default enabled = True
set queue default started = True
#
# Set server attributes.
#
set server scheduling = True
set server max_user_run = 6
set server acl_host_enable = True
set server acl_hosts = *.fysik.dtu.dk
set server acl_hosts = *.alpha.fysik.dtu.dk
set server default_queue = default
set server log_events = 63
set server mail_from = adm
set server query_other_jobs = True
set server resources_default.cput = 01:00:00
set server resources_default.neednodes = 1
set server resources_default.nodect = 1
set server resources_default.nodes = 1
set server scheduler_iteration = 60
set server default_node = 1#shared
Install the PBS software on the client nodes, repeating steps 1-3 above.
Configure the PBS nodes so that they know the server: Check that the file /var/spool/PBS/server_name contains the name of the PBS server (zeise in this example), and edit it if appropriate. Also make sure that this hostname resolves correctly (with or without the domain-name), otherwise the pbs_server may refuse connections from the qmgr command.
Create the file /var/spool/PBS/mom_priv/config on all PBS nodes (server and clients) with the contents:
# The central server must be listed:
$clienthost zeise
where the correct servername must replace "zeise". You may add other relevant lines as recommended in the manual, for example for restricting access and for logging:
$logevent 0x1ff
$restricted *.your.domain.name
(list the domain names that you want to give access).
For maintenance of the configuration file, we use rdist to duplicate /var/spool/PBS/mom_priv/config from the server to all PBS nodes.
Start the MOM mini-servers on both the server and the client nodes:
/usr/local/sbin/pbs_mom
or "/etc/rc.d/init.d/pbs start" on Linux. Make sure that MOM is started at boot time. See discussion under point 5.
On Compaq Tru64 UNIX 4.0E+F there may be a problem with starting pbs_mom too soon. Some network problem makes pbs_mom report errors in an infinite loop, which fills up the logfiles' filesystem within a short time ! Several people told me that they don't have this problem, so it's not understood at present.
The following section is only relevant if you have this problem on Tru64 UNIX.
On Tru64 UNIX start pbs_mom from the last entry in /etc/inittab:
# Portable Batch System batch execution mini-server
pbsmom::once:/etc/rc.pbs > /dev/console 2>&1
The file /etc/rc.pbs delays the startup of pbs_mom:
#!/bin/sh
#
# Portable Batch System (PBS) startup
#
# On Digital UNIX, pbs_mom fills up the mom_logs directory
# within minutes after reboot. Try to sleep at startup
# in order to avoid this.
PBSDIR=/usr/local/sbin
if [ -x ${PBSDIR}/pbs_mom ]; then
echo PBS startup.
# Sleep for a while
sleep 120
${PBSDIR}/pbs_mom # MOM
echo Done.
else
echo Could not execute PBS commands !
fi
Queues defined above do not work until you start them:
qstart default small medium long verylong
qenable default small medium long verylong
This needs to be done only once and for all, at the time when you install PBS.
Make sure that the PBS server has all nodes correctly defined. Use the pbsnodes -a command to list all nodes.
Add nodes using the qmgr command:
# qmgr
Max open servers: 4
Qmgr: create node node99 properties=ev67
where the node-name is node99 with the properties=ev67. Alternatively, you may simply list the nodes in the file /var/spool/PBS/server_priv/nodes:
server:ts ev67
node99 ev67
The :ts indicates a time-shared node; nodes without :ts are cluster nodes where batch jobs may execute. The second column lists the properties that you associate with the node. Restart the pbs_server after editing manually the nodes file.
After you first setup your system, to get the jobs to actually run you need to set the server scheduling attribute to true. This will normally be done for you at boot time (see point 5 in this file), but for this first time, you will need to do this by hand using the qmgr command:
# qmgr
Max open servers: 4
Qmgr: set server scheduling=true
--------------------------------------------------------------------------------
Batch job scripts
Your PBS batch system ought to be fully functional at this point so that you can submit batch jobs using the qsub command. For debugging purposes, PBS offers you an "interactive batch job" by using the command qsub -I.
As an example, you may use the following PBS batch script as a template for creating your own batch scripts. The present script runs an MPI parallel job on the available processors:
#!/bin/sh
### Job name
#PBS -N test
### Declare job non-rerunable
#PBS -r n
### Output files
#PBS -e test.err
#PBS -o test.log
### Mail to user
#PBS -m ae
### Queue name (small, medium, long, verylong)
#PBS -q long
### Number of nodes (node property ev67 wanted)
#PBS -l nodes=8:ev67
# This job's working directory
echo Working directory is $PBS_O_WORKDIR
cd $PBS_O_WORKDIR
echo Running on host `hostname`
echo Time is `date`
echo Directory is `pwd`
echo This jobs runs on the following processors:
echo `cat $PBS_NODEFILE`
# Define number of processors
NPROCS=`wc -l < $PBS_NODEFILE`
echo This job has allocated $NPROCS nodes
# Run the parallel MPI executable "a.out"
mpirun -v -machinefile $PBS_NODEFILE -np $NPROCS a.out
If you specify #PBS -l nodes=1 in the script, you will be running a non-parallel (or serial) batch job:
#!/bin/sh
### Job name
#PBS -N test
### Declare job non-rerunable
#PBS -r n
### Output files
#PBS -e test.err
#PBS -o test.log
### Mail to user
#PBS -m ae
### Queue name (small, medium, long, verylong)
#PBS -q long
### Number of nodes (node property ev6 wanted)
#PBS -l nodes=1:ev6
# This job's working directory
echo Working directory is $PBS_O_WORKDIR
cd $PBS_O_WORKDIR
echo Running on host `hostname`
echo Time is `date`
echo Directory is `pwd`
# Run your executable
a.out
--------------------------------------------------------------------------------
Clean-up after parallel jobs
If a parallel job dies prematurely for any reason, PBS will clean up user processes on the master-node only. We (and others) have found that often MPI slave-processes are lingering on all of the slave-nodes waiting for communication from the (dead) master-process.
At present the only generally applicable way to clean up user processes on the nodes allocated to a PBS job is to use the PBS epilogue capability (see the PBS documentation). The epilogue is executed on the job's master-node, only.
An epilogue script /var/spool/PBS/mom_priv/epilogue should be created on every node, containing for example this:
#!/bin/sh
echo '--------------------------------------'
echo Running PBS epilogue script
# Set key variables
USER=$2
NODEFILE=/var/spool/PBS/aux/$1
echo
echo Killing processes of user $USER on the batch nodes
for node in `cat $NODEFILE`
do
echo Doing node $node
su $USER -c "ssh -a -k -n -x $node skill -v -9 -u $USER"
done
echo Done.
PBS/SCore(Open PBS/SCore) Administrator's Guide
1. Introduction
PBS/SCore must be configired independently from the SCore configuration. This document describes hot to configure PBS/SCore. For further deials, please refer to the Administrator Guide attached to PBS(/SCore). In this document, it is assumed that SCore is already installed and configured.
If SCore system has been installed by using EIT, all the configuration procedure described in this document has been automatically done by EIT.
2. Host Configuration
There are 4 host types in PBS/SCore. An administrator must decide which host is which type.
PBS server host
One of the hosts must be a PBS server host on which PBS server (pbs_server) and PBS scheduler (pbs_sched) are running.
It is possible that msgbserv and/or scoreboard server programs of SCore can also run on the PBS server host.
Compute host
Parallel jobs run on compute hosts. On each compute host, pbs_mom server must be running to monitor host resource and invoke user parallel jobs. To run SCore jobs, compute hosts must be registered in scorehosts.db file, and scoreboard server is up and running on some host.
User hosts
To submit parallel jobs from user hosts, PBS commands, such as qsub and qdel, must be installed on the user hosts.
Host to invoke SCore jobs
An SCore administrator may specify a host from which SCore jobs are invoked via the scout command. This host should not be a compute host. Like a computer host, pbs_mom must be initiated on this host too. If this host is not allocated, one of compute hosts which are allocated to a submitted job is used.
In this document, it is assumed that above hosts are cofigured like the following.
PBS server host
server.pccluster.org
Compute hosts
comp0.pccluster.org
comp1.pccluster.org
comp2.pccluster.org
comp3.pccluster.org
User hosts
server.pccluster.org
Host to invoke SCore jobs
server.pccluster.org
3. File setup on all hosts
Every (possible) host must be configured as in the following.
/opt/score/etc/pbs_server_name
Default PBS server host name must be specified in the /opt/score/etc/pbs_server_name file.
server.pccluster.org
4. File setup on PBS server host
/var/scored/pbs/ directory
Execute the following command to setup the /var/scored/pbs/. This operation is not needed if SCore system has been installed by using the binary rpms or EIT.
# cd /opt/score/install
# ./setup -pbs_server
/var/scored/pbs/server_priv/nodes
All compute hosts must be listed in the /var/scored/pbs/server_priv/nodes file. If scorehosts.db file is already created and scoreboard server is up and running, then the scbd2pbs command creates the nodes file via the scoreboard database.
% /opt/score/sbin/scbd2pbs pcc > /var/scored/pbs/server_priv/nodes
In this example, all hosts belong to pcc host group are listed, and then the nodes file is created. The follwoing is an example of the created nodes file.
comp0.pccluster.org pcc score
comp1.pccluster.org pcc score
comp2.pccluster.org pcc score
comp3.pccluster.org pcc score
Each line consists of hostname and property names of the host. As a PBS property name, host group name, "pcc" in this example, is used. In scorehosts.db file, if a host record has an attribute named "pbs", then its associated value(a) is added to the line as property name(s).
For more detail on the nodes file, please refer to Section 3.3.2. Declaring Nodes of PBS Administrator Guide
5. File setup on compute hosts
/var/scored/pbs/ directory
Type the following command to setup the /var/scored/pbs/. This operation is not needed if SCore system has been installed by using the binary rpms or EIT.
# cd /opt/score/install
# ./setup -pbs_comp
/var/scored/pbs/mom_priv/config
In the /var/scored/pbs/mom_priv/config file, PBS server hostname must be included as in the following.
$logevent 0x1ff
$clienthost server.pccluster.org
For more detail on the config file, please refer to Section 9.2. pbs_mom of PBS Administrator Guide
6. File setup on user hosts
There is nothing to do with user hosts.
7. File setup on the host to submit SCore jobs
/var/scored/pbs/ directory
Type the following command to setup the /var/scored/pbs/. This operation is not needed if SCore system has been installed by using the binary rpms or EIT.
# cd /opt/score/install/
# ./setup -pbs_comp
/var/scored/pbs/mom_priv/config
In the /var/scored/pbs/mom_priv/config file, PBS server hostname and all compute hostnames must be listed.
$logevent 0x1ff
$clienthost server.pccluster.org
$clienthost comp0.pccluster.org
$clienthost comp1.pccluster.org
$clienthost comp2.pccluster.org
$clienthost comp3.pccluster.org
For more detail on the config file, please refer to Section 9.2. pbs_mom of PBS Administrator Guide
8. Initiating PBS Servers
pbs_mom
Initiate the pbs_mom program on all compute host. If there is a host to submit SCore jobs, then the pbs_mom program must be initiate on the host too.
Red Hat or Turbo Linux:
% su
# /etc/rc.d/init.d/pbs_mom start
SuSE Linux:
% su
# /etc/init.d/pbs_mom start
pbs_sched
Initiate PBS scheduler on PBS server host.
Red Hat or Turbo Linux:
% su
# /etc/rc.d/init.d/pbs_sched start
SuSE Linux:
% su
# /etc/init.d/pbs_sched start
pbs_server
Initiate the pbs_server program on the server host. If there is a host to submit SCore jobs, add -m option followed by the name of the host. If this is the first time to initiate PBS server, add -t option to setup databse.
For the first time,
% su
# /opt/score/sbin/pbs_server -t create -m server.pccluster.org
Otherwise,
Red Hat or Turbo Linux:
% su
# /etc/rc.d/init.d/pbs_server start
SuSE Linux:
% su
# /etc/init.d/pbs_server start -m
Confirmation
To confirm if the PBS server processes are running, invoke the following command.
% su
# /opt/score/bin/qmgr -c 'p n @active'
If they are running properly, then the following output is obtained.
#
# Create nodes and set their properties.
#
#
# Create and define comp0.pccluster.org
#
# create node comp0.pccluster.org # unsuppored operation
set node comp0.pccluster.org state = free
set node comp0.pccluster.org properties = any
set node comp0.pccluster.org properties += score
set node comp0.pccluster.org properties += score-pcc
set node comp0.pccluster.org ntype = cluster
#
# Create and define comp1.pccluster.org
#
# create node comp1.pccluster.org # unsuppored operation
set node comp1.pccluster.org state = free
set node comp1.pccluster.org properties = any
set node comp1.pccluster.org properties += score
set node comp1.pccluster.org properties += score-pcc
set node comp1.pccluster.org ntype = cluster
#
# Create and define comp2.pccluster.org
#
# create node comp2.pccluster.org # unsuppored operation
set node comp2.pccluster.org state = free
set node comp2.pccluster.org properties = any
set node comp2.pccluster.org properties += score
set node comp2.pccluster.org properties += score-pcc
set node comp2.pccluster.org ntype = cluster
#
# Create and define comp3.pccluster.org
#
# create node comp3.pccluster.org # unsuppored operation
set node comp3.pccluster.org state = free
set node comp3.pccluster.org properties = any
set node comp3.pccluster.org properties += score
set node comp3.pccluster.org properties += score-pcc
set node comp3.pccluster.org ntype = cluster
Creation of default queue
Create a default queue. In the following example, the created queue is named as "default."
# qmgr -c "create queue default queue_type=execution"
# qmgr -c "set queue default enabled=true"
# qmgr -c "set queue default started=true"
# qmgr -c "set server default_queue=default"
For more detail on the config file, please refer to Section 3.5.2. Queue Configuration of PBS Administrator Guide
Start of scheduling
Finally, activate PBS scheduling queue.
# qmgr -c "set server scheduling=true"
If PBS/SCore is installed from a binary package, then C based scheduler is implemented. For more detail on the C based scheduler, please refer to Section 9.9. C Based Scheduler of PBS Administrator Guide
9. The difference between PBS and PBS/SCore
score property
In PBS/SCore, the property name, score is treated as a special name to distinguish not-SCore jobs and SCore jobs.
Host to invoke SCore jobs
Any SCore jobs must be invoked via the scout and scrun programs. The scout program creates a parallel execution environment in a cluster, and the scrun program invokes user parallel progam in the parallel environment. The scout program can be invoked outside of a cluster.
In contrast, parallel program invokation scheme of PBS is different. PBS assumed user parallel program is firstly invoked on a compute host. And then the program itself spawns remote processes on the other compute hosts.
"-m" option is added to pbs_server to specify the host to invoke SCore jobs. As described above, pbs_mon daemon process must be running on this host. However, this host cannot be allocated to a designated compute host. By default, the host is allocated dynamically one of the compute hosts.
pbs_server -m <hostname>[: