Get started at National Supercomputer Centre (NSC)

A pilot MX 3D structure determination project - now superseded by PReSTO!!

Dear fellow protein crystallographers: the pilot HPC MX installation will no longer be updated, please use the PReSTO installation instead. The pilot documentation will remain for some time on this page.

History

In 2013 we started a pilot project started aiming at installing protein crystallography software in a HPC environment to investigate performance of such setup with respect to remote graphics, speed of calculation and potential to run supercomputer adapted software in a suitable environment. For 2017 we share 10 000 core hours/month and each user have 20 GB of diskspace in /home/x_user and the entire project have 2000 GB under /proj/xray/users/x_user. Below you find how to join the pilot and brief instructions on how to run MX software in a HPC environment. The pilot was presented in a poster at the joint SBnet/SFBBM meeting in Tällberg 2016. This pilot project is a standard HPC installation, however an improved installation called PReSTO is now available at NSC Triolith as well.  The PReSTO installation has been done using Easybuild with the purpose of being shared across Swedish HPC centers.

How to join PReSTO for MX

Swedish structural biologists are welcome to join and use the pilot and PReSTO installation however the pilot allocation is limited in doing MX calculations only since 10 000 core hours / month is not enough to support molecular dynamics (MD) for all interested parties.  If you want to perform MD simulations and other compute demanding calculations together with MX, please apply for your own SNAC medium project, i.e. 20 000 core hours per month, https://supr.snic.se/round/2017medium/ and you are free to spend your own compute time across MX, MD as you wish.

  1. Register yourself in SNIC User and Project Repository (SUPR)
  2. Login to SUPR and "Request membership in project" for pilot called: SNIC 2017/1-199
  3. Accept the user agreement and Request the login account
  4. Download thinlinc-client for any Windows, Linux or Mac computer from Cendio and use thinlinc to access NSC Triolith

Now once you are registered and can login to NSC Triolith using Thinlinc, the PReSTO installation is recommended, however this page decribing MX pilot installation will remain for some time.  Using Easybuild it is possible to send environment variables to the compute nodes of a HPC computer and for instance pipedream from GlobalPhasing can be run at PReSTO but not at the pilot.

MX module commands

Modules can be loaded by .bashrc or by .modules or by "module load xxx" in the terminal window. At NSC usage of .modules for frequently used modules are preferred to .bashrc and examples of a .modules file and a .bashrc file is indicated below where modules relevant to MX are commented out using (#) in .bashrc. When opening a terminal window many modules are global and available in the directory specified by $MODULEPATH, while project specific modules are available after "module load proj/xray" that shifts the value of $MODULEPATH. Project specific modules are selected for frequently changing software.

For frequently used MX software use:
A) .modules for global modules
###########  .modules example start  ###########
$ cat .modules
fasta/v36
usf/March2011

$
Please note the empty line at the end of the .modules file created by "Carriage return" at the end of the file.
###########  .modules example end   ###########

B) .bash_profile for project specific modules
###########  .bash_profile example start  ###########
$ cat .bash_profile
# .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
    . ~/.bashrc
fi
# User specific environment and startup programs
PATH=$PATH:$HOME/bin
module load proj/xray
module load ccp4/7.0-module
module load phenix/1.11.1-2575
module load xds/2016-12-07
module load xds-viewer/0.6
module load xdsapp/2.1
########### .bash_profile example end  ###########

Global MX modules at NSC updated 2017-01-13
module add fasta/v36
module load usf/March2011

Phasing modules
module load hkl2map/0.4.c-beta
module load snb/2.3

Work with graphics
module add pymol/1.6.0.0
module load vasco/1.0.2

Storage module
module load irods/3.3

Pilot project specific modules
module load proj/xray
 - influence your $MODULEPATH to find the project specific modules
module load phenix/1.11.1-2575 - includes rosetta 3.7 (2016.32.58837 bundle)
module load ccp4/7.0-module
module load arcimboldo_lite/nov2016
module load sharp/2.8.7
module load shelx/2016-02-01
module load hkl2map/0.4 
module load buster/20160324
module load autoPROC/20161207
module load xdsme/20161024
module load cns/1.3
module load morda/20160531
module load xds/2016-12-07
module load H5ToXds/1.1.0
module load albula/3.2.0-2
module load xdsgui/2016-12-23
module load xdsstat/2010
module load adxv/1.9.12
module load chimera/1.11 (start with "vglrun chimera")
module load O/14.1 (start with "ono")
 

 

Basic HPC commands

Sharing of HPC resources such as computing time and diskspace demand some special commands listed exemplified below.

HPC commands Consequence
interactive -N1 -t 1:00:00 -A snic2017-1-XXX Reserve 1h interactive compute node on project XXX
exit Exit interactive node terminals saves quota
interactive -N1 -t 0:59:00 --reservation=devel -A snic2017-1-XXX Reserve 59 min development node on project XXX
set ccp4i TEMPORARY to $SNIC_TMP MOLREP might use all tmp of compute nodes
snicquota Diskspace available at /home and /proj/xray
squeue -u x_user Check my JOBIDs running
scancel JOBID Cancel a running job using JOBID
jobsh -j JOBID n148 Access compute node n148 using JOBID
#SBATCH -A snic2017-1-XXX Sbatch script compute time on project XXX

 

BUSTER and SHARP

Buster at compute nodes

At NSC Triolith buster by default use 4 processors i.e. use (#SBATCH -n 4) in buster.script or if using "-nthreads 8" use (#SBATCH -n 8) in the buster.script file.  "sbatch buster.script" is efficient and save compute time, compared to requesting an interactive node (16 processors) and then run a standard buster job on the command line that use only 4 out of 16 processors, i.e. wasting compute time on 12 processors.

From the login node simply perform:
sbatch buster.script

where buster.script is:

#!/bin/bash
#SBATCH -t 1:00:00
#SBATCH -n 8
#SBATCH --mail-type=ALL
module load proj/xray
module load buster/20160324
refine -p model.pdb \
-m data.mtz \
-l chiral.dat \
-l grade-AME.cif \
-Gelly tweak4.gelly \
-nthreads 8 \
-autoncs \
-TLS \
AutomaticFormfactorCorrection=yes \
StopOnGellySanityCheckError=no \
-d run1 > run1.log

This will submit your job to the NCS compute nodes and to time or resources are wasted.

For help with the refine command please visit Buster wiki or perform:

module load proj/xray

module load buster/20160324

refine -hhh

to read help files for the refine command

SHARP at compute nodes

The command-line variant of SHARP is suitable for usage with the sbatch command at NSC Triolith.  Please find an example on how to submit a three wavelength Zinc phasing MAD job to NCS Triolith compute nodes below:

sbatch sharp.script

where sharp.script is:

#!/bin/bash
#SBATCH -t 8:00:00
#SBATCH -N 1
#SBATCH --mail-type=ALL
module load proj/xray
module load ccp4/7.0-module
module load shelx/2016-02-01
module load sharp/2.8.7
run_autoSHARP.sh \
-seq sequence.pir -ha "Zn" \
-nsit 20 \
-wvl 1.28334 peak -mtz pk.mtz \
-wvl 1.27838 infl -mtz ip.mtz \
-wvl 0.91841 hrem -mtz rm.mtz \
-id MADjob1 | tee MADjob1.lis

The purist should note this example lack measured Zinc scattering factors (f' and f") values at the three wavelengths, however additional examples for SAD/MAD/MIRAS etc are found at the GlobalPhasing homepage.  The script above will allocate an entire compute node for 8 hours and since shelxd and sharp are parallel softwares and arp/warp and buccaneer are partially parallel software, the fast execution time is suitable for synchrotron usage

 

Parallel PHASER MR

When trying many different search models and ensembles for molecular replacement, running the PHASER software in parallel mode is a time saver. PHASER is available both in CCP4 and in PHENIX and here is two ways to generate a sbatch script for parallel execution at the compute nodes

Example 1, sbatch script for CCP4 Phaser

1. Edit phaser.script below: an ensemble of two aligned search models in MR_AUTO mode and 16 processors

#!/bin/bash
#SBATCH -t 00:30:00
#SBATCH -N 1
#SBATCH --mail-type=ALL
module load proj/xray
module load ccp4/7.0-module
phaser << eof
MODE MR_AUTO
HKLIN data.mtz
JOBS 16
LABIN F = F_XDSdataset SIGF = SIGF_XDSdataset
ENSEMBLE ens1_1 PDBFILE search_model1.pdb IDENTITY 0.99 &
                                   PDBFILE search_model2.pdb IDENTITY 0.99
COMPOSITION PROTEIN MW 48334 NUM 2
SEARCH ENSEMBLE ens1_1
ROOT ./ens1_1
eof

2. Execute phaser.script as:

sbatch phaser.script

 

Example 2, PHENIX Phaser MRage

1. Start and edit "MRage-automated pipeline" wizard and since "submit queue job" is not available please save a parameter file called mr_pipeline_1.eff

2. Edit MRage.script as:

#!/bin/bash
#SBATCH -t 01:00:00
#SBATCH -N 1
#SBATCH --mail-type=ALL
module load proj/xray
module load phenix/1.11.1-2575
phenix.mrage mr_pipeline_1.eff

3. Then finally execute

sbatch MRage.script

that will submit your job to a compute node. In this particular example an entire node is allocated for 1 hour. More info regarding sbatch scripting at NSC Triolith:

 

Example 3, Phaser using ccp4i2 GUI at interactive node

  1. Get an X hours of compute time at an interactive node by "interactive -N1 -t X:00:00"
  2. module load proj/xray
  3. module load ccp4/7.0-module
  4. ccp4i2
  5. Read in appropriate data.mtz and sequence.seq and model.pdb file and start one of the Phaser modules of ccp4i2
  6. In the Phaser Input module enter "Keywords" and change "Number of parallel threads" from 1 to 16
  7. Run Phaser

Example 4. Submit Parallel phaser job to compute node from Phenix GUI

  1. Start Phenix-GUI
  2. module load proj/xray
  3. module load phenix/1.11.1-2575
  4. Edit phenix preferences - 1 hour with 16 processors is enough for most MR jobs
  5. Edit "Phaser MR (simple one component interface)" and use "Run" - "Submit queue job"

arpWarp

arpWarp is part of many autobuild pipelines such as autoSHARP and some steps are made parallel (1) and suitable for command-line sbatch scripting

  1. Automated macromolecular model building for X-ray crystallography using ARP/wARP version 7.
    Langer G, Cohen S, Lamzin V, Perrakis A
    Nat Protoc 2008 ;3(7):1171-9

Example: arpWarp.script for rebuild of MR solution

#!/bin/bash
module load proj/xray
module load ccp4/7.0-module
auto_tracing.sh \
datafile ./data.mtz \
fp F_XDSdataset \
sigfp SIGF_XDSdataset \
modelin ./MR_solution.pdb \
seqin ./model_sequence.pir

Then submit arpWarp.script to compute nodes by:

sbatch arpWarp.script

More parametrar for auto_tracing.sh can be added to arpWarp.script according to the author instruction here

Phenix and Rosetta

Phenix and Rosetta are two large software packages in rapid development. Currently we have phenix version 1.11.1-2575 (26 Oct 2016) and rosetta 3.7 (2016.32.58837) available to NSC users.  The PHENIX software package comes with a SLURM scheduler in the graphical user interface (GUI) that enable submission of jobs to the compute nodes from the phenix GUI. To enable SLURM scheduling from the Phenix GUI

1. Edit processes in phenix preferences

2. Press the Run symbol (gear wheel) and select "submit queue job"

In the Autobuild wizard under "Other options" one can select "Use queuing system to distribute tasks" is available, however this is NOT required since 15 processors is enough for Autobuild. Many phenix GUI wizards do not enable “submit queue job” for instance “MRage-automated pipeline” and “Rosetta refinement (alpha)”.  In these instances you can save a parameter file and execute a simple sbatch to submit jobs to the Triolith queue.

 

MR rosetta rebuild of MR solution

Molecular replacement is best performed using parallel phaser software called MRage at NSC, however MR rosetta rebuild of MR solutions can save a lot of time and effort.

Below we use the sbatch option although "submit queue job" is another possibility for starting MR rosetta in the phenix GUI. 

Example: MR rosetta rebuild of MR soluition using sbatch command
1. open a terminal
2. Start phenix GUI to edit parameter file
module load proj/xray
module load phenix/1.11.1-2575
phenix
3. save MR rosetta parameter file from phenix GUI (mr_rosetta_1.eff)
4. "diff mr_rosetta_1.eff mr_rosetta_2.eff" is useful when comparing parameter files
5. Edit script for MR rosetta called mr_rosetta.script
#!/bin/bash
#SBATCH -t 96:00:00
#SBATCH -N 1
#SBATCH --mail-type=ALL
module load proj/xray
module load phenix/1.11.1-2575
phenix.mr_rosetta mr_rosetta_1.eff
6. sbatch mr_rosetta.script
and the MR rosetta job is now run at the compute nodes

Skip MR step (use MRage instead) by thicking "Model is already placed" and save parameter file for sbatch command. Fragment files aat000_03_05.200_v1_3/aat000_09_05.200_v1_3 are generated by submitting your protein sequence to the robetta server as described on the phenix homepage

Example script for phenix rosetta refine

1. Enable alpha-test programs and features

2. Open "Rosetta refinement (alpha)" wizard and note that "submit queue job" is not available. Instead save a parameter file called rosetta_refine_1.eff.

3.Edit rosetta_refine.script as:

#!/bin/bash
#SBATCH -t 96:00:00
#SBATCH -N 1
#SBATCH --mail-type=ALL

module load proj/xray
module load phenix/1.11.1-2575
phenix.rosetta_refine rosetta_refine_1.eff

4. Execute "sbatch rosetta_refine.script" to send your job to the compute node

 

Arcimboldo_lite

Arcimboldo_lite (1) is "ab initio" high resolution native datasets phasing software where 4eto was used for  benchmarking (2)

  1. ARCIMBOLDO_LITE: single-workstation implementation and use.
    Sammito M, Millán C, Frieske D, Rodríguez-Freire E, Borges R, Usón I
    Acta Crystallogr. D Biol. Crystallogr. 2015 Sep;71(Pt 9):1921-30
  2. Macromolecular ab initio phasing enforcing secondary and tertiary structure.
    Millán C, Sammito M, Usón I
    IUCrJ 2015 Jan;2(Pt 1):95-105

4eto data-files can be downloaded and executed via "sbatch arcimboldo.script"

#!/bin/bash
#SBATCH -t 6:00:00
#SBATCH -N 1
#SBATCH --mail-type=ALL
module load proj/xray
module load ccp4/7.0-module
module load arcimboldo_lite/nov2016
job=4eto
ARCIMBOLDO_LITE ${job}.bor > ${job}.log

To run your own job please modify the arcimboldo.script above and 4eto.bor file according to arcimboldo manual

################# 4eto.bor - start ##################

[CONNECTION]:
distribute_computing: multiprocessing

[GENERAL]
working_directory = /proj/xray/users/x_user/test_directory
mtz_path = ./4eto_2.mtz
hkl_path = ./4eto_4.hkl

[ARCIMBOLDO]
shelxe_line = -m30 -v0 -a3 -t20 -q -s0.55
helix_length = 14
sigf_label = SIGF
molecular_weight = 22000
name_job = 4eto_def
f_label = F
fragment_to_search = 2
number_of_component = 1
identity = 0.2

[LOCAL]
# Third party software paths
path_local_phaser: /proj/xray/software/ccp4_v7.0/destination/ccp4-7.0/bin/phaser
path_local_shelxe: /software/apps/shelx/expires-2017-01-01/bdist/shelxe

################# 4eto.bor - end ##################

DIALS and XDS at NSC Triolith

xia2 for autoprocessing of X-ray diffraction data by XDS and DIALS is developed by Diamond Light Source and CCP4 and runs in the background when visiting a Diamond beamline. 

xia2 is suitable for NSC Triolith using "sbatch xia2.script" here exemplified by XDS -3dii option (upper) and -dials option (below)

#!/bin/bash
#SBATCH -t 1:00:00
#SBATCH -N 1
#SBATCH --mail-type=ALL
module load proj/xray
module load xds/2016-12-07
module load ccp4/7.0-module
xia2 pipeline=3dii failover=true \
image=/proj/xray/users/x_marmo/TEST/data/x001/TEST-x001_1_0001.cbf:1:900 \
multiprocessing.mode=parallel \
multiprocessing.njob=1 \
multiprocessing.nproc=16 \
trust_beam_centre=False read_all_image_headers=False \

#!/bin/bash
#SBATCH -t 1:00:00
#SBATCH -N 1
#SBATCH --mail-type=ALL
module load proj/xray
module load xds/2016-12-07
module load ccp4/7.0-module
xia2 pipeline=dials failover=true \
image=/proj/xray/users/x_marmo/TEST/data/x001/TEST-x001_1_0001.cbf:1:900 \
multiprocessing.mode=parallel \
multiprocessing.njob=1 \
multiprocessing.nproc=16 \
trust_beam_centre=False read_all_image_headers=False \

For more xia2 guidance visit the xia2 homepage and xia2 multi crystal tutorial

XDS autoprocessing is also possible with xdsme (XDS Made Easy) and autoPROC from GlobalPhasing and fully automated scripts are presented below:

#!/bin/bash
#SBATCH -t 1:00:00
#SBATCH -N 1
#SBATCH --mail-type=ALL
module load proj/xray
module load xdsme/20161024
xdsme --brute datafiles_*.cbf

#!/bin/bash
#SBATCH -t 1:00:00
#SBATCH -N 1
#SBATCH --mail-type=ALL
module load proj/xray
module load autoPROC/20161207
process \
-Id id1,/proj/xray/users/x_marmo/targets/PROTA/data/x001,PROTA-x001_1_####.cbf,1,1800 \
-noANO -B \
-nthreads 8 \
-d proc_1 > proc_1.log

Finally XDS can be run from a graphical user interface at an interactive node (or development node) in the shape of XDSGUI and XDSAPP

Eiger data processing

New Eiger pixel detectors from Dectris require either the H5ToXds software or the dectris-neggia-centos6.so library when running XDS. H5ToXds is available, however users may also register with Dectris and download dectris-neggia-centos6.so, make it execulable (chmod +x), and add

LIB= /home/x_user/bin/dectris-neggia-centos6.so

to their XDS.INP files when running XDS/XDSGUI/XDSAPP/XDSme/Xia2 etc.  Below we present examples for Eiger data processing at NSC Triolith using the EIGER_16M_Nov2015.tar.bz2 test data available after login to https://www.dectris.com/datasets.html.

XDS with Eiger data

To get an XDS.INP file for any of the XDS dependent softwares and GUIs download generate_XDS.INP from Kay Diederichs homepage and give full PATH to the master.h5 as

generate_XDS.INP /proj/xray/users/x_marmo/test_suite_NSC/eiger/2015_11_10/insu6_1_master.h5

Now we can use H5ToXds made by dectris to run XDS by "sbatch xds.script"

#!/bin/bash
#SBATCH -t 1:00:00
#SBATCH -N 1
#SBATCH --mail-type=ALL
module load proj/xray
module load H5ToXds/1.1.0
module load xds/2016-12-07
xds_par

or we can add

LIB= /home/x_user/bin/dectris-neggia-centos6.so

to XDS.INP and use the same sbatch xds.script as above and dectris-neggia-centos6.so will be used instead of H5ToXds. Best performance for this test data is with the neggia library and

MAXIMUM_NUMBER_OF_JOBS=4

added to XDS.INP.  Prior to release of the dectris-neggia-centos6.so library in March 2017 library other options have been available such a using the hdf2mini-cbf software from GlobalPhasing however this is now obsolete handling of HDF5 data. More guidance and benchmarks to data processing with XDS using Eiger data made by Kay Diederichs.

autoPROC with Eiger data

autoPROC is a HDF5 compatible data processing pipeline, that will process Eiger data with XDS, scale it with aimless and analyze data anisotropy with staraniso

The autoPROC module contain XDS/CCP4 and ADXV/gnuplot/grace/cmake that autoPROC need. To run autoPROC with eiger data use:

sbatch pro1.script

where pro1.script is:

#!/bin/bash
#SBATCH -t 1:00:00
#SBATCH -N 1
#SBATCH --mail-type=ALL
module load proj/xray
module load autoPROC/20161207
process \
-h5 insu6_1_master.h5 \
-d pro1 > pro1.log 

For more autoPROC keywords run "process -h"

autoPROC results are available in summary.html file

XDSAPP with Eiger data

XDSAPP can read .cbf files extracted from HDF5 format using hdf2mini-cbf, however XDS.INP need to be explicitly created by generate_XDS.INP available in the xdsapp/2.1 module and saved as EXP.INP as described in release notes.  To use XDSAPP with HDF5 files please:

1. Convert HDF5 into .cbf files using hdf2mini-cbf from autoPROC
module load proj/xray
module load autoPROC/20161207
hdf2mini-cbf -m insu6_1_master.h5 -o insu6_1_

2. Request an interactive node
interactive -N1 -t 1:00:00

3. Read modules for xdsapp
module load proj/xray
module load ccp4/7.0-module
module load phenix/1.11.1-2575
module load xds/2016-12-07
module load xds-viewer/0.6
module load xdsapp/2.1

Please note xdsstat is available in the xdsapp/2.1 module

4. Create XDS.INP with generate_XDS.INP and full path to master.h5 
generate_XDS.INP /proj/xray/users/x_marmo/test_suite_NSC/eiger/2015_11_10/insu6_1_master.h5

5. Edit XDS.INP and finally save it as EXP.INP

6. Use XDSAPP
xdsapp

XDSGUI - with converted .cbf files for Eiger data

XDSGUI can use Eiger data either in HDF5 format with support of H5ToXds (slower) or with .cbf files converted from HDF5 into .cbf by hdf2mini-cbf (faster)

To run xdsgui with converted .cbf files please:

1. Convert HDF5 into .cbf files using hdf2mini-cbf from autoPROC
module load proj/xray
module load autoPROC/20161207
hdf2mini-cbf -m insu6_1_master.h5 -o insu6_1_

2. Request an interactive node
interactive -N1 -t 1:00:00

3. Read xdsgui modules
module load proj/xray
module load ccp4/7.0-module
module load xds/2016-12-07
module load xdsgui/2016-12-23

where ccp4/7.0-module environment required by xdsstat/pointless

4. Start XDSGUI
xdsgui

5. load insu6_1_master.h5 and press generate_XDS.INP

6. Edit XDS.INP in xdsgui
from:
NAME_TEMPLATE_OF_DATA_FRAMES=/proj/xray/users/x_marmo/test_suite_NSC/eiger/2015_11_10/insu6_1_master.h?
to:
NAME_TEMPLATE_OF_DATA_FRAMES=/proj/xray/users/x_marmo/test_suite_NSC/eiger/2015_11_10/insu6_1_??????.cbf

7. Run XDS inside XDSGUI

XDSGUI run faster with
MAXIMUM_NUMBER_OF_JOBS=4 
MAXIMUM_NUMBER_OF_PROCESSORS=4 

XDSGUI - with support of H5ToXds for Eiger data

1. Request an interactive node
interactive -N1 -t 1:00:00

2. Read xdsgui modules
module load proj/xray
module load ccp4/7.0-module
module load xds/2016-12-07
module load H5ToXds/1.1.0
module load xdsgui/2016-12-23

where:
ccp4/7.0-module environment required by xdsstat/pointless
H5ToXds/1.1.0 is needed to read HDF5 data

3. Start XDSGUI
xdsgui

4. load insu6_1_master.h5 and press generate_XDS.INP
Ignore the error message "Cannot access input file"

5. Edit XDS.INP in xdsgui
NAME_TEMPLATE_OF_DATA_FRAMES=/proj/xray/users/x_marmo/test_suite_NSC/eiger/2015_11_10/insu6_1_master.h?
to:
NAME_TEMPLATE_OF_DATA_FRAMES=/proj/xray/users/x_marmo/test_suite_NSC/eiger/2015_11_10/insu6_1_??????.h5

6. Run XDS inside XDSGUI
Now H5ToXds will run together with xds_par making XDS run slower than with converted .cbf files.

XDSme - with H5ToXds and Albula support for Eiger data

XDSme stands for "XDS made easy" and require H5ToXds and Albula from dectris to run with Eiger HDF5 data. XDSme with Eiger HDF5 data require a python environment such as numpy (Numerical Python) that is not available at Triolith compute nodes by default (Dec 2016) however after "module load python/2.7.12-anaconda-4.2.0" numpy is available.

#!/bin/bash
#SBATCH -t 1:00:00
#SBATCH -N 1
#SBATCH --mail-type=ALL
module load python/2.7.12-anaconda-4.2.0
module load proj/xray
module load ccp4/7.0-module
module load H5ToXds/1.1.0
module load albula/3.2.0-2
module load xdsme/20161024
xdsme /proj/xray/users/x_marmo/test_suite_NSC/eiger/2015_11_10/insu6_1_??????.h5

xia2 for DIALS and XDS with Eiger data

xia2 is an expert system for X-ray data processing using DIALS or XDS.  xia2 is part of the CCP4 package that also includes DIALS, however slightly older versions than latest xia2 and DIALS downloads available.  xia2 in XDS mode uses H5ToXds while xia2 in DIALS mode do not.

xia2 can be scripted in DIALS mode for Eiger data, and started by "sbatch dials.script", where dials.script is:

#!/bin/bash
#SBATCH -t 1:00:00
#SBATCH -N 1
#SBATCH --mail-type=ALL
module load proj/xray
module load xds/2016-12-07
module load ccp4/7.0-module
xia2 pipeline=dials failover=true \
image=/proj/xray/users/x_marmo/test_suite_NSC/eiger/empty/2015_11_10/insu6_1_master.h5 \
multiprocessing.mode=parallel \
multiprocessing.njob=1 \
multiprocessing.nproc=16 \
trust_beam_centre=False read_all_image_headers=False \

xia2 can be scripted in XDS mode for Eiger data, and started by "sbatch 3dii.script", where 3dii.script is:

#!/bin/bash
#SBATCH -t 1:00:00
#SBATCH -N 1
#SBATCH --mail-type=ALL
module load proj/xray
module load xds/2016-12-07
module load ccp4/7.0-module
module load H5ToXds/1.1.0
xia2 pipeline=3dii failover=true \
image=/proj/xray/users/x_marmo/test_suite_NSC/eiger/empty/2015_11_10/insu6_1_master.h5 \
multiprocessing.mode=parallel \
multiprocessing.njob=1 \
multiprocessing.nproc=16 \
trust_beam_centre=False read_all_image_headers=False \

xia2 can also be run in DIALS mode or XDS mode using ccp4i2.  So grab an interactive node by:
interactive -N1 -t 1:00:00
module load proj/xray
module load ccp4/7.0-module
module load H5ToXds/1.1.0
ccp4i2

In ccp4i2 under "Integrate X-ray images" there exist a new template for xia2 with DIALS called "Automated integration of images with DIALS using xia2" (preferred for DIALS) and there is also an old template called "Automated integration of images with DIALS - XIA2" that can run xia2 in both DIALS and XDS mode to be updated soon.  A nasty BUG in the new xia2 DIALS template is present in ccp4 v7.0.027 (Jan 2017) and require any user to uncheck "Keep all data" being ticked by default when opening the new xia2 DIALS template - see image below:

The old xia2 template named "Automated integration of images with DIALS - XIA2" can be used for xia2 runs in XDS mode, however please remember to have H5ToXds available in your PATH.

Single processor software, CNS and MoRDa

Running single processor software at NSC Triolith does not save any wall time per job, however many refinement protocols in CNS and molecular replacement attempts in MoRDa can be fired off quickly

CNS - Crystallography and NMR System

Crystallography and NMR system comes with an online GUI to edit input files.  At present the konqueror browser at NSC Triolith does not allow re-opening of saved CNS input files such as generate.inp or refine.inp hopefully we can solve this issue soon. CNS sbatch scripts should allocate a single processor i.e. "#SBATCH -n 1" as exemplified below

#!/bin/bash
#SBATCH -t 1:00:00
#SBATCH -n 1
#SBATCH --mail-type=ALL
module load proj/xray
module load cns/1.3
cns_solve < refine.inp > refine.log

MoRDa - automatic molecular replacement

MoRDa is new software for automatic molecular replacement and "morda -h" will give some command-line options to the software. MoRDa sbatch script should allocate a single processor i.e. "#SBATCH -n 1" as exemplified below

#!/bin/bash
#SBATCH -t 10:00:00
#SBATCH -n 1
#SBATCH --mail-type=ALL
module load proj/xray
module load ccp4/7.0-module
module load morda/20160531
morda -s target.seq  -f data.mtz -alt

Software conflicts

 

In the .bashrc example above the majority of the settings commands are commented out (#) to avoid conflict between software packages. For instance if you want to run the CCP4 autoprocessing software xia2, but previously read the phenix module, xia2 will simply not run for unknown reason. Sometimes however correct functionality require both ccp4 and phenix modules to be read and XDSAPP is such an example. The VASCo module has its own python virtual environment that might conflict to other Python dependent modules such as XDSAPP/PHENIX etc.

 

NSC diskspace

Every user have 20 GB of diskspace in /home/x_user and the entire project have 2000 GB under /proj/xray/users/x_user. Data transfer to windows machines are most easy done by WinSCP, however to Linux machines use scp and rsync exemplified below

rsync -rvplt ./data x_user@triolith-thinlinc.nsc.liu.se:/home/x_user

Directory named "data" transferred to /home/x_user directory at Triolith
scp x_user@triolith-thinlinc.nsc.liu.se:/home/x_user/data/xdsapp/XDS_ASCII.HKL ./ Single file XDS_ASCII.HKL transferred from Triolith to current directory

 

 

 

iRODS is unstable

The enterprise edition of iRODS does not work as intended and data transfer is frequently interrupted. At the initial stage of this project community edition of iRODS were used with stable performance. The current enterprise edition of iRODS is not recommended

Members of the pilot project can join the SweStore iRODS-project to share another 10 TB of long-term data storage. Data placed in iRODS are safe however cannot be used for running calculations at NSC or elsewhere. Data files present in iRODS gets organisied as a database useful for very fast searches. Data is manipulated in iRODS using icommands. Using icommands are similar to using sftp i.e. standard Linux commands expect for the database specific ones - some exemplified below:

icommand examples Consequence
iinit --ttl 72 Perform Yubikey login to iRODS for 72 hours (maxtime)
imkdir /snicZone/proj/psf/Diamond_150125 Create a new "common" iRODS directory.
ipwd or ils Like the standard pwd and ls command
icd Diamond_140125 change irods directory
irsync -rv --link mx8492-36 i:mx8492-36 Transfer mx8492-36 directory to iRODS - alt1
iput -rv mx8492-36 mx8492-36 Transfer mx8492-36 directory to iRODS - alt2
iquest "select sum(DATA_SIZE),count(DATA_NAME),RESC_NAME where COLL_NAME like '/snicZone/proj/psf/Diamond_140125/mx8492-36%'" Check amount of data transferred (like du -sh)
ilocate XDS_ASCII.HKL Find all XDS_ASCII.HKL files in your iRODS zone
imkdir /snicZone/proj/psf/Anna Create personal directory Anna for user s_anna
ils -A /snicZone/proj/psf/Anna Check Anna directory ownership and rights
ichmod -r own s_anna /snicZone/proj/psf/Anna Make s_anna an owner of Anna
ichmod -r null s_admin /snicZone/proj/psf/Anna Remove user s_admin access to Anna
ichmod -r null psf /snicZone/proj/psf/Anna Remove psf group access to Anna

 

VASCo at NSC

1. module load vasco/1.0.2
2. Place a file.pdb in a fresh directory and execute:
vasco -in_dir ./ -filename file

The vasco installation has it's own python virtualenv, so do not run other python programs (phenix/xdsapp) in the same shell.

Links

Bioinformatics (Computational Biology)Protein CrystallographyStructural Biology