Locally Installed C-I-Tasser and I-Tasser fails for certain proteins

This forum is shown on the index page along with all topics.

Moderator: robpearc

rpearson_7
Posts: 17
Joined: Wed Nov 10, 2021 8:05 pm

Locally Installed C-I-Tasser and I-Tasser fails for certain proteins

Post by rpearson_7 »

Hello,

I have installed C-I-Tasser and I-Tasser on my school's HPC. The installation has been successful with a good number of proteins I have submitted. However, certain proteins are consistently failing to yield any predicted models at all while others only output a single model or 3 models when I have set the flag to output 5 models.

What I have noticed is that the failed proteins are normally a little bit larger. For example UNIPROT ID: P50416 with 773 residues and Q96EB6 with 773 residues. Although there are some exceptions, for example P41784 with 80 residues.

What I find odd, is that both C-I-Tasser and I-Tasser are both failing when runing these proteins. I am submitting sbatch script to the SLURM scheduler that looks like the following:

C-I-Tasser example submission:

Code: Select all

#!/bin/bash

#SBATCH --nodes=5
#SBATCH --cpus-per-task=28
#SBATCH -J PosCIT_2
#SBATCH --mem=0
#SBATCH -D /home/rpearson/research/PosValSet/CIT

cd /home/rpearson/research/PosValSet/CIT

srun --ntasks=1 --cpus-per-task=28 bash -c "mkdir P10636 ; scp ./fastas/P10636.fasta ./P10636 ; echo Running P10636 ; cd /home/rpearson/research/PosValSet/CIT/P10636 ; mv P10636.fasta seq.fasta ; perl /home/rpearson/Structure_Prediction_Tools/C-I-TASSER-1.0/I-TASSERmod/runI-TASSER.pl -pkgdir /home/rpearson/Structure_Prediction_Tools/C-I-TASSER-1.0 -libdir /home/rpearson/Structure_Prediction_Tools/CIT_Lib -seqname P10636.fasta -datadir /home/rpearson/research/PosValSet/CIT/P10636 -outdir /home/rpearson/research/PosValSet/CIT/P10636 -runstyle parallel -homoflag benchmark -idcut 0.3 -light true -nmodel 5 -hours 5 -LBS false -EC false -GO false -java_home /usr -cit true;" &

srun --ntasks=1 --cpus-per-task=28 bash -c "mkdir P52751 ; scp ./fastas/P52751.fasta ./P52751 ; echo Running P52751 ; cd /home/rpearson/research/PosValSet/CIT/P52751 ; mv P52751.fasta seq.fasta ; perl /home/rpearson/Structure_Prediction_Tools/C-I-TASSER-1.0/I-TASSERmod/runI-TASSER.pl -pkgdir /home/rpearson/Structure_Prediction_Tools/C-I-TASSER-1.0 -libdir /home/rpearson/Structure_Prediction_Tools/CIT_Lib -seqname P52751.fasta -datadir /home/rpearson/research/PosValSet/CIT/P52751 -outdir /home/rpearson/research/PosValSet/CIT/P52751 -runstyle parallel -homoflag benchmark -idcut 0.3 -light true -nmodel 5 -hours 5 -LBS false -EC false -GO false -java_home /usr -cit true;" &

srun --ntasks=1 --cpus-per-task=28 bash -c "mkdir P50416 ; scp ./fastas/P50416.fasta ./P50416 ; echo Running P50416 ; cd /home/rpearson/research/PosValSet/CIT/P50416 ; mv P50416.fasta seq.fasta ; perl /home/rpearson/Structure_Prediction_Tools/C-I-TASSER-1.0/I-TASSERmod/runI-TASSER.pl -pkgdir /home/rpearson/Structure_Prediction_Tools/C-I-TASSER-1.0 -libdir /home/rpearson/Structure_Prediction_Tools/CIT_Lib -seqname P50416.fasta -datadir /home/rpearson/research/PosValSet/CIT/P50416 -outdir /home/rpearson/research/PosValSet/CIT/P50416 -runstyle parallel -homoflag benchmark -idcut 0.3 -light true -nmodel 5 -hours 5 -LBS false -EC false -GO false -java_home /usr -cit true;" &

srun --ntasks=1 --cpus-per-task=28 bash -c "mkdir P41784 ; scp ./fastas/P41784.fasta ./P41784 ; echo Running P41784 ; cd /home/rpearson/research/PosValSet/CIT/P41784 ; mv P41784.fasta seq.fasta ; perl /home/rpearson/Structure_Prediction_Tools/C-I-TASSER-1.0/I-TASSERmod/runI-TASSER.pl -pkgdir /home/rpearson/Structure_Prediction_Tools/C-I-TASSER-1.0 -libdir /home/rpearson/Structure_Prediction_Tools/CIT_Lib -seqname P41784.fasta -datadir /home/rpearson/research/PosValSet/CIT/P41784 -outdir /home/rpearson/research/PosValSet/CIT/P41784 -runstyle parallel -homoflag benchmark -idcut 0.3 -light true -nmodel 5 -hours 5 -LBS false -EC false -GO false -java_home /usr -cit true;" &

srun --ntasks=1 --cpus-per-task=28 bash -c "mkdir O14641 ; scp ./fastas/O14641.fasta ./O14641 ; echo Running O14641 ; cd /home/rpearson/research/PosValSet/CIT/O14641 ; mv O14641.fasta seq.fasta ; perl /home/rpearson/Structure_Prediction_Tools/C-I-TASSER-1.0/I-TASSERmod/runI-TASSER.pl -pkgdir /home/rpearson/Structure_Prediction_Tools/C-I-TASSER-1.0 -libdir /home/rpearson/Structure_Prediction_Tools/CIT_Lib -seqname O14641.fasta -datadir /home/rpearson/research/PosValSet/CIT/O14641 -outdir /home/rpearson/research/PosValSet/CIT/O14641 -runstyle parallel -homoflag benchmark -idcut 0.3 -light true -nmodel 5 -hours 5 -LBS false -EC false -GO false -java_home /usr -cit true;" &
wait
conda deactivate
I-Tasser example

Code: Select all

#!/bin/bash

#SBATCH --nodes=4
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=28
#SBATCH -J PosIT_4
#SBATCH --mem=125000
#SBATCH -D /home/rpearson/research/PosValSet/IT

cd /home/rpearson/research/PosValSet/IT

srun --nodes=1 --ntasks=1 --cpus-per-task=28 --mem=125000 bash -c "mkdir P10636 ; scp ./fastas/P10636.fasta ./P10636 ; echo Running P10636 ; cd /home/rpearson/research/PosValSet/IT/P10636 ; mv P10636.fasta seq.fasta ; perl /home/rpearson/Structure_Prediction_Tools/C-I-TASSER-1.0/I-TASSERmod/runI-TASSER.pl -pkgdir /home/rpearson/Structure_Prediction_Tools/C-I-TASSER-1.0 -libdir /home/rpearson/Structure_Prediction_Tools/CIT_Lib -seqname P10636.fasta -datadir /home/rpearson/research/PosValSet/IT/P10636 -outdir /home/rpearson/research/PosValSet/IT/P10636 -runstyle parallel -homoflag benchmark -idcut 0.3 -light true -nmodel 5 -hours 5 -LBS false -EC false -GO false -java_home /usr -cit false;" &


srun --nodes=1 --ntasks=1 --cpus-per-task=28 --mem=125000 bash -c "mkdir P41784 ; scp ./fastas/P41784.fasta ./P41784 ; echo Running P41784 ; cd /home/rpearson/research/PosValSet/IT/P41784 ; mv P41784.fasta seq.fasta ; perl /home/rpearson/Structure_Prediction_Tools/C-I-TASSER-1.0/I-TASSERmod/runI-TASSER.pl -pkgdir /home/rpearson/Structure_Prediction_Tools/C-I-TASSER-1.0 -libdir /home/rpearson/Structure_Prediction_Tools/CIT_Lib -seqname P41784.fasta -datadir /home/rpearson/research/PosValSet/IT/P41784 -outdir /home/rpearson/research/PosValSet/IT/P41784 -runstyle parallel -homoflag benchmark -idcut 0.3 -light true -nmodel 5 -hours 5 -LBS false -EC false -GO false -java_home /usr -cit false;" &


srun --nodes=1 --ntasks=1 --cpus-per-task=28 --mem=125000 bash -c "mkdir O14641 ; scp ./fastas/O14641.fasta ./O14641 ; echo Running O14641 ; cd /home/rpearson/research/PosValSet/IT/O14641 ; mv O14641.fasta seq.fasta ; perl /home/rpearson/Structure_Prediction_Tools/C-I-TASSER-1.0/I-TASSERmod/runI-TASSER.pl -pkgdir /home/rpearson/Structure_Prediction_Tools/C-I-TASSER-1.0 -libdir /home/rpearson/Structure_Prediction_Tools/CIT_Lib -seqname O14641.fasta -datadir /home/rpearson/research/PosValSet/IT/O14641 -outdir /home/rpearson/research/PosValSet/IT/O14641 -runstyle parallel -homoflag benchmark -idcut 0.3 -light true -nmodel 5 -hours 5 -LBS false -EC false -GO false -java_home /usr -cit false;" &


srun --nodes=1 --ntasks=1 --cpus-per-task=28 --mem=125000 bash -c "mkdir P50416 ; scp ./fastas/P50416.fasta ./P50416 ; echo Running P50416 ; cd /home/rpearson/research/PosValSet/IT/P50416 ; mv P50416.fasta seq.fasta ; perl /home/rpearson/Structure_Prediction_Tools/C-I-TASSER-1.0/I-TASSERmod/runI-TASSER.pl -pkgdir /home/rpearson/Structure_Prediction_Tools/C-I-TASSER-1.0 -libdir /home/rpearson/Structure_Prediction_Tools/CIT_Lib -seqname P50416.fasta -datadir /home/rpearson/research/PosValSet/IT/P50416 -outdir /home/rpearson/research/PosValSet/IT/P50416 -runstyle parallel -homoflag benchmark -idcut 0.3 -light true -nmodel 5 -hours 5 -LBS false -EC false -GO false -java_home /usr -cit false;" &

wait
conda deactivate
I am not sure where to begin to troubleshoot this issue. Any help would be greatly appreciated!
rpearson_7
Posts: 17
Joined: Wed Nov 10, 2021 8:05 pm

Re: Locally Installed C-I-Tasser and I-Tasser fails for certain proteins

Post by rpearson_7 »

Here is an example stdout from a C-I-Tasser run:

Code: Select all

(base) [rpearson@spartan01 CIT]$ cat P41784_CIT-569556.out
Making P41784 directory.
mkdir: cannot create directory ‘P41784’: File exists
Running P41784

Your setting for running I-TASSER is:
-pkgdir    = /home/rpearson/Structure_Prediction_Tools/C-I-TASSER-1.0
-libdir    = /home/rpearson/Structure_Prediction_Tools/CIT_Lib
-java_home = /usr
-python2   = /opt/intel/intelpython2/bin/python
-python3   = ~/.conda/envs/Quark_and_Itasser_Python3/bin/python3
-seqname   = P41784.fasta
-datadir   = /home/rpearson/research/PosValSet/CIT/P41784
-outdir    = /home/rpearson/research/PosValSet/CIT/P41784
-runstyle  = parallel
-homoflag  = benchmark
-idcut     = 0.3
-cit       = true
-ntemp     = 20
-nmodel    = 5
-light     = true
-hours     = 5
-LBS       = false
-EC        = false
-GO        = false

1. make seq.txt and rmsinp
Your protein contains 80 residues:
> P41784.fasta
MATPWSGYLDDVSAKFDTGVDNLQTQVTEALDKLAAKPSDPALLAAYQSKLSEYNLYRNA
QSNTVKVFKDIDAAIIQNFR
2.0 run DeepMSA
2.1 run Psi-blast
2.2 Secondary structure prediction was done before.
2.3 Predict solvent accessibility...
2.4 run pairmod
pair exist
3.0 do contact prediction
/home/rpearson/research/PosValSet/CIT/P41784/restriplet.dat completes
/home/rpearson/research/PosValSet/CIT/P41784/tripletres.dat completes
/home/rpearson/research/PosValSet/CIT/P41784/respre.dat completes
/home/rpearson/research/PosValSet/CIT/P41784/resplm.dat completes
/home/rpearson/research/PosValSet/CIT/P41784/deepplm.dat completes
3.1 do threading
/home/rpearson/research/PosValSet/CIT/P41784/init.CEthreader exists
/home/rpearson/research/PosValSet/CIT/P41784/init.mCEthreader exists
/home/rpearson/research/PosValSet/CIT/P41784/init.eCEthreader exists
/home/rpearson/research/PosValSet/CIT/P41784/init.PPAS exists
/home/rpearson/research/PosValSet/CIT/P41784/init.dPPAS exists
/home/rpearson/research/PosValSet/CIT/P41784/init.dPPAS2 exists
/home/rpearson/research/PosValSet/CIT/P41784/init.Env-PPAS exists
/home/rpearson/research/PosValSet/CIT/P41784/init.MUSTER exists
/home/rpearson/research/PosValSet/CIT/P41784/init.wPPAS exists
/home/rpearson/research/PosValSet/CIT/P41784/init.wdPPAS exists
/home/rpearson/research/PosValSet/CIT/P41784/init.wMUSTER exists
target type = very
exclude homologous templates...
3.2 make restraints
4.1 run simulation
Congradulations! All your input files are correct. You can run TASSER simulations now!

run 4 parallel simulations
run the first simulation job CITP41784.fastasim_2A
4.2 check finished simulations
5.1 do clustering
5.2 build full-atomic model
6 Estimate global accuracy, local accuracy of models and B-factor
P41784 complete.
Everything seems to be fine here. However, these are the files I have. It is missing 2 models I was expecting (model4.pdb and model5.pdb). Why might this be?
P41784_dir_ls.PNG
P41784_dir_ls.PNG (94.18 KiB) Viewed 33049 times
rpearson_7
Posts: 17
Joined: Wed Nov 10, 2021 8:05 pm

Re: Locally Installed C-I-Tasser and I-Tasser fails for certain proteins

Post by rpearson_7 »

Here is an example of an I-Tasser output I am getting.

Code: Select all

(base) [rpearson@spartan01 IT]$ cat P10636_IT-569564.out
Making P10636 directory.
mkdir: cannot create directory ‘P10636’: File exists
Running P10636

Your setting for running I-TASSER is:
-pkgdir    = /home/rpearson/Structure_Prediction_Tools/C-I-TASSER-1.0
-libdir    = /home/rpearson/Structure_Prediction_Tools/CIT_Lib
-java_home = /usr
-python2   = /opt/intel/intelpython2/bin/python
-python3   = ~/.conda/envs/Quark_and_Itasser_Python3/bin/python3
-seqname   = P10636.fasta
-datadir   = /home/rpearson/research/PosValSet/IT/P10636
-outdir    = /home/rpearson/research/PosValSet/IT/P10636
-runstyle  = parallel
-homoflag  = benchmark
-idcut     = 0.3
-cit       = true
-ntemp     = 20
-nmodel    = 5
-light     = true
-hours     = 5
-LBS       = false
-EC        = false
-GO        = false

1. make seq.txt and rmsinp
Your protein contains 758 residues:
> P10636.fasta
MAEPRQEFEVMEDHAGTYGLGDRKDQGGYTMHQDQEGDTDAGLKESPLQTPTEDGSEEPG
SETSDAKSTPTAEDVTAPLVDEGAPGKQAAAQPHTEIPEGTTAEEAGIGDTPSLEDEAAG
HVTQEPESGKVVQEGFLREPGPPGLSHQLMSGMPGAPLLPEGPREATRQPSGTGPEDTEG
GRHAPELLKHQLLGDLHQEGPPLKGAGGKERPGSKEEVDEDRDVDESSPQDSPPSKASPA
QDGRPPQTAAREATSIPGFPAEGAIPLPVDFLSKVSTEIPASEPDGPSVGRAKGQDAPLE
FTFHVEITPNVQKEQAHSEEHLGRAAFPGAPGEGPEARGPSLGEDTKEADLPEPSEKQPA
AAPRGKPVSRVPQLKARMVSKSKDGTGSDDKKAKTSTRSSAKTLKNRPCLSPKHPTPGSS
DPLIQPSSPAVCPEPPSSPKYVSSVTSRTGSSGAKEMKLKGADGKTKIATPRGAAPPGQK
GQANATRIPAKTPPAPKTPPSSGEPPKSGDRSGYSSPGSPGTPGSRSRTPSLPTPPTREP
KKVAVVRTPPKSPSSAKSRLQTAPVPMPDLKNVKSKIGSTENLKHQPGGGKVQIINKKLD
LSNVQSKCGSKDNIKHVPGGGSVQIVYKPVDLSKVTSKCGSLGNIHHKPGGGQVEVKSEK
LDFKDRVQSKIGSLDNITHVPGGGNKKIETHKLTFRENAKAKTDHGAEIVYKSPVVSGDT
SPRHLSNVSSTGSIDMVDSPQLATLADEVSASLAKQGL
2.0 run DeepMSA
2.1 run Psi-blast
2.2 Secondary structure prediction was done before.
2.3 Predict solvent accessibility...
2.4 run pairmod
pair exist
3.0 do contact prediction
/home/rpearson/research/PosValSet/IT/P10636/restriplet.dat completes
/home/rpearson/research/PosValSet/IT/P10636/tripletres.dat completes
/home/rpearson/research/PosValSet/IT/P10636/respre.dat completes
/home/rpearson/research/PosValSet/IT/P10636/resplm.dat completes
/home/rpearson/research/PosValSet/IT/P10636/deepplm.dat completes
3.1 do threading
start parallel threading CEthreader
CITCEthreader_P10636.fasta is running, skip
start parallel threading mCEthreader
CITmCEthreader_P10636.fasta is running, skip
start parallel threading eCEthreader
CITeCEthreader_P10636.fasta is running, skip
/home/rpearson/research/PosValSet/IT/P10636/init.PPAS exists
/home/rpearson/research/PosValSet/IT/P10636/init.dPPAS exists
/home/rpearson/research/PosValSet/IT/P10636/init.dPPAS2 exists
/home/rpearson/research/PosValSet/IT/P10636/init.Env-PPAS exists
/home/rpearson/research/PosValSet/IT/P10636/init.MUSTER exists
/home/rpearson/research/PosValSet/IT/P10636/init.wPPAS exists
/home/rpearson/research/PosValSet/IT/P10636/init.wdPPAS exists
/home/rpearson/research/PosValSet/IT/P10636/init.wMUSTER exists
only 8 threading programs have output, please check threading programs
P10636 complete.
There are other examples of the output writing
only 8 threading programs have output, please check threading programs
in some of my runs.

Any ideas?

Thanks!!
jlspzw
Posts: 247
Joined: Tue May 04, 2021 5:04 pm

Re: Locally Installed C-I-Tasser and I-Tasser fails for certain proteins

Post by jlspzw »

Dear user,

It is possible that I-TASSER and C-I-TASSER do not give five models. normally model less than five (for example, only 1) means all decoys from the simulation have a similar structure then going to 1 cluster, indicates your model normally has high quality.

you can find explanation here
https://zhanggroup.org/C-I-TASSER/help.html
or read text as following
##########################
What are the 'top final models from C-I-TASSER'?
For each target, C-I-TASSER simulations generate tens of thousands of conformations (called decoys). To select the final models, C-I-TASSER uses the SPICKER program to cluster all the decoys based on pair-wise structure similarity, and report up to five models which correspond to the five largest structure clusters. In Monte Carlo theory, the largest clusters correspond to the states of the largest partition function (or lowest free energy) and therefore have the highest confidence. The confidence of each model is quantitatively measured by C-score (see below). Since the top 5 models are ranked by the cluster size, it is possible that the lower-rank models have a higher C-score. Although the first model has a higher C-score and a better quality in most cases, it is not unusual that the lower-rank models have a better quality than the higher-rank models. If the C-I-TASSER simulations converge, it is possible to have less than 5 clusters generated. This is usually an indication that the models are high quality because of the converged simulations.
##########################

for threading problem,

ITeCEthreader_P10636.fasta is running, skip

seems you have heading jobs still running for this target, could you check if the backend process or you submit the threading job in other computational nodes?

Best
IT team
rpearson_7
Posts: 17
Joined: Wed Nov 10, 2021 8:05 pm

Re: Locally Installed C-I-Tasser and I-Tasser fails for certain proteins

Post by rpearson_7 »

Thanks! I decided to re-run some of the queries to see if that helped. For some, it did. For others, re-running did not solve the problem.

For the proteins that still have issues producing any models I am noticing a trend.

Here is the example output:

Code: Select all

Your setting for running I-TASSER is:
-pkgdir    = /home/rpearson/Structure_Prediction_Tools/C-I-TASSER-1.0
-libdir    = /home/rpearson/Structure_Prediction_Tools/CIT_Lib
-java_home = /usr
-python2   = /opt/intel/intelpython2/bin/python
-python3   = ~/.conda/envs/Quark_and_Itasser_Python3/bin/python3
-seqname   = P50416.fasta
-datadir   = /home/rpearson/research/PosValSet/IT/P50416
-outdir    = /home/rpearson/research/PosValSet/IT/P50416
-runstyle  = parallel
-homoflag  = benchmark
-idcut     = 0.3
-cit       = true
-ntemp     = 20
-nmodel    = 5
-light     = true
-hours     = 5
-LBS       = false
-EC        = false
-GO        = false

1. make seq.txt and rmsinp
Your protein contains 773 residues:
> P50416.fasta
MAEAHQAVAFQFTVTPDGIDLRLSHEALRQIYLSGLHSWKKKFIRFKNGIITGVYPASPS
SWLIVVVGVMTTMYAKIDPSLGIIAKINRTLETANCMSSQTKNVVSGVLFGTGLWVALIV
TMRYSLKVLLSYHGWMFTEHGKMSRATKIWMGMVKIFSGRKPMLYSFQTSLPRLPVPAVK
DTVNRYLQSVRPLMKEEDFKRMTALAQDFAVGLGPRLQWYLKLKSWWATNYVSDWWEEYI
YLRGRGPLMVNSNYYAMDLLYILPTHIQAARAGNAIHAILLYRRKLDREEIKPIRLLGST
IPLCSAQWERMFNTSRIPGEETDTIQHMRDSKHIVVYHRGRYFKVWLYHDGRLLKPREME
QQMQRILDNTSEPQPGEARLAALTAGDRVPWARCRQAYFGRGKNKQSLDAVEKAAFFVTL
DETEEGYRSEDPDTSMDSYAKSLLHGRCYDRWFDKSFTFVVFKNGKMGLNAEHSWADAPI
VAHLWEYVMSIDSLQLGYAEDGHCKGDINPNIPYPTRLQWDIPGECQEVIETSLNTANLL
ANDVDFHSFPFVAFGKGIIKKCRTSPDAFVQLALQLAHYKDMGKFCLTYEASMTRLFREG
RTETVRSCTTESCDFVRAMVDPAQTVEQRLKLFKLASEKHQHMYRLAMTGSGIDRHLFCL
YVVSKYLAVESPFLKEVLSEPWRLSTSQTPQQQVELFDLENNPEYVSSGGGFGPVADDGY
GVSYILVGENLINFHISSKFSCPETDSHRFGRHLKEAMTDIITLFGLSSNSKK
2.0 run DeepMSA
2.1 run Psi-blast
2.2 Secondary structure prediction was done before.
2.3 Predict solvent accessibility...
2.4 run pairmod
pair exist
3.0 do contact prediction
/home/rpearson/research/PosValSet/IT/P50416/restriplet.dat completes
/home/rpearson/research/PosValSet/IT/P50416/tripletres.dat completes
/home/rpearson/research/PosValSet/IT/P50416/respre.dat completes
/home/rpearson/research/PosValSet/IT/P50416/resplm.dat completes
/home/rpearson/research/PosValSet/IT/P50416/deepplm.dat completes
3.1 do threading
start parallel threading CEthreader
CITCEthreader_P50416.fasta is running, skip
start parallel threading mCEthreader
CITmCEthreader_P50416.fasta is running, skip
start parallel threading eCEthreader
CITeCEthreader_P50416.fasta is running, skip
/home/rpearson/research/PosValSet/IT/P50416/init.PPAS exists
/home/rpearson/research/PosValSet/IT/P50416/init.dPPAS exists
/home/rpearson/research/PosValSet/IT/P50416/init.dPPAS2 exists
/home/rpearson/research/PosValSet/IT/P50416/init.Env-PPAS exists
/home/rpearson/research/PosValSet/IT/P50416/init.MUSTER exists
/home/rpearson/research/PosValSet/IT/P50416/init.wPPAS exists
/home/rpearson/research/PosValSet/IT/P50416/init.wdPPAS exists
/home/rpearson/research/PosValSet/IT/P50416/init.wMUSTER exists
only 8 threading programs have output, please check threading programs
So I check the threading program outputs and the first thing I notice is the following (out_CITCEthreader_P50416.out):

Code: Select all

hostname: node23.cluster
starting time: Mon Apr  3 17:51:50 PDT 2023
pwd: /tmp/rpearson/CITP50416.fasta
run psipred...
do initial threading by HHD...
$ cp initial.a3m /tmp/HM6dh3rmwV/nPqeX3ZykH.in.a3m
Filtering alignment to diversity 7 ...
$ /home/rpearson/Structure_Prediction_Tools/C-I-TASSER-1.0/contact/DeepMSA/bin/hhfilter -v 1 -neff 7 -i /tmp/HM6dh3rmwV/nPqeX3ZykH.in.a3m -o /tmp/HM6dh3rmwV/nPqeX3ZykH.in.a3m
hhlib=/home/rpearson/Structure_Prediction_Tools/C-I-TASSER-1.0/contact/DeepMSA
$ /home/rpearson/Structure_Prediction_Tools/C-I-TASSER-1.0/contact/DeepMSA/scripts/reformat.pl -v 1 -r -noss a3m psi /tmp/HM6dh3rmwV/nPqeX3ZykH.in.a3m /tmp/HM6dh3rmwV/nPqeX3ZykH.in.psi
Predicting secondary structure with PSIPRED ... $ /home/rpearson/Structure_Prediction_Tools/C-I-TASSER-1.0/contact/DeepMSA/bin/blastpgp -b 1 -j 1 -h 0.001 -d /home/rpearson/Structure_Prediction_Tools/C-I-TASSER-1.0/contact/DeepMSA/data/do_not_delete -i /tmp/HM6dh3rmwV/nPqeX3ZykH.sq -B /tmp/HM6dh3rmwV/nPqeX3ZykH.in.psi -C /tmp/HM6dh3rmwV/nPqeX3ZykH.chk 1> /tmp/HM6dh3rmwV/nPqeX3ZykH.blalog 2> /tmp/HM6dh3rmwV/nPqeX3ZykH.blalog
$ echo nPqeX3ZykH.chk > /tmp/HM6dh3rmwV/nPqeX3ZykH.pn

$ echo nPqeX3ZykH.sq  > /tmp/HM6dh3rmwV/nPqeX3ZykH.sn

$ /home/rpearson/Structure_Prediction_Tools/C-I-TASSER-1.0/contact/DeepMSA/bin/makemat -P /tmp/HM6dh3rmwV/nPqeX3ZykH
$ /home/rpearson/Structure_Prediction_Tools/C-I-TASSER-1.0/contact/DeepMSA/bin/psipred /tmp/HM6dh3rmwV/nPqeX3ZykH.mtx /home/rpearson/Structure_Prediction_Tools/C-I-TASSER-1.0/contact/DeepMSA/data/weights.dat /home/rpearson/Structure_Prediction_Tools/C-I-TASSER-1.0/contact/DeepMSA/data/weights.dat2 /home/rpearson/Structure_Prediction_Tools/C-I-TASSER-1.0/contact/DeepMSA/data/weights.dat3 > /tmp/HM6dh3rmwV/nPqeX3ZykH.ss
$ /home/rpearson/Structure_Prediction_Tools/C-I-TASSER-1.0/contact/DeepMSA/bin/psipass2 /home/rpearson/Structure_Prediction_Tools/C-I-TASSER-1.0/contact/DeepMSA/data/weights_p2.dat 1 0.98 1.09 /tmp/HM6dh3rmwV/nPqeX3ZykH.ss2 /tmp/HM6dh3rmwV/nPqeX3ZykH.ss > /tmp/HM6dh3rmwV/nPqeX3ZykH.horiz
done
hhlib=/home/rpearson/Structure_Prediction_Tools/C-I-TASSER-1.0/contact/DeepMSA
initial.a3m is in A2M, A3M or FASTA format
Read initial.a3m with 2069 sequences
Alignment in initial.a3m contains 773 match states
1267 out of 2067 sequences passed filter (up to 91% position-dependent max pairwise sequence identity)
Effective number of sequences exp(entropy) = 7.0
Writing HMM to initial.hhm
Done
.................................................. 1000 HMMs searched
.................................................. 2000 HMMs searched
.................................................. 3000 HMMs searched
.................................................. 4000 HMMs searched
.................................................. 5000 HMMs searched
.................................................. 6000 HMMs searched
.................................................. 7000 HMMs searched
.................................................. 8000 HMMs searched
.................................................. 9000 HMMs searched
.................................................. 10000 HMMs searched
.................................................. 11000 HMMs searched
.................................................. 12000 HMMs searched
.................................................. 13000 HMMs searched
.................................................. 14000 HMMs searched
.................................................. 15000 HMMs searched
.................................................. 16000 HMMs searched
.................................................. 17000 HMMs searched
.................................................. 18000 HMMs searched
.................................................. 19000 HMMs searched
.................................................. 20000 HMMs searched
.................................................. 21000 HMMs searched
.................................................. 22000 HMMs searched
.................................................. 23000 HMMs searched
.................................................. 24000 HMMs searched
.................................................. 25000 HMMs searched
.................................................. 26000 HMMs searched
.................................................. 27000 HMMs searched
.................................................. 28000 HMMs searched
.................................................. 29000 HMMs searched
.................................................. 30000 HMMs searched
.................................................. 31000 HMMs searched
.................................................. 32000 HMMs searched
.................................................. 33000 HMMs searched
.................................................. 34000 HMMs searched
.................................................. 35000 HMMs searched
.................................................. 36000 HMMs searched
.................................................. 37000 HMMs searched
.................................................. 38000 HMMs searched
.................................................. 39000 HMMs searched
.................................................. 40000 HMMs searched
.................................................. 41000 HMMs searched
.................................................. 42000 HMMs searched
.................................................. 43000 HMMs searched
.................................................. 44000 HMMs searched
.................................................. 45000 HMMs searched
.................................................. 46000 HMMs searched
.................................................. 47000 HMMs searched
.................................................. 48000 HMMs searched
.................................................. 49000 HMMs searched
.................................................. 50000 HMMs searched
.................................................. 51000 HMMs searched
.................................................. 52000 HMMs searched
.................................................. 53000 HMMs searched
.................................................. 54000 HMMs searched
.................................................. 55000 HMMs searched
.................................................. 56000 HMMs searched
.................................................. 57000 HMMs searched
.................................................. 58000 HMMs searched
.................................................. 59000 HMMs searched
.................................................. 60000 HMMs searched
.................................................. 61000 HMMs searched
.................................................. 62000 HMMs searched
.................................................. 63000 HMMs searched
.................................................. 64000 HMMs searched
.................................................. 65000 HMMs searched
.................................................. 66000 HMMs searched
.................................................. 67000 HMMs searched
.................................................. 68000 HMMs searched
.................................................. 69000 HMMs searched
.................................................. 70000 HMMs searched
.................................................. 71000 HMMs searched
.................................................. 72000 HMMs searched
.................................................. 73000 HMMs searched
.................................................. 74000 HMMs searched
.................................................. 75000 HMMs searched
.................................................. 76000 HMMs searched
.................................................. 77000 HMMs searched
.................................................. 78000 HMMs searched
.................................................. 79000 HMMs searched
.................................................. 80000 HMMs searched
.................................................. 81000 HMMs searched
.................................................. 82000 HMMs searched
.................................................. 83000 HMMs searched
.................................................. 84000 HMMs searched
..........................Realigning 20000 query-template alignments with maximum accuracy (MAC) algorithm ...
.................................................. 1000 HMMs aligned
.................................................. 2000 HMMs aligned
.................................................. 3000 HMMs aligned
.................................................. 4000 HMMs aligned
.................................................. 5000 HMMs aligned
.................................................. 6000 HMMs aligned
.................................................. 7000 HMMs aligned
.................................................. 8000 HMMs aligned
.................................................. 9000 HMMs aligned
.................................................. 10000 HMMs aligned
.................................................. 11000 HMMs aligned
.................................................. 12000 HMMs aligned
.................................................. 13000 HMMs aligned
.................................................. 14000 HMMs aligned
.................................................. 15000 HMMs aligned
.................................................. 16000 HMMs aligned
.................................................. 17000 HMMs aligned
.................................................. 18000 HMMs aligned
.................................................. 19000 HMMs aligned
.................................................run CEthreader...
cuda is ready? : False
                                                 [10%]no contacts
re-align and re-rank the top 300 templates...
No Query File or Database File!
I get the same "No Query File or Database File!" message in the following files:
out_CITeCEthreader_P50416.fasta
out_CITmCEthreader_P50416.fasta
and a few more....

Here is the output of the err_CITCEthreader_P50416.fasta:

Code: Select all

WARNING: Ignoring unknown option -mapt ...

WARNING: Ignoring unknown option 0 ...
sh: line 1:  9577 Killed                  ~/.conda/envs/Quark_and_Itasser_Python3/bin/python3 /home/rpearson/Structure_Prediction_Tools/C-I-TASSER-1.0/contact/ResPre/respre.py /tmp/rpearson/CITCEthreader_P50416.fasta//deepmsa_protein.aln /tmp/rpearson/CITCEthreader_P50416.fasta//deepmsa.contact
Illegal division by zero at /var/spool/slurmd/job569000/slurm_script line 418.
slurmstepd: error: Detected 1 oom-kill event(s) in step 569000.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.

Any ideas or suggestions?

Thanks!!
rpearson_7
Posts: 17
Joined: Wed Nov 10, 2021 8:05 pm

Re: Locally Installed C-I-Tasser and I-Tasser fails for certain proteins

Post by rpearson_7 »

Oh, I also have a few more general questions.

Question 1)
What is the difference between "real" and "benchmark"?

Question 2)
I am trying to create an ensemble of protein conformations for both C-I-Tasser and I-Tasser so having 0 or 1 model does not help much. Is there something in the settings that I could change to increase my chances of creating the 5 model outputs I specified? Maybe running the simulations longer than the 5 hours?

Question 3)
What are the settings run by default on the C-I-Tasser and I-Tasser servers? I would like to replicate that as close as possible. I am running locally because I must run many proteins and can not wait for them all to be run on the servers.
rpearson_7
Posts: 17
Joined: Wed Nov 10, 2021 8:05 pm

Re: Locally Installed C-I-Tasser and I-Tasser fails for certain proteins

Post by rpearson_7 »

Here is an update:

I went into runITasser.pl and (since I am using the parallel flag) I increased the wall time to 144:00:00 and the memory to 100000mb from 72:00:00 and 10000mb respectively. I did the same to the runCEThreader.pl script. I have yet to see how this impacts the runs... My hypothesis is that for some of my larger proteins (> 593 residues) I may be running out of time or memory. By increasing these I might allow these larger proteins to finish.

On another note, for one of my proteins O14641, I was looking through the init.* files. These files almost look like pdb type files. On all of them they have residues equal to 736 which is the same number of residues as in the O14641 protein. However, init.wPPAS shows pdb files with much less residues than 736 (~370 or so, in most cases). I then took a look at "err_CITwdPPAS_O14641.fasta" and saw the following:

Code: Select all

(base) [rpearson@spartan01 O14641]$ cat err_CITwdPPAS_O14641.fasta
slurmstepd: error: couldn't chdir to `/tmp/rpearson/CITO14641.fasta': No such file or directory: going to /tmp instead
Exception in thread "main" java.io.FileNotFoundException: /home/rpearson/Structure_Prediction_Tools/CIT_Lib/DEP/7a4aA3.dep (No such file or directory)
        at java.io.FileInputStream.open0(Native Method)
        at java.io.FileInputStream.open(FileInputStream.java:195)
        at java.io.FileInputStream.<init>(FileInputStream.java:138)
        at java.io.FileInputStream.<init>(FileInputStream.java:93)
        at java.io.FileReader.<init>(FileReader.java:58)
        at c.a(c.java)
        at c.main(c.java)
rpearson_7
Posts: 17
Joined: Wed Nov 10, 2021 8:05 pm

Re: Locally Installed C-I-Tasser and I-Tasser fails for certain proteins

Post by rpearson_7 »

I just re-downloaded the DEP directory from the following website:
https://seq2fun.dcmb.med.umich.edu//C-I ... nload.html

I then extracted the files using the following command:
tar -xvjf DEP.tar.bz2

I checked the file I was looking for existed and it did. I copied that file to the DEP directory used in the CIT_Lib directory that I have set up as my library.

I will re-run the proteins and report back. This run has 3 changes. I have the missing file, I increased wall time for the threading and increased memory for the threading.

Fingers crossed,

Rich
rpearson_7
Posts: 17
Joined: Wed Nov 10, 2021 8:05 pm

Re: Locally Installed C-I-Tasser and I-Tasser fails for certain proteins

Post by rpearson_7 »

Got some data back...

Everything failed during the threading portion. I have it set to run in parallel and I received the following error message:
(base) [rpearson@spartan01 P0DMV8]$ cat err_CITMUSTER_P0DMV8.fasta
slurmstepd: error: couldn't chdir to `/tmp/rpearson/CITP0DMV8.fasta': No such file or directory: going to /tmp instead
slurmstepd: error: *** JOB 571208 ON node14 CANCELLED AT 2023-04-14T02:32:43 DUE TO TIME LIMIT ***
I decided to revert the change I made with respect to wall time in both of the files I modified. I set it back to 72:00:00 that is default in the program. I also decreased the memory from 100000mb to 20000mb. This is double the amount of memory set by default (10000mb) in hopes this fixes my issue I seem to be having in predicting the model structures for larger proteins.

I will re-run and report back.
jlspzw
Posts: 247
Joined: Tue May 04, 2021 5:04 pm

Re: Locally Installed C-I-Tasser and I-Tasser fails for certain proteins

Post by jlspzw »

Dear user,

Thank you for checking, we are glad to hear you solve the problem.

Best
IT Team
Post Reply