###############################################################################
#    INSTALLATION AND IMPLEMENTATION OF NeBcon (version 1.0, 2017/02/26)      #
#    This program belongs to Zhang Lab at University of Michigan	      #
#    If you have any questions, please contact "hebaoji@itp.ac.cn" or         #
#    "mortuza@umich.edu"						      #
###############################################################################

DISCLAIMER: NeBcon uses several scripting languages (e.g. awk, sed, cat). 
            Therefore, the package can be run ONLY in "LINUX" environment. Also, 
            you need to download and install perl and JAVA to run NeBcon.


1. What is NeBcon?

   NeBcon (Neural-network and Bayes-classifier based contact prediction) is an 
   algorithm for sequence-based protein contact prediction, built on multiple 
   contact prediction programs, which are machine-learinng, co-evolution and 
   meta-server based. It first uses the naive Bayes classifier to calculate the 
   posterior probability of multiple contact predictors. Neural Network is then 
   used to train the actual contact maps against the secondary structure, solvent 
   accessibility, Shannon entropy of multiple sequence alignments,in combination 
   with the posterior probability scores calculated from the predictors. 
   The benchmark result shows significant advantage of contact prediction over
   individual contact programs. The contact programs that are used in NeBcon are:

	  (a) SVMSEQ
	  (b) PSICOV
	  (c) CCMpred
	  (d) Freecontact
	  (e) STRUCTCH
	  (f) MetaPSICOV

2. Download required databases:
   
   a) RUN the "download_db.sh" bash script in the "./NeBconpackage" directory using 
      the following command to download and extract uniprot20, uniref90 and nr.

      sh download_db.sh

3. Guidelines to run NeBcon:

   a) Create an input directory for a specific protein (e.g. 1i3cA) and place the 
      fasta format of the protein sequence with name "seq.txt" in the input 
      directory. 

      (Example: Say, for the protein 1i3cA, an input folder, named 1i3cA, is 
       created at "/home/user/NeBconpackage/test/"
       That means, the "seq.txt" file of the protein 1i3cA should be placed for 
       this example is at: /home/user/NeBconpackage/test/1i3cA/ )

   b) run "nebcon.pl" perl script in the ./NeBconpackage directory using following 
      command:

   ./nebcon.pl -seqname <sequence_name> -datadir <data_dir> -runstyle <run_style>

   Here, for "-seqname", "-datadir" and "-runstyle" flags, provide following arguments
   respectively.

   <sequence_name> :Provide name of the sequence (1i3cA,1hxrA,etc.)
   <data_dir>	   :Provide path where the sequence file (seq.txt) is placed
   <run_style>	   :Provide the run style (either "serial" or "parallel") of the jobs. 
                    If it is "serial",the script will run jobs sequentially. If your 
                    system supports running parallel jobs in different nodes using 
                    PBS/torque job scheduling system, you may put "parallel"
   
    Example:
   -----------
   ./nebcon.pl -seqname 1i3cA -datadir /home/user/NeBconpackage/test/1i3cA -runstyle serial

4. Output files: 

    a) NeBcon program first generates output contact map files "XXX.dat" for the 
       six predictors, where XXX refers to the name of the predictor. Then, 
       the program generates final output files for NeBcon as shown in b.

    b) NeBcon output files:
     -- nnbayes.dat (for Carbon alpha contact map)
     -- nnbayesb.dat (for Carbon beta contact map)
     -- protein-step1 (Predicted beta fragments)
     -- protein-step2 (Provides the aligned score) 
     -- protein-step3 (Provides the reliable beta strands and score)

    c) Additional contact output files for QUARK:
      -- NeBcon program also generates contact map files "XXX.dat.quark" for 
         the predictors, including NeBcon. Here, XXX refers to the name of the
         predictors. These contact maps can be used in QUARK to predict 3D 
         structure of the proteins.

    d) The program also generates following files as byproducts that are used to 
       predict contact maps:
      -- protein.aln (Aligned homologous sequences from PSI-BLAST search)
      -- protein.solv (Predicted solvent accessibility by PSSpred)
      -- seq.dat.ss (Predicted secondary structure by psipred)
      -- protein.colstats (Aligned sequence statistics)


5. Example cases are provided in the folder "./NeBconpackage/test", where all the input and 
   output files for protein "1i3cA" and "1hxrA" are available.

6. The webserver of the program is available at: 
   http://zhanglab.dcmb.med.umich.edu/NeBcon/

7. If you use NeBcon, please cite:

   Baoji He, S.M. Mortuza, Yanting Wang, Hongbin Shen, Yang Zhang. NeBcon: Protein contact 
   map prediction using neural network training coupled with naïve Bayes classifiers. 
   submitted, 2017.

8. Notes:
   The original version of NeBcon in the paper used two additional programs (Betacon and 
   SVMcon), but this release only contains 6 programs due to the license restriction on 
   program release. However, our benchmark results show that the versions with and without 
   the programs after retraining perform comparably.