PSIPRED is a simple and accurate secondary structure prediction method that combines two feedforward neural networks that analyze the output obtained from PSI-BLAST (Position-Specific Iteration-BLAST). Using a very rigorous cross-validation method to evaluate the performance of this method, the average Q 3 score of PSIPRED 3.2 is 81.6%. The predictions produced by PSIPRED have also been submitted to CASP4 for evaluation and evaluated during the CASP4 conference in Asilomar in December 2000. PSIPRED 2.0 reached the third quarter on average. Among all the 40 target domains submitted, the PIP score was 80.6%, and there is no obvious sequence similarity to the structure existing in the PDB. The latter ranked PSIPRED in the first place among the 20 evaluation methods.
When performing machine learning modeling on amino acid sequences, it is necessary to extract features of amino acid sequences. The richer features can usually bring more accurate prediction results. Therefore, the secondary structure of the protein can be predicted from the original amino acid sequence, water solubility Etc., to enrich the features during feature extraction. PSIPRED is a simple and accurate protein secondary structure prediction tool. It combines two feedforward neural networks to analyze the PSI-BLAST results and process more information. PSIPRED can provide the following analysis but not limited to:
✓ Transmembrane topology prediction
✓ Prediction of transmembrane spiral contact
✓ Fold recognition
✓ Prediction of protein imbalance
✓ Domain boundary prediction
Download the installation package
PSIPRED: http://bioinfadmin.cs.ucl.ac.uk/downloads/psipred/old_versions/psipred3.5.tar.gz
BLAST: ftp://ftp.ncbi.nih.gov/blast/executables/legacy.NOTSUPPORTED/2.2.26/blast-2.2.26-x64-linux.tar.gz
SWISSPROT protein sequence: ftp://ftp.ncbi.nih.gov/blast/db/FASTA/swissprot.gz
For the choice of BLAST or BLAST+, according to the official website’s recommendation, BLAST is still used instead of BLAST+. The main reason is that BLAST+ has reduced the accuracy when processing the scoring matrix of protein sequences, so we chose BLAST for this installation. Install script is as follows:
tar -zxvf psipred3.5.tar.gz
cd psipred/src
make
make install
export BLAST_HOME=/home/yizhou/blast-2.2.26
export PSIBLAST=/home/yizhou/psipred
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SPARK_HOME/bin:$BLAST_HOME/bin:$PSIBLAST/bin
The prediction method or algorithm is split into three stages: generating a sequence profile, predicting initial secondary structure, and filtering the predicted structure. PSIPRED works to normalize the sequence profile generated by PSIBLAST. Then, by using neural networking, initial secondary structure is predicted. For each amino acid in the sequence, the neural network is fed with a window of 15 acids. Added information is attached, indicating if the window spans the N or C terminus of the chain. This results in a final input layer of 315 input units, divided into 15 groups of 21 units. The network has one hidden layer of 75 units and 3 output nodes (one for each secondary structure element: helix, sheet, coil).