P3CMQA

P3CMQA - Profile based 3 dimensional Convolutional neural network for protein structure Model Quality Assessment

Table of contents

How to use

  1. Submit Job

    At first, please enter your email address and the PDB or mmCIF file of the structural model at the Submit job page.

    You can submit up to 10 files at a time.

    For the sake of computation time, the number of residues in the model structure is limited to 2000.

    The input of the fasta file is optional, but since we are using sequence-based features such as sequence profiles, the results will be more reliable if you input the fasta file.

    Furthermore, we skip pre-processing for fasta sequences that have been processed once, so if you are submitting multiple PDB/mmCIF files for a target, submitting a fasta file will save you runtime.

    Note that it only allows a fasta file that contains a single sequence.

    Also note that if you submit a fasta file, all the PDB/mmCIF files must be prediction structures for the same target.

    If you did not input the fasta file, we produce a sequence from the PDB/mmCIF file and generate features from that sequence.


  2. Check the results of the prediction

    When the prediction is complete, you will receive an email with the URL of the prediction result.

    Note that the expiration date of the resulting URL is 5 days.

    When you access the URL, you will see the following page.

    (You can see this page at Result Example.)

    Each element on the page has the following meaning

    1. Job information

      Job information shows detailed information about the job.

      Job Title indicates the name of the submitted file, and Model length indicates the number of residues in the submitted model structure.

      The global score represents the score for the whole model structure, and the score ranges from 0 to 1. The closer the score is to 1, the better the structural model is.

      You can download the prediction results in three different formats from the blue download buttons.

      • Normal text format

        We output the prediction results in the following text format.

        # Model name : sample_1.pdb
        # Model Quality Score : 0.4236
        Resid	Resname	Score
        1	SER	0.04631
        2	ASN	0.06995
        3	ALA	0.07282
        :
        

        The first line shows the name of the model structure and the second line shows the score of the whole model structure.

        The third and subsequent lines indicate the residue number, residue name, and predicted score for each residue.

        You can load this file as a csv file using python in the following way.

        import pandas as pd
        pd.read_csv('sample_1.txt', sep='\t', header=2)
        >      Resid Resname    Score
          0        1     SER  0.04631
          1        2     ASN  0.06995
          2        3     ALA  0.07282
          ..     ...     ...      ...
        
      • CASP format
      • We also support CASP format. See CASP format page for details.

      • PDB format
      • You can download a pdb with the predicted score for each residue which is set to bfactor.

        If you color the downloaded pdb with a molecular visualization software based on the bfactor, you can see the protein structure colored by the predicted score.


    2. 3D view of the model

      You can see the three-dimensional structure of the model structure, which is colored by the prediction score for each residue in this area.

      The structural model is colored in rainbow colors, with red areas representing low prediction scores and blue areas representing high prediction scores.

      You can rotate and scale the model structure and hover over it to find out the residue number.

      We use NGL viewer (AS Rose, et al., Bioinformatics, 2018. doi:10.1093/bioinformatics/bty419) to visualize proteins.


    3. Local Score

      This area shows the predicted scores for each residue as a bar chart.

      The coloring is the same as that of the structural model.

      You can check the detailed score for each residue by hovering the cursor over the graph.

      For model structures with more than 1500 residues, local scores cannot be displayed due to a problem with the library, so please download the results and check them.

Runtime

We show the approximate runtime for one structural model.

Our process is largely composed of two parts: preprocessing prediction.

Preprocessing is skipped in the case of a previously processed sequence.

The runtime required for each process is shown in the following figures.

Most of the runtime is taken for preprocessing, and the prediction itself takes about 20 seconds.

Therefore, for sequences that have already been preprocessed, the predictions can be performed quickly.

Privacy policy

  1. WHAT INFORMATION DO WE COLLECT?

    We do not collect any user information.

  2. WILL YOUR DATA BE SHARED WITH ANYONE?

    We will return the URL with the prediction results via email. Only people who know this URL can see the prediction results.The URL is designed to be unguessable. This URL is only available for 5 days.

  3. HOW LONG DO WE KEEP YOUR DATA?

    The input data and the prediction result data will be automatically deleted from the server in 5 days after the completion of the prediction.

  4. HOW CAN YOU CONTACT US ABOUT THIS NOTICE?

    If you have questions or comments about this notice, you may email us at p3cmqa@cb.cs.titech.ac.jp