Coordinate File Description (PDB Format)

General Information

The following describes the minimum coordinate specification in PDB format that is required by the RCSB validation and deposition software. The PDB record formats for coordinate data are reproduced here for your convenience; however, the validation check and deposition may only require the specification of a few of these records. In many cases, a file with the following format is all that is required.


Coordinate Record Descriptions

CRYST1 | ORIGXn | SCALEn | MTRIXn | TVECT | MODEL
ATOM | ANISOU | TER | HETATM | ENDMDL

Record:

CRYST1

Contains: unit cell parameters, space group, and Z value
Notes:
  • If the structure was not determined by crystallographic means, simply defines a unit cube
    (a = b =c = 1.0, alpha = beta = gamma = 90
    degrees, space group = P 1, and Z = 1)
  • The Hermann-Mauguin space group symbol is given without parenthesis,
    e.g., P 21 21 2 and using the full symbol, e.g., C 1 2 1 instead of C 2.
  • The screw axis is described as a two digit number.
  • For a rhombohedral space group in the hexagonal setting, the lattice type symbol used is H.
  • The Z value is the number of polymeric chains in a unit cell. In the case of heteropolymers,
    Z is the number of occurrences of the most populous chain.
  • In the case of a polycrystalline fiber diffraction study,
    CRYST1 and SCALE contain the normal unit cell data.
  • The unit cell parameters are used to calculate SCALE.
  • COLUMNS       DATA TYPE      CONTENTS
    --------------------------------------------------------------------------------
     1 -  6       Record name    "CRYST1"
    
     7 - 15       Real(9.3)      a (Angstroms)
    
    16 - 24       Real(9.3)      b (Angstroms)     
    
    25 - 33       Real(9.3)      c (Angstroms)     
    
    34 - 40       Real(7.2)      alpha (degrees)   
    
    41 - 47       Real(7.2)      beta (degrees)    
    
    48 - 54       Real(7.2)      gamma (degrees)   
    
    56 - 66       LString        Space group       
    
    67 - 70       Integer        Z value           
    
    
    Example:
    
             1         2         3         4         5         6         7 
    1234567890123456789012345678901234567890123456789012345678901234567890 
    CRYST1  117.000   15.000   39.000  90.00  90.00  90.00 P 21 21 21    8 
    

    Record:

    ORIGXn

    Contains: the transformation from the orthogonal coordinates contained
    in the database entry to the submitted coordinates
    Notes: If the original submitted coordinates are Xsub, Ysub, Zsub and the orthogonal Angstroms coordinates
    contained in the data entry are X, Y, Z, then:
    Xsub = O11X + O12Y + O13Z + T1
    Ysub = O21X + O22Y + O23Z + T2
    Zsub = O31X + O32Y + O33Z + T3
    COLUMNS       DATA TYPE       CONTENTS
    --------------------------------------------------------------------------------
     1 -  6       Record name     "ORIGXn" (n=1, 2, or 3)
    
    11 - 20       Real(10.6)      o[n][1]  
    
    21 - 30       Real(10.6)      o[n][2]  
    
    31 - 40       Real(10.6)      o[n][3]  
    
    46 - 55       Real(10.5)      t[n]     
    
    Example: 
    
             1         2         3         4         5         6         7
    1234567890123456789012345678901234567890123456789012345678901234567890
    ORIGX1      0.963457  0.136613  0.230424       16.61000               
    ORIGX2     -0.158977  0.983924  0.081383       13.72000               
    ORIGX3     -0.215598 -0.115048  0.969683       37.65000               
    

    Record:

    SCALEn

    Contains: the transformation from the orthogonal coordinates contained in the entry to fractional
    crystallographic coordinates
    Notes:
  • If the orthogonal Angstroms coordinates are X, Y, Z, and the fractional
    cell coordinates are xfrac, yfrac, zfrac, then:
    xfrac = S11X + S12Y + S13Z + U1
    yfrac = S21X + S22Y + S23Z + U2
    zfrac = S31X + S32Y + S33Z + U3
  • For NMR and fiber diffraction submissions, SCALE is given as an identity
    matrix with no translation.
  • COLUMNS       DATA TYPE      CONTENTS                  
    --------------------------------------------------------------------------------
     1 -  6       Record name    "SCALEn" (n=1, 2, or 3)
    
    11 - 20       Real(10.6)     s[n][1]                         
    
    21 - 30       Real(10.6)     s[n][2]                       
    
    31 - 40       Real(10.6)     s[n][3]                         
    
    46 - 55       Real(10.5)     u[n]                      
    
    Example:
    
             1         2         3         4         5         6         7
    1234567890123456789012345678901234567890123456789012345678901234567890
    SCALE1      0.019231  0.000000  0.000000        0.00000               
    SCALE2      0.000000  0.017065  0.000000        0.00000               
    SCALE3      0.000000  0.000000  0.016155        0.00000               
    
    

    Record:

    MTRIXn

    Contains: the transformations expressing non-crystallographic symmetry
    Notes: The MTRIX transformations operate on the coordinates in the entry to yield
    equivalent representations of the molecule in the same coordinate frame. One
    trio of MTRIX records with a constant serial number is given for each
    non-crystallographic symmetry operation defined.
    COLUMNS       DATA TYPE      CONTENTS                          
    --------------------------------------------------------------------------------
     1 -  6       Record name    "MTRIXn" (n=1, 2, or 3)
    
     8 - 10       Integer        Serial number             
    
    11 - 20       Real(10.6)     m[n][1]     
    
    21 - 30       Real(10.6)     m[n][2]     
    
    31 - 40       Real(10.6)     m[n][3]     
    
    46 - 55       Real(10.5)     v[n]        
    
    60            Integer        1 if coordinates for the related molecule are present;
    			     otherwise, blank.
    
    Example: 
    
             1         2         3         4         5         6         7
    1234567890123456789012345678901234567890123456789012345678901234567890
    MTRIX1   1 -1.000000  0.000000 -0.000000        0.00001    1          
    MTRIX2   1 -0.000000  1.000000  0.000000        0.00002    1          
    MTRIX3   1  0.000000 -0.000000 -1.000000        0.00002    1          
    

    Record:

    TVECT

    Contains: the translation vector which have infinite covalent connections
    Notes: For structures not comprised of discrete molecules (e.g., infinite
    polysaccharide chains), the entry contains a fragment which can be built into
    the full structure by the simple translation vectors of TVECT records.
    COLUMNS       DATA TYPE      CONTENTS
    --------------------------------------------------------------------------------
     1 -  6       Record name    "TVECT "                                    
    
     8 - 10       Integer        Serial number
    
    11 - 20       Real(10.5)     t[1]
    
    21 - 30       Real(10.5)     t[2]
                                     
    
    31 - 40       Real(10.5)     t[3]
                                     
    
    41 - 70       String         Text comment                        
    
    
    Example: 
    
             1         2         3         4         5         6         7
    1234567890123456789012345678901234567890123456789012345678901234567890
    TVECT    1   0.00000   0.00000  28.30000
    

    Record:

    MODEL

    Contains: the model serial number when a single coordinate entry contains multiple structures
    Notes:
  • Models are numbered sequentially beginning with 1.
  • If an entry contains more than 99,999 total atoms,
    then it must be divided among multiple models.
  • Each MODEL must have a corresponding ENDMDL record.
  • In the case of an NMR entry the EXPDTA record states the number of model structures
    that are present in the individual entry.
  • COLUMNS       DATA TYPE      CONTENTS                            
    ----------------------------------------------------------------------
     1 -  6       Record name    "MODEL "                                            
    
    11 - 14       Integer        Model serial number
    
    
    
    Example: 
    
             1         2         3         4         5         6         7         8
    12345678901234567890123456789012345678901234567890123456789012345678901234567890
    MODEL        1
    ATOM      1  N   ALA     1      11.104   6.134  -6.504  1.00  0.00           N
    ATOM      2  CA  ALA     1      11.639   6.071  -5.147  1.00  0.00           C
    ...
    ...
    ATOM    293 1HG  GLU    18     -14.861  -4.847   0.361  1.00  0.00           H
    ATOM    294 2HG  GLU    18     -13.518  -3.769   0.084  1.00  0.00           H
    TER     295      GLU    18                                           
    ENDMDL                                                              
    MODEL        2
    ATOM      1  N   ALA     1      11.304   6.234  -6.104  1.00  0.00           N
    ATOM      2  CA  ALA     1      11.239   6.371  -5.247  1.00  0.00           C
    ...
    ...
    ATOM    293 1HG  GLU    18     -14.752  -4.948   0.461  1.00  0.00           H
    ATOM    294 2HG  GLU    18     -13.630  -3.769   0.160  1.00  0.00           H
    TER     295      GLU    18                                           
    ENDMDL                                                              
    
    

    Record:

    ATOM

    Contains: the atomic coordinates for standard residues and the occupancy and temperature factor for each atom
    Notes:
  • ATOM records for proteins are listed from amino to carboxyl terminus.
  • Nucleic acid residues are listed from the 5' to the 3' terminus.
  • No ordering is specified for polysaccharides.
  • The list of ATOM records in a chain is terminated by a TER record.
  • If an atom is provided in more than one position, then a non-blank alternate
    location indicator must be used. Within a residue, all atoms of a given
    conformation are assigned the same alternate position indicator.
  • Additional atoms (modifying group) to side chains of standard residues are represented as a HET group
    which is assigned its own residue name. The chainID, sequence
    number, and insertion code assigned to the HET group is that of the
    standard residue to which it is attached.
  • In some entries, the occupancy and temperature factor fields may be used for other quantities.
  • The segment identifier is a string of up to four (4) alphanumeric characters, left-justified, and may
    include a space, e.g., CH86, A 1, NASE.
  • COLUMNS        DATA TYPE       CONTENTS                            
    --------------------------------------------------------------------------------
     1 -  6        Record name     "ATOM  "                                            
    
     7 - 11        Integer         Atom serial number.                   
    
    13 - 16        Atom            Atom name.                            
    
    17             Character       Alternate location indicator.         
    
    18 - 20        Residue name    Residue name.                         
    
    22             Character       Chain identifier.                     
    
    23 - 26        Integer         Residue sequence number.              
    
    27             AChar           Code for insertion of residues.       
    
    31 - 38        Real(8.3)       Orthogonal coordinates for X in Angstroms.                       
    
    39 - 46        Real(8.3)       Orthogonal coordinates for Y in Angstroms.                            
    
    47 - 54        Real(8.3)       Orthogonal coordinates for Z in Angstroms.                            
    
    55 - 60        Real(6.2)       Occupancy.                            
    
    61 - 66        Real(6.2)       Temperature factor (Default = 0.0).                   
    
    73 - 76        LString(4)      Segment identifier, left-justified.   
    
    77 - 78        LString(2)      Element symbol, right-justified.      
    
    79 - 80        LString(2)      Charge on the atom.       
    
    
    
    
    Example: 
    
             1         2         3         4         5         6         7         8
    12345678901234567890123456789012345678901234567890123456789012345678901234567890
    ATOM    145  N   VAL A  25      32.433  16.336  57.540  1.00 11.92      A1   N
    ATOM    146  CA  VAL A  25      31.132  16.439  58.160  1.00 11.85      A1   C
    ATOM    147  C   VAL A  25      30.447  15.105  58.363  1.00 12.34      A1   C
    ATOM    148  O   VAL A  25      29.520  15.059  59.174  1.00 15.65      A1   O
    ATOM    149  CB AVAL A  25      30.385  17.437  57.230  0.28 13.88      A1   C
    ATOM    150  CB BVAL A  25      30.166  17.399  57.373  0.72 15.41      A1   C
    ATOM    151  CG1AVAL A  25      28.870  17.401  57.336  0.28 12.64      A1   C
    ATOM    152  CG1BVAL A  25      30.805  18.788  57.449  0.72 15.11      A1   C
    ATOM    153  CG2AVAL A  25      30.835  18.826  57.661  0.28 13.58      A1   C
    ATOM    154  CG2BVAL A  25      29.909  16.996  55.922  0.72 13.25      A1   C
    

    Record:

    ANISOU

    Contains: the anisotropic temperature factors
    Notes:
  • Columns 7 - 27 and 73 - 80 are identical to the corresponding ATOM/HETATM record.
  • The anisotropic temperature factors (columns 29 - 70) are scaled by a factor of 10**4
    (Angstroms**2) and are presented as integers.
  • The anisotropic temperature factors are stored in the same coordinate frame as the
    atomic coordinate records.
  • ANISOU values are listed only if they have been provided by the depositor.
  • COLUMNS        DATA TYPE       CONTENTS                  
    ----------------------------------------------------------------------
     1 -  6        Record name     "ANISOU"                                  
    
     7 - 11        Integer          Atom serial number
    
    13 - 16        Atom             Atom name                  
    
    17             Character        Alternate location indicator                  
    
    18 - 20        Residue name     Residue name               
    
    22             Character        Chain identifier           
    
    23 - 26        Integer          Residue sequence number    
    
    27             AChar           Insertion code             
    
    29 - 35        Integer         u[1][1] 
    
    36 - 42        Integer         u[2][2] 
    
    43 - 49        Integer         u[3][3] 
    
    50 - 56        Integer         u[1][2] 
    
    57 - 63        Integer         u[1][3] 
    
    64 - 70        Integer         u[2][3] 
    
    73 - 76        LString(4)      Segment identifier, left-justified
    
    77 - 78        LString(2)      Element symbol, right-justified
    
    79 - 80        LString(2)      Charge on the atom       
    
    
    Example: 
    
             1         2         3         4         5         6         7         8
    12345678901234567890123456789012345678901234567890123456789012345678901234567890
    ATOM    107  N   GLY    13      12.681  37.302 -25.211 1.000 15.56           N
    ANISOU  107  N   GLY    13     2406   1892   1614    198    519   -328       N
    ATOM    108  CA  GLY    13      11.982  37.996 -26.241 1.000 16.92           C
    ANISOU  108  CA  GLY    13     2748   2004   1679    -21    155   -419       C
    ATOM    109  C   GLY    13      11.678  39.447 -26.008 1.000 15.73           C
    ANISOU  109  C   GLY    13     2555   1955   1468     87    357   -109       C
    ATOM    110  O   GLY    13      11.444  40.201 -26.971 1.000 20.93           O
    ANISOU  110  O   GLY    13     3837   2505   1611    164   -121    189       O
    ATOM    111  N   ASN    14      11.608  39.863 -24.755 1.000 13.68           N
    ANISOU  111  N   ASN    14     2059   1674   1462     27    244    -96       N
    

    Record:

    TER

    Contains: indicates the end of a list of ATOM/HETATM records for a chain
    Notes:
  • The TER records occur in the coordinate section of the entry, and indicate
    the last residue presented for each polypeptide and/or nucleic acid chain for
    which there are coordinates. For proteins, the residue defined on the TER
    record is the carboxy-terminal residue; for nucleic acids it is the
    3'-terminal residue.
  • For a cyclic molecule, the choice of termini is arbitrary.
  • Terminal oxygen atoms are presented as OXT for proteins, and as O5T or O3T for nucleic acids.
  • The TER record has the same residue name, chain identifier, sequence number
    and insertion code as the terminal residue. The serial number of the TER
    record is one number greater than the serial number of the ATOM/HETATM
    preceding the TER.
  • For chains with gaps due to disorder, it is recommended that the C-terminus
    atoms be labeled O and OXT.
  • The residue name appearing on the TER record must be the same as the residue name
    of the immediately preceding ATOM or non-water HETATM record.
  • COLUMNS         DATA TYPE         CONTENTS
    --------------------------------------------------------------------------------
     1 -  6         Record name       "TER   "                                 
    
     7 - 11         Integer           Serial number
    
    18 - 20         Residue name      Residue name               
    
    22              Character         Chain identifier           
    
    23 - 26         Integer           Residue sequence number    
    
    27              AChar             Insertion code     
    
    Example: 
    
             1         2         3         4         5         6         7         8
    12345678901234567890123456789012345678901234567890123456789012345678901234567890
    ATOM   4150  H   ALA A 431       8.674  16.036  12.858  1.00  0.00           H
    TER    4151      ALA A 431
    
    ATOM   1403  O   PRO P  22      12.701  33.564  15.827  1.09 18.03           O
    ATOM   1404  CB  PRO P  22      13.512  32.617  18.642  1.09  9.32           C
    ATOM   1405  CG  PRO P  22      12.828  33.382  19.740  1.09 12.23           C
    ATOM   1406  CD  PRO P  22      12.324  34.603  18.985  1.09 11.47           C
    HETATM 1407  CA  BLE P   1      14.625  32.240  14.151  1.09 16.76           C
    HETATM 1408  CB  BLE P   1      15.610  33.091  13.297  1.09 16.56           C
    HETATM 1409  CG  BLE P   1      15.558  34.629  13.373  1.09 14.27           C
    HETATM 1410  CD1 BLE P   1      16.601  35.208  12.440  1.09 14.75           C
    HETATM 1411  CD2 BLE P   1      14.209  35.160  12.930  1.09 15.60           C
    HETATM 1412  N   BLE P   1      14.777  32.703  15.531  1.09 14.79           N
    HETATM 1413  B   BLE P   1      14.921  30.655  14.194  1.09 15.56           B
    HETATM 1414  O1  BLE P   1      14.852  30.178  12.832  1.09 16.10           O
    HETATM 1415  O2  BLE P   1      13.775  30.147  14.862  1.09 20.95           O
    TER    1416      BLE P   1                                            
    
    

    Record:

    HETATM

    Contains: the atomic coordinate records for atoms within "non-standard"
    groups. These records are used for water molecules and atoms presented in HET
    groups.
    Notes:
  • Insertion codes, segment id, and element naming are fully described in the
    ATOM section of this document.
  • Disordered solvents may be represented by the residue name DIS.
  • No ordering is specified for polysaccharides.
  • HETATM records must have corresponding HET, HETNAM, FORMUL
    and CONECT records, except for waters.
  • COLUMNS        DATA TYPE       CONTENTS                         
    --------------------------------------------------------------------------------
     1 -  6        Record name     "HETATM"                                          
    
     7 - 11        Integer         Atom serial number.                
    
    13 - 16        Atom            Atom name                         
    
    17             Character       Alternate location indicator      
    
    18 - 20        Residue name    Residue name                      
    
    22             Character       Chain identifier                  
    
    23 - 26        Integer         Residue sequence number           
    
    27             AChar           Code for insertion of residues    
    
    31 - 38        Real(8.3)       Orthogonal coordinates for X      
    
    39 - 46        Real(8.3)       Orthogonal coordinates for Y      
    
    47 - 54        Real(8.3)       Orthogonal coordinates for Z      
    
    55 - 60        Real(6.2)       Occupancy                         
    
    61 - 66        Real(6.2)       Temperature factor                
    
    73 - 76        LString(4)      Segment identifier, left-justified                    
    
    77 - 78        LString(2)      Element symbol, right-justified  
    
    79 - 80        LString(2)      Charge on the atom    
    
    Example: 
    
             1         2         3         4         5         6         7         8
    12345678901234567890123456789012345678901234567890123456789012345678901234567890
    HETATM 1357 MG    MG   168       4.669  34.118  19.123  1.00  3.16          MG2+
    HETATM 3835 FE   HEM     1      17.140   3.115  15.066  1.00 14.14          FE3+
    
    

    Record:

    ENDMDL

    Contains: these records are paired with MODEL records to group individual structures found in a coordinate entry
    Notes:
  • MODEL/ENDMDL records are used only when more than one structure
    is presented in the entry, or if there are more than 99,999 atoms.
  • Every MODEL record has an associated ENDMDL record.
  • COLUMNS         DATA TYPE        CONTENTS
    ------------------------------------------------------------------
     1 -  6         Record name      "ENDMDL"                         
    
    Example: 
    
             1         2         3         4         5         6         7         8
    12345678901234567890123456789012345678901234567890123456789012345678901234567890
    ...
    ...
    ATOM  14550 1HG  GLU   122     -14.364  14.787 -14.258  1.00  0.00           H
    ATOM  14551 2HG  GLU   122     -13.794  13.738 -12.961  1.00  0.00           H
    TER   14552      GLU   122                                             
    ENDMDL                                                                 
    MODEL        9                                                         
    ATOM  14553  N   SER     1     -28.280   1.567  12.004  1.00  0.00           N
    ATOM  14554  CA  SER     1     -27.749   0.392  11.256  1.00  0.00           C
    ...
    ...
    ATOM  16369 1HG  GLU   122      -3.757  18.546  -8.439  1.00  0.00           H
    ATOM  16370 2HG  GLU   122      -3.066  17.166  -7.584  1.00  0.00           H
    TER   16371      GLU   122                                             
    ENDMDL                                                                 
    MODEL       10                                                         
    ATOM  16372  N   SER     1     -22.285   7.041  10.003  1.00  0.00           N
    ATOM  16373  CA  SER     1     -23.026   6.872   8.720  1.00  0.00           C
    ...
    ...
    ATOM  18188 1HG  GLU   122      -1.467  18.282 -17.144  1.00  0.00           H
    ATOM  18189 2HG  GLU   122      -2.711  18.067 -15.913  1.00  0.00           H
    TER   18190      GLU   122                                             
    ENDMDL