PDB data structures and I/O¶
PDB entry top-level class¶
pdb_entry
module¶
Top-level module for PDB structure entries.
The specifications used in this class are derived from the Protein Data Bank Contents Guide: Atomic Coordinate Entry Format Description, Version 3.3.
-
class
old_pdb.pdb_entry.
Entry
[source]¶ Top-level class for PDB structure entry.
-
annotate_link
(record) → old_pdb.secondary.Link[source]¶ Annotate LINK to indicate whether the named atoms are elements.
Creates two new Boolean attributes in record:
is_element1
andis_element2
.- Parameters
record (secondary.Link) – record to annotate
- Returns
annotated record
annotation.Author
AUTHOR record.
-
property
caveat
¶ annotation.Caveat
CAVEAT record.
-
check_master
()[source]¶ Check the contents against internal bookkeeping records.
- Raises
AssertionError – if checks fail
-
property
cis_peptide
¶ List of
secondary.CisPeptide
CISPEP records.
-
property
compound
¶ annotation.Compound
COMPND record.
-
property
connect
¶ List of
bookkeeping.Connection
CONECT records.
-
property
database_reference
¶ List of
primary.DatabaseReference
DBREF records.
-
property
disulfide_bond
¶ List of
secondary.DisulfideBond
SSBOND records.
-
property
experimental_data
¶ annotation.ExperimentalData
EXPDTA record.
-
find_atom_by_name
(chain_id, residue_id, atom_name, model_num=1) → old_pdb.coordinates.Atom[source]¶ Find a specific atom by name.
- Parameters
chain_id (str) – chain ID to find
residue_id (int) – residue ID to find
atom_name (str) – name of atom to find
model_num (int) – model number to use
- Returns
ATOM or HETATM object
-
find_residue
(chain_id, residue_id, model_num=1) → list[source]¶ Find a specific residue.
- Parameters
chain_id (str) – chain ID to find
residue_id (int) – residue ID to find
model_num (int) – model number to use
- Returns
list of
coordinates.Atom
-like objects
-
property
frac_transform
¶ List of
crystallography.FractionalTransform
SCALEn records.
-
property
header
¶ annotation.Header
HEADER record.
-
property
helix
¶ List of
secondary.Helix
HELIX records.
-
property
heterogen
¶ List of
heterogen.Heterogen
HET records.
-
property
heterogen_formula
¶ heterogen.Formula
FORMUL record.
-
property
heterogen_name
¶ List of
heterogen.HeterogenName
HETNAM records.
-
property
heterogen_synonym
¶ heterogen.HeterogenSynonym
HETSYN record.
-
property
journal
¶ annotation.Journal
JRNL record.
-
property
keyword
¶ annotation.Keywords
KEYWDS record.
-
property
link
¶ List of
secondary.Link
LINK records.
-
property
master
¶ bookkeeping.Master
MASTER record.
-
property
model
¶ List of
coordinates.Model
MODEL records.
-
property
model_type
¶ annotation.ModelType
MDLTYP record.
-
property
modified_residue
¶ List of
primary.ModifiedResidue
MODRES records.
-
property
noncrystal_transform
¶ List of
crystallography.NoncrystalTransform
MTRIXn records.
-
num_atoms
(heavy_only=True) → int[source]¶ Number of ATOM and HETATM entries in all chains in entry.
- Parameters
heavy_only (bool) – exclude hydrogen atoms from count
-
property
num_model
¶ annotation.NumModels
NUMMDL record.
-
num_residues
(count_hetatm=False) → int[source]¶ Number of residues in entry.
- Parameters
count_hetam (bool) – include heterogen residues in count
-
num_transforms
() → int[source]¶ Return the number of optional transform records in entry.
- Returns
number of ORGIXn + SCALEn + MTRIXn
-
property
obsolete
¶ annotation.Obsolete
OBSLTE record.
-
property
original_transform
¶ List of
crystallography.OriginalTransform
ORIGX records.
-
property
remark
¶ List of
annotation.Remark
REMARK records.
-
property
revision_data
¶ annotation.RevisionData
REVDAT record.
-
property
sequence_difference
¶ List of
primary.SequenceDifferences
SEQADV records.
-
property
sequence_residue
¶ List of
primary.SequenceResidues
SEQRES records.
-
property
setter
¶ annotation.Author
AUTHOR record.
-
property
sheet
¶ List of
secondary.Sheet
SHEET records.
-
property
source
¶ annotation.Source
SOURCE record.
-
property
split
¶ annotation.Split
SPLIT record.
-
property
supersedes
¶ annotation.Supersedes
SPRSDE record.
-
property
title
¶ annotation.Title
TITLE record.
-
property
unit_cell
¶ crystallography.UnitCell
CRYST1 record.
-
PDB records¶
annotation
module¶
Classes for PDB records that provide annotation information.
-
class
old_pdb.annotation.
Author
[source]¶ AUTHOR field
The AUTHOR record contains the names of the people responsible for the contents of the entry.
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“AUTHOR”
9-10
Continuation
continuation
Allows concatenation of multiple records.
11-79
List
author_list
List of the author names, separated by commas.
-
class
old_pdb.annotation.
Caveat
[source]¶ CAVEAT field
CAVEAT warns of severe errors in an entry. Use caution when using an entry containing this record.
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“CAVEAT”
9-10
Continuation
continuation
Allows concatenation of multiple records.
12-15
IDcode
id_code
PDB ID code of this entry.
20-79
String
comment
Free text giving the reason for the CAVEAT.
-
class
old_pdb.annotation.
Compound
[source]¶ COMPND field
The COMPND record describes the macromolecular contents of an entry. Each macromolecule found in the entry is described by a set of token: value pairs, and is referred to as a COMPND record component. Since the concept of a molecule is difficult to specify exactly, PDB staff may exercise editorial judgment in consultation with depositors in assigning these names.
For each macromolecular component, the molecule name, synonyms, number assigned by the Enzyme Commission (EC), and other relevant details are specified.
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“COMPND”
8-10
Continuation
continuation
Allows concatenation of multiple records.
11-80
Specification list
compound
Description of the molecular components.
-
class
old_pdb.annotation.
ExperimentalData
[source]¶ EXPDTA field
The EXPDTA record identifies the experimental technique used. This may refer to the type of radiation and sample, or include the spectroscopic or modeling technique. Permitted values include:
ELECTRON DIFFRACTION
FIBER DIFFRACTION
FLUORESCENCE TRANSFER
NEUTRON DIFFRACTION
NMR
THEORETICAL MODEL
X-RAY DIFFRACTION
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“EXPDTA”
9-10
Continuation
continuation
Allows concatenation of multiple records.
11-79
SList
technique
The experimental technique(s) with optional comment describing the sample or experiment.
-
class
old_pdb.annotation.
Header
[source]¶ HEADER field
The HEADER record uniquely identifies a PDB entry through the id_code field. This record also provides a classification for the entry. Finally, it contains the date the coordinates were deposited at the PDB.
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“HEADER”
11-50
String(40)
classification
Classifies the molecule(s).
51-59
Date
dep_date
Deposition date. This is the date the coordinates were received at the PDB.
63-66
IDcode
id_code
This identifier is unique within the PDB.
-
class
old_pdb.annotation.
Journal
[source]¶ JRNL field
The JRNL record contains the primary literature citation that describes the experiment which resulted in the deposited coordinate set. There is at most one JRNL reference per entry. If there is no primary reference, then there is no JRNL reference. Other references are given in REMARK 1.
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“JRNL”
13-79
LString
text
See details in PDB specification.
-
class
old_pdb.annotation.
Keywords
[source]¶ KEYWDS field
The KEYWDS record contains a set of terms relevant to the entry. Terms in the KEYWDS record provide a simple means of categorizing entries and may be used to generate index files. This record addresses some of the limitations found in the classification field of the HEADER record. It provides the opportunity to add further annotation to the entry in a concise and computer-searchable fashion.
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“KEYWDS”
9-10
Continuation
continuation
Allows concatenation of records if necessary.
11-79
List
keywords
Comma-separated list of keywords relevant to the entry.
-
class
old_pdb.annotation.
ModelType
[source]¶ MDLTYP field.
The MDLTYP record contains additional annotation pertinent to the coordinates presented in the entry.
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“MDLTYP”
9-10
Continuation
continuation
Allows concatenation of multiple records.
11-80
SList
comment
Free Text providing additional structural annotation.
-
class
old_pdb.annotation.
NumModels
[source]¶ NUMMDL field
The NUMMDL record indicates total number of models in a PDB entry.
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“NUMMDL”
11-14
Integer
model_number
Number of models.
-
class
old_pdb.annotation.
Obsolete
[source]¶ OBSLTE field
This record acts as a flag in an entry which has been withdrawn from the PDB’s full release. It indicates which, if any, new entries have replaced the withdrawn entry.
The format allows for the case of multiple new entries replacing one existing entry.
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“OBSLTE”
9-10
Continuation
continuation
Allows concatenation of multiple records
12-20
Date
replace_date
Date that this entry was replaced.
22-25
IDcode
id_code
ID code of this entry.
32-35
IDcode
replace_id_codes[0]
ID of entry replacing this one.
37-40
IDcode
replace_id_codes[1]
ID of entry replacing this one.
42-45
IDcode
replace_id_codes[2]
ID of entry replacing this one.
47-50
IDcode
replace_id_codes[3]
ID of entry replacing this one.
52-55
IDcode
replace_id_codes[4]
ID of entry replacing this one.
57-60
IDcode
replace_id_codes[5]
ID of entry replacing this one.
62-65
IDcode
replace_id_codes[6]
ID of entry replacing this one.
67-70
IDcode
replace_id_codes[7]
ID of entry replacing this one.
72-75
IDcode
replace_id_codes[8]
ID of entry replacing this one.
-
class
old_pdb.annotation.
Remark
[source]¶ REMARK field
REMARK records present experimental details, annotations, comments, and information not included in other records. In a number of cases, REMARKs are used to expand the contents of other record types. A new level of structure is being used for some REMARK records. This is expected to facilitate searching and will assist in the conversion to a relational database.
-
parse_line
(line)[source]¶ Initialize by parsing line.
COLUMNS
TYPE
FIELD
DEFINITION
8-10
int
remark_num
Remark number. It is not an error for remark n to exist in an entry when remark n-1 does not.
12-79
str
remark_text
Left as white space in first line of each new remark.
- Parameters
line (str) – line with PDB class
-
-
class
old_pdb.annotation.
Revision
[source]¶ Class to store contents of a single REVDAT modification.
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“REVDAT”
8-10
Integer
modification_num
Modification number.
11-12
Continuation
continuation
Allows concatenation of multiple records.
14-22
Date
modification_date
Date of modification (or for new entries) in DD-MMM-YY format. This is not repeated on continued lines.
24-27
IDCode
modification_id
ID code of this entry. This is not repeated on continuation lines.
32
Integer
modification_type
An integer identifying the type of modification. For all revisions, the modification type is listed as 1
40-45
LString(6)
record
Modification detail.
47-52
LString(6)
record
Modification detail.
54-59
LString(6)
record
Modification detail.
61-66
LString(6)
record
Modification detail.
-
class
old_pdb.annotation.
RevisionData
[source]¶ REVDAT field
REVDAT records contain a history of the modifications made to an entry since its release.
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“REVDAT”
8-10
Integer
modNum
Modification number.
11-12
Continuation
continuation
Allows concatenation of multiple records.
14-22
Date
modDate
Date of modification (or for new entries) in DD-MMM-YY format. This is not repeated on continued lines.
24-27
IDCode
modId
ID code of this entry. This is not repeated on continuation lines.
32
Integer
modType
An integer identifying the type of modification. For all revisions, the modification type is listed as 1
40-45
LString(6)
record
Modification detail.
47-52
LString(6)
record
Modification detail.
54-59
LString(6)
record
Modification detail.
61-66
LString(6)
record
Modification detail.
-
class
old_pdb.annotation.
Site
[source]¶ SITE class
The SITE records supply the identification of groups comprising important sites in the macromolecule.
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“SITE “
8-10
Integer
seq_num
Sequence number.
12-14
LString(3)
site_id
Site name.
16-17
Integer
num_res
Number of residues that compose the site.
19-21
Residue name
res_name1
Residue name for first residue that creates the site.
23
Character
chain_id1
Chain identifier for first residue of site.
24-27
Integer
seq1
Residue sequence number for first residue of the site.
28
AChar
ins_code1
Insertion code for first residue of the site.
30-32
Residue name
res_name2
Residue name for second residue that creates the site.
34
Character
chain_id2
Chain identifier for second residue of the site.
35-38
Integer
seq2
Residue sequence number for second residue of the site.
39
AChar
ins_code2
Insertion code for second residue of the site.
41-43
Residue name
res_name3
Residue name for third residue that creates the site.
45
Character
chain_id3
Chain identifier for third residue of the site.
46-49
Integer
seq3
Residue sequence number for third residue of the site.
50
AChar
ins_code3
Insertion code for third residue of the site.
52-54
Residue name
res_name4
Residue name for fourth residue that creates the site.
56
Character
chain_id4
Chain identifier for fourth residue of the site.
57-60
Integer
seq4
Residue sequence number for fourth residue of the site.
61
AChar
ins_code4
Insertion code for fourth residue of the site.
-
class
old_pdb.annotation.
Source
[source]¶ SOURCE field
The SOURCE record specifies the biological and/or chemical source of each biological molecule in the entry. Sources are described by both the common name and the scientific name, e.g., genus and species. Strain and/or cell-line for immortalized cells are given when they help to uniquely identify the biological entity studied.
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“SOURCE”
8-10
Continuation
continuation
Allows concatenation of multiple records.
11-79
Specification List
source
Identifies the source of the macromolecule in a token: value format.
-
class
old_pdb.annotation.
Split
[source]¶ SPLIT field
The SPLIT record is used in instances where a specific entry composes part of a large macromolecular complex. It will identify the PDB entries that are required to reconstitute a complete complex.
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“SPLIT “
9-10
Continuation
continuation
Allows concatenation of multiple records.
12-15
IDcode
id_codes[0]
ID code of related entry.
17-20
IDcode
id_codes[1]
ID code of related entry.
22-25
IDcode
id_codes[2]
ID code of related entry.
27-30
IDcode
id_codes[3]
ID code of related entry.
32-35
IDcode
id_codes[4]
ID code of related entry.
37-40
IDcode
id_codes[5]
ID code of related entry.
42-45
IDcode
id_codes[6]
ID code of related entry.
47-50
IDcode
id_codes[7]
ID code of related entry.
52-55
IDcode
id_codes[8]
ID code of related entry.
57-60
IDcode
id_codes[9]
ID code of related entry.
62-65
IDcode
id_codes[10]
ID code of related entry.
67-70
IDcode
id_codes[11]
ID code of related entry.
72-75
IDcode
id_codes[12]
ID code of related entry.
77-80
IDcode
id_codes[13]
ID code of related entry.
-
class
old_pdb.annotation.
Supersedes
[source]¶ SPRSDE field
The SPRSDE records contain a list of the ID codes of entries that were made obsolete by the given coordinate entry and withdrawn from the PDB release set. One entry may replace many. It is PDB policy that only the principal investigator of a structure has the authority to withdraw it.
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“SPRSDE”
9-10
Continuation
continuation
Allows for multiple ID codes.
12-20
Date
super_date
Date entry superseded the listed entries. This field is not copied on continuations.
22-25
IDcode
id_code
ID code of this entry. This field is not copied on continuations.
32-35
IDcode
super_id_codes
ID code of superseded entry.
37-40
IDcode
super_id_codes
ID code of superseded entry.
42-45
IDcode
super_id_codes
ID code of superseded entry.
47-50
IDcode
super_id_codes
ID code of superseded entry.
52-55
IDcode
super_id_codes
ID code of superseded entry.
57-60
IDcode
super_id_codes
ID code of superseded entry.
62-65
IDcode
super_id_codes
ID code of superseded entry.
67-70
IDcode
super_id_codes
ID code of superseded entry.
72-75
IDcode
super_id_codes
ID code of superseded entry.
-
class
old_pdb.annotation.
Title
[source]¶ TITLE field
The TITLE record contains a title for the experiment or analysis that is represented in the entry. It should identify an entry in the PDB in the same way that a title identifies a paper.
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“TITLE “
9-10
Continuation
continuation
Allows concatenation of multiple records.
11-80
String
title
Title of the experiment.
primary
module¶
Classes for PDB records that provide primary structure information.
-
class
old_pdb.primary.
DatabaseReference
[source]¶ DBREF record.
The DBREF record provides cross-reference links between PDB sequences (what appears in SEQRES record) and a corresponding database sequence.
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“DBREF “
8-11
IDcode
id_code
ID code of this entry.
13
Character
chain_id
Chain identifier.
15-18
Integer
seq_begin
Initial sequence number of PDB sequence segment.
19
AChar
ins_begin
Initial insertion code of PDB sequence segment.
21-24
Integer
seq_end
Ending sequence number of PDB sequence segment.
25
AChar
ins_end
Ending insertion code of PDB sequence segment.
27-32
LString
database
Sequence database name.
34-41
LString
database_accession
Sequence database accession code.
43-54
LString
database_id_code
Sequence database id code.
56-60
Integer
database_seq_begin
Initial sequence number of database seqment.
61
AChar
database_ins_begin
Insertion code of initial residue segment, if PDB is reference.
63-67
Integer
database_seq_end
Ending sequence number of segment.
68
AChar
database_ins_end
Insertion code of the the segment end, if PDB is reference.
-
class
old_pdb.primary.
DatabaseReference1
[source]¶ Provides cross-reference links between PDB sequences (what appears in SEQRES record) and a corresponding database sequence.
This updated two-line format is used when the accession code or sequence numbering does not fit the space allotted in the standard DBREF format. This includes some GenBank sequence numbering (greater than 5 characters) and UNIMES accession numbers (greater than 12 characters).
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“DBREF1”
8-11
IDcode
id_code
ID code of this entry.
13
Character
chain_id
Chain identifier.
15-18
Integer
seq_begin
Initial sequence number of the PDB sequence segment, right justified.
19
AChar
ins_begin
Initial insertion code of the PDB sequence segment.
21-24
Integer
seq_end
Ending sequence number of the PDB sequence segment, right justified.
25
AChar
ins_end
Ending insertion code of the PDB sequence segment.
27-32
LString
database
Sequence database name.
48-67
LString
db_id_code
Sequence database identification code, left justified.
-
class
old_pdb.primary.
DatabaseReference2
[source]¶ Provides cross-reference links between PDB sequences (what appears in SEQRES record) and a corresponding database sequence.
This updated two-line format is used when the accession code or sequence numbering does not fit the space allotted in the standard DBREF format. This includes some GenBank sequence numbering (greater than 5 characters) and UNIMES accession numbers (greater than 12 characters).
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“DBREF2”
8-11
IDcode
id_code
ID code of this entry.
13
Character
chain_id
Chain identifier.
19-40
LString
db_accession
Sequence database accession code left justified.
46-55
Integer
seq_begin
Initial sequence number of the Database segment, right justified.
58-67
Integer
seq_end
Ending sequence number of the Database segment, right justified.
-
class
old_pdb.primary.
ModifiedResidue
[source]¶ MODRES field
The MODRES record provides descriptions of modifications (e.g., chemical or post-translational) to protein and nucleic acid residues. Included are a mapping between residue names given in a PDB entry and standard residues.
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“MODRES”
8-11
IDcode
id_code
ID code of this entry.
13-15
Residue name
res_name
Residue name used in this entry
17
Character
chain_id
Chain identifier.
19-22
Integer
seq_num
Sequence number.
23
AChar
ins_code
Insertion code.
25-27
Residue name
standard_res
Standard residue name.
30-70
String
comment
Description of the residue modification.
-
class
old_pdb.primary.
SequenceDifferences
[source]¶ SEQADV field
The SEQADV record identifies conflicts between sequence information in the ATOM records of the PDB entry and the sequence database entry given on DBREF. Please note that these records were designed to identify differences and not errors. No assumption is made as to which database contains the correct data. PDB may include REMARK records in the entry that reflect the depositor’s view of which database has the correct sequence.
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“SEQADV”
8-11
IDcode
id_code
ID code of this entry.
13-15
Residue name
res_name
Name of the PDB residue in conflict.
17
Character
chain_id
PDB chain identifier.
19-22
Integer
seq_num
PDB sequence number.
23
AChar
ins_code
PDB insertion code.
25-28
LString
database
30-38
LString
db_id_code
Sequence database accession number.
40-42
Residue name
db_res
Sequence database residue name.
44-48
Integer
db_seq
Sequence database sequence number.
50-70
LString
conflict
Conflict comment.
-
class
old_pdb.primary.
SequenceResidues
[source]¶ SEQRES field
SEQRES records contain the amino acid or nucleic acid sequence of residues in each chain of the macromolecule that was studied.
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“SEQRES”
8-10
Integer
serNum
Serial number of the SEQRES record for the current chain. Starts at 1 and increments by one each line. Reset to 1 for each chain.
12
Character
chainID
Chain identifier. This may be any single legal character, including a blank which is is used if there is only one chain.
14-17
Integer
numRes
Number of residues in the chain. This value is repeated on every record.
20-22
Residue name
resName
Residue name.
24-26
Residue name
resName
Residue name.
28-30
Residue name
resName
Residue name.
32-34
Residue name
resName
Residue name.
36-38
Residue name
resName
Residue name.
40-42
Residue name
resName
Residue name.
44-46
Residue name
resName
Residue name.
48-50
Residue name
resName
Residue name.
52-54
Residue name
resName
Residue name.
56-58
Residue name
resName
Residue name.
60-62
Residue name
resName
Residue name.
64-66
Residue name
resName
Residue name.
68-70
Residue name
resName
Residue name.
-
property
residues
¶ Dictionary of residues indexed by chain id.
- Returns
dictionary with chain IDs as keys and lists of residue names as values.
-
property
heterogen
module¶
Classes for PDB records that provide heterogen information.
-
class
old_pdb.heterogen.
Formula
[source]¶ FORMUL field
The FORMUL record presents the chemical formula and charge of a non-standard group.
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“FORMUL”
9-10
Integer
compNum
Component number.
13-15
LString(3)
hetID
Het identifier.
17-18
Integer
continuation
Continuation number.
19
Character
asterisk
“*” for water.
20-70
String
text
Chemical formula.
-
property
components
¶ Formulae for components.
- Returns
dictionary with component numbers as keys and values that consist of tuples of the hetatom ID and the formula text.
-
property
-
class
old_pdb.heterogen.
Heterogen
[source]¶ HET field
HET records are used to describe non-standard residues, such as prosthetic groups, inhibitors, solvent molecules, and ions for which coordinates are supplied. Groups are considered HET if they are:
not one of the standard amino acids, and
not one of the nucleic acids (C, G, A, T, U, and I), and
not one of the modified versions of nucleic acids (+C, +G, +A, +T, +U, and +I), and
not an unknown amino acid or nucleic acid where UNK is used to indicate the unknown residue name.
Het records also describe heterogens for which the chemical identity is unknown, in which case the group is assigned the hetatm_id UNK.
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“HET “
8-10
LString(3)
het_id
Identifier, right-justified.
13
Character
chain_id
Chain identifier.
14-17
Integer
seq_num
Sequence number.
18
AChar
ins_code
Insertion code.
21-25
Integer
num_het_atoms
Number of HETATM records for the group present in the entry.
31-70
String
text
Text describing Het group.
-
class
old_pdb.heterogen.
HeterogenName
[source]¶ HETNAM field
This record gives the chemical name of the compound with the given hetatm_id.
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“HETNAM”
9-10
Continuation
continuation
Allows concatenation of multiple records.
12-14
LString(3)
het_id
Het identifier, right- justified.
16-70
String
text
Chemical name.
-
class
old_pdb.heterogen.
HeterogenSynonym
[source]¶ HETSYN field
This record provides synonyms, if any, for the compound in the corresponding (i.e., same hetatm_id) HETNAM record. This is to allow greater flexibility in searching for HET groups.
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“HETSYN”
9-10
Continuation
continuation
Allows concatenation of multiple records.
12-14
LString(3)
het_id
Het identifier, right- justified.
16-70
SList
synonyms
List of synonyms.
secondary
module¶
Classes for records with secondary structure and connectivity information.
-
class
old_pdb.secondary.
CisPeptide
[source]¶ CISPEP field
CISPEP records specify the prolines and other peptides found to be in the cis conformation. This record replaces the use of footnote records to list cis peptides.
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“CISPEP”
8-10
Integer
ser_num
Record serial number.
12-14
LString(3)
pep1
Residue name.
16
Character
chain_id1
Chain identifier.
18-21
Integer
seq_num1
Residue sequence number.
22
AChar
icode1
Insertion code.
26-28
LString(3)
pep2
Residue name.
30
Character
chain_id2
Chain identifier.
32-35
Integer
seq_num2
Residue sequence number.
36
AChar
icode2
Insertion code.
44-46
Integer
mod_num
Identifies the specific model.
54-59
Real(6.2)
measure
Angle measurement in degrees.
-
class
old_pdb.secondary.
DisulfideBond
[source]¶ SSBOND field
The SSBOND record identifies each disulfide bond in protein and polypeptide structures by identifying the two residues involved in the bond.
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“SSBOND”
8-10
Integer
ser_num
Serial number.
12-14
LString(3)
“CYS”
Residue name.
16
Character
chain_id1
Chain identifier.
18-21
Integer
seq_num1
Residue sequence number.
22
AChar
icode1
Insertion code.
26-28
LString(3)
“CYS”
Residue name.
30
Character
chain_id2
Chain identifier.
32-35
Integer
seq_num2
Residue sequence number.
36
AChar
icode2
Insertion code.
60-65
SymOP
sym1
Symmetry operator for residue 1.
67-72
SymOP
sym2
Symmetry operator for residue 2.
74-78
Real(5.2)
length
Disulfide bond distance
-
class
old_pdb.secondary.
Helix
[source]¶ HELIX field
HELIX records are used to identify the position of helices in the molecule. Helices are both named and numbered. The residues where the helix begins and ends are noted, as well as the total length.
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“HELIX “
8-10
Integer
serNum
Serial number of the helix. starts at 1 and increases incrementally.
12-14
LString(3)
helix_id
Helix identifier. In addition to a serial number, each helix is given an alphanumeric helix identifier.
16-18
Residue name
init_res_name
Name of the initial residue.
20
Character
init_chain_id
Chain identifier for the chain containing this helix.
22-25
Integer
init_seq_num
Sequence number of the initial residue.
26
AChar
init_i_code
Insertion code of the initial residue.
28-30
Residue name
end_res_name
Name of the terminal residue of the helix.
32
Character
end_chain_id
Chain identifier for the chain containing this helix.
34-37
Integer
end_seq_num
Sequence number of the terminal residue.
38
AChar
end_i_code
Insertion code of the terminal residue.
39-40
Integer
helix_class
Helix class (see below).
41-70
String
comment
Comment about this helix.
72-76
Integer
length
Length of this helix.
-
class
old_pdb.secondary.
Link
[source]¶ LINK field
The LINK records specify connectivity between residues that is not implied by the primary structure. Connectivity is expressed in terms of the atom names. This record supplements information given in CONECT records and is provided here for convenience in searching.
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“LINK “
13-16
Atom
name1
Atom name.
17
Character
alt_loc1
Alternate location indicator.
18-20
Residue name
res_name1
Residue name.
22
Character
chain_id
Chain identifier.
23-26
Integer
res_seq1
Residue sequence number.
27
AChar
ins_code1
Insertion code.
43-46
Atom
name2
Atom name.
47
Character
alt_loc2
Alternate location indicator.
48-50
Residue name
res_name2
Residue name.
52
Character
chain_id
Chain identifier.
53-56
Integer
res_seq2
Residue sequence number.
57
AChar
ins_code2
Insertion code.
60-65
SymOP
sym1
Symmetry operator atom 1.
67-72
SymOP
sym2
Symmetry operator atom 2.
74-78
Real(5.2)
Length
Link distance
-
class
old_pdb.secondary.
Sheet
[source]¶ SHEET field
SHEET records are used to identify the position of sheets in the molecule. Sheets are both named and numbered. The residues where the sheet begins and ends are noted.
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“SHEET “
8-10
Integer
strand
Strand number which starts at 1 for each strand within a sheet and increases by one.
12-14
LString(3)
sheet_id
Sheet identifier.
15-16
Integer
num_strands
Number of strands in sheet.
18-20
Residue name
init_res_name
Name of initial residue.
22
Character
init_chain_id
Chain identifier of initial residue in strand.
23-26
Integer
init_seq_num
Sequence number of initial residue in strand.
27
AChar
init_ins_code
Insertion code of initial residue in strand.
29-31
Residue name
end_res_name
Name of terminal residue
33
Character
end_chain_id
Chain identifier of terminal residue
34-37
Integer
end_seq_num
Sequence number of terminal residue.
38
AChar
end_ins_code
Insertion code of terminal residue.
39-40
Integer
sense
Sense of strand with respect to previous strand in the sheet. 0 if first strand, 1 if parallel, and -1 if anti-parallel.
42-45
Atom
cur_atom
Registration. Atom name in current strand.
46-48
Residue name
cur_res_name
Registration. Residue name in current strand
50
Character
cur_chain_id
Registration. Chain identifier in current strand.
51-54
Integer
cur_res_seq
Registration. Residue sequence number in current strand.
55
AChar
cur_ins_code
Registration. Insertion code in current strand.
57-60
Atom
prev_atom
Registration. Atom name in previous strand.
61-63
Residue name
prev_res_name
Registration. Residue name in previous strand.
65
Character
prev_chain_id
Registration. Chain identifier in previous strand.
66-69
Integer
prev_res_seq
Registration. Residue sequence number in previous strand.
70
AChar
prev_ins_code
Registration. Insertion code in previous strand.
crystallography
module¶
Classes for records with crystallographic information.
-
class
old_pdb.crystallography.
FractionalTransform
(n)[source]¶ SCALEn baseclass
The SCALEn (n = 1, 2, or 3) records present the transformation from the orthogonal coordinates as contained in the entry to fractional crystallographic coordinates. Non-standard coordinate systems should be explained in the remarks.
COLUMNS
DATA TYPE
FIELD
DEFINITION
1 - 6
Record name
“SCALEn” n=1, 2, or 3
11 - 20
Real(10.6)
sn1
Sn1
21 - 30
Real(10.6)
sn2
Sn2
31 - 40
Real(10.6)
sn3
Sn3
46 - 55
Real(10.5)
unif
Un
-
class
old_pdb.crystallography.
NoncrystalTransform
(n)[source]¶ MTRIXn baseclass
The MTRIXn (n = 1, 2, or 3) records present transformations expressing non-crystallographic symmetry.
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“MTRIXn”
n=1, 2, or 3
8-10
Integer
serial
Serial number.
11-20
Real(10.6)
mn1
Mn1
21-30
Real(10.6)
mn2
Mn2
31-40
Real(10.6)
mn3
Mn3
46-55
Real(10.5)
vn
Vn
60
Integer
i_given
1 if coordinates for the representations which are approximately related by the transformations of the molecule are contained in the entry. Otherwise, blank.
-
class
old_pdb.crystallography.
OriginalTransform
(n)[source]¶ ORIGXn class
The ORIGXn (n = 1, 2, or 3) records present the transformation from the orthogonal coordinates contained in the entry to the submitted coordinates.
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“ORIGXn”
n=1, 2, or 3
11-20
Real(10.6)
on1
On1
21-30
Real(10.6)
on2
On2
31-40
Real(10.6)
on3
On3
46-55
Real(10.5)
tn
Tn
-
class
old_pdb.crystallography.
UnitCell
[source]¶ CRYST1 class
The CRYST1 record presents the unit cell parameters, space group, and Z value. If the structure was not determined by crystallographic means, CRYST1 simply defines a unit cube.
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“CRYST1”
7-15
Real(9.3)
a
a (Angstroms).
16-24
Real(9.3)
b
b (Angstroms).
25-33
Real(9.3)
c
c (Angstroms).
34-40
Real(7.2)
alpha
alpha (degrees).
41-47
Real(7.2)
beta
beta (degrees).
48-54
Real(7.2)
gamma
gamma (degrees).
56-66
LString
sGroup
Space group.
67-70
Integer
z
Z value.
coordinates
module¶
Classes for records with coordinate information.
-
class
old_pdb.coordinates.
Atom
[source]¶ ATOM class
The ATOM records present the atomic coordinates for standard residues. They also present the occupancy and temperature factor for each atom. Heterogen coordinates use the HETATM record type. The element symbol is always present on each ATOM record; segment identifier and charge are optional.
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“ATOM “
7-11
Integer
serial
Atom serial number.
13-16
Atom
name
Atom name.
17
Character
alt_loc
Alternate location indicator.
18-20
Residue name
res_name
Residue name.
22
Character
chain_id
Chain identifier.
23-26
Integer
res_seq
Residue sequence number.
27
AChar
ins_code
Code for insertion of residues.
31-38
Real(8.3)
x
Orthogonal coordinates for X in Angstroms.
39-46
Real(8.3)
y
Orthogonal coordinates for Y in Angstroms.
47-54
Real(8.3)
z
Orthogonal coordinates for Z in Angstroms.
55-60
Real(6.2)
occupancy
Occupancy.
61-66
Real(6.2)
temp_factor
Temperature factor.
77-78
LString(2)
element
Element symbol, right-justified.
79-80
LString(2)
charge
Charge on the atom.
-
class
old_pdb.coordinates.
ChainTerminus
[source]¶ TER class
The TER record indicates the end of a list of ATOM/HETATM records for a chain.
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“TER “
7-11
Integer
serial
Serial number.
18-20
Residue name
res_name
Residue name.
22
Character
chain_id
Chain identifier.
23-26
Integer
res_seq
Residue sequence number.
27
AChar
ins_code
Insertion code.
-
class
old_pdb.coordinates.
HeterogenAtom
[source]¶ HETATM class
The HETATM records present the atomic coordinate records for atoms within “non-standard” groups. These records are used for water molecules and atoms presented in HET groups.
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“HETATM”
7-11
Integer
serial
Atom serial number.
13-16
Atom
name
Atom name.
17
Character
alt_loc
Alternate location indicator.
18-20
Residue name
res_name
Residue name.
22
Character
chain_id
Chain identifier.
23-26
Integer
res_seq
Residue sequence number.
27
AChar
ins_code
Code for insertion of residues.
31-38
Real(8.3)
x
Orthogonal coordinates for X.
39-46
Real(8.3)
y
Orthogonal coordinates for Y.
47-54
Real(8.3)
z
Orthogonal coordinates for Z.
55-60
Real(6.2)
occupancy
Occupancy.
61-66
Real(6.2)
temp_factor
Temperature factor.
77-78
LString(2)
element
Element symbol; right-justified.
79-80
LString(2)
charge
Charge on the atom.
-
class
old_pdb.coordinates.
Model
[source]¶ MODEL class.
The MODEL record specifies the model serial number when multiple structures are presented in a single coordinate entry, as is often the case with structures determined by NMR.
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“MODEL “
11-14
Integer
serial
Model serial number.
-
num_atoms
(heavy_only) → int[source]¶ Number of ATOM and HETATM entries in all chains in model.
- Parameters
heavy_only (bool) – exclude hydrogen atoms from count
-
-
class
old_pdb.coordinates.
TemperatureFactor
[source]¶ ANISOU class
The ANISOU records present the anisotropic temperature factors.
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“ANISOU”
7-11
Integer
serial
Atom serial number.
13-16
Atom
name
Atom name.
17
Character
alt_loc
Alternate location indicator
18-20
Residue name
res_name
Residue name.
22
Character
chain_id
Chain identifier.
23-26
Integer
res_seq
Residue sequence number.
27
AChar
ins_code
Insertion code.
29-35
Integer
u00
U(1,1)
36-42
Integer
u11
U(2,2)
43-49
Integer
u22
U(3,3)
50-56
Integer
u01
U(1,2)
57-63
Integer
u02
U(1,3)
64-70
Integer
u12
U(2,3)
77-78
LString(2)
element
Element symbol, right-justified.
79-80
LString(2)
charge
Charge on the atom.
bookkeeping
module¶
Classes for records with connectivity and bookkeeping information.
-
class
old_pdb.bookkeeping.
Connection
[source]¶ CONECT class
The CONECT records specify connectivity between atoms for which coordinates are supplied. The connectivity is described using the atom serial number as found in the entry. CONECT records are mandatory for HET groups (excluding water) and for other bonds not specified in the standard residue connectivity table which involve atoms in standard residues (see Appendix 4 for the list of standard residues). These records are generated by the PDB.
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“CONECT”
7-11
Integer
serial
Atom serial number
12-16
Integer
serial
Serial number of bonded atom
17-21
Integer
serial
Serial number of bonded atom
22-26
Integer
serial
Serial number of bonded atom
27-31
Integer
serial
Serial number of bonded atom
-
class
old_pdb.bookkeeping.
Master
[source]¶ MASTER class
The MASTER record is a control record for bookkeeping. It lists the number of lines in the coordinate entry or file for selected record types.
COLUMNS
DATA TYPE
FIELD
DEFINITION
1-6
Record name
“MASTER”
11-15
Integer
num_remark
Number of REMARK records
16-20
Integer
“0”
21-25
Integer
num_het
Number of HET records
26-30
Integer
num_helix
Number of HELIX records
31-35
Integer
num_sheet
Number of SHEET records
36-40
Integer
num_turn
deprecated
41-45
Integer
num_site
Number of SITE records
46-50
Integer
num_xform
Number of coordinate transform records (ORIGX+SCALE+MTRIX)
51-55
Integer
num_coord
Number of atomic coordinate records (ATOM+HETATM)
56-60
Integer
num_ter
Number of TER records
61-65
Integer
num_conect
Number of CONECT records
66-70
Integer
num_seq
Number of SEQRES records