PDB data structures and I/O¶

PDB entry top-level class¶

`pdb_entry` module¶

Top-level module for PDB structure entries.

The specifications used in this class are derived from the Protein Data Bank Contents Guide: Atomic Coordinate Entry Format Description, Version 3.3.

class old_pdb.pdb_entry.Entry[source]¶

Top-level class for PDB structure entry.

annotate_link(record) → old_pdb.secondary.Link [source]¶

Annotate LINK to indicate whether the named atoms are elements.

Creates two new Boolean attributes in record: is_element1 and is_element2.

Parameters: record (secondary.Link) – record to annotate
Returns: annotated record

property author¶: annotation.Author AUTHOR record.

property caveat¶: annotation.Caveat CAVEAT record.

check_master()[source]¶

Check the contents against internal bookkeeping records.

Raises: AssertionError – if checks fail

property cis_peptide¶: List of secondary.CisPeptide CISPEP records.

property compound¶: annotation.Compound COMPND record.

property connect¶: List of bookkeeping.Connection CONECT records.

property database_reference¶: List of primary.DatabaseReference DBREF records.

property disulfide_bond¶: List of secondary.DisulfideBond SSBOND records.

property experimental_data¶: annotation.ExperimentalData EXPDTA record.

find_atom_by_name(chain_id, residue_id, atom_name, model_num=1) → old_pdb.coordinates.Atom [source]¶

Find a specific atom by name.

Parameters

chain_id (str) – chain ID to find
residue_id (int) – residue ID to find
atom_name (str) – name of atom to find
model_num (int) – model number to use

Returns

ATOM or HETATM object

find_residue(chain_id, residue_id, model_num=1) → list[source]¶

Find a specific residue.

Parameters

chain_id (str) – chain ID to find
residue_id (int) – residue ID to find
model_num (int) – model number to use

Returns

list of coordinates.Atom-like objects

property frac_transform¶: List of crystallography.FractionalTransform SCALEn records.

property header¶: annotation.Header HEADER record.

property helix¶: List of secondary.Helix HELIX records.

property heterogen¶: List of heterogen.Heterogen HET records.

property heterogen_formula¶: heterogen.Formula FORMUL record.

property heterogen_name¶: List of heterogen.HeterogenName HETNAM records.

property heterogen_synonym¶: heterogen.HeterogenSynonym HETSYN record.

property journal¶: annotation.Journal JRNL record.

property keyword¶: annotation.Keywords KEYWDS record.

property link¶: List of secondary.Link LINK records.

property master¶: bookkeeping.Master MASTER record.

property model¶: List of coordinates.Model MODEL records.

property model_type¶: annotation.ModelType MDLTYP record.

property modified_residue¶: List of primary.ModifiedResidue MODRES records.

property noncrystal_transform¶: List of crystallography.NoncrystalTransform MTRIXn records.

num_atoms(heavy_only=True) → int[source]¶

Number of ATOM and HETATM entries in all chains in entry.

Parameters: heavy_only (bool) – exclude hydrogen atoms from count

num_chains() → int[source]¶: Number of chains in entry.

property num_model¶: annotation.NumModels NUMMDL record.

num_residues(count_hetatm=False) → int[source]¶

Number of residues in entry.

Parameters: count_hetam (bool) – include heterogen residues in count

num_ter() → int[source]¶: Number of TER records in entry.

num_transforms() → int[source]¶

Return the number of optional transform records in entry.

Returns: number of ORGIXn + SCALEn + MTRIXn

property obsolete¶: annotation.Obsolete OBSLTE record.

property original_transform¶: List of crystallography.OriginalTransform ORIGX records.

parse_file(file_)[source]¶

Parse a PDB file.

Parameters: file (file) – file open for reading.

parse_line(line)[source]¶

Parse a line of a PDB file.

Parameters: line (str) – line of PDB file

property remark¶: List of annotation.Remark REMARK records.

property revision_data¶: annotation.RevisionData REVDAT record.

property sequence_difference¶: List of primary.SequenceDifferences SEQADV records.

property sequence_residue¶: List of primary.SequenceResidues SEQRES records.

property setter¶: annotation.Author AUTHOR record.

property sheet¶: List of secondary.Sheet SHEET records.

property source¶: annotation.Source SOURCE record.

property split¶: annotation.Split SPLIT record.

property supersedes¶: annotation.Supersedes SPRSDE record.

property title¶: annotation.Title TITLE record.

property unit_cell¶: crystallography.UnitCell CRYST1 record.

PDB records¶

`annotation` module¶

Classes for PDB records that provide annotation information.

class old_pdb.annotation.Author[source]¶

AUTHOR field

The AUTHOR record contains the names of the people responsible for the contents of the entry.

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“AUTHOR”
9-10	Continuation	continuation	Allows concatenation of multiple records.
11-79	List	author_list	List of the author names, separated by commas.

parse_line(line)[source]¶

Parse PDB-format line.

Parameters: line (str) – line to parse

class old_pdb.annotation.Caveat[source]¶

CAVEAT field

CAVEAT warns of severe errors in an entry. Use caution when using an entry containing this record.

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“CAVEAT”
9-10	Continuation	continuation	Allows concatenation of multiple records.
12-15	IDcode	id_code	PDB ID code of this entry.
20-79	String	comment	Free text giving the reason for the CAVEAT.

parse_line(line)[source]¶

Parse PDB-format line.

Parameters: line (str) – line to parse

class old_pdb.annotation.Compound[source]¶

COMPND field

The COMPND record describes the macromolecular contents of an entry. Each macromolecule found in the entry is described by a set of token: value pairs, and is referred to as a COMPND record component. Since the concept of a molecule is difficult to specify exactly, PDB staff may exercise editorial judgment in consultation with depositors in assigning these names.

For each macromolecular component, the molecule name, synonyms, number assigned by the Enzyme Commission (EC), and other relevant details are specified.

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“COMPND”
8-10	Continuation	continuation	Allows concatenation of multiple records.
11-80	Specification list	compound	Description of the molecular components.

parse_line(line)[source]¶

Parse PDB-format line.

Parameters: line (str) – line to parse

class old_pdb.annotation.ExperimentalData[source]¶

EXPDTA field

The EXPDTA record identifies the experimental technique used. This may refer to the type of radiation and sample, or include the spectroscopic or modeling technique. Permitted values include:

ELECTRON DIFFRACTION
FIBER DIFFRACTION
FLUORESCENCE TRANSFER
NEUTRON DIFFRACTION
NMR
THEORETICAL MODEL
X-RAY DIFFRACTION

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“EXPDTA”
9-10	Continuation	continuation	Allows concatenation of multiple records.
11-79	SList	technique	The experimental technique(s) with optional comment describing the sample or experiment.

parse_line(line)[source]¶

Parse PDB-format line.

Parameters: line (str) – line to parse

class old_pdb.annotation.Header[source]¶

HEADER field

The HEADER record uniquely identifies a PDB entry through the id_code field. This record also provides a classification for the entry. Finally, it contains the date the coordinates were deposited at the PDB.

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“HEADER”
11-50	String(40)	classification	Classifies the molecule(s).
51-59	Date	dep_date	Deposition date. This is the date the coordinates were received at the PDB.
63-66	IDcode	id_code	This identifier is unique within the PDB.

parse_line(line)[source]¶

Parse PDB-format line.

Parameters: line (str) – line to parse

class old_pdb.annotation.Journal[source]¶

JRNL field

The JRNL record contains the primary literature citation that describes the experiment which resulted in the deposited coordinate set. There is at most one JRNL reference per entry. If there is no primary reference, then there is no JRNL reference. Other references are given in REMARK 1.

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“JRNL”
13-79	LString	text	See details in PDB specification.

parse_line(line)[source]¶

Parse PDB-format line.

Parameters: line (str) – line to parse

class old_pdb.annotation.Keywords[source]¶

KEYWDS field

The KEYWDS record contains a set of terms relevant to the entry. Terms in the KEYWDS record provide a simple means of categorizing entries and may be used to generate index files. This record addresses some of the limitations found in the classification field of the HEADER record. It provides the opportunity to add further annotation to the entry in a concise and computer-searchable fashion.

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“KEYWDS”
9-10	Continuation	continuation	Allows concatenation of records if necessary.
11-79	List	keywords	Comma-separated list of keywords relevant to the entry.

parse_line(line)[source]¶

Parse PDB-format line.

Parameters: line (str) – line to parse

class old_pdb.annotation.ModelType[source]¶

MDLTYP field.

The MDLTYP record contains additional annotation pertinent to the coordinates presented in the entry.

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“MDLTYP”
9-10	Continuation	continuation	Allows concatenation of multiple records.
11-80	SList	comment	Free Text providing additional structural annotation.

parse_line(line)[source]¶

Parse PDB-format line.

Parameters: line (str) – line to parse

class old_pdb.annotation.NumModels[source]¶

NUMMDL field

The NUMMDL record indicates total number of models in a PDB entry.

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“NUMMDL”
11-14	Integer	model_number	Number of models.

parse_line(line)[source]¶

Parse PDB-format line.

Parameters: line (str) – line to parse

class old_pdb.annotation.Obsolete[source]¶

OBSLTE field

This record acts as a flag in an entry which has been withdrawn from the PDB’s full release. It indicates which, if any, new entries have replaced the withdrawn entry.

The format allows for the case of multiple new entries replacing one existing entry.

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“OBSLTE”
9-10	Continuation	continuation	Allows concatenation of multiple records
12-20	Date	replace_date	Date that this entry was replaced.
22-25	IDcode	id_code	ID code of this entry.
32-35	IDcode	replace_id_codes[0]	ID of entry replacing this one.
37-40	IDcode	replace_id_codes[1]	ID of entry replacing this one.
42-45	IDcode	replace_id_codes[2]	ID of entry replacing this one.
47-50	IDcode	replace_id_codes[3]	ID of entry replacing this one.
52-55	IDcode	replace_id_codes[4]	ID of entry replacing this one.
57-60	IDcode	replace_id_codes[5]	ID of entry replacing this one.
62-65	IDcode	replace_id_codes[6]	ID of entry replacing this one.
67-70	IDcode	replace_id_codes[7]	ID of entry replacing this one.
72-75	IDcode	replace_id_codes[8]	ID of entry replacing this one.

parse_line(line)[source]¶

Parse PDB-format line.

Parameters: line (str) – line to parse

class old_pdb.annotation.Remark[source]¶

REMARK field

REMARK records present experimental details, annotations, comments, and information not included in other records. In a number of cases, REMARKs are used to expand the contents of other record types. A new level of structure is being used for some REMARK records. This is expected to facilitate searching and will assist in the conversion to a relational database.

parse_line(line)[source]¶

Initialize by parsing line.

COLUMNS	TYPE	FIELD	DEFINITION
8-10	int	remark_num	Remark number. It is not an error for remark n to exist in an entry when remark n-1 does not.
12-79	str	remark_text	Left as white space in first line of each new remark.

Parameters: line (str) – line with PDB class

class old_pdb.annotation.Revision[source]¶

Class to store contents of a single REVDAT modification.

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“REVDAT”
8-10	Integer	modification_num	Modification number.
11-12	Continuation	continuation	Allows concatenation of multiple records.
14-22	Date	modification_date	Date of modification (or for new entries) in DD-MMM-YY format. This is not repeated on continued lines.
24-27	IDCode	modification_id	ID code of this entry. This is not repeated on continuation lines.
32	Integer	modification_type	An integer identifying the type of modification. For all revisions, the modification type is listed as 1
40-45	LString(6)	record	Modification detail.
47-52	LString(6)	record	Modification detail.
54-59	LString(6)	record	Modification detail.
61-66	LString(6)	record	Modification detail.

parse_line(line)[source]¶

Parse PDB-format line for specific revision.

Parameters: line (str) – line to parse.

class old_pdb.annotation.RevisionData[source]¶

REVDAT field

REVDAT records contain a history of the modifications made to an entry since its release.

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“REVDAT”
8-10	Integer	modNum	Modification number.
11-12	Continuation	continuation	Allows concatenation of multiple records.
14-22	Date	modDate	Date of modification (or for new entries) in DD-MMM-YY format. This is not repeated on continued lines.
24-27	IDCode	modId	ID code of this entry. This is not repeated on continuation lines.
32	Integer	modType	An integer identifying the type of modification. For all revisions, the modification type is listed as 1
40-45	LString(6)	record	Modification detail.
47-52	LString(6)	record	Modification detail.
54-59	LString(6)	record	Modification detail.
61-66	LString(6)	record	Modification detail.

parse_line(line)[source]¶

Parse PDB-format line.

Parameters: line (str) – line to parse

property revisions¶

Get revisions.

Returns: dictionary with modifiction numbers as keys and Revision objects as values

class old_pdb.annotation.Site[source]¶

SITE class

The SITE records supply the identification of groups comprising important sites in the macromolecule.

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“SITE “
8-10	Integer	seq_num	Sequence number.
12-14	LString(3)	site_id	Site name.
16-17	Integer	num_res	Number of residues that compose the site.
19-21	Residue name	res_name1	Residue name for first residue that creates the site.
23	Character	chain_id1	Chain identifier for first residue of site.
24-27	Integer	seq1	Residue sequence number for first residue of the site.
28	AChar	ins_code1	Insertion code for first residue of the site.
30-32	Residue name	res_name2	Residue name for second residue that creates the site.
34	Character	chain_id2	Chain identifier for second residue of the site.
35-38	Integer	seq2	Residue sequence number for second residue of the site.
39	AChar	ins_code2	Insertion code for second residue of the site.
41-43	Residue name	res_name3	Residue name for third residue that creates the site.
45	Character	chain_id3	Chain identifier for third residue of the site.
46-49	Integer	seq3	Residue sequence number for third residue of the site.
50	AChar	ins_code3	Insertion code for third residue of the site.
52-54	Residue name	res_name4	Residue name for fourth residue that creates the site.
56	Character	chain_id4	Chain identifier for fourth residue of the site.
57-60	Integer	seq4	Residue sequence number for fourth residue of the site.
61	AChar	ins_code4	Insertion code for fourth residue of the site.

parse_line(line)[source]¶

Parse PDB-format line.

Parameters: line (str) – line to parse

class old_pdb.annotation.Source[source]¶

SOURCE field

The SOURCE record specifies the biological and/or chemical source of each biological molecule in the entry. Sources are described by both the common name and the scientific name, e.g., genus and species. Strain and/or cell-line for immortalized cells are given when they help to uniquely identify the biological entity studied.

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“SOURCE”
8-10	Continuation	continuation	Allows concatenation of multiple records.
11-79	Specification List	source	Identifies the source of the macromolecule in a token: value format.

parse_line(line)[source]¶

Parse a PDB-format line.

Parameters: line (str) – line to parse

class old_pdb.annotation.Split[source]¶

SPLIT field

The SPLIT record is used in instances where a specific entry composes part of a large macromolecular complex. It will identify the PDB entries that are required to reconstitute a complete complex.

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“SPLIT “
9-10	Continuation	continuation	Allows concatenation of multiple records.
12-15	IDcode	id_codes[0]	ID code of related entry.
17-20	IDcode	id_codes[1]	ID code of related entry.
22-25	IDcode	id_codes[2]	ID code of related entry.
27-30	IDcode	id_codes[3]	ID code of related entry.
32-35	IDcode	id_codes[4]	ID code of related entry.
37-40	IDcode	id_codes[5]	ID code of related entry.
42-45	IDcode	id_codes[6]	ID code of related entry.
47-50	IDcode	id_codes[7]	ID code of related entry.
52-55	IDcode	id_codes[8]	ID code of related entry.
57-60	IDcode	id_codes[9]	ID code of related entry.
62-65	IDcode	id_codes[10]	ID code of related entry.
67-70	IDcode	id_codes[11]	ID code of related entry.
72-75	IDcode	id_codes[12]	ID code of related entry.
77-80	IDcode	id_codes[13]	ID code of related entry.

parse_line(line)[source]¶

Parse input line.

Parameters: line (str) – PDB-format line to parse

class old_pdb.annotation.Supersedes[source]¶

SPRSDE field

The SPRSDE records contain a list of the ID codes of entries that were made obsolete by the given coordinate entry and withdrawn from the PDB release set. One entry may replace many. It is PDB policy that only the principal investigator of a structure has the authority to withdraw it.

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“SPRSDE”
9-10	Continuation	continuation	Allows for multiple ID codes.
12-20	Date	super_date	Date entry superseded the listed entries. This field is not copied on continuations.
22-25	IDcode	id_code	ID code of this entry. This field is not copied on continuations.
32-35	IDcode	super_id_codes	ID code of superseded entry.
37-40	IDcode	super_id_codes	ID code of superseded entry.
42-45	IDcode	super_id_codes	ID code of superseded entry.
47-50	IDcode	super_id_codes	ID code of superseded entry.
52-55	IDcode	super_id_codes	ID code of superseded entry.
57-60	IDcode	super_id_codes	ID code of superseded entry.
62-65	IDcode	super_id_codes	ID code of superseded entry.
67-70	IDcode	super_id_codes	ID code of superseded entry.
72-75	IDcode	super_id_codes	ID code of superseded entry.

parse_line(line)[source]¶

Parse PDB-format line.

Parameters: line (str) – line to parse

class old_pdb.annotation.Title[source]¶

TITLE field

The TITLE record contains a title for the experiment or analysis that is represented in the entry. It should identify an entry in the PDB in the same way that a title identifies a paper.

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“TITLE “
9-10	Continuation	continuation	Allows concatenation of multiple records.
11-80	String	title	Title of the experiment.

parse_line(line)[source]¶

Parse PDB-format line.

Parameters: line (str) – line to parse

`primary` module¶

Classes for PDB records that provide primary structure information.

class old_pdb.primary.DatabaseReference[source]¶

DBREF record.

The DBREF record provides cross-reference links between PDB sequences (what appears in SEQRES record) and a corresponding database sequence.

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“DBREF “
8-11	IDcode	id_code	ID code of this entry.
13	Character	chain_id	Chain identifier.
15-18	Integer	seq_begin	Initial sequence number of PDB sequence segment.
19	AChar	ins_begin	Initial insertion code of PDB sequence segment.
21-24	Integer	seq_end	Ending sequence number of PDB sequence segment.
25	AChar	ins_end	Ending insertion code of PDB sequence segment.
27-32	LString	database	Sequence database name.
34-41	LString	database_accession	Sequence database accession code.
43-54	LString	database_id_code	Sequence database id code.
56-60	Integer	database_seq_begin	Initial sequence number of database seqment.
61	AChar	database_ins_begin	Insertion code of initial residue segment, if PDB is reference.
63-67	Integer	database_seq_end	Ending sequence number of segment.
68	AChar	database_ins_end	Insertion code of the the segment end, if PDB is reference.

parse_line(line)[source]¶

Parse DBREF line.

Parameters: line (str) – line to parse

class old_pdb.primary.DatabaseReference1[source]¶

Provides cross-reference links between PDB sequences (what appears in SEQRES record) and a corresponding database sequence.

This updated two-line format is used when the accession code or sequence numbering does not fit the space allotted in the standard DBREF format. This includes some GenBank sequence numbering (greater than 5 characters) and UNIMES accession numbers (greater than 12 characters).

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“DBREF1”
8-11	IDcode	id_code	ID code of this entry.
13	Character	chain_id	Chain identifier.
15-18	Integer	seq_begin	Initial sequence number of the PDB sequence segment, right justified.
19	AChar	ins_begin	Initial insertion code of the PDB sequence segment.
21-24	Integer	seq_end	Ending sequence number of the PDB sequence segment, right justified.
25	AChar	ins_end	Ending insertion code of the PDB sequence segment.
27-32	LString	database	Sequence database name.
48-67	LString	db_id_code	Sequence database identification code, left justified.

parse_line(line)[source]¶

Parse PDB-format line.

Parameters: line (str) – line with PDB class

class old_pdb.primary.DatabaseReference2[source]¶

Provides cross-reference links between PDB sequences (what appears in SEQRES record) and a corresponding database sequence.

This updated two-line format is used when the accession code or sequence numbering does not fit the space allotted in the standard DBREF format. This includes some GenBank sequence numbering (greater than 5 characters) and UNIMES accession numbers (greater than 12 characters).

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“DBREF2”
8-11	IDcode	id_code	ID code of this entry.
13	Character	chain_id	Chain identifier.
19-40	LString	db_accession	Sequence database accession code left justified.
46-55	Integer	seq_begin	Initial sequence number of the Database segment, right justified.
58-67	Integer	seq_end	Ending sequence number of the Database segment, right justified.

parse_line(line)[source]¶

Parse PDB-format line.

Parameters: line (str) – line to parse

class old_pdb.primary.ModifiedResidue[source]¶

MODRES field

The MODRES record provides descriptions of modifications (e.g., chemical or post-translational) to protein and nucleic acid residues. Included are a mapping between residue names given in a PDB entry and standard residues.

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“MODRES”
8-11	IDcode	id_code	ID code of this entry.
13-15	Residue name	res_name	Residue name used in this entry
17	Character	chain_id	Chain identifier.
19-22	Integer	seq_num	Sequence number.
23	AChar	ins_code	Insertion code.
25-27	Residue name	standard_res	Standard residue name.
30-70	String	comment	Description of the residue modification.

parse_line(line)[source]¶

Parse PDB-format line.

Parameters: line (str) – line to parse

class old_pdb.primary.SequenceDifferences[source]¶

SEQADV field

The SEQADV record identifies conflicts between sequence information in the ATOM records of the PDB entry and the sequence database entry given on DBREF. Please note that these records were designed to identify differences and not errors. No assumption is made as to which database contains the correct data. PDB may include REMARK records in the entry that reflect the depositor’s view of which database has the correct sequence.

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“SEQADV”
8-11	IDcode	id_code	ID code of this entry.
13-15	Residue name	res_name	Name of the PDB residue in conflict.
17	Character	chain_id	PDB chain identifier.
19-22	Integer	seq_num	PDB sequence number.
23	AChar	ins_code	PDB insertion code.
25-28	LString	database
30-38	LString	db_id_code	Sequence database accession number.
40-42	Residue name	db_res	Sequence database residue name.
44-48	Integer	db_seq	Sequence database sequence number.
50-70	LString	conflict	Conflict comment.

parse_line(line)[source]¶

Parse PDB-format line.

Parameters: line (str) – line to parse

class old_pdb.primary.SequenceResidues[source]¶

SEQRES field

SEQRES records contain the amino acid or nucleic acid sequence of residues in each chain of the macromolecule that was studied.

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“SEQRES”
8-10	Integer	serNum	Serial number of the SEQRES record for the current chain. Starts at 1 and increments by one each line. Reset to 1 for each chain.
12	Character	chainID	Chain identifier. This may be any single legal character, including a blank which is is used if there is only one chain.
14-17	Integer	numRes	Number of residues in the chain. This value is repeated on every record.
20-22	Residue name	resName	Residue name.
24-26	Residue name	resName	Residue name.
28-30	Residue name	resName	Residue name.
32-34	Residue name	resName	Residue name.
36-38	Residue name	resName	Residue name.
40-42	Residue name	resName	Residue name.
44-46	Residue name	resName	Residue name.
48-50	Residue name	resName	Residue name.
52-54	Residue name	resName	Residue name.
56-58	Residue name	resName	Residue name.
60-62	Residue name	resName	Residue name.
64-66	Residue name	resName	Residue name.
68-70	Residue name	resName	Residue name.

num_chains() → int[source]¶: Number of chains in sequence.

parse_line(line)[source]¶

Parse PDB-format line.

Parameters: line (str) – line to parse

property residues¶

Dictionary of residues indexed by chain id.

Returns: dictionary with chain IDs as keys and lists of residue names as values.

`heterogen` module¶

Classes for PDB records that provide heterogen information.

class old_pdb.heterogen.Formula[source]¶

FORMUL field

The FORMUL record presents the chemical formula and charge of a non-standard group.

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“FORMUL”
9-10	Integer	compNum	Component number.
13-15	LString(3)	hetID	Het identifier.
17-18	Integer	continuation	Continuation number.
19	Character	asterisk	“*” for water.
20-70	String	text	Chemical formula.

property components¶

Formulae for components.

Returns: dictionary with component numbers as keys and values that consist of tuples of the hetatom ID and the formula text.

parse_line(line)[source]¶

Parse PDB-format line.

Parameters: line (str) – line to parse

class old_pdb.heterogen.Heterogen[source]¶

HET field

HET records are used to describe non-standard residues, such as prosthetic groups, inhibitors, solvent molecules, and ions for which coordinates are supplied. Groups are considered HET if they are:

not one of the standard amino acids, and
not one of the nucleic acids (C, G, A, T, U, and I), and
not one of the modified versions of nucleic acids (+C, +G, +A, +T, +U, and +I), and
not an unknown amino acid or nucleic acid where UNK is used to indicate the unknown residue name.

Het records also describe heterogens for which the chemical identity is unknown, in which case the group is assigned the hetatm_id UNK.

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“HET “
8-10	LString(3)	het_id	Identifier, right-justified.
13	Character	chain_id	Chain identifier.
14-17	Integer	seq_num	Sequence number.
18	AChar	ins_code	Insertion code.
21-25	Integer	num_het_atoms	Number of HETATM records for the group present in the entry.
31-70	String	text	Text describing Het group.

parse_line(line)[source]¶

Parse PDB-format line.

Parameters: line (str) – line to parse

class old_pdb.heterogen.HeterogenName[source]¶

HETNAM field

This record gives the chemical name of the compound with the given hetatm_id.

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“HETNAM”
9-10	Continuation	continuation	Allows concatenation of multiple records.
12-14	LString(3)	het_id	Het identifier, right- justified.
16-70	String	text	Chemical name.

parse_line(line)[source]¶

Parse PDB-format line.

Parameters: line (str) – line to parse

class old_pdb.heterogen.HeterogenSynonym[source]¶

HETSYN field

This record provides synonyms, if any, for the compound in the corresponding (i.e., same hetatm_id) HETNAM record. This is to allow greater flexibility in searching for HET groups.

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“HETSYN”
9-10	Continuation	continuation	Allows concatenation of multiple records.
12-14	LString(3)	het_id	Het identifier, right- justified.
16-70	SList	synonyms	List of synonyms.

parse_line(line)[source]¶

Parse line of PDB file.

Parameters: line (str) – PDB file line to parse

`secondary` module¶

Classes for records with secondary structure and connectivity information.

class old_pdb.secondary.CisPeptide[source]¶

CISPEP field

CISPEP records specify the prolines and other peptides found to be in the cis conformation. This record replaces the use of footnote records to list cis peptides.

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“CISPEP”
8-10	Integer	ser_num	Record serial number.
12-14	LString(3)	pep1	Residue name.
16	Character	chain_id1	Chain identifier.
18-21	Integer	seq_num1	Residue sequence number.
22	AChar	icode1	Insertion code.
26-28	LString(3)	pep2	Residue name.
30	Character	chain_id2	Chain identifier.
32-35	Integer	seq_num2	Residue sequence number.
36	AChar	icode2	Insertion code.
44-46	Integer	mod_num	Identifies the specific model.
54-59	Real(6.2)	measure	Angle measurement in degrees.

parse_line(line)[source]¶

Parse PDB-format line.

Parameters: line (str) – line to parse

class old_pdb.secondary.DisulfideBond[source]¶

SSBOND field

The SSBOND record identifies each disulfide bond in protein and polypeptide structures by identifying the two residues involved in the bond.

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“SSBOND”
8-10	Integer	ser_num	Serial number.
12-14	LString(3)	“CYS”	Residue name.
16	Character	chain_id1	Chain identifier.
18-21	Integer	seq_num1	Residue sequence number.
22	AChar	icode1	Insertion code.
26-28	LString(3)	“CYS”	Residue name.
30	Character	chain_id2	Chain identifier.
32-35	Integer	seq_num2	Residue sequence number.
36	AChar	icode2	Insertion code.
60-65	SymOP	sym1	Symmetry operator for residue 1.
67-72	SymOP	sym2	Symmetry operator for residue 2.
74-78	Real(5.2)	length	Disulfide bond distance

parse_line(line)[source]¶

Parse PDB-format line.

Parameters: line (str) – line to parse

class old_pdb.secondary.Helix[source]¶

HELIX field

HELIX records are used to identify the position of helices in the molecule. Helices are both named and numbered. The residues where the helix begins and ends are noted, as well as the total length.

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“HELIX “
8-10	Integer	serNum	Serial number of the helix. starts at 1 and increases incrementally.
12-14	LString(3)	helix_id	Helix identifier. In addition to a serial number, each helix is given an alphanumeric helix identifier.
16-18	Residue name	init_res_name	Name of the initial residue.
20	Character	init_chain_id	Chain identifier for the chain containing this helix.
22-25	Integer	init_seq_num	Sequence number of the initial residue.
26	AChar	init_i_code	Insertion code of the initial residue.
28-30	Residue name	end_res_name	Name of the terminal residue of the helix.
32	Character	end_chain_id	Chain identifier for the chain containing this helix.
34-37	Integer	end_seq_num	Sequence number of the terminal residue.
38	AChar	end_i_code	Insertion code of the terminal residue.
39-40	Integer	helix_class	Helix class (see below).
41-70	String	comment	Comment about this helix.
72-76	Integer	length	Length of this helix.

parse_line(line)[source]¶

Parse PDB-format line.

Parameters: line (str) – line to parse

class old_pdb.secondary.Link[source]¶

LINK field

The LINK records specify connectivity between residues that is not implied by the primary structure. Connectivity is expressed in terms of the atom names. This record supplements information given in CONECT records and is provided here for convenience in searching.

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“LINK “
13-16	Atom	name1	Atom name.
17	Character	alt_loc1	Alternate location indicator.
18-20	Residue name	res_name1	Residue name.
22	Character	chain_id	Chain identifier.
23-26	Integer	res_seq1	Residue sequence number.
27	AChar	ins_code1	Insertion code.
43-46	Atom	name2	Atom name.
47	Character	alt_loc2	Alternate location indicator.
48-50	Residue name	res_name2	Residue name.
52	Character	chain_id	Chain identifier.
53-56	Integer	res_seq2	Residue sequence number.
57	AChar	ins_code2	Insertion code.
60-65	SymOP	sym1	Symmetry operator atom 1.
67-72	SymOP	sym2	Symmetry operator atom 2.
74-78	Real(5.2)	Length	Link distance

parse_line(line)[source]¶

Parse PDB-format line.

Parameters: line (str) – line to parse

class old_pdb.secondary.Sheet[source]¶

SHEET field

SHEET records are used to identify the position of sheets in the molecule. Sheets are both named and numbered. The residues where the sheet begins and ends are noted.

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“SHEET “
8-10	Integer	strand	Strand number which starts at 1 for each strand within a sheet and increases by one.
12-14	LString(3)	sheet_id	Sheet identifier.
15-16	Integer	num_strands	Number of strands in sheet.
18-20	Residue name	init_res_name	Name of initial residue.
22	Character	init_chain_id	Chain identifier of initial residue in strand.
23-26	Integer	init_seq_num	Sequence number of initial residue in strand.
27	AChar	init_ins_code	Insertion code of initial residue in strand.
29-31	Residue name	end_res_name	Name of terminal residue
33	Character	end_chain_id	Chain identifier of terminal residue
34-37	Integer	end_seq_num	Sequence number of terminal residue.
38	AChar	end_ins_code	Insertion code of terminal residue.
39-40	Integer	sense	Sense of strand with respect to previous strand in the sheet. 0 if first strand, 1 if parallel, and -1 if anti-parallel.
42-45	Atom	cur_atom	Registration. Atom name in current strand.
46-48	Residue name	cur_res_name	Registration. Residue name in current strand
50	Character	cur_chain_id	Registration. Chain identifier in current strand.
51-54	Integer	cur_res_seq	Registration. Residue sequence number in current strand.
55	AChar	cur_ins_code	Registration. Insertion code in current strand.
57-60	Atom	prev_atom	Registration. Atom name in previous strand.
61-63	Residue name	prev_res_name	Registration. Residue name in previous strand.
65	Character	prev_chain_id	Registration. Chain identifier in previous strand.
66-69	Integer	prev_res_seq	Registration. Residue sequence number in previous strand.
70	AChar	prev_ins_code	Registration. Insertion code in previous strand.

parse_line(line)[source]¶

Parse PDB-format line.

Parameters: line (str) – line to parse

`crystallography` module¶

Classes for records with crystallographic information.

class old_pdb.crystallography.FractionalTransform(n)[source]¶

SCALEn baseclass

The SCALEn (n = 1, 2, or 3) records present the transformation from the orthogonal coordinates as contained in the entry to fractional crystallographic coordinates. Non-standard coordinate systems should be explained in the remarks.

COLUMNS	DATA TYPE	FIELD	DEFINITION
1 - 6	Record name	“SCALEn” n=1, 2, or 3
11 - 20	Real(10.6)	sn1	Sn1
21 - 30	Real(10.6)	sn2	Sn2
31 - 40	Real(10.6)	sn3	Sn3
46 - 55	Real(10.5)	unif	Un

parse_line(line)[source]¶

Parse PDB-format line.

Parameters: line (str) – line to parse

class old_pdb.crystallography.NoncrystalTransform(n)[source]¶

MTRIXn baseclass

The MTRIXn (n = 1, 2, or 3) records present transformations expressing non-crystallographic symmetry.

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“MTRIXn”	n=1, 2, or 3
8-10	Integer	serial	Serial number.
11-20	Real(10.6)	mn1	Mn1
21-30	Real(10.6)	mn2	Mn2
31-40	Real(10.6)	mn3	Mn3
46-55	Real(10.5)	vn	Vn
60	Integer	i_given	1 if coordinates for the representations which are approximately related by the transformations of the molecule are contained in the entry. Otherwise, blank.

parse_line(line)[source]¶

Parse PDB-format line.

Parameters: line (str) – line to parse

class old_pdb.crystallography.OriginalTransform(n)[source]¶

ORIGXn class

The ORIGXn (n = 1, 2, or 3) records present the transformation from the orthogonal coordinates contained in the entry to the submitted coordinates.

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“ORIGXn”	n=1, 2, or 3
11-20	Real(10.6)	on1	On1
21-30	Real(10.6)	on2	On2
31-40	Real(10.6)	on3	On3
46-55	Real(10.5)	tn	Tn

parse_line(line)[source]¶

Parse PDB-format line.

Parameters: line (str) – line to parse

class old_pdb.crystallography.UnitCell[source]¶

CRYST1 class

The CRYST1 record presents the unit cell parameters, space group, and Z value. If the structure was not determined by crystallographic means, CRYST1 simply defines a unit cube.

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“CRYST1”
7-15	Real(9.3)	a	a (Angstroms).
16-24	Real(9.3)	b	b (Angstroms).
25-33	Real(9.3)	c	c (Angstroms).
34-40	Real(7.2)	alpha	alpha (degrees).
41-47	Real(7.2)	beta	beta (degrees).
48-54	Real(7.2)	gamma	gamma (degrees).
56-66	LString	sGroup	Space group.
67-70	Integer	z	Z value.

parse_line(line)[source]¶

Parse PDB-format line.

Parameters: line (str) – line to parse

`coordinates` module¶

Classes for records with coordinate information.

class old_pdb.coordinates.Atom[source]¶

ATOM class

The ATOM records present the atomic coordinates for standard residues. They also present the occupancy and temperature factor for each atom. Heterogen coordinates use the HETATM record type. The element symbol is always present on each ATOM record; segment identifier and charge are optional.

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“ATOM “
7-11	Integer	serial	Atom serial number.
13-16	Atom	name	Atom name.
17	Character	alt_loc	Alternate location indicator.
18-20	Residue name	res_name	Residue name.
22	Character	chain_id	Chain identifier.
23-26	Integer	res_seq	Residue sequence number.
27	AChar	ins_code	Code for insertion of residues.
31-38	Real(8.3)	x	Orthogonal coordinates for X in Angstroms.
39-46	Real(8.3)	y	Orthogonal coordinates for Y in Angstroms.
47-54	Real(8.3)	z	Orthogonal coordinates for Z in Angstroms.
55-60	Real(6.2)	occupancy	Occupancy.
61-66	Real(6.2)	temp_factor	Temperature factor.
77-78	LString(2)	element	Element symbol, right-justified.
79-80	LString(2)	charge	Charge on the atom.

parse_line(line)[source]¶

Parse PDB-format line.

Parameters: line (str) – line to parse

class old_pdb.coordinates.ChainTerminus[source]¶

TER class

The TER record indicates the end of a list of ATOM/HETATM records for a chain.

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“TER “
7-11	Integer	serial	Serial number.
18-20	Residue name	res_name	Residue name.
22	Character	chain_id	Chain identifier.
23-26	Integer	res_seq	Residue sequence number.
27	AChar	ins_code	Insertion code.

parse_line(line)[source]¶

Parse PDB-format line.

Parameters: line (str) – line to parse

class old_pdb.coordinates.HeterogenAtom[source]¶

HETATM class

The HETATM records present the atomic coordinate records for atoms within “non-standard” groups. These records are used for water molecules and atoms presented in HET groups.

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“HETATM”
7-11	Integer	serial	Atom serial number.
13-16	Atom	name	Atom name.
17	Character	alt_loc	Alternate location indicator.
18-20	Residue name	res_name	Residue name.
22	Character	chain_id	Chain identifier.
23-26	Integer	res_seq	Residue sequence number.
27	AChar	ins_code	Code for insertion of residues.
31-38	Real(8.3)	x	Orthogonal coordinates for X.
39-46	Real(8.3)	y	Orthogonal coordinates for Y.
47-54	Real(8.3)	z	Orthogonal coordinates for Z.
55-60	Real(6.2)	occupancy	Occupancy.
61-66	Real(6.2)	temp_factor	Temperature factor.
77-78	LString(2)	element	Element symbol; right-justified.
79-80	LString(2)	charge	Charge on the atom.

parse_line(line)[source]¶

Parse PDB-format line.

Parameters: line (str) – line to parse

class old_pdb.coordinates.Model[source]¶

MODEL class.

The MODEL record specifies the model serial number when multiple structures are presented in a single coordinate entry, as is often the case with structures determined by NMR.

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“MODEL “
11-14	Integer	serial	Model serial number.

property all_atoms¶

Get all atoms in model.

Returns: list of Atom-like objects

property atoms¶

Get ATOM atoms in model.

Returns: list of Atom-like objects

property het_atoms¶

Get HETATM atoms in model.

Returns: list of Atom-like objects

num_atoms(heavy_only) → int[source]¶

Number of ATOM and HETATM entries in all chains in model.

Parameters: heavy_only (bool) – exclude hydrogen atoms from count

num_chains() → int[source]¶: Count number of chains in model.

num_residues(count_hetatm) → int[source]¶

Number of residues in entry.

Parameters: count_hetatm (bool) – include heterogen residues in count

num_ter() → int[source]¶: Count number of termini in entry.

parse_line(line)[source]¶

Parse PDB-format line.

Parameters: line (str) – line to parse

class old_pdb.coordinates.TemperatureFactor[source]¶

ANISOU class

The ANISOU records present the anisotropic temperature factors.

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“ANISOU”
7-11	Integer	serial	Atom serial number.
13-16	Atom	name	Atom name.
17	Character	alt_loc	Alternate location indicator
18-20	Residue name	res_name	Residue name.
22	Character	chain_id	Chain identifier.
23-26	Integer	res_seq	Residue sequence number.
27	AChar	ins_code	Insertion code.
29-35	Integer	u00	U(1,1)
36-42	Integer	u11	U(2,2)
43-49	Integer	u22	U(3,3)
50-56	Integer	u01	U(1,2)
57-63	Integer	u02	U(1,3)
64-70	Integer	u12	U(2,3)
77-78	LString(2)	element	Element symbol, right-justified.
79-80	LString(2)	charge	Charge on the atom.

parse_line(line)[source]¶

Parse PDB-format line.

Parameters: line (str) – line to parse

`bookkeeping` module¶

Classes for records with connectivity and bookkeeping information.

class old_pdb.bookkeeping.Connection[source]¶

CONECT class

The CONECT records specify connectivity between atoms for which coordinates are supplied. The connectivity is described using the atom serial number as found in the entry. CONECT records are mandatory for HET groups (excluding water) and for other bonds not specified in the standard residue connectivity table which involve atoms in standard residues (see Appendix 4 for the list of standard residues). These records are generated by the PDB.

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“CONECT”
7-11	Integer	serial	Atom serial number
12-16	Integer	serial	Serial number of bonded atom
17-21	Integer	serial	Serial number of bonded atom
22-26	Integer	serial	Serial number of bonded atom
27-31	Integer	serial	Serial number of bonded atom

parse_line(line)[source]¶

Parse PDB-format line.

Parameters: line (str) – line to parse

class old_pdb.bookkeeping.Master[source]¶

MASTER class

The MASTER record is a control record for bookkeeping. It lists the number of lines in the coordinate entry or file for selected record types.

COLUMNS	DATA TYPE	FIELD	DEFINITION
1-6	Record name	“MASTER”
11-15	Integer	num_remark	Number of REMARK records
16-20	Integer	“0”
21-25	Integer	num_het	Number of HET records
26-30	Integer	num_helix	Number of HELIX records
31-35	Integer	num_sheet	Number of SHEET records
36-40	Integer	num_turn	deprecated
41-45	Integer	num_site	Number of SITE records
46-50	Integer	num_xform	Number of coordinate transform records (ORIGX+SCALE+MTRIX)
51-55	Integer	num_coord	Number of atomic coordinate records (ATOM+HETATM)
56-60	Integer	num_ter	Number of TER records
61-65	Integer	num_conect	Number of CONECT records
66-70	Integer	num_seq	Number of SEQRES records

parse_line(line)[source]¶

Parse a PDB-format line

Parameters: line (str) – line to parse

PDB data structures and I/O¶

PDB entry top-level class¶

pdb_entry module¶

PDB records¶

annotation module¶

primary module¶

heterogen module¶

secondary module¶

crystallography module¶

coordinates module¶

bookkeeping module¶

`pdb_entry` module¶

`annotation` module¶

`primary` module¶

`heterogen` module¶

`secondary` module¶

`crystallography` module¶

`coordinates` module¶

`bookkeeping` module¶