PDB data structures and I/O

PDB entry top-level class

pdb_entry module

Top-level module for PDB structure entries.

The specifications used in this class are derived from the Protein Data Bank Contents Guide: Atomic Coordinate Entry Format Description, Version 3.3.

class old_pdb.pdb_entry.Entry[source]

Top-level class for PDB structure entry.

Annotate LINK to indicate whether the named atoms are elements.

Creates two new Boolean attributes in record: is_element1 and is_element2.

Parameters

record (secondary.Link) – record to annotate

Returns

annotated record

property author

annotation.Author AUTHOR record.

property caveat

annotation.Caveat CAVEAT record.

check_master()[source]

Check the contents against internal bookkeeping records.

Raises

AssertionError – if checks fail

property cis_peptide

List of secondary.CisPeptide CISPEP records.

property compound

annotation.Compound COMPND record.

property connect

List of bookkeeping.Connection CONECT records.

property database_reference

List of primary.DatabaseReference DBREF records.

property disulfide_bond

List of secondary.DisulfideBond SSBOND records.

property experimental_data

annotation.ExperimentalData EXPDTA record.

find_atom_by_name(chain_id, residue_id, atom_name, model_num=1)old_pdb.coordinates.Atom[source]

Find a specific atom by name.

Parameters
  • chain_id (str) – chain ID to find

  • residue_id (int) – residue ID to find

  • atom_name (str) – name of atom to find

  • model_num (int) – model number to use

Returns

ATOM or HETATM object

find_residue(chain_id, residue_id, model_num=1)list[source]

Find a specific residue.

Parameters
  • chain_id (str) – chain ID to find

  • residue_id (int) – residue ID to find

  • model_num (int) – model number to use

Returns

list of coordinates.Atom-like objects

property frac_transform

List of crystallography.FractionalTransform SCALEn records.

property header

annotation.Header HEADER record.

property helix

List of secondary.Helix HELIX records.

property heterogen

List of heterogen.Heterogen HET records.

property heterogen_formula

heterogen.Formula FORMUL record.

property heterogen_name

List of heterogen.HeterogenName HETNAM records.

property heterogen_synonym

heterogen.HeterogenSynonym HETSYN record.

property journal

annotation.Journal JRNL record.

property keyword

annotation.Keywords KEYWDS record.

List of secondary.Link LINK records.

property master

bookkeeping.Master MASTER record.

property model

List of coordinates.Model MODEL records.

property model_type

annotation.ModelType MDLTYP record.

property modified_residue

List of primary.ModifiedResidue MODRES records.

property noncrystal_transform

List of crystallography.NoncrystalTransform MTRIXn records.

num_atoms(heavy_only=True)int[source]

Number of ATOM and HETATM entries in all chains in entry.

Parameters

heavy_only (bool) – exclude hydrogen atoms from count

num_chains()int[source]

Number of chains in entry.

property num_model

annotation.NumModels NUMMDL record.

num_residues(count_hetatm=False)int[source]

Number of residues in entry.

Parameters

count_hetam (bool) – include heterogen residues in count

num_ter()int[source]

Number of TER records in entry.

num_transforms()int[source]

Return the number of optional transform records in entry.

Returns

number of ORGIXn + SCALEn + MTRIXn

property obsolete

annotation.Obsolete OBSLTE record.

property original_transform

List of crystallography.OriginalTransform ORIGX records.

parse_file(file_)[source]

Parse a PDB file.

Parameters

file (file) – file open for reading.

parse_line(line)[source]

Parse a line of a PDB file.

Parameters

line (str) – line of PDB file

property remark

List of annotation.Remark REMARK records.

property revision_data

annotation.RevisionData REVDAT record.

property sequence_difference

List of primary.SequenceDifferences SEQADV records.

property sequence_residue

List of primary.SequenceResidues SEQRES records.

property setter

annotation.Author AUTHOR record.

property sheet

List of secondary.Sheet SHEET records.

property source

annotation.Source SOURCE record.

property split

annotation.Split SPLIT record.

property supersedes

annotation.Supersedes SPRSDE record.

property title

annotation.Title TITLE record.

property unit_cell

crystallography.UnitCell CRYST1 record.

PDB records

annotation module

Classes for PDB records that provide annotation information.

class old_pdb.annotation.Author[source]

AUTHOR field

The AUTHOR record contains the names of the people responsible for the contents of the entry.

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“AUTHOR”

9-10

Continuation

continuation

Allows concatenation of multiple records.

11-79

List

author_list

List of the author names, separated by commas.

parse_line(line)[source]

Parse PDB-format line.

Parameters

line (str) – line to parse

class old_pdb.annotation.Caveat[source]

CAVEAT field

CAVEAT warns of severe errors in an entry. Use caution when using an entry containing this record.

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“CAVEAT”

9-10

Continuation

continuation

Allows concatenation of multiple records.

12-15

IDcode

id_code

PDB ID code of this entry.

20-79

String

comment

Free text giving the reason for the CAVEAT.

parse_line(line)[source]

Parse PDB-format line.

Parameters

line (str) – line to parse

class old_pdb.annotation.Compound[source]

COMPND field

The COMPND record describes the macromolecular contents of an entry. Each macromolecule found in the entry is described by a set of token: value pairs, and is referred to as a COMPND record component. Since the concept of a molecule is difficult to specify exactly, PDB staff may exercise editorial judgment in consultation with depositors in assigning these names.

For each macromolecular component, the molecule name, synonyms, number assigned by the Enzyme Commission (EC), and other relevant details are specified.

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“COMPND”

8-10

Continuation

continuation

Allows concatenation of multiple records.

11-80

Specification list

compound

Description of the molecular components.

parse_line(line)[source]

Parse PDB-format line.

Parameters

line (str) – line to parse

class old_pdb.annotation.ExperimentalData[source]

EXPDTA field

The EXPDTA record identifies the experimental technique used. This may refer to the type of radiation and sample, or include the spectroscopic or modeling technique. Permitted values include:

  • ELECTRON DIFFRACTION

  • FIBER DIFFRACTION

  • FLUORESCENCE TRANSFER

  • NEUTRON DIFFRACTION

  • NMR

  • THEORETICAL MODEL

  • X-RAY DIFFRACTION

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“EXPDTA”

9-10

Continuation

continuation

Allows concatenation of multiple records.

11-79

SList

technique

The experimental technique(s) with optional comment describing the sample or experiment.

parse_line(line)[source]

Parse PDB-format line.

Parameters

line (str) – line to parse

class old_pdb.annotation.Header[source]

HEADER field

The HEADER record uniquely identifies a PDB entry through the id_code field. This record also provides a classification for the entry. Finally, it contains the date the coordinates were deposited at the PDB.

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“HEADER”

11-50

String(40)

classification

Classifies the molecule(s).

51-59

Date

dep_date

Deposition date. This is the date the coordinates were received at the PDB.

63-66

IDcode

id_code

This identifier is unique within the PDB.

parse_line(line)[source]

Parse PDB-format line.

Parameters

line (str) – line to parse

class old_pdb.annotation.Journal[source]

JRNL field

The JRNL record contains the primary literature citation that describes the experiment which resulted in the deposited coordinate set. There is at most one JRNL reference per entry. If there is no primary reference, then there is no JRNL reference. Other references are given in REMARK 1.

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“JRNL”

13-79

LString

text

See details in PDB specification.

parse_line(line)[source]

Parse PDB-format line.

Parameters

line (str) – line to parse

class old_pdb.annotation.Keywords[source]

KEYWDS field

The KEYWDS record contains a set of terms relevant to the entry. Terms in the KEYWDS record provide a simple means of categorizing entries and may be used to generate index files. This record addresses some of the limitations found in the classification field of the HEADER record. It provides the opportunity to add further annotation to the entry in a concise and computer-searchable fashion.

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“KEYWDS”

9-10

Continuation

continuation

Allows concatenation of records if necessary.

11-79

List

keywords

Comma-separated list of keywords relevant to the entry.

parse_line(line)[source]

Parse PDB-format line.

Parameters

line (str) – line to parse

class old_pdb.annotation.ModelType[source]

MDLTYP field.

The MDLTYP record contains additional annotation pertinent to the coordinates presented in the entry.

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“MDLTYP”

9-10

Continuation

continuation

Allows concatenation of multiple records.

11-80

SList

comment

Free Text providing additional structural annotation.

parse_line(line)[source]

Parse PDB-format line.

Parameters

line (str) – line to parse

class old_pdb.annotation.NumModels[source]

NUMMDL field

The NUMMDL record indicates total number of models in a PDB entry.

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“NUMMDL”

11-14

Integer

model_number

Number of models.

parse_line(line)[source]

Parse PDB-format line.

Parameters

line (str) – line to parse

class old_pdb.annotation.Obsolete[source]

OBSLTE field

This record acts as a flag in an entry which has been withdrawn from the PDB’s full release. It indicates which, if any, new entries have replaced the withdrawn entry.

The format allows for the case of multiple new entries replacing one existing entry.

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“OBSLTE”

9-10

Continuation

continuation

Allows concatenation of multiple records

12-20

Date

replace_date

Date that this entry was replaced.

22-25

IDcode

id_code

ID code of this entry.

32-35

IDcode

replace_id_codes[0]

ID of entry replacing this one.

37-40

IDcode

replace_id_codes[1]

ID of entry replacing this one.

42-45

IDcode

replace_id_codes[2]

ID of entry replacing this one.

47-50

IDcode

replace_id_codes[3]

ID of entry replacing this one.

52-55

IDcode

replace_id_codes[4]

ID of entry replacing this one.

57-60

IDcode

replace_id_codes[5]

ID of entry replacing this one.

62-65

IDcode

replace_id_codes[6]

ID of entry replacing this one.

67-70

IDcode

replace_id_codes[7]

ID of entry replacing this one.

72-75

IDcode

replace_id_codes[8]

ID of entry replacing this one.

parse_line(line)[source]

Parse PDB-format line.

Parameters

line (str) – line to parse

class old_pdb.annotation.Remark[source]

REMARK field

REMARK records present experimental details, annotations, comments, and information not included in other records. In a number of cases, REMARKs are used to expand the contents of other record types. A new level of structure is being used for some REMARK records. This is expected to facilitate searching and will assist in the conversion to a relational database.

parse_line(line)[source]

Initialize by parsing line.

COLUMNS

TYPE

FIELD

DEFINITION

8-10

int

remark_num

Remark number. It is not an error for remark n to exist in an entry when remark n-1 does not.

12-79

str

remark_text

Left as white space in first line of each new remark.

Parameters

line (str) – line with PDB class

class old_pdb.annotation.Revision[source]

Class to store contents of a single REVDAT modification.

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“REVDAT”

8-10

Integer

modification_num

Modification number.

11-12

Continuation

continuation

Allows concatenation of multiple records.

14-22

Date

modification_date

Date of modification (or for new entries) in DD-MMM-YY format. This is not repeated on continued lines.

24-27

IDCode

modification_id

ID code of this entry. This is not repeated on continuation lines.

32

Integer

modification_type

An integer identifying the type of modification. For all revisions, the modification type is listed as 1

40-45

LString(6)

record

Modification detail.

47-52

LString(6)

record

Modification detail.

54-59

LString(6)

record

Modification detail.

61-66

LString(6)

record

Modification detail.

parse_line(line)[source]

Parse PDB-format line for specific revision.

Parameters

line (str) – line to parse.

class old_pdb.annotation.RevisionData[source]

REVDAT field

REVDAT records contain a history of the modifications made to an entry since its release.

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“REVDAT”

8-10

Integer

modNum

Modification number.

11-12

Continuation

continuation

Allows concatenation of multiple records.

14-22

Date

modDate

Date of modification (or for new entries) in DD-MMM-YY format. This is not repeated on continued lines.

24-27

IDCode

modId

ID code of this entry. This is not repeated on continuation lines.

32

Integer

modType

An integer identifying the type of modification. For all revisions, the modification type is listed as 1

40-45

LString(6)

record

Modification detail.

47-52

LString(6)

record

Modification detail.

54-59

LString(6)

record

Modification detail.

61-66

LString(6)

record

Modification detail.

parse_line(line)[source]

Parse PDB-format line.

Parameters

line (str) – line to parse

property revisions

Get revisions.

Returns

dictionary with modifiction numbers as keys and Revision objects as values

class old_pdb.annotation.Site[source]

SITE class

The SITE records supply the identification of groups comprising important sites in the macromolecule.

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“SITE “

8-10

Integer

seq_num

Sequence number.

12-14

LString(3)

site_id

Site name.

16-17

Integer

num_res

Number of residues that compose the site.

19-21

Residue name

res_name1

Residue name for first residue that creates the site.

23

Character

chain_id1

Chain identifier for first residue of site.

24-27

Integer

seq1

Residue sequence number for first residue of the site.

28

AChar

ins_code1

Insertion code for first residue of the site.

30-32

Residue name

res_name2

Residue name for second residue that creates the site.

34

Character

chain_id2

Chain identifier for second residue of the site.

35-38

Integer

seq2

Residue sequence number for second residue of the site.

39

AChar

ins_code2

Insertion code for second residue of the site.

41-43

Residue name

res_name3

Residue name for third residue that creates the site.

45

Character

chain_id3

Chain identifier for third residue of the site.

46-49

Integer

seq3

Residue sequence number for third residue of the site.

50

AChar

ins_code3

Insertion code for third residue of the site.

52-54

Residue name

res_name4

Residue name for fourth residue that creates the site.

56

Character

chain_id4

Chain identifier for fourth residue of the site.

57-60

Integer

seq4

Residue sequence number for fourth residue of the site.

61

AChar

ins_code4

Insertion code for fourth residue of the site.

parse_line(line)[source]

Parse PDB-format line.

Parameters

line (str) – line to parse

class old_pdb.annotation.Source[source]

SOURCE field

The SOURCE record specifies the biological and/or chemical source of each biological molecule in the entry. Sources are described by both the common name and the scientific name, e.g., genus and species. Strain and/or cell-line for immortalized cells are given when they help to uniquely identify the biological entity studied.

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“SOURCE”

8-10

Continuation

continuation

Allows concatenation of multiple records.

11-79

Specification List

source

Identifies the source of the macromolecule in a token: value format.

parse_line(line)[source]

Parse a PDB-format line.

Parameters

line (str) – line to parse

class old_pdb.annotation.Split[source]

SPLIT field

The SPLIT record is used in instances where a specific entry composes part of a large macromolecular complex. It will identify the PDB entries that are required to reconstitute a complete complex.

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“SPLIT “

9-10

Continuation

continuation

Allows concatenation of multiple records.

12-15

IDcode

id_codes[0]

ID code of related entry.

17-20

IDcode

id_codes[1]

ID code of related entry.

22-25

IDcode

id_codes[2]

ID code of related entry.

27-30

IDcode

id_codes[3]

ID code of related entry.

32-35

IDcode

id_codes[4]

ID code of related entry.

37-40

IDcode

id_codes[5]

ID code of related entry.

42-45

IDcode

id_codes[6]

ID code of related entry.

47-50

IDcode

id_codes[7]

ID code of related entry.

52-55

IDcode

id_codes[8]

ID code of related entry.

57-60

IDcode

id_codes[9]

ID code of related entry.

62-65

IDcode

id_codes[10]

ID code of related entry.

67-70

IDcode

id_codes[11]

ID code of related entry.

72-75

IDcode

id_codes[12]

ID code of related entry.

77-80

IDcode

id_codes[13]

ID code of related entry.

parse_line(line)[source]

Parse input line.

Parameters

line (str) – PDB-format line to parse

class old_pdb.annotation.Supersedes[source]

SPRSDE field

The SPRSDE records contain a list of the ID codes of entries that were made obsolete by the given coordinate entry and withdrawn from the PDB release set. One entry may replace many. It is PDB policy that only the principal investigator of a structure has the authority to withdraw it.

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“SPRSDE”

9-10

Continuation

continuation

Allows for multiple ID codes.

12-20

Date

super_date

Date entry superseded the listed entries. This field is not copied on continuations.

22-25

IDcode

id_code

ID code of this entry. This field is not copied on continuations.

32-35

IDcode

super_id_codes

ID code of superseded entry.

37-40

IDcode

super_id_codes

ID code of superseded entry.

42-45

IDcode

super_id_codes

ID code of superseded entry.

47-50

IDcode

super_id_codes

ID code of superseded entry.

52-55

IDcode

super_id_codes

ID code of superseded entry.

57-60

IDcode

super_id_codes

ID code of superseded entry.

62-65

IDcode

super_id_codes

ID code of superseded entry.

67-70

IDcode

super_id_codes

ID code of superseded entry.

72-75

IDcode

super_id_codes

ID code of superseded entry.

parse_line(line)[source]

Parse PDB-format line.

Parameters

line (str) – line to parse

class old_pdb.annotation.Title[source]

TITLE field

The TITLE record contains a title for the experiment or analysis that is represented in the entry. It should identify an entry in the PDB in the same way that a title identifies a paper.

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“TITLE “

9-10

Continuation

continuation

Allows concatenation of multiple records.

11-80

String

title

Title of the experiment.

parse_line(line)[source]

Parse PDB-format line.

Parameters

line (str) – line to parse

primary module

Classes for PDB records that provide primary structure information.

class old_pdb.primary.DatabaseReference[source]

DBREF record.

The DBREF record provides cross-reference links between PDB sequences (what appears in SEQRES record) and a corresponding database sequence.

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“DBREF “

8-11

IDcode

id_code

ID code of this entry.

13

Character

chain_id

Chain identifier.

15-18

Integer

seq_begin

Initial sequence number of PDB sequence segment.

19

AChar

ins_begin

Initial insertion code of PDB sequence segment.

21-24

Integer

seq_end

Ending sequence number of PDB sequence segment.

25

AChar

ins_end

Ending insertion code of PDB sequence segment.

27-32

LString

database

Sequence database name.

34-41

LString

database_accession

Sequence database accession code.

43-54

LString

database_id_code

Sequence database id code.

56-60

Integer

database_seq_begin

Initial sequence number of database seqment.

61

AChar

database_ins_begin

Insertion code of initial residue segment, if PDB is reference.

63-67

Integer

database_seq_end

Ending sequence number of segment.

68

AChar

database_ins_end

Insertion code of the the segment end, if PDB is reference.

parse_line(line)[source]

Parse DBREF line.

Parameters

line (str) – line to parse

class old_pdb.primary.DatabaseReference1[source]

Provides cross-reference links between PDB sequences (what appears in SEQRES record) and a corresponding database sequence.

This updated two-line format is used when the accession code or sequence numbering does not fit the space allotted in the standard DBREF format. This includes some GenBank sequence numbering (greater than 5 characters) and UNIMES accession numbers (greater than 12 characters).

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“DBREF1”

8-11

IDcode

id_code

ID code of this entry.

13

Character

chain_id

Chain identifier.

15-18

Integer

seq_begin

Initial sequence number of the PDB sequence segment, right justified.

19

AChar

ins_begin

Initial insertion code of the PDB sequence segment.

21-24

Integer

seq_end

Ending sequence number of the PDB sequence segment, right justified.

25

AChar

ins_end

Ending insertion code of the PDB sequence segment.

27-32

LString

database

Sequence database name.

48-67

LString

db_id_code

Sequence database identification code, left justified.

parse_line(line)[source]

Parse PDB-format line.

Parameters

line (str) – line with PDB class

class old_pdb.primary.DatabaseReference2[source]

Provides cross-reference links between PDB sequences (what appears in SEQRES record) and a corresponding database sequence.

This updated two-line format is used when the accession code or sequence numbering does not fit the space allotted in the standard DBREF format. This includes some GenBank sequence numbering (greater than 5 characters) and UNIMES accession numbers (greater than 12 characters).

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“DBREF2”

8-11

IDcode

id_code

ID code of this entry.

13

Character

chain_id

Chain identifier.

19-40

LString

db_accession

Sequence database accession code left justified.

46-55

Integer

seq_begin

Initial sequence number of the Database segment, right justified.

58-67

Integer

seq_end

Ending sequence number of the Database segment, right justified.

parse_line(line)[source]

Parse PDB-format line.

Parameters

line (str) – line to parse

class old_pdb.primary.ModifiedResidue[source]

MODRES field

The MODRES record provides descriptions of modifications (e.g., chemical or post-translational) to protein and nucleic acid residues. Included are a mapping between residue names given in a PDB entry and standard residues.

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“MODRES”

8-11

IDcode

id_code

ID code of this entry.

13-15

Residue name

res_name

Residue name used in this entry

17

Character

chain_id

Chain identifier.

19-22

Integer

seq_num

Sequence number.

23

AChar

ins_code

Insertion code.

25-27

Residue name

standard_res

Standard residue name.

30-70

String

comment

Description of the residue modification.

parse_line(line)[source]

Parse PDB-format line.

Parameters

line (str) – line to parse

class old_pdb.primary.SequenceDifferences[source]

SEQADV field

The SEQADV record identifies conflicts between sequence information in the ATOM records of the PDB entry and the sequence database entry given on DBREF. Please note that these records were designed to identify differences and not errors. No assumption is made as to which database contains the correct data. PDB may include REMARK records in the entry that reflect the depositor’s view of which database has the correct sequence.

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“SEQADV”

8-11

IDcode

id_code

ID code of this entry.

13-15

Residue name

res_name

Name of the PDB residue in conflict.

17

Character

chain_id

PDB chain identifier.

19-22

Integer

seq_num

PDB sequence number.

23

AChar

ins_code

PDB insertion code.

25-28

LString

database

30-38

LString

db_id_code

Sequence database accession number.

40-42

Residue name

db_res

Sequence database residue name.

44-48

Integer

db_seq

Sequence database sequence number.

50-70

LString

conflict

Conflict comment.

parse_line(line)[source]

Parse PDB-format line.

Parameters

line (str) – line to parse

class old_pdb.primary.SequenceResidues[source]

SEQRES field

SEQRES records contain the amino acid or nucleic acid sequence of residues in each chain of the macromolecule that was studied.

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“SEQRES”

8-10

Integer

serNum

Serial number of the SEQRES record for the current chain. Starts at 1 and increments by one each line. Reset to 1 for each chain.

12

Character

chainID

Chain identifier. This may be any single legal character, including a blank which is is used if there is only one chain.

14-17

Integer

numRes

Number of residues in the chain. This value is repeated on every record.

20-22

Residue name

resName

Residue name.

24-26

Residue name

resName

Residue name.

28-30

Residue name

resName

Residue name.

32-34

Residue name

resName

Residue name.

36-38

Residue name

resName

Residue name.

40-42

Residue name

resName

Residue name.

44-46

Residue name

resName

Residue name.

48-50

Residue name

resName

Residue name.

52-54

Residue name

resName

Residue name.

56-58

Residue name

resName

Residue name.

60-62

Residue name

resName

Residue name.

64-66

Residue name

resName

Residue name.

68-70

Residue name

resName

Residue name.

num_chains()int[source]

Number of chains in sequence.

parse_line(line)[source]

Parse PDB-format line.

Parameters

line (str) – line to parse

property residues

Dictionary of residues indexed by chain id.

Returns

dictionary with chain IDs as keys and lists of residue names as values.

heterogen module

Classes for PDB records that provide heterogen information.

class old_pdb.heterogen.Formula[source]

FORMUL field

The FORMUL record presents the chemical formula and charge of a non-standard group.

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“FORMUL”

9-10

Integer

compNum

Component number.

13-15

LString(3)

hetID

Het identifier.

17-18

Integer

continuation

Continuation number.

19

Character

asterisk

“*” for water.

20-70

String

text

Chemical formula.

property components

Formulae for components.

Returns

dictionary with component numbers as keys and values that consist of tuples of the hetatom ID and the formula text.

parse_line(line)[source]

Parse PDB-format line.

Parameters

line (str) – line to parse

class old_pdb.heterogen.Heterogen[source]

HET field

HET records are used to describe non-standard residues, such as prosthetic groups, inhibitors, solvent molecules, and ions for which coordinates are supplied. Groups are considered HET if they are:

  • not one of the standard amino acids, and

  • not one of the nucleic acids (C, G, A, T, U, and I), and

  • not one of the modified versions of nucleic acids (+C, +G, +A, +T, +U, and +I), and

  • not an unknown amino acid or nucleic acid where UNK is used to indicate the unknown residue name.

Het records also describe heterogens for which the chemical identity is unknown, in which case the group is assigned the hetatm_id UNK.

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“HET “

8-10

LString(3)

het_id

Identifier, right-justified.

13

Character

chain_id

Chain identifier.

14-17

Integer

seq_num

Sequence number.

18

AChar

ins_code

Insertion code.

21-25

Integer

num_het_atoms

Number of HETATM records for the group present in the entry.

31-70

String

text

Text describing Het group.

parse_line(line)[source]

Parse PDB-format line.

Parameters

line (str) – line to parse

class old_pdb.heterogen.HeterogenName[source]

HETNAM field

This record gives the chemical name of the compound with the given hetatm_id.

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“HETNAM”

9-10

Continuation

continuation

Allows concatenation of multiple records.

12-14

LString(3)

het_id

Het identifier, right- justified.

16-70

String

text

Chemical name.

parse_line(line)[source]

Parse PDB-format line.

Parameters

line (str) – line to parse

class old_pdb.heterogen.HeterogenSynonym[source]

HETSYN field

This record provides synonyms, if any, for the compound in the corresponding (i.e., same hetatm_id) HETNAM record. This is to allow greater flexibility in searching for HET groups.

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“HETSYN”

9-10

Continuation

continuation

Allows concatenation of multiple records.

12-14

LString(3)

het_id

Het identifier, right- justified.

16-70

SList

synonyms

List of synonyms.

parse_line(line)[source]

Parse line of PDB file.

Parameters

line (str) – PDB file line to parse

secondary module

Classes for records with secondary structure and connectivity information.

class old_pdb.secondary.CisPeptide[source]

CISPEP field

CISPEP records specify the prolines and other peptides found to be in the cis conformation. This record replaces the use of footnote records to list cis peptides.

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“CISPEP”

8-10

Integer

ser_num

Record serial number.

12-14

LString(3)

pep1

Residue name.

16

Character

chain_id1

Chain identifier.

18-21

Integer

seq_num1

Residue sequence number.

22

AChar

icode1

Insertion code.

26-28

LString(3)

pep2

Residue name.

30

Character

chain_id2

Chain identifier.

32-35

Integer

seq_num2

Residue sequence number.

36

AChar

icode2

Insertion code.

44-46

Integer

mod_num

Identifies the specific model.

54-59

Real(6.2)

measure

Angle measurement in degrees.

parse_line(line)[source]

Parse PDB-format line.

Parameters

line (str) – line to parse

class old_pdb.secondary.DisulfideBond[source]

SSBOND field

The SSBOND record identifies each disulfide bond in protein and polypeptide structures by identifying the two residues involved in the bond.

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“SSBOND”

8-10

Integer

ser_num

Serial number.

12-14

LString(3)

“CYS”

Residue name.

16

Character

chain_id1

Chain identifier.

18-21

Integer

seq_num1

Residue sequence number.

22

AChar

icode1

Insertion code.

26-28

LString(3)

“CYS”

Residue name.

30

Character

chain_id2

Chain identifier.

32-35

Integer

seq_num2

Residue sequence number.

36

AChar

icode2

Insertion code.

60-65

SymOP

sym1

Symmetry operator for residue 1.

67-72

SymOP

sym2

Symmetry operator for residue 2.

74-78

Real(5.2)

length

Disulfide bond distance

parse_line(line)[source]

Parse PDB-format line.

Parameters

line (str) – line to parse

class old_pdb.secondary.Helix[source]

HELIX field

HELIX records are used to identify the position of helices in the molecule. Helices are both named and numbered. The residues where the helix begins and ends are noted, as well as the total length.

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“HELIX “

8-10

Integer

serNum

Serial number of the helix. starts at 1 and increases incrementally.

12-14

LString(3)

helix_id

Helix identifier. In addition to a serial number, each helix is given an alphanumeric helix identifier.

16-18

Residue name

init_res_name

Name of the initial residue.

20

Character

init_chain_id

Chain identifier for the chain containing this helix.

22-25

Integer

init_seq_num

Sequence number of the initial residue.

26

AChar

init_i_code

Insertion code of the initial residue.

28-30

Residue name

end_res_name

Name of the terminal residue of the helix.

32

Character

end_chain_id

Chain identifier for the chain containing this helix.

34-37

Integer

end_seq_num

Sequence number of the terminal residue.

38

AChar

end_i_code

Insertion code of the terminal residue.

39-40

Integer

helix_class

Helix class (see below).

41-70

String

comment

Comment about this helix.

72-76

Integer

length

Length of this helix.

parse_line(line)[source]

Parse PDB-format line.

Parameters

line (str) – line to parse

LINK field

The LINK records specify connectivity between residues that is not implied by the primary structure. Connectivity is expressed in terms of the atom names. This record supplements information given in CONECT records and is provided here for convenience in searching.

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“LINK “

13-16

Atom

name1

Atom name.

17

Character

alt_loc1

Alternate location indicator.

18-20

Residue name

res_name1

Residue name.

22

Character

chain_id

Chain identifier.

23-26

Integer

res_seq1

Residue sequence number.

27

AChar

ins_code1

Insertion code.

43-46

Atom

name2

Atom name.

47

Character

alt_loc2

Alternate location indicator.

48-50

Residue name

res_name2

Residue name.

52

Character

chain_id

Chain identifier.

53-56

Integer

res_seq2

Residue sequence number.

57

AChar

ins_code2

Insertion code.

60-65

SymOP

sym1

Symmetry operator atom 1.

67-72

SymOP

sym2

Symmetry operator atom 2.

74-78

Real(5.2)

Length

Link distance

parse_line(line)[source]

Parse PDB-format line.

Parameters

line (str) – line to parse

class old_pdb.secondary.Sheet[source]

SHEET field

SHEET records are used to identify the position of sheets in the molecule. Sheets are both named and numbered. The residues where the sheet begins and ends are noted.

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“SHEET “

8-10

Integer

strand

Strand number which starts at 1 for each strand within a sheet and increases by one.

12-14

LString(3)

sheet_id

Sheet identifier.

15-16

Integer

num_strands

Number of strands in sheet.

18-20

Residue name

init_res_name

Name of initial residue.

22

Character

init_chain_id

Chain identifier of initial residue in strand.

23-26

Integer

init_seq_num

Sequence number of initial residue in strand.

27

AChar

init_ins_code

Insertion code of initial residue in strand.

29-31

Residue name

end_res_name

Name of terminal residue

33

Character

end_chain_id

Chain identifier of terminal residue

34-37

Integer

end_seq_num

Sequence number of terminal residue.

38

AChar

end_ins_code

Insertion code of terminal residue.

39-40

Integer

sense

Sense of strand with respect to previous strand in the sheet. 0 if first strand, 1 if parallel, and -1 if anti-parallel.

42-45

Atom

cur_atom

Registration. Atom name in current strand.

46-48

Residue name

cur_res_name

Registration. Residue name in current strand

50

Character

cur_chain_id

Registration. Chain identifier in current strand.

51-54

Integer

cur_res_seq

Registration. Residue sequence number in current strand.

55

AChar

cur_ins_code

Registration. Insertion code in current strand.

57-60

Atom

prev_atom

Registration. Atom name in previous strand.

61-63

Residue name

prev_res_name

Registration. Residue name in previous strand.

65

Character

prev_chain_id

Registration. Chain identifier in previous strand.

66-69

Integer

prev_res_seq

Registration. Residue sequence number in previous strand.

70

AChar

prev_ins_code

Registration. Insertion code in previous strand.

parse_line(line)[source]

Parse PDB-format line.

Parameters

line (str) – line to parse

crystallography module

Classes for records with crystallographic information.

class old_pdb.crystallography.FractionalTransform(n)[source]

SCALEn baseclass

The SCALEn (n = 1, 2, or 3) records present the transformation from the orthogonal coordinates as contained in the entry to fractional crystallographic coordinates. Non-standard coordinate systems should be explained in the remarks.

COLUMNS

DATA TYPE

FIELD

DEFINITION

1 - 6

Record name

“SCALEn” n=1, 2, or 3

11 - 20

Real(10.6)

sn1

Sn1

21 - 30

Real(10.6)

sn2

Sn2

31 - 40

Real(10.6)

sn3

Sn3

46 - 55

Real(10.5)

unif

Un

parse_line(line)[source]

Parse PDB-format line.

Parameters

line (str) – line to parse

class old_pdb.crystallography.NoncrystalTransform(n)[source]

MTRIXn baseclass

The MTRIXn (n = 1, 2, or 3) records present transformations expressing non-crystallographic symmetry.

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“MTRIXn”

n=1, 2, or 3

8-10

Integer

serial

Serial number.

11-20

Real(10.6)

mn1

Mn1

21-30

Real(10.6)

mn2

Mn2

31-40

Real(10.6)

mn3

Mn3

46-55

Real(10.5)

vn

Vn

60

Integer

i_given

1 if coordinates for the representations which are approximately related by the transformations of the molecule are contained in the entry. Otherwise, blank.

parse_line(line)[source]

Parse PDB-format line.

Parameters

line (str) – line to parse

class old_pdb.crystallography.OriginalTransform(n)[source]

ORIGXn class

The ORIGXn (n = 1, 2, or 3) records present the transformation from the orthogonal coordinates contained in the entry to the submitted coordinates.

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“ORIGXn”

n=1, 2, or 3

11-20

Real(10.6)

on1

On1

21-30

Real(10.6)

on2

On2

31-40

Real(10.6)

on3

On3

46-55

Real(10.5)

tn

Tn

parse_line(line)[source]

Parse PDB-format line.

Parameters

line (str) – line to parse

class old_pdb.crystallography.UnitCell[source]

CRYST1 class

The CRYST1 record presents the unit cell parameters, space group, and Z value. If the structure was not determined by crystallographic means, CRYST1 simply defines a unit cube.

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“CRYST1”

7-15

Real(9.3)

a

a (Angstroms).

16-24

Real(9.3)

b

b (Angstroms).

25-33

Real(9.3)

c

c (Angstroms).

34-40

Real(7.2)

alpha

alpha (degrees).

41-47

Real(7.2)

beta

beta (degrees).

48-54

Real(7.2)

gamma

gamma (degrees).

56-66

LString

sGroup

Space group.

67-70

Integer

z

Z value.

parse_line(line)[source]

Parse PDB-format line.

Parameters

line (str) – line to parse

coordinates module

Classes for records with coordinate information.

class old_pdb.coordinates.Atom[source]

ATOM class

The ATOM records present the atomic coordinates for standard residues. They also present the occupancy and temperature factor for each atom. Heterogen coordinates use the HETATM record type. The element symbol is always present on each ATOM record; segment identifier and charge are optional.

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“ATOM “

7-11

Integer

serial

Atom serial number.

13-16

Atom

name

Atom name.

17

Character

alt_loc

Alternate location indicator.

18-20

Residue name

res_name

Residue name.

22

Character

chain_id

Chain identifier.

23-26

Integer

res_seq

Residue sequence number.

27

AChar

ins_code

Code for insertion of residues.

31-38

Real(8.3)

x

Orthogonal coordinates for X in Angstroms.

39-46

Real(8.3)

y

Orthogonal coordinates for Y in Angstroms.

47-54

Real(8.3)

z

Orthogonal coordinates for Z in Angstroms.

55-60

Real(6.2)

occupancy

Occupancy.

61-66

Real(6.2)

temp_factor

Temperature factor.

77-78

LString(2)

element

Element symbol, right-justified.

79-80

LString(2)

charge

Charge on the atom.

parse_line(line)[source]

Parse PDB-format line.

Parameters

line (str) – line to parse

class old_pdb.coordinates.ChainTerminus[source]

TER class

The TER record indicates the end of a list of ATOM/HETATM records for a chain.

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“TER “

7-11

Integer

serial

Serial number.

18-20

Residue name

res_name

Residue name.

22

Character

chain_id

Chain identifier.

23-26

Integer

res_seq

Residue sequence number.

27

AChar

ins_code

Insertion code.

parse_line(line)[source]

Parse PDB-format line.

Parameters

line (str) – line to parse

class old_pdb.coordinates.HeterogenAtom[source]

HETATM class

The HETATM records present the atomic coordinate records for atoms within “non-standard” groups. These records are used for water molecules and atoms presented in HET groups.

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“HETATM”

7-11

Integer

serial

Atom serial number.

13-16

Atom

name

Atom name.

17

Character

alt_loc

Alternate location indicator.

18-20

Residue name

res_name

Residue name.

22

Character

chain_id

Chain identifier.

23-26

Integer

res_seq

Residue sequence number.

27

AChar

ins_code

Code for insertion of residues.

31-38

Real(8.3)

x

Orthogonal coordinates for X.

39-46

Real(8.3)

y

Orthogonal coordinates for Y.

47-54

Real(8.3)

z

Orthogonal coordinates for Z.

55-60

Real(6.2)

occupancy

Occupancy.

61-66

Real(6.2)

temp_factor

Temperature factor.

77-78

LString(2)

element

Element symbol; right-justified.

79-80

LString(2)

charge

Charge on the atom.

parse_line(line)[source]

Parse PDB-format line.

Parameters

line (str) – line to parse

class old_pdb.coordinates.Model[source]

MODEL class.

The MODEL record specifies the model serial number when multiple structures are presented in a single coordinate entry, as is often the case with structures determined by NMR.

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“MODEL “

11-14

Integer

serial

Model serial number.

property all_atoms

Get all atoms in model.

Returns

list of Atom-like objects

property atoms

Get ATOM atoms in model.

Returns

list of Atom-like objects

property het_atoms

Get HETATM atoms in model.

Returns

list of Atom-like objects

num_atoms(heavy_only)int[source]

Number of ATOM and HETATM entries in all chains in model.

Parameters

heavy_only (bool) – exclude hydrogen atoms from count

num_chains()int[source]

Count number of chains in model.

num_residues(count_hetatm)int[source]

Number of residues in entry.

Parameters

count_hetatm (bool) – include heterogen residues in count

num_ter()int[source]

Count number of termini in entry.

parse_line(line)[source]

Parse PDB-format line.

Parameters

line (str) – line to parse

class old_pdb.coordinates.TemperatureFactor[source]

ANISOU class

The ANISOU records present the anisotropic temperature factors.

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“ANISOU”

7-11

Integer

serial

Atom serial number.

13-16

Atom

name

Atom name.

17

Character

alt_loc

Alternate location indicator

18-20

Residue name

res_name

Residue name.

22

Character

chain_id

Chain identifier.

23-26

Integer

res_seq

Residue sequence number.

27

AChar

ins_code

Insertion code.

29-35

Integer

u00

U(1,1)

36-42

Integer

u11

U(2,2)

43-49

Integer

u22

U(3,3)

50-56

Integer

u01

U(1,2)

57-63

Integer

u02

U(1,3)

64-70

Integer

u12

U(2,3)

77-78

LString(2)

element

Element symbol, right-justified.

79-80

LString(2)

charge

Charge on the atom.

parse_line(line)[source]

Parse PDB-format line.

Parameters

line (str) – line to parse

bookkeeping module

Classes for records with connectivity and bookkeeping information.

class old_pdb.bookkeeping.Connection[source]

CONECT class

The CONECT records specify connectivity between atoms for which coordinates are supplied. The connectivity is described using the atom serial number as found in the entry. CONECT records are mandatory for HET groups (excluding water) and for other bonds not specified in the standard residue connectivity table which involve atoms in standard residues (see Appendix 4 for the list of standard residues). These records are generated by the PDB.

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“CONECT”

7-11

Integer

serial

Atom serial number

12-16

Integer

serial

Serial number of bonded atom

17-21

Integer

serial

Serial number of bonded atom

22-26

Integer

serial

Serial number of bonded atom

27-31

Integer

serial

Serial number of bonded atom

parse_line(line)[source]

Parse PDB-format line.

Parameters

line (str) – line to parse

class old_pdb.bookkeeping.Master[source]

MASTER class

The MASTER record is a control record for bookkeeping. It lists the number of lines in the coordinate entry or file for selected record types.

COLUMNS

DATA TYPE

FIELD

DEFINITION

1-6

Record name

“MASTER”

11-15

Integer

num_remark

Number of REMARK records

16-20

Integer

“0”

21-25

Integer

num_het

Number of HET records

26-30

Integer

num_helix

Number of HELIX records

31-35

Integer

num_sheet

Number of SHEET records

36-40

Integer

num_turn

deprecated

41-45

Integer

num_site

Number of SITE records

46-50

Integer

num_xform

Number of coordinate transform records (ORIGX+SCALE+MTRIX)

51-55

Integer

num_coord

Number of atomic coordinate records (ATOM+HETATM)

56-60

Integer

num_ter

Number of TER records

61-65

Integer

num_conect

Number of CONECT records

66-70

Integer

num_seq

Number of SEQRES records

parse_line(line)[source]

Parse a PDB-format line

Parameters

line (str) – line to parse