Bio.Align.stockholm 模块

Bio.Align 对斯德哥尔摩文件格式的对齐文件的支持。

您应该通过 Bio.Align 函数使用此模块。

例如,考虑来自 PFAM 的 HAT 螺旋基序的对齐。

# STOCKHOLM 1.0
#=GF ID   HAT
#=GF AC   PF02184.18
#=GF DE   HAT (Half-A-TPR) repeat
#=GF AU   SMART;
#=GF SE   Alignment kindly provided by SMART
#=GF GA   21.00 21.00;
#=GF TC   21.00 21.00;
#=GF NC   20.90 20.90;
#=GF BM   hmmbuild HMM.ann SEED.ann
#=GF SM   hmmsearch -Z 57096847 -E 1000 --cpu 4 HMM pfamseq
#=GF TP   Repeat
#=GF CL   CL0020
#=GF RN   [1]
#=GF RM   9478129
#=GF RT   The HAT helix, a repetitive motif implicated in RNA processing.
#=GF RA   Preker PJ, Keller W;
#=GF RL   Trends Biochem Sci 1998;23:15-16.
#=GF DR   INTERPRO; IPR003107;
#=GF DR   SMART; HAT;
#=GF DR   SO; 0001068; polypeptide_repeat;
#=GF CC   The HAT (Half A TPR) repeat is found in several RNA processing
#=GF CC   proteins [1].
#=GF SQ   3
#=GS CRN_DROME/191-222     AC P17886.2
#=GS CLF1_SCHPO/185-216    AC P87312.1
#=GS CLF1_SCHPO/185-216    DR PDB; 3JB9 R; 185-216;
#=GS O16376_CAEEL/201-233  AC O16376.2
CRN_DROME/191-222                KEIDRAREIYERFVYVH.PDVKNWIKFARFEES
CLF1_SCHPO/185-216               HENERARGIYERFVVVH.PEVTNWLRWARFEEE
#=GR CLF1_SCHPO/185-216    SS    --HHHHHHHHHHHHHHS.--HHHHHHHHHHHHH
O16376_CAEEL/201-233             KEIDRARSVYQRFLHVHGINVQNWIKYAKFEER
#=GC SS_cons                     --HHHHHHHHHHHHHHS.--HHHHHHHHHHHHH
#=GC seq_cons                    KEIDRARuIYERFVaVH.P-VpNWIKaARFEEc
//

使用 Bio.Align 解析此文件会存储对齐、其注释以及序列及其注释。

>>> from Bio.Align import stockholm
>>> alignments = stockholm.AlignmentIterator("Stockholm/example.sth")
>>> alignment = next(alignments)
>>> alignment.shape
(3, 33)
>>> alignment[0]
'KEIDRAREIYERFVYVH-PDVKNWIKFARFEES'

对齐元数据存储在 alignment.annotations 中。

>>> alignment.annotations["accession"]
'PF02184.18'
>>> alignment.annotations["references"][0]["title"]
'The HAT helix, a repetitive motif implicated in RNA processing.'

对齐列的注释存储在 alignment.column_annotations 中。

>>> alignment.column_annotations["consensus secondary structure"]
'--HHHHHHHHHHHHHHS.--HHHHHHHHHHHHH'

序列及其注释存储在 alignment.sequences 中。

>>> alignment.sequences[0].id
'CRN_DROME/191-222'
>>> alignment.sequences[0].seq
Seq('KEIDRAREIYERFVYVHPDVKNWIKFARFEES')
>>> alignment.sequences[1].letter_annotations["secondary structure"]
'--HHHHHHHHHHHHHHS--HHHHHHHHHHHHH'

切片对齐的特定列将切片任何按列注释。

>>> alignment.column_annotations["consensus secondary structure"]
'--HHHHHHHHHHHHHHS.--HHHHHHHHHHHHH'
>>> part_alignment = alignment[:,10:20]
>>> part_alignment.column_annotations["consensus secondary structure"]
'HHHHHHS.--'
class Bio.Align.stockholm.AlignmentIterator(source)

基类:AlignmentIterator

斯德哥尔摩格式的对齐文件对齐迭代器。

该文件可能包含多个连接的对齐,这些对齐将被增量加载并返回。

对齐元数据(以 #=GF 开头的行)存储在字典 alignment.annotations 中。列注释(以 #=GC 开头的行)存储在字典 alignment.column_annotations 中。序列名称存储在 record.id 中。序列记录元数据(以 #=GS 开头的行)存储在字典 record.annotations 中。序列字母注释(以 #=GR 开头的行)存储在字典 record.letter_annotations 中。

不支持环绕对齐 - 每个序列必须在单行上。

有关文件格式的更多信息,请参阅:http://sonnhammer.sbc.su.se/Stockholm.html https://en.wikipedia.org/wiki/Stockholm_format

fmt: str | None = 'Stockholm'
gf_mapping = {'**': '**', 'AC': 'accession', 'AU': 'author', 'BM': 'build method', 'CB': 'calibration method', 'CC': 'comment', 'CL': 'clan', 'DE': 'definition', 'GA': 'gathering method', 'ID': 'identifier', 'NC': 'noise cutoff', 'PI': 'previous identifier', 'SE': 'source of seed', 'SM': 'search method', 'SS': 'source of structure', 'TC': 'trusted cutoff', 'TP': 'type', 'WK': 'wikipedia'}
gr_mapping = {'AS': 'active site', 'CSA': 'Catalytic Site Atlas', 'IN': 'intron', 'LI': 'ligand binding', 'PP': 'posterior probability', 'SA': 'surface accessibility', 'SS': 'secondary structure', 'TM': 'transmembrane', 'pAS': 'active site - Pfam predicted', 'sAS': 'active site - from SwissProt'}
gc_mapping = {'2L3J_B_SS': '2L3J B SS', 'AS_cons': 'consensus active site', 'CORE': 'CORE', 'CSA_cons': 'consensus Catalytic Site Atlas', 'IN_cons': 'consensus intron', 'LI_cons': 'consensus ligand binding', 'MM': 'model mask', 'PK': 'PK', 'PK_SS': 'PK SS', 'PP_cons': 'consensus posterior probability', 'RF': 'reference coordinate annotation', 'RNA_elements': 'RNA elements', 'RNA_ligand_AdoCbl': 'RNA ligand AdoCbl', 'RNA_ligand_AqCbl': 'RNA ligand AqCbl', 'RNA_ligand_FMN': 'RNA ligand FMN', 'RNA_ligand_Guanidinium': 'RNA ligand Guanidinium', 'RNA_ligand_SAM': 'RNA ligand SAM', 'RNA_ligand_THF_1': 'RNA ligand THF 1', 'RNA_ligand_THF_2': 'RNA ligand THF 2', 'RNA_ligand_TPP': 'RNA ligand TPP', 'RNA_ligand_preQ1': 'RNA ligand preQ1', 'RNA_motif_k_turn': 'RNA motif k turn', 'RNA_structural_element': 'RNA structural element', 'RNA_structural_elements': 'RNA structural elements', 'Repeat_unit': 'Repeat unit', 'SA_cons': 'consensus surface accessibility', 'SS_cons': 'consensus secondary structure', 'TM_cons': 'consensus transmembrane', 'cons': 'cons', 'pAS_cons': 'consensus active site - Pfam predicted', 'sAS_cons': 'consensus active site - from SwissProt', 'scorecons': 'consensus score', 'scorecons_70': 'consensus score 70', 'scorecons_80': 'consensus score 80', 'scorecons_90': 'consensus score 90', 'seq_cons': 'consensus sequence'}
gs_mapping = {'AC': 'accession', 'LO': 'look', 'OC': 'organism classification', 'OS': 'organism'}
__abstractmethods__ = frozenset({})
key = 'IN'
keyword = 'cons'
value = 'intron'
class Bio.Align.stockholm.AlignmentWriter(target)

基类:AlignmentWriter

Stockholm 文件格式的对齐文件写入器。

gf_mapping = {'**': '**', 'accession': 'AC', 'author': 'AU', 'build method': 'BM', 'calibration method': 'CB', 'clan': 'CL', 'comment': 'CC', 'definition': 'DE', 'gathering method': 'GA', 'identifier': 'ID', 'noise cutoff': 'NC', 'previous identifier': 'PI', 'search method': 'SM', 'source of seed': 'SE', 'source of structure': 'SS', 'trusted cutoff': 'TC', 'type': 'TP', 'wikipedia': 'WK'}
gs_mapping = {'accession': 'AC', 'look': 'LO', 'organism': 'OS', 'organism classification': 'OC'}
gr_mapping = {'Catalytic Site Atlas': 'CSA', 'active site': 'AS', 'active site - Pfam predicted': 'pAS', 'active site - from SwissProt': 'sAS', 'intron': 'IN', 'ligand binding': 'LI', 'posterior probability': 'PP', 'secondary structure': 'SS', 'surface accessibility': 'SA', 'transmembrane': 'TM'}
gc_mapping = {'2L3J B SS': '2L3J_B_SS', 'CORE': 'CORE', 'PK': 'PK', 'PK SS': 'PK_SS', 'RNA elements': 'RNA_elements', 'RNA ligand AdoCbl': 'RNA_ligand_AdoCbl', 'RNA ligand AqCbl': 'RNA_ligand_AqCbl', 'RNA ligand FMN': 'RNA_ligand_FMN', 'RNA ligand Guanidinium': 'RNA_ligand_Guanidinium', 'RNA ligand SAM': 'RNA_ligand_SAM', 'RNA ligand THF 1': 'RNA_ligand_THF_1', 'RNA ligand THF 2': 'RNA_ligand_THF_2', 'RNA ligand TPP': 'RNA_ligand_TPP', 'RNA ligand preQ1': 'RNA_ligand_preQ1', 'RNA motif k turn': 'RNA_motif_k_turn', 'RNA structural element': 'RNA_structural_element', 'RNA structural elements': 'RNA_structural_elements', 'Repeat unit': 'Repeat_unit', 'cons': 'cons', 'consensus Catalytic Site Atlas': 'CSA_cons', 'consensus active site': 'AS_cons', 'consensus active site - Pfam predicted': 'pAS_cons', 'consensus active site - from SwissProt': 'sAS_cons', 'consensus intron': 'IN_cons', 'consensus ligand binding': 'LI_cons', 'consensus posterior probability': 'PP_cons', 'consensus score': 'scorecons', 'consensus score 70': 'scorecons_70', 'consensus score 80': 'scorecons_80', 'consensus score 90': 'scorecons_90', 'consensus secondary structure': 'SS_cons', 'consensus sequence': 'seq_cons', 'consensus surface accessibility': 'SA_cons', 'consensus transmembrane': 'TM_cons', 'model mask': 'MM', 'reference coordinate annotation': 'RF'}
fmt: str | None = 'Stockholm'
format_alignment(alignment)

返回一个包含单个对齐的 Stockholm 格式字符串。

__abstractmethods__ = frozenset({})