Bio.UniGene 包
模块内容
解析 Unigene 平面文件格式文件,例如 Hs.data 文件。
以下是此解析器处理的平面文件格式的概述
行类型/限定符
ID UniGene cluster ID TITLE Title for the cluster GENE Gene symbol CYTOBAND Cytological band EXPRESS Tissues of origin for ESTs in cluster RESTR_EXPR Single tissue or development stage contributes more than half the total EST frequency for this gene. GNM_TERMINUS genomic confirmation of presence of a 3' terminus; T if a non-templated polyA tail is found among a cluster's sequences; else I if templated As are found in genomic sequence or S if a canonical polyA signal is found on the genomic sequence GENE_ID Entrez gene identifier associated with at least one sequence in this cluster; to be used instead of LocusLink. LOCUSLINK LocusLink identifier associated with at least one sequence in this cluster; deprecated in favor of GENE_ID HOMOL Homology; CHROMOSOME Chromosome. For plants, CHROMOSOME refers to mapping on the arabidopsis genome. STS STS ACC= GenBank/EMBL/DDBJ accession number of STS [optional field] UNISTS= identifier in NCBI's UNISTS database TXMAP Transcript map interval MARKER= Marker found on at least one sequence in this cluster RHPANEL= Radiation Hybrid panel used to place marker PROTSIM Protein Similarity data for the sequence with highest-scoring protein similarity in this cluster ORG= Organism PROTGI= Sequence GI of protein PROTID= Sequence ID of protein PCT= Percent alignment ALN= length of aligned region (aa) SCOUNT Number of sequences in the cluster SEQUENCE Sequence ACC= GenBank/EMBL/DDBJ accession number of sequence NID= Unique nucleotide sequence identifier (gi) PID= Unique protein sequence identifier (used for non-ESTs) CLONE= Clone identifier (used for ESTs only) END= End (5'/3') of clone insert read (used for ESTs only) LID= Library ID; see Hs.lib.info for library name and tissue MGC= 5' CDS-completeness indicator; if present, the clone associated with this sequence is believed CDS-complete. A value greater than 511 is the gi of the CDS-complete mRNA matched by the EST, otherwise the value is an indicator of the reliability of the test indicating CDS completeness; higher values indicate more reliable CDS-completeness predictions. SEQTYPE= Description of the nucleotide sequence. Possible values are mRNA, EST and HTC. TRACE= The Trace ID of the EST sequence, as provided by NCBI Trace Archive
- class Bio.UniGene.SequenceLine(text=None)
Bases:
object
存储 Unigene 文件中一个 SEQUENCE 行的信息。
用 SEQUENCE 行的文本部分初始化,或者什么都不初始化。
- 属性和描述(以小写形式访问)
ACC= 序列的 GenBank/EMBL/DDBJ 登录号
NID= 唯一的核苷酸序列标识符(gi)
PID= 唯一的蛋白质序列标识符(用于非 EST)
CLONE= 克隆标识符(仅用于 EST)
END= 克隆插入片段读取的末端(5’/3’)(仅用于 EST)
LID= 文库 ID;有关文库名称和组织,请参见 Hs.lib.info
MGC= 5’ CDS 完成度指标;如果存在,则与该序列相关的克隆被认为是 CDS 完成的。大于 511 的值是 EST 匹配的 CDS 完成 mRNA 的 gi,否则该值是指示 CDS 完成度测试可靠性的指标;较高的值表示更可靠的 CDS 完成度预测。
SEQTYPE= 核苷酸序列的描述。可能的值为 mRNA、EST 和 HTC。
TRACE= EST 序列的 Trace ID,由 NCBI Trace Archive 提供
- __init__(text=None)
初始化该类。
- __repr__()
将 UniGene SequenceLine 对象作为字符串返回。
- class Bio.UniGene.ProtsimLine(text=None)
Bases:
object
存储 Unigene 文件中一个 PROTSIM 行的信息。
用 PROTSIM 行的文本部分初始化,或者什么都不初始化。
属性和描述(以小写形式访问) ORG= 生物体 PROTGI= 蛋白质的序列 GI PROTID= 蛋白质的序列 ID PCT= 比对百分比 ALN= 比对区域的长度(aa)
- __init__(text=None)
初始化该类。
- __repr__()
将 UniGene ProtsimLine 对象作为字符串返回。
- class Bio.UniGene.STSLine(text=None)
Bases:
object
存储 Unigene 文件中一个 STS 行的信息。
用 STS 行的文本部分初始化,或者什么都不初始化。
属性和描述(以小写形式访问)
ACC= STS 的 GenBank/EMBL/DDBJ 登录号 [可选字段] UNISTS= NCBI 的 UNISTS 数据库中的标识符
- __init__(text=None)
初始化该类。
- __repr__()
将 UniGene STSLine 对象作为字符串返回。
- class Bio.UniGene.Record
Bases:
object
存储 Unigene 记录。
以下是存储的内容
self.ID = '' # ID line self.species = '' # Hs, Bt, etc. self.title = '' # TITLE line self.symbol = '' # GENE line self.cytoband = '' # CYTOBAND line self.express = [] # EXPRESS line, parsed on ';' # Will be an array of strings self.restr_expr = '' # RESTR_EXPR line self.gnm_terminus = '' # GNM_TERMINUS line self.gene_id = '' # GENE_ID line self.locuslink = '' # LOCUSLINK line self.homol = '' # HOMOL line self.chromosome = '' # CHROMOSOME line self.protsim = [] # PROTSIM entries, array of Protsims # Type ProtsimLine self.sequence = [] # SEQUENCE entries, array of Sequence entries # Type SequenceLine self.sts = [] # STS entries, array of STS entries # Type STSLine self.txmap = [] # TXMAP entries, array of TXMap entries
- __init__()
初始化该类。
- __repr__()
将 UniGene Record 对象表示为用于调试的字符串。
- Bio.UniGene.parse(handle)
读取和加载 Unigene 记录,用于包含多个记录的文件。
- Bio.UniGene.read(handle)
读取和加载 Unigene 记录,每个文件一个记录。