Bio.motifs.jaspar.db 模块
提供对 JASPAR5 格式数据库的读取访问。
此模块需要安装 MySQLdb。
示例,根据需要替换您的数据库凭据
from Bio.motifs.jaspar.db import JASPAR5
JASPAR_DB_HOST = "hostname.example.org"
JASPAR_DB_NAME = "JASPAR2018"
JASPAR_DB_USER = "guest"
JASPAR_DB_PASS = "guest"
jdb = JASPAR5(
host=JASPAR_DB_HOST,
name=JASPAR_DB_NAME,
user=JASPAR_DB_USER,
password=JASPAR_DB_PASS
)
ets1 = jdb.fetch_motif_by_id('MA0098')
print(ets1)
TF name ETS1
Matrix ID MA0098.3
Collection CORE
TF class ['Tryptophan cluster factors']
TF family ['Ets-related factors']
Species 9606
Taxonomic group vertebrates
Accession ['P14921']
Data type used HT-SELEX
Medline 20517297
PAZAR ID TF0000070
Comments Data is from Taipale HTSELEX DBD (2013)
Matrix:
0 1 2 3 4 5 6 7 8 9
A: 2683.00 180.00 425.00 0.00 0.00 2683.00 2683.00 1102.00 89.00 803.00
C: 210.00 2683.00 2683.00 21.00 0.00 0.00 9.00 21.00 712.00 401.00
G: 640.00 297.00 7.00 2683.00 2683.00 0.00 31.00 1580.00 124.00 1083.00
T: 241.00 22.00 0.00 0.00 12.00 0.00 909.00 12.00 1970.00 396.00
motifs = jdb.fetch_motifs(
collection = 'CORE',
tax_group = ['vertebrates', 'insects'],
tf_class = 'Homeo domain factors',
tf_family = ['TALE-type homeo domain factors', 'POU domain factors'],
min_ic = 12
)
for motif in motifs:
pass # do something with the motif
- class Bio.motifs.jaspar.db.JASPAR5(host=None, name=None, user=None, password=None)
基类:
object
表示 JASPAR5 数据库的类。
表示 JASPAR5 数据库的类。其中的方法松散地基于 perl TFBS::DB::JASPAR5 模块。
注意:我们将只实现从数据库中读取 JASPAR 基序。与 perl 模块不同,我们目前不会尝试实现任何存储 JASPAR 基序或创建新数据库的方法。
- __init__(host=None, name=None, user=None, password=None)
构造一个 JASPAR5 实例并连接到指定的数据库。
- 参数
host - JASPAR 数据库服务器的主机名
name - JASPAR 数据库的名称
user - 连接到 JASPAR 数据库的用户名
password - JASPAR 数据库密码
- __str__()
返回 JASPAR5 数据库连接的字符串表示形式。
- fetch_motif_by_id(id)
通过其 JASPAR 矩阵 ID 从数据库中获取单个 JASPAR 基序。
示例 id ‘MA0001.1’。
- 参数
- id - JASPAR 矩阵 ID。这可能是一个完全指定的 ID,包括
版本号(例如 MA0049.2)或仅基本 ID(例如 MA0049)。如果只提供基本 ID,则返回最新版本。
- 返回值
一个 Bio.motifs.jaspar.Motif 对象
**注意:**perl TFBS 模块允许您指定要返回的矩阵类型(PFM、PWM、ICM),但矩阵始终以 JASPAR 格式存储为 PFM,因此这并不真正属于这里。获取 PFM 后,可以调用 pwm() 和 pssm() 方法来返回归一化矩阵和对数几率矩阵。
- fetch_motifs_by_name(name)
通过给定的 TF 名称从 JASPAR 数据库中获取 JASPAR 基序列表。
参数:name - 单个名称或名称列表 返回值:一个 Bio.motifs.jaspar.Motif 对象列表
注意:名称不保证是唯一的。可能存在多个具有相同名称的基序。因此,即使 name 指定单个名称,也会返回基序列表。这只是调用 self.fetch_motifs(collection = None, tf_name = name)。
此行为与 TFBS perl 模块的 get_Matrix_by_name() 方法不同,该方法始终返回单个矩阵,并在多个矩阵具有相同名称的情况下发出警告消息并返回第一个检索到的矩阵。
- fetch_motifs(collection=JASPAR_DFLT_COLLECTION, tf_name=None, tf_class=None, tf_family=None, matrix_id=None, tax_group=None, species=None, pazar_id=None, data_type=None, medline=None, min_ic=0, min_length=0, min_sites=0, all=False, all_versions=False)
使用选择标准获取 jaspar.Record(列表)的基序。
参数
Except where obvious, all selection criteria arguments may be specified as a single value or a list of values. Motifs must meet ALL the specified selection criteria to be returned with the precedent exceptions noted below. all - Takes precedent of all other selection criteria. Every motif is returned. If 'all_versions' is also specified, all versions of every motif are returned, otherwise just the latest version of every motif is returned. matrix_id - Takes precedence over all other selection criteria except 'all'. Only motifs with the given JASPAR matrix ID(s) are returned. A matrix ID may be specified as just a base ID or full JASPAR IDs including version number. If only a base ID is provided for specific motif(s), then just the latest version of those motif(s) are returned unless 'all_versions' is also specified. collection - Only motifs from the specified JASPAR collection(s) are returned. NOTE - if not specified, the collection defaults to CORE for all other selection criteria except 'all' and 'matrix_id'. To apply the other selection criteria across all JASPAR collections, explicitly set collection=None. tf_name - Only motifs with the given name(s) are returned. tf_class - Only motifs of the given TF class(es) are returned. tf_family - Only motifs from the given TF families are returned. tax_group - Only motifs belonging to the given taxonomic supergroups are returned (e.g. 'vertebrates', 'insects', 'nematodes' etc.) species - Only motifs derived from the given species are returned. Species are specified as taxonomy IDs. data_type - Only motifs generated with the given data type (e.g. ('ChIP-seq', 'PBM', 'SELEX' etc.) are returned. NOTE - must match exactly as stored in the database. pazar_id - Only motifs with the given PAZAR TF ID are returned. medline - Only motifs with the given medline (PubmMed IDs) are returned. min_ic - Only motifs whose profile matrices have at least this information content (specificty) are returned. min_length - Only motifs whose profiles are of at least this length are returned. min_sites - Only motifs compiled from at least these many binding sites are returned. all_versions- Unless specified, just the latest version of motifs determined by the other selection criteria are returned. Otherwise all versions of the selected motifs are returned.
- 返回值
一个 Bio.motifs.jaspar.Record(列表)的基序。