User guide¶
Inputs¶
Below is the list of all input files and parameters. See section Usage for particular functions inputs.
count table file
count_table_input
matrix NxM; N = genes, M = time points
CSV file format
examples: Table 1 or “data” directory on GitHub
Time 1 |
Time 2 |
Time 3 |
|
---|---|---|---|
Gene 1 |
255 |
1,596 |
80 |
Gene 2 |
112 |
63 |
0 |
Gene 3 |
56 |
3,582 |
27 |
Gene 4 |
559 |
865 |
91 |
normalization type parameter
normalization_type
optional, default: None
options: RPKM, CPM, TPM
Please note that the count table must be in a normalized form in order to infer GRN correctly!
GenBank file
genbank_file_input
optional (except refineGRN function) but several parts are skipped if not provided
Gene names in the count table must match locus tags (i.e. “locus_tag”) or names (i.e. “name”) in the GenBank file!
required format: GenBank full (i.e. containing nucleotide sequence in the ORIGIN section)
examples: “data” directory on GitHub or Examples)
promoter length parameter
promoter_length
optional, default: 1000 bp
The length of a sequence in which a TFBM (transcription factor binding motif) is searched. E.g. promoter_length=1000 means that the sequence 1000 bp upstream of a gene is taken.
maximum time for a TFMB search parameter
motifs_max_time
optional, default: 180 s
Maximum time in seconds to search TFBM for individual TF using MEME Suite. The recommended time by MEME Suite is 180 seconds, but it may take longer for large genomes. If the search is terminated due to timeout, a message will be displayed and the parameter should be extended.
Gene Regulatory Network file
GRN_input
Adjacency matrix NxN; N = number of genes; TF = transcription factor (regulator); TG = target (regulated) gene
CSV file format
examples: Table 2 or “data” directory on GitHub
TG 1 |
TG 2 |
TG 3 |
|
---|---|---|---|
TF 1 |
0 |
1 |
-1 |
TF 2 |
1 |
0 |
0 |
TF 3 |
1 |
-1 |
0 |
Add database information parameter
add_dbs_info
optional, default: None
options to add DBs info: ‘yes’, ‘Yes’, 1; every other parameter´s value results in skipping Cell Collective DB search
Usage¶
Below is the list of Augusta’s functions along with the inputs. See Examples for further description and tutorials.
Import Augusta:
> python3
>>> import Augusta
GRN and BN inference using RNA-Seq¶
RNASeq_to_BN is the main function for inferring both networks (GRN and BN) using RNA-Seq dataset as an input.
Usage:
>>> Augusta.RNASeq_to_BN(count_table_input, promoter_length, genbank_file_input, normalization_type, motifs_max_time)
Note: count_table_input is the only indispensable input, the remaining ones are optional. Not providing GenBank file results in only inferring GRN by computing mutual information. Further steps such as count table normalization, GRN validation (TFBM and DBs search), and Cell Collective DB search would be skipped.
GRN inference using RNA-Seq¶
RNASeq_to_GRN is the function for inferring only a Gene Regulatory Network using RNA-Seq dataset as an input.
Usage:
>>> Augusta.RNASeq_to_GRN(count_table_input, promoter_length, genbank_file_input, normalization_type, motifs_max_time)
Note: count_table_input is the only indispensable input, the remaining ones are optional. Not providing GenBank file results in only inferring GRN by computing mutual information. Further steps such as count table normalization, GRN validation (TFBM and DBs search) would be skipped.
BN inference using GRN¶
GRN_to_BN is the function for inferring a Boolean Network (BN) using a Gene Regulatory Network (GRN) file as an input.
Usage:
>>> Augusta.GRN_to_BN(GRN_input, promoter_length, genbank_file_input, add_dbs_info)
Note: GRN_input is the only indispensable input, the remaining ones are optional. Not providing GenBank file and/or not setting add_dbs_info only results in a GRN to BN conversion. CC DB would not be searched.
GRN refinement¶
refineGRN is the function for refining already inferred Gene Regulatory Network (GRN).
Usage:
>>> Augusta.refineGRN(GRN_input, genbank_file_input, count_table_input, promoter_length, motifs_max_time)
Note: GRN_input, genbank_file_input, and count_table_input are indispensable inputs, the remaining ones are optional.
Outputs¶
All output files are stored in generated “output” directory. During motif search, the temporary file “temporary_coreg_seq.fasta” is generated and deleted at the end of the verification process.
Gene Regulatory Network
adjancency matrix in CSV file format
rows: TFs (trascription factors / regulators), cols: TGs (target / regulated genes)
“GRN.csv”
Boolean Network
SBML-qual file format
“BN.sbml”
Note: GRN is primarily converted to the temporary file “BN.txt”. If memory is sufficient, the “BN.txt” is converted to “BN.sbml”. Otherwise, “BN.txt” is the final output.
motifs
all TFBM discovered in the genome assigned to their transcription factor
Stockholm file format
“discovered_motifs.sto”
genes interactions
all interactions searched across databases stored as “DBs_interactions_list.csv”
uncertain interactions stored as “DBs_interactions_uncertain.csv” (i.e. the same gene pair has both positive and negative interaction types in different DBs). The edge type with more references is incorporated into the network. In case the number of references is equal, no information is taken from the database.