User guide

Inputs

Below is the list of all input files and parameters. See section Usage for particular functions inputs.

  • count table file

  • count_table_input

  • matrix NxM; N = genes, M = time points

  • CSV file format

  • examples: Table 1 or “data” directory on GitHub

Table 1: Count table example

Time 1

Time 2

Time 3

Gene 1

255

1,596

80

Gene 2

112

63

0

Gene 3

56

3,582

27

Gene 4

559

865

91

  • normalization type parameter

  • normalization_type

  • optional, default: None

  • options: RPKM, CPM, TPM

  • Please note that the count table must be in a normalized form in order to infer GRN correctly!

  • GenBank file

  • genbank_file_input

  • optional (except refineGRN function) but several parts are skipped if not provided

  • Gene names in the count table must match locus tags (i.e. “locus_tag”) or names (i.e. “name”) in the GenBank file!

  • required format: GenBank full (i.e. containing nucleotide sequence in the ORIGIN section)

  • examples: “data” directory on GitHub or Examples)

  • promoter length parameter

  • promoter_length

  • optional, default: 1000 bp

  • The length of a sequence in which a TFBM (transcription factor binding motif) is searched. E.g. promoter_length=1000 means that the sequence 1000 bp upstream of a gene is taken.

  • maximum time for a TFMB search parameter

  • motifs_max_time

  • optional, default: 180 s

  • Maximum time in seconds to search TFBM for individual TF using MEME Suite. The recommended time by MEME Suite is 180 seconds, but it may take longer for large genomes. If the search is terminated due to timeout, a message will be displayed and the parameter should be extended.

  • Gene Regulatory Network file

  • GRN_input

  • Adjacency matrix NxN; N = number of genes; TF = transcription factor (regulator); TG = target (regulated) gene

  • CSV file format

  • examples: Table 2 or “data” directory on GitHub

Table 2: GRN example

TG 1

TG 2

TG 3

TF 1

0

1

-1

TF 2

1

0

0

TF 3

1

-1

0

  • Add database information parameter

  • add_dbs_info

  • optional, default: None

  • options to add DBs info: ‘yes’, ‘Yes’, 1; every other parameter´s value results in skipping Cell Collective DB search

Usage

Below is the list of Augusta’s functions along with the inputs. See Examples for further description and tutorials.

Import Augusta:

> python3
>>> import Augusta

GRN and BN inference using RNA-Seq

RNASeq_to_BN is the main function for inferring both networks (GRN and BN) using RNA-Seq dataset as an input.

Usage:

>>> Augusta.RNASeq_to_BN(count_table_input, promoter_length, genbank_file_input, normalization_type, motifs_max_time)

Note: count_table_input is the only indispensable input, the remaining ones are optional. Not providing GenBank file results in only inferring GRN by computing mutual information. Further steps such as count table normalization, GRN validation (TFBM and DBs search), and Cell Collective DB search would be skipped.

GRN inference using RNA-Seq

RNASeq_to_GRN is the function for inferring only a Gene Regulatory Network using RNA-Seq dataset as an input.

Usage:

>>> Augusta.RNASeq_to_GRN(count_table_input, promoter_length, genbank_file_input, normalization_type, motifs_max_time)

Note: count_table_input is the only indispensable input, the remaining ones are optional. Not providing GenBank file results in only inferring GRN by computing mutual information. Further steps such as count table normalization, GRN validation (TFBM and DBs search) would be skipped.

BN inference using GRN

GRN_to_BN is the function for inferring a Boolean Network (BN) using a Gene Regulatory Network (GRN) file as an input.

Usage:

>>> Augusta.GRN_to_BN(GRN_input, promoter_length, genbank_file_input, add_dbs_info)

Note: GRN_input is the only indispensable input, the remaining ones are optional. Not providing GenBank file and/or not setting add_dbs_info only results in a GRN to BN conversion. CC DB would not be searched.

GRN refinement

refineGRN is the function for refining already inferred Gene Regulatory Network (GRN).

Usage:

>>> Augusta.refineGRN(GRN_input, genbank_file_input, count_table_input, promoter_length, motifs_max_time)

Note: GRN_input, genbank_file_input, and count_table_input are indispensable inputs, the remaining ones are optional.

Outputs

All output files are stored in generated “output” directory. During motif search, the temporary file “temporary_coreg_seq.fasta” is generated and deleted at the end of the verification process.

  • Gene Regulatory Network

  • adjancency matrix in CSV file format

  • rows: TFs (trascription factors / regulators), cols: TGs (target / regulated genes)

  • “GRN.csv”

  • Boolean Network

  • SBML-qual file format

  • “BN.sbml”

  • Note: GRN is primarily converted to the temporary file “BN.txt”. If memory is sufficient, the “BN.txt” is converted to “BN.sbml”. Otherwise, “BN.txt” is the final output.

  • motifs

  • all TFBM discovered in the genome assigned to their transcription factor

  • Stockholm file format

  • “discovered_motifs.sto”

  • genes interactions

  • all interactions searched across databases stored as “DBs_interactions_list.csv”

  • uncertain interactions stored as “DBs_interactions_uncertain.csv” (i.e. the same gene pair has both positive and negative interaction types in different DBs). The edge type with more references is incorporated into the network. In case the number of references is equal, no information is taken from the database.