TENOR (Transcriptome ENcyclopedia Of Rice) in RAP-DB

We have published TENOR (Transcriptome ENcyclopedia Of Rice) database for providing gene expression profiles and transcriptional activity on the rice genome at the nucleotide level based on the RNA-Seq data under 140 environmental stresses and plant hormone treated conditions (Kawahara Y. et. al. 2016). Six years have passed since its release, numerous RNA-Seq data have been accumulated in the public database. To provide more comprehensive transcriptome information on the rice genome, we obtained publicly available RNA-Seq data, curated meta-information (sampling, experimental, sequencing conditions) for each sample, and analyzed them using the standardized analysis pipeline. Currently, gene expression profiles and transcriptional activities under 565 different experimental conditions and tissues are provided in the "Expression (TENOR)" section of each transcript annotation page (e.g. Os01t0911700-01) and JBrowse, respectively. The meta-information of the samples and the details of the analysis pipeline are as below.

Reference data

Genome sequences [FASTA]
- IRGSP-1.0 genome (including organella and unanchored contig sequences)
Gene annotation for StringTie in GTF (as of 11 Nov 2021)
- RAP-DB representative genes[GTF]
- RAP-DB predicted genes[GTF]

Analysis tools

Java (JDK 1.8.0_191)
Trimmomatic (v0.39)
HISAT2 (v2.2.1)
StringTie(v2.1.6)
SamTools (v1.13)
BAMscale (v1.0)

Commands and parameters used in the workflow

Preprocessing of Illumina paired-end reads

    $ java -jar trimmomatic-0.39.jar PE \
    -phred33 read.r1.fastq.gz read.r2.fastq.gz \
    read.pe.r1.fastq.gz read.se.r1.fastq.gz read.pe.r2.fastq.gz read.se.r2.fastq.gz \
    ILLUMINACLIP:adapters.fa:2:30:10 LEADING:15 TRAILING:15 SLIDINGWINDOW:10:15 MINLEN:30

Making index of the genome

    # concatenate GTF files of RAP-DB Rep. and Pred. genes
    $ cat IRGSP-1.0_representative_transcript_exon_2021-11-11.gtf \
    IRGSP-1.0_predicted_transcript_exon_2021-11-11.gtf \
    > all_transcripts_exon.gtf

    # make splice site data
    $ python extract_splice_sites.py all_transcripts_exon.gtf > ss.tab

    # make exon position data
    $ python extract_exons.py all_transcripts_exon.gtf > exon.tab

    # make index for hisat2
    $ hisat2-build --ss ss.tab --exon exon.tab \
    IRGSP-1.0_genome_M_C_unanchored.fa IRGSP-1.0_genome_M_C_unanchored

Alignment of Illumina reads to the reference genome

    $ hisat2 -x IRGSP-1.0_genome_M_C_unanchored \
    --summary-file rnaseq_summary.stats --min-intronlen 20 --max-intronlen 10000 \
    --dta --new-summary -1 read.pe.r1.fastq.gz -2 read.pe.r2.fastq.gz -S alignment.sam
    $ samtools sort -o alignment.sort.bam alignment.sam
    $ samtools index alignment.sort.bam

Calculate gene abundance (TPM)

    $ stringtie alignment.sort.bam -e -B \
    -G all_transcripts_exon.gtf \
    -o sample/sample.gtf -A rnaseq_abundance_sample.stats

To get gene expression levels of each transcript and sample, TPM values were extracted from GTF files (sample.gtf) output by StringTie.

Making BigWig data for JBrowse

    # for standard RNA-Seq data
    BAMscale scale --operation rna --bam ./alignment.sort.bam

    # for strand-specific RNA-Seq data
    BAMscale scale --operation strandrna --bam ./alignment.sort.bam

TENOR (Transcriptome ENcyclopedia Of Rice) in RAP-DB

Reference data

Analysis tools

Commands and parameters used in the workflow

Meta-information of RNA-Seq samples in TENOR of RAP-DB