Using FFPE RNA-Seq with 12 marker genes to evaluate genotoxic and non-genotoxic rat hepatocarcinogens

Introduction Various challenges have been overcome with regard to applying ‘omics technologies for chemical risk assessments. Previously we published results detailing targeted mRNA sequencing (RNA-Seq) on a next generation sequencer using intact RNA derived from freshly frozen rat liver tissues. We successfully discriminated genotoxic hepatocarcinogens (GTHCs) from non-genotoxic hepatocarcinogens (NGTHCs) using 11 selected marker genes. Based on this, we next attempted to use formalin-fixed paraffin-embedded (FFPE) pathology specimens for RNA-Seq analyses. Findings In this study we performed FFPE RNA-Seq to compare a typical GTHC, 2-acetylaminofluorene (AAF) to genotoxicity equivocal p-cresidine (CRE). CRE is used as a synthetic chemical intermediate, and this compound is classified as an IARC 2B carcinogen and is mutagenic in S. typhimurium, which is non-genotoxic to rat livers as assessed by single strand DNA damage analysis. RNA-Seq was used to examine liver FFPE samples obtained from groups of five 10-week-old male F344 rats that were fed with chemicals (AAF: 0.025% and CRE: 1% in food) for 4 weeks or from controls that were fed a basal diet. We extracted RNAs from FFPE samples and RNA-Seq was performed on a MiniSeq (Illumina) using the TruSeq custom RNA panel. AAF induced remarkable differences in the expression of eight genes (Aen, Bax, Btg2, Ccng1, Gdf15, Mbd1, Phlda3 and Tubb4b) from that in the control group, while CRE only induced expression changes in Gdf15, as shown using Tukey’s test. Gene expression profiles for nine genes (Aen, Bax, Btg2, Ccng1, Cdkn1a, Gdf15, Mbd1, Phlda3, and Plk2) differed. between samples treated with AAF and CRE. Finally, principal component analysis (PCA) of 12 genes (Aen, Bax, Btg2, Ccnf, Ccng1, Cdkn1a, Gdf15, Lrp1, Mbd1, Phlda3, Plk2, and Tubb4b) using our previous Open TG-GATE data plus FFPE-AAF and FFPE-CRE successfully differentiated FFPE-AAF, as GTHC, from FFPE-CRE, as NGHTC. Conclusion Our results suggest that FFPE RNA-Seq and PCA are useful for evaluating typical rat GTHCs and NGTHCs.


Introduction
Various 'omics technologies, including genomics, proteomics, and metabolomics, have been used to evaluate chemical risk assessment. However, in addition to the existing knowledge gaps with regard to linking specific molecular changes to apical outcomes, current methodological uncertainties in interpreting and assessing data limit the application of 'omics technologies in regulatory toxicology [1].
Targeted mRNA sequencing (RNA-Seq) has become an important tool for examining the role of the transcriptome in biological processes [2]; however, few studies have examined the feasibility of chemical risk assessment using RNA-Seq. Previously, we published RNA-Seq data generated from a next generation sequencer using intact RNA derived from freshly frozen rat liver tissues [3]. We successfully discriminated genotoxic hepatocarcinogens (GTHCs, N-nitrosodiethylamine and 3,3′-dimethylbennzidine·HCl) from a non-genotoxic hepatocarcinogen [NGTHC, di(2-ethylhexyl)phthalate] and a different intermediate hepatocarcinogen (1,4-dioxane) [3] using 11 selected marker genes that have been described previously [4]. Recently, Hester et al. described a successful RNA-Seq analysis of formalin-fixed paraffin-embedded (FFPE) tissue [5]. In the present study, we attempted to use FFPE pathology specimens for RNA-Seq. Currently, few papers have been published regarding FFPE RNA-Seq (non-targeted) in the rat liver [6].
The general aims of the present study were to evaluate GTHC and NGTHC via the analysis of selected gene expression patterns within the liver as analyzed using FFPE RNA-Seq and PCA, to determine the usefulness of FFPE RNA-Seq for this analysis, and to compare the typical GTHC, 2-acetylaminofluorene (AAF)-induced gene expression profile to expression profiles that were induced by p-cresidine (CRE), which is equivocal for genotoxicity and carcinogenic in rat liver. CRE is used as a synthetic chemical intermediate, and this compound is an IARC 2B carcinogen (rat urinary bladder carcinomas, hepatocellular carcinomas). CRE is mutagenic in S. typhimurium, and possesses a weak in vitro ability to affect sister chromatin exchange and to induce chromosome aberrations [7]; however, it is non-genotoxic in rat livers as assessed by single strand DNA damage assays [8].
Previously we proposed the use of 12 mouse marker genes (Aen, Bax, Btg2, Ccnf, Ccng1, Cdkn1a, Gdf15, Lrp1, Mbd1, Phlda3, Plk2 and Tubb4b) to discriminate eight mouse GTHCs from four NGTHCs using qPCR [9]. Then, we successfully evaluated these 12 mouse marker genes using publicly available rat toxicogenomic data from the Open Japanese Toxicogenomics Project-Genomics Assisted Toxicity Evaluation System (Open TG-GATEs; https://toxico.nibiohn.go.jp) to discriminate genotoxic from non-genotoxic hepatocarcinogens [4]. The Open TG-GATEs was developed by the Japanese Toxicogenomics Project consortium, analyzed by DNA microarray and opened to the public in 2015 [10]. We compared five rat typical genotoxic hepatocarcinogens (GTHCs) to seven typical non-genotoxic hepatocarcinogens (NGTH Cs) and 11 non-genotoxic non-hepatocarcinogens (NGTN HCs; see Methods, Chemicals) at 24 h and 29 days using three doses, and this yielded 124 data points that could be assessed using Open TG-GATEs. Principal component analysis (PCA) of the 12 mouse marker genes successfully separated GTHCs from NGTHCs and NGTNHCs independently of species differences. In the third study, we successfully applied the 12 marker genes to the rat RNA-Seq study with the exception of Gdf15, as this gene possessed a low read number in some samples [3]. In the present study, we again applied the 12 marker genes.
As described previously [9], nine (Aen, Bax, Btg2, Ccng1, Cdkn1a, Gdf15, Mbd1, Phlda3 and Plk2) of 12 marker genes are members of genes families that are related to the intrinsic apoptotic signaling pathway that is activated by the p53 mediator in response to DNA damage. Ccnf may be related to DNA repair and DNA damage [11]. Lrp1 may modulates cancer progression [12], and Tubb4b may be related to human cancer [13].
As described previously [4], we defined the typical rat GTHC as positive according to the standard Ames test and as positive according to in vivo liver tests such as the micronucleus test, the transgenic mutation assay, the comet assay and the UDS test. GTHC were also carcinogenic in rat livers. We defined the typical rat NGTHC as negative according to the standard Ames test and as negative according to in vivo liver tests. These compounds were also carcinogenic in rat livers.
In the present study we successfully extended our RNA-Seq study for use on FFPE samples to discriminate GTHC (AAF) and NGTHC (CRE) via 12 selected marker gene expression patterns in the liver as analyzed using FFPE RNA-Seq and PCA.

Animal treatment
Fifteen male F344 rats were obtained at 5 weeks of age from Charles River Japan (Yokohama, Japan) and used after 1 week of acclimatization. They were maintained in polycarbonate cages with wood chips as bedding in an air-conditioned room [12-h light (5 a.m. to 5. p.m.), 12-h dark; 23 ± 2°C; 55 ± 5% humidity] and were provided food (Oriental Yeast Co., Tokyo, Japan) and water ad libitum. The experimental design was approved by the Animal Care and Utilization Committee of the National Institute of Health Sciences, Japan and the animals were cared for in accordance with institutional guidelines and the Guideline for Proper Conduct of Animal Experiments (Science Council of Japan, June 1, 2006). AAF and CRE were purchased from Tokyo Chemical Industry (Tokyo, Japan) and Sigma-Aldrich (St Louis, Missouri, USA), respectively.
Experimental groups (exp) of five male, 6-week-old F344 rats were given chemicals in their food for 4 weeks; AAF: 0.025% and CRE: 1% in food. These doses were similar doses for long-term carcinogenesis studies. Rats in the control group (cont) were given water and a basal diet. After the treatment, the left lobe of the liver was dissected and fixed in 10% formalin for a week, routinely processed to paraffin-embedded blocks, which were used for another study [14], and stored at room temperature for 3 years before use.

RNA isolation and RNA-Seq
Total RNA was extracted from a liver sample of a FFPE tissue section (10 μm thick with a size range of approximately 200mm 2 ) using Maxwell Ⓡ RSC RNA FFPE Kit (Promega Japan, Tokyo, Japan) according to the manufacturer's instructions. RNA quality was analyzed using an Agilent 2100 Bioanalyzer (Agilent Technologies Japan, Ltd., Tokyo, Japan). RNA-Seq was conducted using Tru-Seq Targeted RNA Expression Library Prep kits in Min-iSeq (Illumina K.K., Tokyo, Japan) according to the manufacturer's instruction. Approximately 200 ng total RNA from each sample was used for RNA-Seq. The read numbers of each of the 12 genes and Dazap2 and Ube2d3 of 15 samples (five each from control, AAF and CRE groups) were analyzed in a single NGS run by adding the oligonucleotide barcode using Local Run Manager Software (Illumina) (Additional file 1). The symbols, gene names and ID numbers of the 12 genes analyzed and Dazap2 and Ube2d3 genes, candidate normalized gene are summarized in Table 1. We selected Dazap2 and Ube2d3 as candidate normalized genes from public Open TG-GATEs data. (see Discussion in more detail.)

Statistical analysis
For statistical analysis, we performed a logarithmic (log 2 ) transformation of the raw data to stabilize the variance. Statistical significance for each gene against the control group and between AAF and CRE groups was assessed with the Tukey test using "Pharmaco Basic" (Kazuhiko Matsumoto edited, Scientist-press, Tokyo). Differentiation of the gene expression profiles associated with typical GTHCs from those with typical NGTHCs and NGTNHCs and CRE was achieved using the unsupervised learning algorithm PCA [15]. PCA was performed using the PCA program in "R project for Statistical Computing" (https://www.r-project.org/). PCA was conducted with 12 genes (Aen, Bax, Btg2, Ccnf, Ccng1, Cdkn1a, Gdf15, Lrp1, Mbd1, Phlda3, Plk2, and Tubb4b) on FFPE-AAF and FFPE-CRE and our previous Open TG-GATE data (GTHCs, NGTHCs and NGTNHCs) [5].

Changes in gene expression in 12 marker genes induced by AAF and CRE
The read numbers of each of the 12 genes and Dazap2 and Ube2d3 genes in 15 samples as assessed using a MiniSeq are provided in Additional file 1. Log 2 -transformed results are presented in Additional file 2. The ratio (exp/cont) log 2 that was calculated after Dazap2normalization is shown in Additional file 3. Figure 1 presents the bar plots of log 2 -transformed RNA-Seq results (ratio of experimental group/control group in Additional file 3) for the12 genes and the statistical significance of these data were calculated using the Tukey test.
The results demonstrate that significant changes in gene expression in AAF compared to that of the control group were present in eight genes (Aen, Bax, Btg2, Ccng1, Gdf15, Mbd1, Phlda3 and Tubb4b) as determined using the Tukey test; however, CRE induced significant changes only in Gdf15 expression. The gene expression profiles of the nine genes (Aen, Bax, Btg2, Ccng1, Cdkn1a, Gdf15, Mbd1, Phlda3, and Plk2) were different after treatment with AAF or CRE.   Fig. 1 Changes in gene expression in 12 marker genes in the AAF and CRE groups. The mean of each control group was calculated as 0 (log 2 ) and ratio (exp/cont) log 2 was presented. The results of individual rats of control group (C), AAF and CRE were presented. The significance was analyzed using the Tukey test; *, P < 0.05; **, P < 0.01(each experimental group against the control group). The Tukey test: #, P < 0.05, ##, P < 0.01 (AAF against CRE) NGTNHCs (AA, ASP, CAF, CPA, CPP, DEX, DIA, IND,  PBZ, THE and TOL) at 24 h and 29 days in the Open TG-GATEs data according to PCA (Fig. 2). GTHCs were separated from NGTHCs and NGTNHHCs using PC1, where the approximate border line between the two groups was − 0.397 (dashed line in Fig. 2).

Discussion
High-throughput experiments can be performed using RNA-Seq, as a considerable number of samples and a considerable number of genes can be analyzed simultaneously. RNA-Seq results are highly reliable due to the ability of this method to allow for confirmation of the cDNA sequence. Unlike RNA-Seq, each experiment must be conducted for each sample using standard DNA microarray and qPCR technology, and only relative changes in fluorescent intensity can be measured. Ning et al. also suggested that rapid identification of genetic variants, somatic mutations, gene expression profiles, and epigenetic alterations with single-base resolution can be achieved using RNA-Seq [16]. It was reported that FFPE samples were highly similar to frozen samples with regard to sequencing quality metrics, differentially expressed genes profiles, and doseresponse parameters [6]. For the total RNA derived from FFPE samples in the present study, the 260 nm/230 nm ratio was acceptable (approximately 2.0); however, the RNA was short and exhibited a main peak that was below 200 bp. Given that the Illumina short-read sequencer was used and read length for RNA-Seq identification was only 51 bp, the obtained RNA size was sufficient to obtain results. In the present study, we obtained acceptable read number data from FFPE samples compared to that from the previously used freshly frozen samples [3]. We recommend the use of FFPE samples in laboratories, while further methodological development may be required for older lower-quality FFPE samples. Although animal welfare should be considered when performing chemical risk assessment for humans, a 28day toxicity test in rodents remains essential. Using freshly frozen or FFPE samples from 28 days test for RNA-Seq may prove useful.
The changes in gene expressions in 12 marker genes induced by AAF in FFPE samples that were analyzed by FFPE RNA-Seq in the present study were similar to those in freshly frozen samples that were analyzed by DNA microarray (Open TG-GATEs) [4], and the results for eight of these genes (Aen, Bax, Btg2, Ccng1, Gdf15, Mbd1, Phlda3 and Tubb4b) were identical. The results indicating.
increases and decreases in the expression of three of these genes (Ccnf, Cdkn1a and Lrp1) were roughly similar. For the PCA data of FFPE-AAF, the average data of five rats in the present study (0.025% in food; approximate 12.5 mg/kg bw/day) were PC1: − 6.26 and PC2: − 2.25. The PCA data of the Open TG-GATEs of the low dose (30 mg/kg bw/day by gavage, average of three rats) at 29 days were PC1: − 5.23 and PC2: 0.17 (Additional file 5). Numerically, the PC1 score of both experiments were similar, and the PC2 scores both experiments were somewhat different. This Difference may be due to the observed differences in Cdkn1a and Plk2.
As approximately 90% of human carcinogens are genotoxic carcinogens [15], they remain serious threat to human health. We are developing an in vivo short-term genotoxic carcinogen screening method using gene expression profiles and PCA that is based on the toxic modes of action of various chemical compounds. As a next step, all genotoxic and non-genotoxic carcinogens and non-carcinogens could ideally be screened using gene expression profiles via an in vivo short-term assay.
PCA is an unsupervised learning algorithm. Ringnér wrote that "PCA is often incorporated into genomewide expression studies." He explained that "samples can then be plotted, making it possible to visually assess similarities and differences between samples and determine whether samples can be grouped" [17]. In the present study, we demonstrated that PCA analysis of our previous Open TG-GATEs study [4] was useful for predicting the genotoxicity and hepatocarcinogenicity of new chemicals. Here, we have successfully analyzed our FFPE-AAF and FFPE-CRE data against our previous Open TG-GATEs data. The PC1 results for AAF at 24 h and 29 days from Open TG-GATEs data (Additional file 5) were − 5.50 to − 3.79, and the PC1 results for FFPE-AAF (Additional file 5) were − 8.20 to − 4.57. The PC1 results for FFPE-AAF were consistently plotted in the expected areas. Differences in Cdkn1a and Plk2 in the present FFPE data compared to that from our previous Open TG-GATEs data [4] may be responsible for the lower PC2 numerical values obtained in the present study. The PC1 result for DEN24hL (Open TG-GATEs data) was exceptionally close to those of the NGTHCs and NGTNHCs group, and this may be due to the use of a single lower dose of 10 mg/kg bw, as the LD50 of DEN is 220 mg/kg bw by oral administration.
FFPE-CRE was plotted by PCA in the NGTHCs group from Open TG-GATEs data in the present study. Little is known regarding the in vivo genotoxicity of CRE in rat livers. Although CRE is mutagenic in S. typhimurium and weakly positive in in vitro sister chromatin exchange and chromosome aberrations [7], the present results and the results of the in vivo rat liver single strand DNA damage test [8] suggest that CRE is not genotoxic in rat livers.
Additionally, we can calculate the approximate value of PC1 as written in the Discussion and Appendix A3 [4]. The first principal component (Y1) is given by the linear combination of the variable X1, X2, −--, Xp. Y1 = a11X1 + a12X2 + … + a1pXp where a1p is the eigenvector, that can be calculated with the PCA program in R, and Xp is the canonicalized logarithmic (log2)-transformed gene ratio (exp/cont), [(χ-μ)/σ].
When we calculate all 24-h and 29-day data from the Open TG-GATEs data using R, a11, −--, a1p of PC1 are a ( Users can calculate their PC1 (Y1) score by introducing their xp into the following equation: read numbers was assured according to a wide range of RNA-Seq experiments. In a previous study [3], we analyzed 12 marker genes; however, the results for Gdf15 could not be used because the read number of Gapdh occupied a major portion of the read numbers and the read number of Gdf15's was too low in some samples. In the present study we endeavored to determine a new normalization gene candidate that exhibited lower expression than that of Gapdh. We chose Dazap2 and Ube2d3 that were selected from the Open TG-GATEs data as the most stable genes (Additional file 7) (DNA microarray data, [https://toxico.nibiohn.go.jp]). These genes did not exhibit significant increases or decreases and their fluorescent expression intensity was approximately 1/3 that of Gapdh in the Open TG-GATEs data. We used Dazap2 as a normalized gene in the present study; however, it was not necessarily an ideal normalized gene in the present study, as the read number for Dazap2 was 7.53 to12.4% of the total read number in the control group (lower than expected), 7.71 to 14.1% in the CRE group and 2.49 to 5.30% in the AAF group due to the observation that marker genes such as Aen, Bax, Btg2, Ccng1 and Phlda3 were remarkably increased in the AAF group. The read number for Ube2d3 was lower than that of Dazap2, and this gene was not suitable as a normalized gene. We will continue to investigate more suitable normalization genes for use in our RNA-Seq analysis.

Conclusion
The present results showed that FFPE RNA-Seq of rat liver provides reliable genes expression data comparable with RNA-Seq with fresh frozen tissue. It represents a useful tool for discovery and validation of biomarkers.