Mutation assay using single-molecule real-time (SMRTTM) sequencing technology
Genes and Environment volume 37, Article number: 15 (2015)
We present here a simple, phenotype-independent mutation assay using a PacBio RSII DNA sequencer employing single-molecule real-time (SMRT) sequencing technology. Salmonella typhimurium YG7108 was treated with the alkylating agent N-ethyl-N-nitrosourea (ENU) and grown though several generations to fix the induced mutations, the DNA was extracted and the mutations were analyzed by using the SMRT DNA sequencer.
The ENU-induced base-substitution frequency was 15.4 per Megabase pair, which is highly consistent with our previous results based on colony isolation and next-generation sequencing. The induced mutation spectrum (95% G:C → A:T, 5% A:T → G:C) is also consistent with the known ENU signature. The base-substitution frequency of the control was calculated to be less than 0.12 per Megabase pair. A current limitation of the approach is the high frequency of artifactual insertion and deletion mutations it detects.
Ultra-low frequency base-substitution mutations can be detected directly by using the SMRT DNA sequencer, and this technology provides a phenotype-independent mutation assay.
Mutation assays capable of detecting somatic mutations at very low frequencies are important in the areas of environmental mutagenesis, carcinogenesis, epidemiology, and regulatory science. They are especially important in the context of safety evaluation of newly developed drugs or industrial chemicals. Although many mutation assays have been developed, most rely on some kind of phenotypic selection, which involves time-consuming procedures and is potentially biased. We previously reported a phenotype-free mutation assay using next-generation DNA sequencing . In that study, we treated a Salmonella typhimurium strain with a mutagen to induced and fix mutations, followed by colony isolation and whole-genome sequencing of the colonies. The induced mutations were successfully detected in silico using bioinformatics software. That strategy is summarized in Fig. 1 and named the ‘Colony-NGS method’. Although the approach is simple and reliable, difficulties still remain when it is applied to mammalian cells. This is because: 1) the colony-isolation step is much more technically challenging in the case of mammalian cells compared to bacterial cells, and 2) the mammalian genome is diploid and hundreds of times larger than the bacterial genome, which limits deep coverage in sequencing. Furthermore, the Colony-NGS method is not applicable to bio-monitoring of somatic mutations in tissues of experimental animals or clinical specimens from patients because it is impossible to do the colony isolation from those sources.
Recently, ‘Duplex Sequencing’ methodologies, which enable detecting a single mutation among >1 × 107 nucleotides by using a general next-generation DNA sequencing (NGS) technology, have been developed [2,3]. This is a very promising strategy for application to bio-monitoring of somatic mutations. However, here in this paper we demonstrate an alternative approach by using single-molecule real-time sequencing.
The PacBio RS II DNA sequencer (Pacific Biosciences, Inc.) is a recent innovation  based on a single-molecule real-time (SMRT) technology. Since it is able to read the sequence of a single DNA molecule, it can in principle detect the mutations present in the molecule just by sequencing it accurately, as summarized in Fig. 1 (named the ‘SMRT method’) . A significant advantage of this strategy is that the colony isolation step is unnecessary, so that the approach should be applicable to any cell line and specimen from experimental animals, patients and environmental animals.
However, a drawback of this technology is the accuracy of the sequencing data it generates. At present, the error rate in raw reads of the PacBio sequencer is exceedingly high (~15%). To help overcome this problem, the ‘SMRTbellTM template’, in which single-stranded DNA loops are ligated to both ends of a double-stranded DNA, is used to direct sequencing of the same DNA molecule repeatedly . The greater the number of repeat reads so as to generate a consensus read of multiple sub-reads from the same single circular DNA template – i.e., a circular consensus sequence (CCS) read – the more accurate the sequencing result . In this study, we validated that we can detect ultra-low frequency mutations by using the SMRT method with the CCS strategy.
Materials and methods
ENU (CAS No. 759-73-9) and dimethyl sulfoxide (DMSO; CAS NO. 67-68-5) were purchased from Wako (Osaka, Japan). The test strain Salmonella typhimurium YG7108, hisG46 rfa ΔuvrB bio ada ST::kan r ogt ST::cat r, which is highly sensitive to alkylating agents, was used in this study .
Mutagen exposure and mutation fixation
The exposure method followed the Ames test 20-min pre-incubation procedure . The YG7108 strain was cultured overnight at 37 °C in nutrient broth (No.2, OXOID) containing 25 μg/mL kanamycin and 10 μg/mL chloramphenicol. Phosphate buffer (0.5 mL), DMSO or 2.5 mg/mL ENU (0.1 mL) and the overnight culture (0.1 mL) were mixed in a tube in that order and incubated for 20 min at 37 °C with gentle shaking at 100 rpm. A 1-μL portion was added into 10 mL of LB medium and cultured at 37 °C for 13 h to fix mutations, after which DNA was extracted. The rest of the mixture was poured onto a minimum agar plate in 2 mL of 0.6 % soft agar and incubated for two days at 37 °C, following which the revertant colonies were counted.
Preparation of SMRTbellTM templates and sequencing
The genomic DNA samples (5 μg each) were sheared to 50-1000 bp (average 280 bp) fragments by using a Covaris Shearing Device, and used to construct a PacBio DNA library using a SMRTbell Template Prep Kit 1.0 following the manufacturer’s guidelines (http://www.pacb.com/samplenet/PC_250bp_Amplicon_Library_Preparation_and_Sequencing.pdf). Each sample was sequenced on the PacBio RS platform on a single SMRT Cell with C2-P4 chemistry. The base calling and CCS read generation was carried out using PacBio’s instrument control and SMRT Analysis software.
In silico mutation detection
Mutation detection was carried out by using CLC Genomics Workbench software (ver 7). The fastq files of raw data and CCS were imported into the software. The CCS fastq files were mapped to reference Salmonella genome sequences: NC_003197 (S. typhimurium str. LT2 chromosome, complete genome, 4,857,432 bp), and CP003387 (S. typhimurium str. 798 plasmid p798_93, complete sequence, 93,877 bp). The point mutations were detected using the Basic Variant Detection command (first screening). The essential parameters of the Basic Variant Detection were: ploidy = 1, minimum coverage = 1, minimum count = 1, minimum frequency (%) = 0.1, neighborhood radius = 5, minimum central quality = 40, minimum neighborhood quality = 40. The mutated reads were searched in the CCS fastq files and their corresponding raw reads were extracted from the raw-fastq files. The extracted raw reads were combined in a new fastq file and mapped to the Salmonella reference sequence again. The raw reads were manually checked and mutation calls were counted with the help of the viewer function of the CLC Genomics Workbench software.
The test strain Salmonella typhimurium YG7108, which is highly sensitive to alkylating agents, was treated with ENU (Fig. 2a) or its solvent DMSO, followed by dilution and growth overnight in LB medium to fix mutations. Genomic DNA was extracted from the overnight culture. SMRTbell templates were prepared from the DNA samples, with an average insertion size of 280 bp. Note that no PCR amplification step was carried out during preparation of the SMRTbell templates, which is essential to minimize the occurrence of artifactual mutations. The templates were subjected to the sequencing reaction in the PacBio RS II platform, and fastq files were generated from the raw data (contains all the sequence information of multiple sub-reads) and CCS data (contains only the consensus sequence). The threshold of the CCS was a pass time (the number of times the same molecule was repeatedly read) of 10 and 99% accuracy.
The CCS-fastq files were imported to CLC Genomics Workbench software (ver.7). In total, 8.09 and 8.56 Mbp of the sequence data were obtained from the control and ENU-treated samples, respectively. The CCS reads were mapped to the reference sequence of Salmonella typhimurium and the point mutations were detected in silico. Improbably large numbers of insertions and deletions were called in both the control (405 insertions and 424 deletions) and ENU-treated (367 insertions and 1276 deletions) samples, respectively (Table 1). We had previously analyzed mutations induced in the same bacterial strain with the same exposure protocol by isolating colonies and carrying out whole-genome sequencing. In that previous study, we analyzed the entire genome of each of 4 clones (4.8 Mbp of Salmonella genome × 4 clones = 19.6 Mbp search region), but did not detect any insertions and deletions in either the control or ENU-treated samples (unpublished observations). Thus we concluded that the insertions and deletions called in this present study are not reliable and most probably artifacts. In the case of base substitutions, however, 19 and 160 mutations were called in the control and ENU-treated samples, respectively (Table 1). While these frequencies are consistent with the results of our previous study, they are still higher than the estimated values. Thus we decided to proceed with a confirmation step regarding the base substitutions.
Next, we obtained sequence IDs of the CCS reads in which the base substitutions were called at the first screening. Then we searched the sequence IDs in the raw fastq files and extracted the corresponding information of the sequence IDs, and made new fastq files which contained the raw repeated sequence data of the molecules in which the base substitution was possibly present. The newly edited fastq files were mapped to the same Salmonella reference sequence. Typical examples of mapped raw reads are shown in Fig. 2b. In the sequencing reaction using the SMRTbell template, the plus and minus strands of a double-stranded DNA molecule are read alternately, thus almost equivalent numbers of forward and reverse reads were obtained. In cases of real mutations, the same base substitutions will be called in both the forward and reverse reads. In cases where different base substitutions were called between the forward and reverse reads, these must be templates bearing a mismatch. In cases where a specific base was clearly called for on one strand but a variety of bases was called for the opposite strand, this may indicate the existence of persistent DNA damage.
After carefully checking the raw data, the base substitution mutations called in Table 1 were counted again and shown in Tables 2, 3 and 4. After recalculation, the numbers of ‘real’ base substitution mutations were 0 and 132 in the control and ENU-treated samples, respectively (Table 4). The rest were likely due to mismatches, DNA damage, SNPs that the strain originally possessed, calls at the edges of the mapped read which did not have sufficient coverage, and so on.
We compared the mutation data by this method (SMRT method) with our previous result from colony isolation and whole-genome sequencing (Colony-NGS method). In the ENU-treated samples, the mutation frequencies estimated by the SMRT method (15.4/Mbp) and the Colony-NGS method (12.7/Mbp) were very similar and not significantly different by the binomial test (Fig. 3a). The mutation spectrum obtained by the SMRT method showed that 95% were G:C → A:T transitions and 5% were A:T → G:C transitions (Table 3 and Fig. 3c). This mutation spectrum is well consistent with the ENU signature shown in a previous report  and our previous data obtained by the Colony-NGS method (unpublished observations). As for the control (DMSO treated) samples, no mutation was observed in both the SMRT and Colony-NGS methods, thus the mutation frequency was calculated as less than 0.12 per Mbp (1 mutation/8.09 Mbp) and less than 0.05 per Mbp (1 mutation/19.6 Mbp), respectively (Fig. 3a).
In this paper, we successfully detected ultra-low frequency base substitution mutations by using a single-molecule real-time sequencer with the SMRTbell strategy. In principle, this strategy is applicable to any DNA samples such as from bacteria, cell lines, tissues of experimental animals, specimens from patients, and enables us to quantify the mutation frequency and the mutation signature of such DNA samples.
The significant merit for using SMRTbell strategy is that we can sequence each plus and minus strand of a double stranded DNA, thus we are able to distinguish ‘real mutations’ from ‘mismatches’ or ‘DNA damages’. Intriguingly, we could detect not only fixed mutations but also mismatches in the Salmonella DNA. In this current procedure, a half of the total mismatches are expected to be detected. From our data, the occurrence of the mismatches in the Salmonella genome was roughly estimated as 8 - 10. However, to quantify mismatches absolutely, a new bioinformatics tool should be developed. We also detected 4 possible ‘DNA damages’ only in the ENU-treated sample (Table 4). In Table 3, the raw read judged as ‘Damage’ seems to have lower coverage number than ‘mutation’ or ‘mismatch’. This would reflect the presence of the DNA damages in the SMRTbell templates. Note that, the current procedure is not designed for detection of the DNA damages, thus the detected number would be far less than that of real DNA damages.
The background mutation frequency of the SMRT method in this study was less than 0.12 per Mbp which was comparable to the background level of ‘Duplex Sequencing’ methodologies [2,3]. The background level would depend on the threshold of pass time and accuracy of the CCS. The threshold values used in this study were the most strict values in the current version of PacBio’s instrument control and SMRT Analysis software. The real mutation frequency of the control sample was estimated by combining the Colony-NGS and Ames assay results. In the Ames assay using the same exposure procedure, the mutation frequency of the control sample was 1/685 of that of the ENU-treated sample (Fig. 3b), thus the mutation frequency of the control sample was estimated as 12.7/685 = 0.02 per Mbp. Therefore, more sequencing data (at least 50 Mbp) are required to detect mutations in the control sample.
As for insertion and deletion type mutations, this strategy cannot be used at present because of the very high background level of indels. The reason why more deletions were observed in the ENU-treated sample may be because remaining DNA damages influenced the sequence reaction. Ongoing improvements to the hardware and software of the SMRT sequencer and to the bioinformatics of mutation detection will likely overcome this problem in the near future.
Ultra-low frequency base-substitution mutations can be detected directly by using the SMRT DNA sequencer, and this technology provides a useful phenotype-independent mutation assay.
Availability of supporting data
The sequence data used in this study are available at ‘DDBJ Sequence Read Archive’ with the following accounts.
BioSample: SAMD00029313 (data of DMSO-treated sample), SAMD00029314 (data of ENU-treated sample).
Matsuda T, Takamune M, Matsuda Y, Yamada M. A pilot study for the mutation assay using a high-throughput DNA sequencer. Genes and Environ. 2013;35:53–6.
Schmitt MW, Kennedy SR, Salk JJ, Fox EJ, Hiatt JB, Loeb LA. Detection of ultra-rare mutations by next-generation sequencing. Proc Natl Acad Sci U S A. 2012;109:14508–13.
Kennedy SR, Schmitt MW, Fox EJ, Kohrn BF, Salk JJ, Ahn EH, et al. Detecting ultralow-frequency mutations by duplex sequencing. Nat Protoc. 2014;9:2586–606.
Eid J, Fehr A, Gray J, Luong K, Lyle J, Otto G, et al. Real-time DNA sequencing from single polymerase molecules. Science. 2009;323:133–8.
Matsuda T. Anticipated mutation assay using single-molecule real-time (SMRT TM) sequencing technology. Genes and Environ. 2010;32:21–4.
Travers KJ, Chin CS, Rank DR, Eid JS, Turner SW. A flexible and efficient template format for circular consensus sequencing and SNP detection. Nucleic Acids Res. 2010;38, e159.
Jiao X, Zheng X, Ma L, Kutty G, Gogineni E, Sun Q, et al. A Benchmark Study on Error Assessment and Quality Control of CCS Reads Derived from the PacBio RS. J Data Mining Genomics Proteomics. 2013;4.
Yamada M, Matsui K, Sofuni T, Nohmi T. New tester strains of Salmonella typhimurium lacking O6-methylguanine DNA methyltransferases and highly sensitive to mutagenic alkylating agents. Mutat Res. 1997;381:15–24.
Maron DM, Ames BN. Revised methods for the Salmonella mutagenicity test. Mutat Res. 1983;113:173–215.
Richardson KK, Richardson FC, Crosby RM, Swenberg JA, Skopek TR. DNA base changes and alkylation following in vivo exposure of Escherichia coli to N-methyl-N-nitrosourea or N-ethyl-N-nitrosourea. Proc Natl Acad Sci U S A. 1987;84:344–8.
We thank Ken Osaki, Kenichi Dedachi and Nishiho Makino of TOMY DIGITAL BIOLOGY CO., LTD for discussions about data analysis, and Makiko Takamune for her technical assistance. This research was supported by KAKENHI (2321006) and Health and Labour Sciences Research Grants for Research on Development of New Drugs (KHB1209 and H26-souyaku-ippan-008).
The authors declare that they have no competing interests.
TM conceived and designed the experiments and wrote the manuscript. SM carried out statistical evaluation. MY designed and carried out the experiments. All authors read and approved the final manuscript.
About this article
Cite this article
Matsuda, T., Matsuda, S. & Yamada, M. Mutation assay using single-molecule real-time (SMRTTM) sequencing technology. Genes and Environ 37, 15 (2015). https://doi.org/10.1186/s41021-015-0017-5
- PacBio RSII DNA sequencer
- Single-molecule real-time (SMRT) sequencing technology
- Mutation assay