Genetic mystery: 50,000 DNA "knots" found in the human genome
Опубликовано 2024-09-28 23:00 , обновлено 2024-09-29 00:00
DNA is famously known for its double helix structure, but the human genome also contains more than 50,000 unusual knot-like DNA structures known as i-motifs, according to a team of experts at the Garvan Institute of Medical Research.
In a study published in the EMBO Journal, the scientists presented the first comprehensive map of these unique DNA structures, shedding light on their potential roles in gene regulation, particularly in relation to diseases such as cancer.
Abundance and distribution of i-motifs
“DNA i-motif structures are formed in the nuclei of human cells and are believed to provide critical genomic regulation,” noted the study authors.
“While the existence, abundance, and distribution of i-motif structures in human cells has been demonstrated and studied by immunofluorescent staining, and more recently NMR and CUT&Tag, the abundance and distribution of such structures in human genomic DNA have remained unclear.”
Pinpointing the mysterious DNA structures
Back in 2018, Garvan scientists achieved a breakthrough by being the first to directly visualize i-motifs within living human cells, as they detailed in the journal Nature. The team developed a novel antibody tool specifically designed to recognize and bind to i-motifs.
The current research builds on this earlier work, utilizing the same antibody to pinpoint the locations of i-motifs throughout the entire human genome.
“In this study, we mapped more than 50,000 i-motif sites in the human genome that occur in all three of the cell types we examined,” said senior author Daniel Christ, head of the Antibody Therapeutics Lab and Director of the Centre for Targeted Therapy at Garvan.
“That’s a remarkably high number for a DNA structure whose existence in cells was once considered controversial. Our findings confirm that i-motifs are not just laboratory curiosities but widespread – and likely to play key roles in genomic function.”
I-motifs are not randomly distributed
I-motifs are distinctive DNA structures that differ from the well-known double helix. They form when stretches of cytosine bases on the same DNA strand pair with each other, creating a four-stranded, twisted configuration that protrudes from the double helix.
The research revealed that i-motifs are not randomly distributed. The experts discovered that they are concentrated in key functional regions of the genome, including areas that control gene activity.
The role of i-motifs in gene activity
The experts determined that the formation of the distinctive DNA structures is cell-cycle and pH dependent. “Furthermore, we provide evidence that i-motif structures are formed in regulatory regions of the human genome, including promoters and telomeric regions.”
Study lead author Cristian David Peña Martinez, a research officer in the Antibody Therapeutics Lab, explained that i-motifs are associated with genes that are highly active during specific times in the cell cycle, which suggests they play a dynamic role in regulating gene activity.
“We also found that i-motifs form in the promoter region of oncogenes, such as the MYC oncogene, which encodes one of cancer’s most notorious ‘undruggable’ targets. This presents an exciting opportunity to target disease-linked genes through the i-motif structure,” he added.
Implications for hard-to-treat cancers
According to study co-author Sarah Kummerfeld, an assistant professor and chief scientific officer at Garvan, the widespread presence of i-motifs near these “holy grail” sequences involved in hard-to-treat cancers opens up new possibilities for diagnostic and therapeutic approaches.
“It might be possible to design drugs that target i-motifs to influence gene expression, which could expand current treatment options,” said Professor Kummerfeld.
Significance of the study
Professor Christ pointed out that the successful mapping of i-motifs was made possible by Garvan’s world-leading expertise in antibody development and genomics.
“This study is an example of how fundamental research and technological innovation can come together to make paradigm-shifting discoveries,” said Professor Christ.
“Our study provides foundational knowledge and resources relating to the location and distribution of i-motifs in human genomic DNA, representing potential targets for future diagnostic and therapeutic strategies,” the authors concluded.
i-Motif DNA: structural features and significance to cell biology
Abstract
The i-motif represents a paradigmatic example of the wide structural versatility of nucleic acids. In remarkable contrast to duplex DNA, i-motifs are four-stranded DNA structures held together by hemi- protonated and intercalated cytosine base pairs (C:C+). First observed 25 years ago, and considered by many as a mere structural oddity, interest in and discussion on the biological role of i-motifs have grown dramatically in recent years. In this review we focus on structural aspects of i-motif formation, the factors leading to its stabilization and recent studies describing the possible role of i-motifs in fundamental biological processes.
INTRODUCTION
In addition to the B-form DNA double helix (1), DNA can adopt a number of alternative non-B DNA structures (2,3) including G-quadruplex (G4s) (4) and intercalated motif (i-motif) structures (5) (Figure (Figure1).1). The in vivo existence of these structures in human cells remained uncertain until their recent visualization using antibody fragments that recognize them in a structure-specific manner (6,7). These findings provided key evidence that i-motif structures may be formed in regulatory regions of the human genome, and support the notion that G4 and i-motif structures may play complementary roles in the regulation of gene expression. In this review we focus on structural aspects of i-motif formation and recent studies that describe the possible role of i-motifs in cell biology. We will not extensively review the current uses of i-motif structures in nanotechnological applications, but we will provide a brief summary and refer the reader to excellent reviews on this subject (8–10).
DNA i-MOTIF-STRUCTURAL FEATURES
The first DNA i-motif was characterized by Gehring et al. for the hexamer sequence d(TCCCCC) forming an intercalated quadruple-helical tetramolecular structure under acidic conditions (5). It consists of two parallel-stranded duplexes intercalated in an antiparallel orientation and held together by hemi-protonated cytosine-cytosine+ (C:C+) base pairs (Figure (Figure1A1A and B) (5,11). Since this report, a number of i-motif structures has been determined by crystallographic and NMR methods (5,11). As in the case of G4 structures, i-motifs may fold in an intermolecular fashion from the association of two (dimers) or four (tetramers) separate DNA strands, or form an intramolecular structure (monomer) due to the spatial arrangement of four different C-tracts within the same strand.
i-Motifs have other very peculiar structural features. The distance between consecutive base pairs is 3.1 Å, and the right-handed helical twist angle is ∼12–20°, both significantly smaller than those measured in B-DNA (3.4 Å and 36°) (12). The intercalation of base pairs from two parallel duplexes leads to a structure with two major (wide) grooves and two minor (narrow) grooves (Figure (Figure1C).1C). The two minor grooves are extremely narrow (3.1 Å versus ∼5.7 Å for B-DNA), giving rise to a number of inter-strand short distances along the sugar phosphate backbones (12). These parameters result in destabilizing interactions due to close phosphate-phosphate distances, which are partially compensated by favorable inter-strand sugar-sugar contacts. Since the minor grooves are formed between antiparallel strands, sugar-sugar contacts occur in two alternative types: face-to-face (ff) and back-to-back (bb) steps. In the ff steps, the ring oxygens (O4′) face one another (5′-side); however, in the bb steps, the C3′-C4′ edges of the sugar moieties are oriented close to each other (3′-side, Figure Figure1C1C).
Due to the spatial arrangement of C:C+ base pairs, i-motif structures can be classified in two different intercalation topologies known as 3′E and 5′E (Figure (Figure1D)1D) (13). When the outmost C:C+ base pair is at the 3′-end, the structure is known as 3′E, while in the 5′E topology, the terminal C:C+ base pair is at the 5′-end (14) (Figure (Figure1D).1D). In absence of additional interactions and for a given number of C:C+ base pairs, the 3′E topology is more stable than the 5′E topology due to the extended sugar-sugar contacts along the narrow grooves (15).
Due to the requirement of hemi-protonated base pairs, it was thought that i-motif structures could only fold at acidic pH values; however, several recent studies have shown that i-motif structures can form at neutral pH depending on the sequence and environmental conditions (16,17). Thus, i-motif structures were observed at neutral pH and low temperatures under molecular crowding conditions (18,19), under negative superhelicity (20), in the presence of silver (21) or copper (I) cations (22), and inside silica nano-channels (23). Chemical modifications such as 2′-deoxy-2′-fluoro-arabinocytidine (2′F-araC, known as 2′F-ANA) also induce formation of DNA i-motif structures under neutral conditions (24,25).
FACTORS AFFECTING THE STABILITY OF i-MOTIF STRUCTURES
Like other nucleic acid structures, i-motif stability depends on many factors, including sequence nature, temperature, and ionic strength. Unlike B-DNA or G-quadruplexes, in which π stacking interactions between sequential nucleobases play an essential role in their stability, the intercalative geometry between consecutive base pairs in i-motif structures gives rise to very little overlap between the six-membered aromatic pyrimidine bases (Figure (Figure1B1B and C). Although C:C+ base pairs involve the favorable stacking of exocyclic carbonyls and amino groups in an antiparallel fashion, theoretical calculations showed that this and other favorable stacking interactions between consecutive C:C+ base pairs barely compensate the electrostatic repulsion between their charged imino groups (15). Of particular relevance to understand the factors affecting i-motif stability are the multiple studies recently reported on chemically modified i-motif structures, which we discussed in detail below.
The C:C+ base pair
The hemi-protonated C:C+ base pairs are the key interactions for i-motif stability. The three hydrogen bonds of the C:C+ base pair confer a high stability. Computer calculations indicate that the base-pairing energy (BPE) for the C:C+ base pair is 169.7 kJ/mol, higher than BPEs of canonical Watson-Crick G·C (96.6 kJ/mol) and neutral C·C (68.0 kJ/mol) (26). The central hydrogen bonding in a hemiprotonated C:C+ (N3···H···N3) base pair has been described as a double-well potential where the proton delocalizes/oscillates between the two wells (27). Leroy et al. estimated the proton-transfer rate to be 8 × 104 s−1 (28). The NMR structural study of an intramolecular telomeric i-motif (PDB code: 1ELN) showed that the C:C+ base pairs are planar and the N3–N3 distance is around 2.6–2.8 Å (29). Most importantly, protonation at the N3 position produces a positively charged base pair. NMR spectroscopy and computational analyses performed by Lieblein et al. suggested that the N3···H+···N3 bonds possess an asymmetric double-well potential and that the proton in one C:C+ base pair tends to adopt a position that leads to the largest distance with respect to the proton of neighboring C:C+ base pair (27).
The effect of chemical modifications in the C:C+ base pairs has been investigated in different contexts. Wadkins et al. indicated that a cytosine modification might have different effects on i-motif stability depending on the environmental conditions (30). For instance, substitution of cytosine by its halogenated analogues (Figure (Figure2A),2A), such as 5-fluoro, 5-bromo and 5-iodo derivatives, increases stability of i-motifs at acidic conditions (31). Furthermore, cytosine methylation at position 5 leads to an increase in the pH of mid transition (i.e. pHT) and Tm of i-motif structures (Δ pHT = +0.11 for two sequences containing two 5-methylcytosine substitutions), while hydroxymethylation leads to a decrease in pHT (ΔpHT = –0.2) and Tm (Figure (Figure2A)2A) (32).
Recently, Waller's group investigated the effect of 2′-deoxyriboguanylurea (GuaUre-dR) (Figure (Figure2A)2A) on human telomeric i-motif formation (33). GuaUre-dR is a breakdown product of decitabine, a cytidine analogue that acts as an epimutagen and a chemotherapeutic agent. Despite the fact that this analogue can form a base pair with cytosine without protonation, the modified telomeric sequences exhibited a decrease in the pHT (5.8) compared to the unmodified i-motifs (pHT = 6.1) (33). Mir et al. studied the effect of pseudoiso-deoxycytidine (psC) (Figure (Figure2A)2A) on the stability of head-to-head and head-to-tail dimeric i-motif structures (34). An increase of the stability of the studied i-motif structures was observed when the neutral psC:C base pair was located at the end of a C:C+ stack. However, protonated base pairs are required (psC:C+ or psC:psC+) when the psC modifications are located in central positions of the i-motif structures. The results from these two studies underline the importance of the electrostatic interactions in the i-motif stability, where the presence of positive charges in the core of the structures is a key factor for its stability.
Significant i-motif stabilizations have been reported recently by introducing phenoxazine 2′-deoxynucleotides (i-clamps) containing a C8-aminopropynol tether (Figure 2A). These nucleotides are able to form base pairs with protonated cytosines and, simultaneously, interact favorably with the phosphate backbone of the opposite strand. As in the case of psC substitutions, the effect depends on the position of the substitution, where more stabilization is observed when they are located at the 5′-end of the C-stacks (35).
Very recently, natural base lesions were introduced in the TAA loops and in the core cytosines of the human telomeric C-rich strand d(CCCTAA)3CCCT (36). This study reveals that i-motifs containing apurinic sites (apA) and 8-oxoadenine (8oxoA) substituting adenine, and 5-hydroxymethyluracil (5hmU) (Figure (Figure2A)2A) substituting thymine exhibit thermal stabilities that depend on the position of these substitutions in the sequence. Thus, in comparison to the unmodified structure, ΔTm values for i-motifs having apA substitutions ranges between −1.7 and +5.0°C, those having 8oxoA vary from −1.4 to +3.5°C, and ΔTm for the ones containing 5hmU is +0.4°C on average. On the contrary, the presence of uracil instead of cytosine (due to the enzymatic or spontaneous deamination of cytosine) substantially reduces i-motif thermal stability. The extent of this destabilization depends on the position of the deaminated cytosines. Whereas the loss of outer cytosines produces a decrease of 11.6–17.0°C in melting temperatures, the destabilization produced by the substitution of inner cytosines is even more pronounced due to the loss of a C:C+ pair intercalated between two inversely oriented pairs (36).
Sugar and phosphate backbone modifications
Sugar modifications
The i-motif folding architecture is destabilized by the close inter-strand phosphorus-phosphorus distances (5.9 Å) along the narrow groove. This destabilizing factor is compensated by an inter-strand favorable interaction network between the deoxyribose sugar moieties along the minor groove (12). Additional stabilizing interactions along the narrow groove results from C-H1′···O4′ interactions within each pair of antiparallel strands (Figure (Figure1C).1C). Moreover, studies have shown intra-nucleoside hydrogen bond between O4′ and H6, and O4′ bonding simultaneously with H1′ and H4′ (12).
Sugar-backbone chemical modifications have shown to be very useful to assess these interactions and modulate i-motif stability. These include cytidines from RNA (37,38), 2′F-RNA (39), locked (LNA) (40) and unlocked (UNA) nucleic acids (41), 2′-arabinonucleic acids (ANA) (42), acyclic threoninol nucleic acids (aTNA) (43), and more recently, 2′-fluoro-arabinonucleic acids (2′F-ANA) (24,25,44) (Figure (Figure2B2B).
In DNA i-motif structures, the conformation of the glycosidic angles is anti and the deoxyribose sugars mainly adopt C3′-endo (North) puckering. These properties led researchers to investigate the ability of RNA to form i-motifs, since the ribose sugar favors the C3′-endo conformation, in addition to RNA-like modifications favoring the North conformation such as 2′F-RNA, LNA, and 2′-OMe. Reports have shown that oligoribonucleotides form less stable i-motif structures compared to their oligodeoxynucleotide counterparts. Initial studies reported a ΔTm of 29°C between i-motif structures formed by an 18-mer DNA sequence, (d(CCCTCCCTTTTCCCTCCC), 54°C) versus the corresponding 18-mer RNA sequence (r(CCCUCCCUUUUCCCUCCC), 25°C) (38). Since the uracil substituted DNA sequence exhibited a similar melting temperature to the thymine DNA counterpart (56°C), the lower stability of RNA i-motif was ascribed due to the presence of ribose 2′-OH groups (38).
Collin and Gehring studied the effect of replacing one or more DNA residues in dTCCCCC (Tm = 49°C at pH 4.6) by RNA (CR) (37). This study suggested that the destabilization observed due to an RNA residue is almost equal to the destabilization due to the loss of a C:C+ base pair since dTCRCCCC and dTCCRCCC exhibited a melting temperature of 41°C compared to 39°C for dTCCCC. When two hydroxyl groups are juxtaposed in back-to-back step, dTCCRCRCC the distance between them is 0.35 nm compared to 0.65 nm in face-to-face steps, leading to less stable i-motif structures with a Tm = 29.5°C (37).
The r(UC5) sequence led to the formation of two i-motif structures adopting different intercalation topologies. The intercalation topology of the major conformation possesses one less 2′-OH/2′-OH repulsive contact compared to the minor conformation. The free energy of the RNA i-motif per C:C+ base pair (-4 kJ/mol) is almost half that of the DNA i-motif (45). Despite of its modest stability, the existence of RNA i-motifs inside the cells has been suggested in vivo. In a very recent study showing i-motif foci by immunostaining with a structure-specific antibody (iMab) (6), a number of i-motif foci were still observed in the nuclei of HeLa cells upon treatment with DNAse. The authors suggest that those remaining foci might be due to DNA i-motifs somehow inaccessible to DNAse or to the detection of RNA i-motifs, which are not affected by DNAse degradation. In addition, the authors observed a moderate but significant decrease of i-motif foci when treating the cells with RNAse. These observations suggest that RNA i-motif may exist in the cell, and it may have a biological role.
Like RNA, 2′F-RNA (Figure (Figure2B)2B) also adopts a C3′-endo sugar pucker and the fluorine is sterically comparable to a hydroxyl group. Therefore, the main difference between 2′F-RNA and RNA lies in fluorine's lower hydrogen-bonding capability compared to the 2′-OH group. The introduction of a single 2′-fluorine (Cf) enhances the thermal stability of d(TCCCfCC) i-motif (48.8°C) with respect to d(TCCCCC) i-motif (44.8°C) (39). Two consecutive modifications, d(TCCfCfCC), lead to 0.6°C reduction compared to the unmodified strand. On the other hand, ribose modifications in the same positions lead to a destabilization of 20°C. Given the similarity in the size of fluorine and hydroxyl group, the destabilization observed with RNA substitutions is not due to steric clashes but likely due to the solvation of the hydroxyl groups compared to a limited solvent accessibility for the fluorinated minor groove.
LNA is another RNA mimic locked in the C3′-endo conformation with a 2′-OCH2-4′-C methylene bridge (Figure (Figure2B).2B). LNA was also introduced in the hexamer model sequence dTC5 which forms stable intermolecular i-motif structure at pH 4.0 (40). Some of the partially modified sequences exhibit similar (d(TLLCCC), d(TLCLCC), and d(TCLCLC)) or higher (d(TCCLLC)) stability compared to unmodified i-motif, while the fully modified strand does not form an i-motif. The stability in LNA-modified i-motifs depends on the position of LNA in the sequence and is due to the extended hydrogen-bonding network at the back-to-back steps involving the 2′-OCH2-4′-C bridge. Therefore, when LNA is introduced in certain positions, the 2′-OCH2-4′-C bridge creates additional hydrogen bonds that neutralize the unfavorable van der Waals contacts in the minor groove.
Interestingly, substituting the 2′-OH in RNA with its 2′-arabinose epimer leads to stable i-motif structures since the OH group is placed in the wide major groove (42). The different stability observed between riboses and arabinoses confirms the critical significance of sugar–sugar contacts in the minor groove on the stabilization of i-motifs. We recently demonstrated that incorporating 2′F-araC (Figure (Figure2B)2B) modifications in i-motif structures leads to significant stabilization over a wide pH range. 2′F-araC is one of very few chemical modifications that stabilize i-motif structures at neutral conditions not only at the ends of the C-tracts, but also in central positions (24). Furthermore, the 2′F-araC modification stabilized intermolecular centromeric and intramolecular telomeric i-motifs in all the positions tested (24). Despite the fact that the nucleoside 2′F-araC exhibits a lower pKa compared to deoxycytidine (3.9 versus 4.4, respectively), the pHT in the modified i-motif structures was remarkably higher (+0.7 for centromeric sequences and +0.8 for telomeric sequences), allowing for the observation of these structures at neutral pH. NMR structural determination revealed that the 2′F-araC residues adopt a C2′-endo sugar pucker, instead of the C3′-endo conformation that is usually found in unmodified structures, with the fluorine atom oriented in the major groove. Therefore, 2′F-araC modifications do not perturb the hydrogen-bonding network that provides stability to the structure, but instead lead to additional electrostatic interactions that are absent in the unmodified structure. The above results allow the utilization of i-motif structures in several applications, most importantly in biological assays that require physiological temperature and pH conditions. Following these interesting observations, Aviñó et al. investigated the effect of (2′S)-2′-deoxy-2′-C-methyl-cytidine units (CMeUp) (Figure (Figure2B)2B) on telomeric i-motif structures (46). This modification adopts mainly a C3′-endo sugar pucker with the methyl group at C2′ in the ‘arabino’ (or β) orientation. CMeUp was tolerated in i-motif structures; however, stabilization was less pronounced compared to 2′F-araC. This result confirms the role of favorable electrostatic interactions induced by the electronegativity of the fluorine atom in the enhanced stability provoked by 2′F-araC.
With the purpose of finding a modification or combination of modifications that would lead to even higher thermal stability at physiological pH along with higher pHT values compared to the 2′F-araC modification, our groups investigated the effects of 5-Me-2′F-araC (Figure (Figure2C).2C). This nucleoside combines two stabilizing i-motif modifications, 5-methylcytosine nucleobase and 2′-fluoroarabinose sugar in the same nucleotide (44). Interestingly, 5-Me-2′F-araC was found to exhibit a similar stabilization effect to 2′F-araC. However, an i-motif with 2′F-araC:5-Me-dC base pairs exhibited a pHT value of 7.17 compared to 6.53 for a structure containing 2′F-araC:2′F-araC and 5-Me-dC:5-Me-dC base pairs. This suggests that it is possible to tune the pH and thermal stability of i-motif structures by selecting the position and type of modifications and highlights the significance of the nature of base pairs on i-motif stability.
In conclusion, most sugar modifications destabilize i-motif structures, regardless of whether or not they favor the C3′-endo sugar conformation. Substituents oriented toward the compact minor groove cause steric clashes that destabilize the structure. However, chemical modifications that preserve sugar-sugar contacts across the minor groove, such as arabinose sugars, are well tolerated. Of particular relevance is the effect of 2′F-araC substitutions, which lead to stable structures at pH 7.
Phosphate modifications
The arrangement of the sugar-phosphate backbone in i-motif folding gives rise to unusually short distances between adjacent phosphates. In an attempt to suppress the repulsion between the negatively charged phosphate backbones, several backbone modifications have been investigated. Mergny and Lacroix investigated the effect of phosphorothioate, and methylphosphonate, as opposed to the phosphodiester backbone (47). Their studies show that only backbones exhibiting phosphodiester and phosphorothioate bonds allow i-motif formation. They hypothesized that even though the methylphosphonate backbone is neutral, the bulkiness of the methyl group prevents i-motif formation. Additionally, the chirality of the methylphosphonate linkage (presence of both Rp and Sp stereochemistries) might have influenced i-motif formation and stability. The incorporation of phosphorothioates in several DNA C-rich sequences leads to the formation of stable i-motif structures at neutral pH, and they are only a few degrees less stable than the unmodified structures (47). Moreover, the chirality of the phosphorothioate group influences i-motif stability; for instance the Rp-stereochemistry leads to greater stabilization compared to the Sp-stereochemistry (ΔTm = 11°C) (48). Another backbone modification that was investigated involved replacing the negatively-charged sugar-phosphate backbone with a neutral polyamide backbone, i.e. peptide nucleic acid (PNA). Balasubramanian et al. studied the effect of PNA on the model hexanucleotide sequence p(TCCCCC) utilizing nano-electrospray ionization-mass spectrometry (Nano-ESI-MS) and H/D exchange (49). PNA was shown to form stable i-motif structures; however, the i-motif folding occurs at a narrower pH range (4.1–4.5) compared to its DNA counterpart (4.5–6.5) (49). i-Motif formation from a 1:1 mixture of PNA and DNA strands was then studied by Modi et al. (50). FRET studies revealed that structure is stabilized by the intercalation of two parallel DNA-PNA heteroduplexes with the DNA strands occupying one of the minor grooves. The hybrid PNA-DNA i-motifs exhibit an intermediate stability (pH 4.2–5.7) compared to the more stable DNA or less stable PNA i-motifs. This intermediate stability can be attributed to lower electrostatic repulsion compared to the net negatively charged DNA and net positively charged PNA i-motif structures. Another interesting ‘backbone’ modification investigated by Robidoux et al. involves branched oligonucleotides, where dC-rich strands are forced to become parallel by joining them to the vicinal 2′-5′ and 3′-5′ phosphodiester linkages of a branching riboadenosine linker. This study showed that branched oligonucleotides can associate into stable i-motif structures, where certain constructs exhibited a Tm around 25°C at pH 7.0 (51).
C-tract length
In general, under the same experimental conditions, the i-motif structure possessing a higher number of C:C+ base pairs would be more stable (52). Very recently, Waller's group and Burrows’ group investigated the effect of C-tract length on the folding of intramolecular i-motif structures under physiological conditions (16,17).
Waller's group studied several sequences containing at least four C-tracts with different lengths. Their results reveal that, in general, the change in pHT increases as the number of cytosines per tract increases. For instance, pHT increases from 6.1 for C2(T3C2)3 to 6.7 for C3(T3C3)3, to 7.1 for C4(T3C4)3 and 7.2 for C5(T3C7)5. A similar trend was observed for the thermal stability (Tm) where C3(T3C3)3 exhibited a Tm = 7°C while the Tms of C4(T3C4)3 and C5(T3C7)5 were 15.8°C and 26.2°C, respectively. Increasing the C-tract length beyond five enhances the stability; however, it also leads to two melting transitions and to increased hysteresis between the folding and unfolding transitions suggesting the formation of two distinct species at pH 7.4. The sequence with the longest C-tract length tested was C10(T3C10)3 which exhibits a pHT of 7.3. Despite that the tested sequences have tracts of four or more cytosines that fold into i-motif at neutral pH values, more recent studies show that sequences containing shorter C tracts also have the tendency to form stable i-motifs at neutral pH (53,54).
On the other hand, the Burrows group was interested in investigating the formation and stability of i-motif structures from several dCn homo-oligonucleotides (n = 10–30) (17). Utilizing different pH-dependent methods an interesting trend was observed between chain length and stability. The highest thermal stabilities and pHT (Tm > 37°C and pHT > 7.2) were observed for dCn strands of length 15, 19, 23 and 27 cytosines (i.e. 4n – 1). This led to identifying 4n – 1 as the ‘sweet spot’ for i-motif folding in deoxycytidine homopolymers. When T nucleotides were introduced to control the length of the i-motif core and to form loops of varying lengths, the most stable structures consisted of an even number of C:C+ base pairs in the core and three loops of only one nucleotide in length, further confirming the results obtained for dCn homo-oligonucleotides. However, it is important to note that these results are not directly applicable to other heteronucleotidic sequences since the presence of loop-to-loop interactions play important roles on the stability of the structures as reviewed in the next section.
Connecting loops and capping interactions
Several systematic studies investigating the number and nature of different nucleobases in the i-motif loops have been undertaken recently in an attempt to comprehend the effect of the interactions between loop nucleobases on i-motif stability (55–57). Capping residues at the end of the i-motif core in addition to the length and nature of their connecting loops are very important factors for i-motif stability (58). Based on the length of the connecting loops, Brooks et al. divided intramolecular i-motif structures into two different classes. i-Motif structures possessing short loops (loop 1 (2-nt): loop 2 (3 to 4-nt): loop 3 (2-nt), i.e. 2:3–4:2) were classified as ‘class I’, while ‘class II’ i-motifs possess longer loops (6–8:2–5:6–7) (59). In general, very short loops, such as one nucleotide, favor the formation of mono and bimolecular i-motifs, whereas longer loops lead preferentially to the formation of intramolecular i-motif structures (60). Since longer loops might allow for extra stabilizing interactions, it has been proposed that class II i-motifs are more stable. However, there is a number of recent studies showing that class I i-motifs are stable at neutral or nearly neutral pH (53,61).
Thymidines are common capping residues in i-motifs since they can form T:T base pairs (Figure (Figure3A)3A) that are isomorphous to C:C+ base pairs and extend the i-motif core (61–63). In fact, T:T base pairs are not only good capping base pairs, but they even are tolerated in the middle of the C:C+ base pair stack (64). Hoogsteen and reverse Watson-Crick A:T base pairs (Figure (Figure3A)3A) in the loops of i-motifs have been found in the crystallographic structure of an oligonucleotide containing a single repeat of the human telomeric [d(TAACCC)] (65) and centromeric sequences (63) (Figure (Figure3A).3A). A:A (Figure (Figure3A)3A) and G:G base pairs have also been observed in several i-motif structures (66–68). This suggests it is not loop size per se but the precise sequence and the resulting interactions of the bases in the loop that are important for stabilization.
One of the capping interactions that provokes more significant effects in the i-motif stability is the formation of minor groove tetrads on the ends of the C:C+ tracts. The first example of this kind of interaction was found by Gallego et al. in the structure of the human B-box centromeric sequence, d(TCCCGTTTCCA) (66). This sequence forms a stable head-to-head dimeric i-motif that features a G:T:G:T tetrad between two lateral loops. The two G:T pairs are found in the minor groove side as previously found in other structures (69). Tetrads of the same family have been observed with Watson-Crick G:C base pairs (70) and a combination of G:C and G:T base pairs (69,71). Stabilization of i-motif structures through minor groove tetrads has been observed in other dimeric i-motif structures (72), and more recently in monomeric i-motifs able to form in tandem repeats (53) (Figure (Figure3B).3B). In the latter case, the stabilization provoked by two minor groove tetrads is dramatic, with pHT values close to 8.0.
In cases where a loop connecting two C-tracts is long enough, the formation of other secondary structures, like stem-loop hairpins, is possible. This situation has been observed in a C-rich sequence located near the promoter region of the n-MYC gene (73), although its three-dimensional structure remains to be elucidated.
Impact of ionic strength, molecular crowding and superhelicity
Unlike G4 structures where the nature of the cation leads to significant differences in stability and folding topology, i-motifs are not affected by the nature of the cation but by the ionic strength of the solution. Mergny et al. showed that increasing NaCl concentration from 0 to 100 mM at a pH close to the pKa of cytosine destabilized i-motif structure. Interestingly, higher NaCl concentrations (300 mM) did not cause further destabilization (60). The same trend of decreasing i-motif stability with increasing ionic strength was observed in sequences present in the promoter of n-MYC gene (74).
Molecular crowding agents such as high-molecular weight polyethylene glycols (PEGs) have been widely used to mimic the crowded environment that the nucleic acid would have inside a cell. Crowding conditions preferentially stabilize both i-motif and G4 structures over duplexes and single-stranded DNA (75). For instance, in a 1:1 mixture of G- and C-rich sequences, molecular crowding conditions shift the equilibrium towards G4 and i-motif structures and prevent Watson-Crick duplex formation (76). Dielectric constant effects, such as a shift in the pKa of cytosine by more than 2 units (e.g. 4.8–7.0), or the formation of non-specific PEG/DNA complexes appear to contribute insignificantly to i-motif stabilization (19,75).
Another factor that is associated with i-motif stability is negative superhelicity, which favors DNA double helix unwinding into its component single-stranded sequences. This unwinding relieves negative superhelical stress, facilitating the formation of non-canonical secondary structures in the unwound regions (20). In order to mimic the negative supercoiling induced upstream of a transcription site, Sun and Hurley placed the natural and mutated C- and G-rich sequences of NHE III1 of the c-MYC oncogene promoter in a supercoiled plasmid (20). Using chemical and enzymatic footprinting, they were able to show that i-motif and G4 formation is facilitated by negative superhelicity under physiological conditions. On the contrary, the mutated strands were locally unwound; however, were unable to fold into stable i-motif and G4 structures.
BIOLOGICAL RELEVANCE OF i-MOTIF STRUCTURES
Location of i-motif forming sequences across the genome
The prevalence of G-quadruplex forming sequences along the genome is now well established. In principle, the complementary strand of any G-quadruplex forming sequence is susceptible to forming i-motifs. Thus, Waller's group utilized the search algorithm Quadparser (16), originally designed to find G4-forming sequences, to determine the potential prevalence of i-motif-forming sequences within the human genome. They searched for sequences having four C-tracts of five cytosines separated by tracts ranging between 1 to 19 nucleotides (16). Applying these search criteria they identified 5,125 sequences across the genome with the potential to fold into i-motif structures. Out of these sequences, 637 (i.e. ∼12.4%) were located in gene promoter regions. Through further examination of the gene ontology codes corresponding to the genes under the regulation of those promoters it was found that potential i-motif formation was concentrated in promoters of genes involved in skeletal system development, sequence specific DNA binding, DNA templated transcription and positive regulation of transcription from RNA polymerase II. On the contrary, they did not find sequences fulfilling the search criteria in genes involved in the immune response, G-protein coupled receptor activity and olfactory receptor activity (16).
Using bioinformatics analysis, the Burrows’ group investigated the presence of dCn tracts across the human genome (17). These studies showed the existence of 769 dCn sequences with n between 15 and 81 nucleotides. In addition to promoter regions, this study shows that these C-rich sequences are also present in introns, and 5′- and 3′-UTRs. On the contrary, fewer dCn tracts were observed in the coding and intergenic regions. These two studies highlight the fact that those sequences with higher potential for forming i-motif structures are not randomly located; instead, they are particularly enriched in the promoters of certain genes, which suggests that they may have a role on certain regulatory mechanisms of gene expression; however, these studies are not comprehensive since several sequences, for instance the minimal i-motif structures previously discussed (53), form stable i-motif structures but deviate from the algorithms utilized. Therefore, a better understanding of the sequential requirements for formation of stable i-motifs is necessary to achieve a more accurate mapping of i-motif occurrence along the genome.
Existence of i-motif structures in vivo
The biological relevance of i-motif structures had been largely questioned due to the lack of experimental evidence of their existence in vivo and to the fact that the formation of these structures is favored at pH values more acidic than the intracellular pH. However, several recent studies have changed this paradigm.
A very interesting study on the stability of i-motifs inside cells was performed by Dzatko et al. who applied NMR spectroscopy in living mammalian cells in order to investigate the stability of i-motif structures in the cellular environment (77). Several i-motif forming sequences from different human promoter regions (DAP, HIF-1α, PDGF-A and JAZF1) conveniently labeled with a fluorophore were pre-folded as i-motifs and transfected into HeLa cells. Flow cytometry and confocal microscopy images indicated that the transfected oligonucleotides entered the cells and localized in the nucleus without compromising the cellular viability and the homeostasis of the intracellular pH (pHi). The in-cell NMR spectra for DAP, PDGF-A and JAZF1 exhibited i-motif imino signals up to 35°C, suggesting the presence of folded i-motif structures at physiological conditions. In contrast, no i-motif-specific imino signals were detected for the HIF-1α sequence. Overall, these in-cell NMR spectra indicate that pre-formed i-motif structures introduced into the cells remain stable and persist in the complex intracellular environment of living HeLa cells.
Very recently, Christ's and Dinger's groups developed an antibody, named iMab, able to bind with high affinity and specificity to C-rich DNA sequences forming i-motif structures (6). These groups describe that this antibody is probably the definitive probe for i-motif formation in living human cells. First, the authors demonstrate that iMab antibody binds different well-defined i-motif structures while showing absence of binding to any other DNA structure such as a duplex, hairpin or G-quadruplex. The use of iMab for immunofluorescent staining in three different cell lines (MCF7, U2OS and HeLa) revealed punctuate foci that were attributed to the recognition of i-motifs structures in the nuclei of cells. Very interestingly, the study showed that the number of foci varied along the cell cycle. The authors evaluated the number of foci in cells arrested in three different stages of the cell cycle (early S phase and G0/G1 and G1/S boundaries) showing that the highest number of foci appeared at the G1/S boundary (Figure (Figure4)4) (6). This observation suggests that i-motif formation can be associated with transcription as late G1 phase is characterized by high transcriptional activity. In contrast, the significant decrease of iMab foci during S phase (Figure (Figure4)4) suggests that i-motif structures are resolved during DNA replication. Likewise, the number of G4 foci detected in cells by immunofluorescence assays with BG4 antibody also varies along the cell cycle (7,78). In contrast to i-motif, G4 detection was higher during S-phase whose major event is DNA replication. Altogether, these results indicate that i-motif and G4 structures are differently populated in diverse stages of the cell cycle, and suggests that they might play opposing roles in regulating gene expression. This study provides the strongest evidence, so far, for the existence of i-motif structures in vivo and their relevance in key biological processes.
Interaction of DNA i-motifs with ligands and proteins
Compared to the well-documented examples of G4 ligands, the discovery of specific i-motif binding ligands lags far behind. Several compounds such as TMPyP4 (79), bis-acridine (BisA) (80) and phenanthroline derivatives (74) have been evaluated as i-motif ligands. Some of them have a stabilizing effect, but they are not selective since they also bind other DNA structures. Likewise, some complexes with terbium and ruthenium metals have been studied as potential i-motif binders (81,82); however, they lack specificity and lead to a slight destabilization of i-motif structures.
Carboxyl-modified single-walled carbon nanotubes (C-SWNTs) are considered the first selective i-motif ligands. Addition of C-SWNTs leads to a remarkable increase of thermal stability of the intramolecular i-motif formed by the C-rich human telomeric sequence at acidic pH (83). Additionally, it was found that the presence of C-SWNTs induces the formation of this i-motif structure at pH 8.0 and inhibits duplex formation between complementary human telomeric C- and G-rich sequences. Through S1 nuclease assays and fluorescence changes of 2-aminopurine labeled loops, it was proposed that the nanotubes bind to the 5′-end of the major groove of the telomeric i-motif structure. The stabilization effect of the C-SWNTs was explained by the favorable electrostatic interactions between the C:C+ base pairs and the negatively charged C-SWNTs which substantially decrease the pKa of the C:C+ base pairs (83).
Qu et al. investigated the biomedical effect of C-SWNTs on telomerase activity and telomere function (discussed later in this review). This study determined that i-motif structures formed in the presence of C-SWNTs could inhibit telomerase activity, interfere with telomere functions, and lead to senescence and apoptosis in cancer cells in vitro and in vivo (84).
Following the carbon nanotubes, another study has succeeded in identifying a selective i-motif ligand. After screening a library of 1990 compounds, Hurley et al. identified a compound (IMC-48) that binds and stabilizes the i-motif structure formed by the C-rich sequence of BCL2 gene promoter. In parallel, a similar compound (IMC-76) was found to favor an alternative hairpin conformation formed by the same sequence (Figure (Figure5).5). These molecules can be used to shift the dynamic equilibrium between the two structures formed by this BCL2 promoter sequence, a hairpin and an i-motif structure containing large loops (8:5:7) (85). IMC-48 binds within the central loop of the i-motif (Figure (Figure5A)5A) and is likely stabilized by stacking interactions with thymines. On the other hand, the binding site of IMC-76 was found between the WC hydrogen-bonded tracts in the hairpin structure (Figure (Figure5B).5B). Interestingly, the molecules have opposite effects, while IMC-48 leads to the activation of gene expression, IMC-76 markedly suppresses the levels of BCL2 mRNA.
Several small molecule ligands have been investigated in an attempt to expand the i-motif-specific ligand library such as the type 2 topoisomerase inhibitor mitoxantrone (86), the para-isomer of the peptidomimetic ligand PBP1 (87), which leads to the upregulation of BCL2 gene expression, and thiazole orange (88). Recently, Shu et al. developed a series of acridone derivatives and determined that one of these derivatives, B19, is capable of binding to and stabilizing the i-motif formed in the c-MYC promoter resulting in downregulation of gene expression and eventually to tumor cell death (89). The characterization of more i-motif structures at high resolution as well as its stabilization at neutral pH will contribute to increase the number of ligands that stabilize i-motifs.
Poly-C-binding proteins (PCBP) are proteins that interact with C-rich DNA sequences and play a fundamental role in regulating gene expression. The PCBP family consists of the hnRNP K (heterogeneous nuclear ribonucleoprotein K), αCP1-4, and αCP-KL proteins (90). However, in most cases, it is not clear whether these proteins bind to a given i-motif structure or to the C-rich strand resulting from i-motif unfolding. In an early study, Marsich et al. discovered a highly cytosine-specific protein in human HeLa cells (91). They did not determine the identity of the protein; however, they were able to show that the protein is specific for the human telomeric sequence, d(CCCTAA)n, containing at least four cytosine tracts (92). Later, Lacroix et al. found two proteins that bind telomeric C-rich sequences, hnRNP K and ASF/SF2 (93).
In a recent report, Niu et al. identified BmILF protein of Bombyx mori insect as an i-motif binding protein (94). By pull-down and EMSA assays, the authors demonstrated that BmILF binds to an i-motif structure formed at acidic pH by a C-rich sequence present in the BmPOUM2 gene promoter.
The BCL2 activating transcription factor heterogeneous nuclear ribonucleoprotein LL (hnRNP LL) is one of the very few i-motif binding proteins that have been studied in depth. It was identified through a pull-down assay aimed to find proteins involved in transcription and showing affinity for the i-motif structure formed by the BCL2 promoter oncogene (95). The hnRNP LL protein is a paralog of hnRNP L, which is a pre-mRNA splicing factor capable of binding and stabilizing BCL2 mRNA (95). hnRNP LL was found to bind specifically to i-motif structures as no binding was observed to either BCL2 promoter forming a duplex and to mutated single strand DNA unable to fold into an i-motif. The protein has four RNA recognition motifs (RRMs) and the BCL2 promoter sequence presents two consensus sequences that can be recognized by hnRNP LL RRMs. These consensus sequences correspond to those in the two lateral loops of the i-motif folding. Through EMSA experiments and luciferase reporter assays it was determined that hnRNP LL binds to the two lateral loops of the BCL2 i-motif (Figure (Figure5C)5C) by means of two of its four RRMs. Interestingly, CD and bromine footprinting experiments show that the binding of the protein unfolds the i-motif structures leading to single- stranded sequences. Therefore, the i-motif structure is the most kinetically favorable conformation for protein binding. The unfolding of the i-motif structures upon protein binding provides the more thermodynamically favored single-stranded conformation (Figure (Figure5D).5D). The protein remains bound to the single strand DNA and leads to the activation of BCL2 gene transcription. The discovery of the hnRNP LL as an i-motif binding protein able to activate gene transcription brings i-motif structures into focus as protein recognition sites capable of participating in regulation of gene expression.
Inhibition of telomerase activity
Mammalian telomeric DNA is composed of tandem repeats of the unit 5′-ATTGGG-3′/3′-TAACCC-5′. Importantly, the G-rich strand is a few hundreds of nucleotides longer than its complementary resulting in a G-rich ssDNA 3′-overhang that may form G4 structures. Studies have shown that the stabilization of certain human telomeric G4 topologies with ligands might lead to the inhibition of telomerase activity (96–98); however, the effect of targeting the complementary C-rich strand has not been investigated in depth. Thus, the study conducted by Qu et al. in 2012 was the first to investigate telomerase activity on i-motifs formed by C-rich human telomeric sequences stabilized by C-SWNT (84). As mentioned earlier, C-SWNTs were found to induce duplex dissociation and to stabilize human telomeric i-motifs at physiological pH and temperature under molecular crowding conditions. Likewise, C-SWNTs induce the formation of G4 structures on the complementary G-rich strand (83). In the presence of the C-SWNTs, telomerase activity is inhibited; thereby suggesting that the G4 stabilized on the leading strand can no longer be elongated by telomerase. Further investigations into the effect of C-SWNTs on cell growth and telomere structure and function suggest that the inhibition of cellular growth produced by C-SWNTs was a consequence of an impact on telomere structure rather than on telomerase activity. The persistence of i-motif and G4 structures, leads to telomere uncapping and release of telomere-binding proteins, resulting in telomere dysfunction. Telomere dysfunction induces DNA damage response and activates DNA repair pathways, which in turn trigger cell cycle arrest, senescence, and apoptosis (83).
I-motif in centromeric sequences
The centromere is the chromosomal region on which the kinetochore, a key multiprotein complex for chromosome segregation, assembles during cell division. In most organisms, centromeres contain large arrays of tandemly repeated DNA sequences (DNA satellite). Centromeric DNA sequences substantially vary among species and can also be different in chromosomes of the same organism. In the absence of a shared genetic motif defining the centromere, the presence of nucleosomes containing the centromere-specific histone H3 variant (CENP-A) is recognized as the essential epigenetic feature of centromeric chromatin. However, the possible role of the centromeric DNA sequences in directing the formation of such specialized chromatin has been suggested and is an attractive matter of debate (99,100). Interestingly, Kasinathan and Henikoff proposed in a very recent report that the predominant formation of non-B DNA structures in centromeric DNA sequences might be the basis for the constitution of centromeric chromatin (101).
In humans, a 171 bp DNA called alphoid satellite is tandemly repeated along the centromeres (Figure (Figure6A).6A). The alphoid satellite is an AT rich sequence that frequently contains a 17-bp GC rich segment whose sequence has two variants known as CENP-B box and A box (Figure (Figure6A).6A). Interestingly, the CENP-B box, the sequence specifically bound by centromeric protein CENP-B, is absent in lower primates and in human Y chromosome. Gallego et al. found that whereas the G-rich sequence of CENP-B box folds into a structure stabilized by canonical base pairs, its complementary C-rich sequence forms a dimeric i-motif at acidic pH (Figure (Figure6B)6B) (66,102). Likewise, the C-rich sequences of the two variants of A box were reported to fold into dimeric i-motif structures (Figure (Figure3A)3A) (63). The dimeric i-motif structures of truncated versions of both CENP-B box and A box having 11 residues were determined at atomic resolution by NMR spectroscopy (63,66) (Figure (Figure6B).6B). The main difference between both structures is the relative disposition of the loops being at the same side of the structure (face-to-face topology) in CENP-B box i-motif and at opposite sides in A box (head-to-tail topology) (Figure (Figure6B6B).
The centromeric region of chromosome 3 of Droshophila melanogaster contains the dodeca satellite DNA, which is composed by tandem repeats of 11/12 bp (CCCGTACTGGT/CCCGTACTCGGT). The capability of these sequences to fold into i-motif structures has been also evaluated. Sequences derived from both 11 bp and 12 bp repeat units were able to fold into dimeric i-motif structures in vitro at acidic pH (103).
These observations lead to propose a possible role for i-motif structures as a structural element providing long-range interactions between laterally associated centromeric nucleosomes (63) (Figure (Figure6C).6C). The presence of the above mentioned i-motif forming sequences at the entrance and exit of the nucleosome would facilitate their participation in dimeric i-motifs formation (Figure (Figure6C).6C). Thus, not only non-B DNA structures may have a role in directing centromere location (101) but also in providing particular architectural features to the centromere.
Transcriptional regulation of gene expression
A substantial amount of data suggests that i-motif structures may be involved in regulation of transcription. A recent report showing i-motif foci in vivo determined that number of i-motif foci were higher during G1/S phase of the cell cycle (Figure (Figure4)4) in which the transcription activity is higher (6). This fact, together with the evidence described below suggest that these structures may have been selected as recognition motifs for proteins involved in the activation of the transcriptional machinery.
The BCL2 oncogene promotes cell survival and proliferation through an anti-apoptotic mechanism. It is overexpressed in some cancer cells, while its under-expression leads to neurodegenerative diseases (104). As discussed earlier, the C-rich sequence of the BCL2 promoter folds into a hairpin and an i-motif structures that are in dynamic equilibrium (105). Two back-to-back studies by Hurley's group showcase two compounds (Figure (Figure5)5) and a protein capable of modulating BCL2 transcription in vitro and in vivo by specifically targeting either the i-motif or the hairpin form in dynamic equilibrium (85,95). Apparent stabilization of the i-motif structure via IMC-48 compound leads to significant upregulation of BCL2. On the contrary, apparent stabilization of the flexible hairpin species via IMC-76 compound leads to transcriptional repression in lymphoma cell lines (Figure (Figure5B)5B) (85).
In a consecutive study, the Hurley's group determined the effect of the hnRNP LL protein on BCL2 transcription. Two of the four RNA recognition motifs (RRMs) in hnRNP LL are required for stable binding to a single-stranded RNA or DNA. The lateral loops of BCL2 i-motif possess sequences capable of binding to the RRMs in hnRNP LL (Figure (Figure5C).5C). These studies demonstrate that hnRNP LL activates transcription by recognizing and unfolding the BCL2 i-motif (Figure (Figure5).5). The presence of IMC-48 shifts the equilibrium in favor of the i-motif and therefore increases the i-motif population available for binding to hnRNP LL. The research group demonstrated that IMC-48 binds to the central loop of the BCL2 i-motif, followed by recognition and binding of hnRNP LL to the two lateral loops. Consequently, hnRNP LL binding leads to i-motif unfolding which, in turn, leads to transcriptional activation. In conclusion, these studies demonstrate the effect of two small molecules (IMC-76 and IMC-48) and a transcriptional factor (hnRNP LL) on the relative population of i-motif and its impact on gene expression.
Recently, a study conducted by Muniyappa and co-authors demonstrated the potential of the C-rich sequences of the PI and PII promoters of human acetyl-CoA carboxylase 1 (ACC1) gene to fold into intramolecular i-motif structures at neutral pH under molecular crowding conditions (106). The authors state that, experiments in HeLa cells including i-motif-forming sequences in a promoter region regulating luciferase transcription lead to a decrease in protein expression and suggest a significant role of i-motif-forming sequences in the regulation of ACC1 gene expression. However, the authors also claim that G4 and i-motif structures in plasmids containing wild type sequences exhibit a cooperative effect leading to a decrease in luciferase expression. Therefore, it is unclear whether i-motif or G4 structures are responsible for transcriptional repression. This ambiguity is addressed in a recent study on the PDGFR-β promoter region (107) where point mutations of the G4 structures led to upregulation of gene expression. This upregulation is due to the effect introduced by the G4 mutations on the i-motif-forming strand.
In a similar study conducted by Niu et al., it was reported that the binding of BmILF protein to an i-motif activates the transcription of BmPOUM2, a Bombyx mori gene involved in developmental processes during metamorphosis. Luciferase expression assays demonstrated that vectors lacking the i-motif-forming sequence in the promoter of the gene resulted in significant decreasing of luciferase activity. Additionally, by using oligonucleotides complementary to the i-motif and G4 forming sequences present in the promoter, it was determined that only the hybridization of i-motif forming sequence with complementary oligonucleotides produced significant decrease in promoter activity. The authors propose that during transcription, the dsDNA at the promoter is melted and the i-motif is formed and bound by BmILF, which may recruit other factors to activate BmPOUM2 expression.
Burrows et al. investigated the effect that modified G4 and i-motif-forming sequences from VEGF gene promoter have on luciferase and renilla expression levels (108). In addition, this study reveals that sequences containing nucleobases produced during oxidative stress such as 8-oxo-7,8-dihydroguanine lead to the up- or down-regulation of transcription depending on whether the modified nucleobases are located on the coding or on the template strand of the promoter, respectively.
Kendrick et al. used IMC-76 in combination with ellipticine (GQC-05) in order to simultaneously target BCL2 and MYC oncogene promoters in diffuse large B-cell lymphoma (DLBCL) (109). GQC-05 stabilizes the G4 formed by G-rich MYC promoter sequence, supposedly acting as an ‘off-switch’ to impede gene expression (110). The simultaneous stabilization of the hairpin over the i-motif (via IMC-76) in the BCL2 promoter and the G4 (via GQC-05) in the MYC promoter decreased the mRNA levels of both genes and enhanced the sensitivity of DLBCL cells towards cyclophosphamide (a chemotherapeutic drug). This study was the first to simultaneously target two different DNA secondary structures, i.e. i-motifs and G4s, leading to dual transcriptional repression and providing an effective approach to treat aggressive malignancies (109).
Following the above mentioned reports, several studies demonstrate how G4s and i-motifs act as on/off molecular switches for the regulation of tyrosine hydroxylase (Th) (111), MYC (112), platelet-derived growth factor receptor β (PDGFR-β) (107), and KRAS promoters (113). For instance, in the case of the MYC promoter it is the extent of transcriptionally induced negative superhelicity that determines whether MYC expression can be turned on. The authors propose that binding of SP1 to its promoter binding sites induces negative superhelicity, which in turn drives duplex melting at 4CT or 5CT GC-rich elements. The mechanism by which the protein contributes to increase negative superhelicity is not known. At low levels of SP1 there is only sufficient unwinding of the duplex to expose the 4CT element which contains six C-tracts that provides the two hnRNP K binding sites in the lateral loops of the i-motif (Figure (Figure7C7C to A). However, this provides insufficient binding strength for the hnRNP K–i-motif complex to compete with nucleolin, which binds to the G4 structure to repress gene transcription (i.e. acting as an OFF switch). However, with increased concentrations of SP1 available to bind to the duplex DNA, this results in increased transcriptionally induced negative superhelicity and the two additional C-tracts become available (Figure (Figure7C7C to B). This provides an additional CT binding site for hnRNP K uniquely found in the 5CT elements. Ultimately, two steps are needed to activate transcription, the first being recognition by hnRNP K of the two CT elements displayed in the lateral loops of the i-motif. In the second step, unfolding of the i-motif occurs, providing access to the third CT element, which is uniquely found in the 5CT elements. Now with three KH domains from the hnRNP K bound to the 3CT elements, this thermodynamically stable complex is able to compete with nucleolin–G4 complex and MYC transcription is turned on (112).
Regulation of DNA biosynthesis
Very recently, Sugimoto and co-authors investigated the effect of several non-canonical DNA structures on DNA replication by the Klenow fragment (KF) of DNA polymerase (114). Different i-motif-forming sequences inserted in the template strand of the replication reaction were found to stall DNA polymerase and thus impede DNA replication or repair. The stalling effect produced by i-motifs on replication was higher than that of other structures with similar thermodynamic stability such us hairpins or mixed-type G4s. This is justified by the unique intercalating topology of base pairs in i-motif structures, which complicate its unwinding and subsequent replication by DNA polymerase. As mentioned earlier, the base pairs in an i-motif structure are intercalated between two parallel duplexes and lack the base stacking interactions like in antiparallel duplexes (Figure (Figure1B).1B). Therefore, unzipping is particularly disfavored since consecutive base pairs belong to different duplexes. In addition, the arrangement of the loops in the i-motif structure might cause steric hindrance for polymerase binding. These two factors may contribute to the fact that i-motif unwinding by KF requires higher activation energy than that of other non-canonical structures. These data suggest that i-motifs could modulate DNA replication in vivo and pose a greater impact compared to other secondary structures.
Mutual exclusivity of i-motifs and G-quadruplexes
Most of the earlier studies focused on investigating the biological effects of G4s and i-motifs separately in single-stranded DNA fragments. However, in the genome and with the exception of chromosome ends, the formation of G4 and i-motif structures is compromised by hybridization between complementary strands to form a duplex DNA. Several factors have been found to affect this competence. The first G4/i-motif-duplex interconversion studies were carried out by Phan and Mergny. They inferred that 1:1 mixtures of the C-rich d((C3TA2)3C3T) and G-rich d(AG3(T2AG3)3) telomeric sequences at acidic pH (<5) and in the presence of KCl produced predominantly i-motif and G4, respectively (115). However, at pH 7.0 and 100 mM NaCl the duplex formed by the hybridization of both sequences was the predominant species. Several groups have investigated the factors influencing interconversion kinetics and determined that the sequence, the experimental conditions (i.e. ionic strength, temperature, and pH) (116,117), and the incorporation of chemical modifications all play a significant role in favoring the tetraplex structures over duplexes and vice versa.
Mao et al. reported single-molecule concentration jump experiments showing that in the absence of any force, human telomeric DNA i-motif structures exhibit a half-life of 2.60 s at neutral pH. This half-life is sufficient for the interaction of i-motifs with proteins or small molecules leading to the modulation of several biological processes (118). i-Motifs half-life in the presence of their complementary strands can be dramatically extended by introducing 2′F-araC modifications (Figure (Figure2B)2B) in telomeric sequences (25). When 2′F-araC modified i-motif and/or 2′F-araG modified G4 structures were pre-folded, their unfolding poses a significant barrier to duplex formation. In this scenario i-motifs and G4s located at the termini of a duplex co-exist at neutral pH conditions for >30 days (25).
Using magneto-optical tweezers, Mao's group recently quantitatively evaluated for the first time how chemical (ions and pH) and mechanical (superhelicity and molecular crowding) factors influence the population dynamics of G4s and i-motifs formed by the insulin-linked polymorphic region (ILPR) (119). By applying 32 different sets of chemical and mechanical experimental conditions, it was found that chemical factors, especially concentration of potassium ions and acidic pH, have the most substantial effect on the formation and stabilization of G4s and i-motifs, respectively. Among the mechanical factors, superhelicity was found to have more significant impact than molecular crowding. Negative superhelicity reduces the stability of DNA duplexes consequently favoring the formation of G4s in the G-rich strand and i-motifs in the complementary C-rich strand.
In the last few years, several researchers have begun to investigate on the co-existence of G4 and i-motif structures. The question as whether i-motif and G4 structures can appear simultaneously in complementary single strands or are mutually exclusive was a matter of debate due to contradictory reports. One of the earlier reports by Sun and Hurley studied the influence of negative superhelicity on the formation of G4 and i-motif structures in the promoter regions of cMYC and their impact on gene expression (20). By means of enzymatic and chemical footprinting assay, they showed that both the G4 and i-motif structures are present at the same time in opposite strands and with slight displacement relative to each other. Three out of the four required GC tracts were shared between the two tetraplexes.
However, in a later study in collaboration with Mao's group they showed that the G-quadruplex and the i-motif in the MYC promoter are mutually exclusive in the NHE III1 (112). This finding of mutual exclusivity has also been demonstrated in ILPR region by the Mao group (120). Using chemical footprinting and single molecule techniques, they were able to show that the G-rich sequence folds into a G4 structure at pH 7.4 and 100 mM K+ while the C-rich strand folds into an i-motif structure at pH 5.5 and 100 mM Li+. However, under conditions that favor the formation of both structures (pH 5.5 and 100 mM K+), either the G4 or the i-motif structure forms, but not both, which the authors attribute to mutual steric hindrance. In a more recent work, it was observed that that both tetraplexes can form simultaneously when the G4 and i-motif forming sequences are offset with respect to each other in complementary strands (121). This study further suggests that mutual exclusivity is governed by steric hindrance between the structures arising from the complementary strands. The mutually exclusivity phenomenon also suggests that G4 and i-motif structures may play opposing biological roles at the same location of the genome. As described above, several studies strongly suggest that G4 formation suppresses transcription while i-motif formation lead to the activation of gene expression (59). The observation that the highest number of i-motif (6) and G4 (7) foci in the nuclei of cells is observed at different stages of the cell cycle also supports that both structures might not coexist in the cellular context.
SOME APPLICATIONS OF i-MOTIF STRUCTURES
The particular features of i-motif structures have inspired the design of nanotechnological systems for analytical and biomedical purposes (8–10). Many of these devices are based on the structural transitions that i-motif-forming oligonucleotides experience as a result of pH variations. Taking advantage of this structural response to pH, several nanodevices based on i-motif folding/unfolding were devised to monitor pH changes inside cellular context. Measuring the intracellular pH is a fundamental goal in biosciences given the crucial influence of pH in cellular processes and the implications of dysregulated intracellular pH in certain diseases such as cancer (122). The development of the so-called ‘I-switch’ by the Krishnan group (123) was the first example of an i-motif-based nanomachine able to sense and report pH changes along endosomal maturation both inside living cells in culture (123) and in multicellular organism (124). Interestingly, by adding a tag to the device it is possible to target any biotinylated protein and measure the pH associated to its function (123,125).
The use of gold nanoparticles (AuNPs) functionalized in their surface with C-rich sequences able to form i-motifs has also been successfully applied for monitoring changes of pH inside living cells. The particular spectroscopic properties of AuNPs and their facility to be functionalized and internalized inside cells make them very attractive alternatives for the development of efficient pH sensors of the cellular environment. Thus, several designs based on AuNPs modified with i-motif-forming sequences have been used to monitor pH changes inside endocytic vesicles (126–130).
These examples provide evidence of the compatibility of i-motif structures with the intracellular environment and highlight their potential as building blocks for nanobiotechnological applications. Indeed, some other i-motif-based nanodevices such as biosensors or drug release platforms have already been developed and serve as proof of principle for their application in vivo (8,9,131).
CONCLUSIONS/PERSPECTIVE
Despite the tremendous advances in the field of i-motif structural biology in the recent years, many aspects still require further investigation. The present data strongly suggest that i-motifs form transiently in the cell. However, more in vivo studies are definitively needed to confirm i-motif formation in different phases of the cell cycle. In addition, further research on i-motif recognition by proteins and small ligands in vitro and in vivo are essential to elucidate the role of i-motifs in different biological processes. Due to the dynamic nature of i-motifs, these studies are hampered, particularly in vivo, by the difficulty of discriminating between recognition of C-rich sequences or recognition of the actual i-motif structure. Synthetic i-motif constructions stabilized by chemical modifications (e.g. 2′F-ANA) can facilitate these investigations by ‘freezing’ these intrinsically dynamic sequences in the potentially active i-motif conformation. Chemically modified i-motifs, stable over a wide range of conditions compared to their unmodified counterparts, can be also used for screening experiments to identify new proteins and small ligands that specifically recognize this structure.
There is still much to know about i-motif structures. Compared to G4s, there are only a few i-motif structures determined by NMR or crystallographic methods. At present, it is not possible to predict the stability of an i-motif based on its sequence. Therefore, more structural information is needed to fully understand the effect of capping interactions and the loops connecting the C-tracts on i-motif stability. In spite of the recent findings of i-motif ligands, the number of known i-motif-specific binders is very limited in comparison with G4 ligands. No three-dimensional structure of an i-motif/ligand complex has been determined yet. This will be a major achievement for the development of potential drugs based on i-motif recognition.
Since the first years of the century, several aspects of G4s have gained considerable research interest mainly due to their thermodynamic stability at physiological conditions. However, the i-motif structure has been the ugly duckling in the family of non-canonical DNA structures for many years. The recent results shed a new light on the i-motif field, which will blossom in the coming years.
ACKNOWLEDGEMENTS
The authors would like to acknowledge Professor Daniel Christ and Professor Laurence Hurley for providing the original copy of Figures Figures4,4, ,55 and 7 to be used in this review.
FUNDING
Natural Sciences and Engineering Research Council of Canada (NSERCC); MINECO grant [BFU2017-89707-P]; Juan de la Cierva postdoctoral Fellowship [FJCI-2016-28474 to M.G.]. Funding for open access charge: NSERCC Discovery grant ( to M.J.D.); MINECO grant [BFU2017-89707-P to C.G.].
Conflict of interest statement. None declared.
REFERENCES
Читайте также:
большая часть ДНК принадлежала самому планктону, 19% приходилось на бактерии, этим планктоном проглоченные, но половину картотеки составили обрывки генов 50 и более разных вирусов.
Все физические поля и определяемые ими процессы на Земле происходят в геометрии Эвклида. Свет от прямолинейного распространения в однородной среде не отклоняется. Поэтому утверждение, сформулированное в заголовке статьи, для физика звучит абсолютно абсурдно. Однако это не так.
Оставлять комментарии могут только зарегистрированные пользователи. Войдите в систему используя свою учетную запись на сайте: |
||