分享
2018-Cambray-Evaluation of 244,000 synthetic s.pdf
下载文档

ID:3519172

大小:7.20MB

页数:20页

格式:PDF

时间:2024-05-18

收藏 分享赚钱
温馨提示:
1. 部分包含数学公式或PPT动画的文件,查看预览时可能会显示错乱或异常,文件下载后无此问题,请放心下载。
2. 本文档由用户上传,版权归属用户,汇文网负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容,确认文档内容符合您的需求后进行下载,若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误,付费完成后未能成功下载的用户请联系客服处理。
网站客服:3074922707
2018-Cambray-Evaluation of 244 000 synthetic 2018 Cambray Evaluation 244 000
nature biotechnology VOLUME 36 NUMBER 10 OCTOBER 2018 1005resourceTranslation is one of the most conserved and energy-intensive proc-esses in any cell.In bacteria,synthesis and maintenance of ribosomes and related translational machinery accounts for more than 40%of the cellular energy budget1.This means that,for any translated sequence,the determinants of translational efficiency are likely to be under selection to minimize cost2.Aside from specific regulatory mechanisms,optimization of translation is thought to involve fine-tuning intrinsic sequence properties.Although hundreds of studies have linked such features to protein production,very few have also measured physiological outcomes of sequence variation or attempted to integrate translation efficiencies and physiology35.Highly expressed genes are enriched in codons cognate to abun-dant tRNAs6,and gene expression levels usually correlate with indices describing these enrichments7,8.Biased codon usage is thought to opti-mize elongation by preventing wasteful ribosome collisions and stack-ing9.Similarly,incorporation of specific amino acids10 and secondary structures in transcripts11,12 has been linked with translation speed.Experimental studies on endogenous and designed transcripts have identified the importance of RNA structures in hindering initiation by blocking ribosome binding sites1316.The sequence of the 5 untranslated region and the first 3050 codons of a gene is thought to contain most determinants of translation efficiency17.Selection for a translation speed ramp encoded at the 5 end of coding sequences via the above mecha-nisms has been proposed to balance the flow of initiating ribosomes with the elongation capacity of the downstream sequence18,19.In practice,discerning the relative importance of sequence-intrinsic features and the interactions among them has proven difficult.This is because most sequenceactivity studies involve analysis of endog-enous genes,which have evolved highly scrambled control signals.For example,a bias for N-terminal positive charge is found almost exclu-sively in weakly expressed membrane proteins.This type of protein sequence enables association with the cytosolic membrane leaflet20 rather than ribosomal slowing19,but it is associated with both.Though intended for tests of causation,studies using designed sequences might suffer from similar confounding effects.For example,occurrence of rare codons in a synthetic library comprising 13 syn-onymous variants of the first 10 codons from 137 E.coli genes fused to a standard reporter was found to correlate with increased protein production.Although this finding seemed to support the codon ramp hypothesis,post hoc analysis revealed that these rare codons were also A+T-rich,resulting in weaker secondary structures in mRNAs and presumably higher initiation21.Notwithstanding caveats associated with its simple design,this codon variant library was further used to show various effects of transcript structure,codon and amino acid properties on cell growth5.To our knowledge,the largest study to formally attempt to disentangle multiple sequence features by design used a library of 285 synthetic genes and concluded that A+T content over the first 35 nucleotides(nt)rather than structures or codons has the largest effect on translation efficiency22.This study was done in vitro and could not assess physiological effects in vivo.We undertook a comprehensive analysis of factors affecting translation efficiency in E.coli by implementing a massive design of experiments(DoE)at the molecular level.DoE is widely used in many fields(for example,agronomy,clinical trials,chemical process evaluation of 244,000 synthetic sequences reveals design principles to optimize translation in Escherichia coliGuillaume Cambray1,2,Joao C Guimaraes1,3&Adam Paul Arkin3,4 Comparative analyses of natural and mutated sequences have been used to probe mechanisms of gene expression,but small sample sizes may produce biased outcomes.We applied an unbiased design-of-experiments approach to disentangle factors suspected to affect translation efficiency in E.coli.We precisely designed 244,000 DNA sequences implementing 56 replicates of a full factorial design to evaluate nucleotide,secondary structure,codon and amino acid properties in combination.For each sequence,we measured reporter transcript abundance and decay,polysome profiles,protein production and growth rates.Associations between designed sequences properties and these consequent phenotypes were dominated by secondary structures and their interactions within transcripts.We confirmed that transcript structure generally limits translation initiation and demonstrated its physiological cost using an epigenetic assay.Codon composition has a sizable impact on translatability,but only in comparatively rare elongation-limited transcripts.We propose a set of design principles to improve translation efficiency that would benefit from more accurate prediction of secondary structures in vivo.1California Institute for Quantitative Biosciences,University of California,Berkeley,Berkeley,California,USA.2DGIMI,Univ.Montpellier,INRA,Montpellier,France.3Department of Bioengineering,University of California,Berkeley,Berkeley,California,USA.4Environmental Genomics and Systems Biology Division,Lawrence Berkeley National Laboratory,Berkeley,California,USA.Correspondence should be addressed to G.C.(guillaume.cambrayinra.fr)or A.P.A.(aparkinlbl.gov).Received 24 November 2017;accepted 2 August 2018;published online 24 September 2018;doi:10.1038/nbt.42381006VOLUME 36 NUMBER 10 OCTOBER 2018 nature biotechnologyresourceoptimization,and mechanical design)23 and aims to explain the vari-ance in a set of experimental observables by systematically planned variations of a set of explanatory variables24.In the framework of studying sequenceactivity relationships,explanatory variables are not extrinsic and independent treatments(for example,temperature and osmolarity)but rather intrinsic and interdependent properties of the DNA molecule(for example,synonymous sequences varying A+T content but not transcript secondary structures nor codon usage).As such constraints significantly complicate implementation,very few studies have applied DoE to optimize25,26 or characterize22,27,28 genetic systems.Here we design 244,000 synthetic sequences to systematically explore an eight-dimensional space defined by the main sequence properties introduced above and use high-throughput DNA sequenc-ing to measure the consequences of these sequence perturbations on several phenotypes linked to translation efficiency(Fig.1a).RESULTSDesign of experiments to understand translationWe selected eight intrinsic sequence properties reported to be the main factors affecting translation efficiency(Fig.1b,c).These pre-dictors describe sequence nucleotide content(A+T content(%AT),patterns of codon usage(codon adaptation index(CAI),codon ramp bottleneck position(BtlP)and strength(BtlS),hydrophobicity of the coded polypeptide(mean hydrophobicity index(MHI)and stabil-ity of secondary structures tiled along the transcript(STR30:+30,STR+01:+60,STR+31:+90).These explanatory variables were arrayed into a statistical full factorial design(Fig.1b).Full factorial designs vary all predictors(over a discrete range of levels)in all combinations.A set of observations is then performed on each combination.The data are treated with analysis of variance(ANOVA)and other regres-sion analyses to quantify the effect of each predictor,and all possible interactions between them,on the observables,given all possible val-ues of the other predictors.To derive relevant physiological ranges for our eight predictors,we calculated values of each sequence properties for all 3,990 protein cod-ing genes annotated in a reference E.coli genome and examined their associations with matched published measures of mRNA and pro-tein abundance29(Online Methods and Supplementary Data 14).On the basis of this analysis(Supplementary Fig.1),we assigned either two or three discrete levels over which to vary each parameter,leading to 1,458 unique combinations of parameter levels(Fig.1b).Nearly half these combinations are not represented in the genome,and randomly generated sequences produce a very skewed sampling of this discretized property space(Supplementary Fig.2a,b and Supplementary Data 5 and 6).We used D-Tailor30 to derive 56 fully independent mutational series,each representing a complete replicate of the full factorial design.For each sequence,we further generated two closely related replicates that differ by one to four random points mutations but retain the same particular combination of properties(Online Methods,Supplementary Figs.2ce and 3,and Supplementary Data 7 and 8).The resulting set of 244,000 sequences was synthesized on a high-density oligonucleotide array,PCR-amplified and cloned into a reporter plasmid,and the resulting library was introduced into E.coli MDS42 by transformation(Online Methods).Amplicon sequencing NucleotidesCodonsAmino acids RNA structuresProteinmRNA and decayRibosomal densitiesGrowth ratesHTP DNA synthesis8 properties1,458 combinations 3 variants(14 nt)56 replicate series244,000 sequencesBulk cloningControlled libraryMolecular DoEHypothesesBioimplementationHTP measurementsaSequence propertiesScoresPositions LevelsThresholdsNucleotidesA+T content%AT+4,+2120.58NACodonsGlobal usage bias(CAI)CAI+1,+9930.280.44Local usage biasPositionBtlP+1,+9920.1NAStrengthBtlS2|BtlP=11.48NA1|BtlP=2NANAAmino acidsMean hydropathy indexMHI+28,+60311mRNA structuresPredicted minimum free energy(G)STR30:+30-30,+30311.57.5STR+01:+60+01,+603149STR+31:+90+31,+90316.511.5bdTEV cleavage6 His tag96 ntDesignedN-termsfGFPLacIIPTGFlexible linkersLeaderUnnatural tRNAUnnatural amino acidInducible translational couplingInducible promoterReporter standardizationTAGSDLSDRcmRNAsecondarystructuresNucleotidesCodonsAmino acidsSTR30:+30MHISTR+01:+60%ATCAIBtlPBtlSSTR+31:+9030+91+1+31+61Figure 1 High-throughput design of experiments.(a)A workflow for hypothesis-driven functional genomics applied to translation.244,000 digital DNA sequences were designed programmatically to produce 56 independent libraries.Each library covers every combination of eight sequence properties previously found to correlate with protein production in E.coli.Designer sequences were synthesized and cloned in bulk.Multiple phenotypes affording an integrative study of the genetic perturbations impact on the cellular system were characterized using high-throughput(HTP)sequencing as a generic quantitative readout.DoE,design of experiments.(b)Sequence properties varied in the factorial design.(c)Topological mapping of sequence properties onto the reporter system.As the functional contributions of properties defined from the same underlying sequence are intricately intertwined,dissecting their separate contributions requires systematic sequence and experimental design.(d)A standard reporter for translational output.Synthetic sequences are cloned as N-terminal fusions to a modified sfGFP gene,which leads to the production of an invariant fluorescent reporter upon post-translational processing by the TEV protease.Translation of the reporter is driven by a perfect ShineDalgarno motif(SDR,underlined)embedded into the leader sequence of an inducible translational coupling device(yellow),itself driven by its own upstream SD(SDL).Translation of the leader cistron can only occur when we add a specific unnatural amino acid to the culture,which suppresses a stop codon placed early in the sequence through a cognate synthetic tRNA32.Ribosomes able to translate past this point unwind RNA structure ahead of them,thus making the embedded SDR more accessible.Ribosomes terminating on the leader need not dissociate from the transcript to reinitiate translation of the reporter,further improving initiation rates13,35.The system is borne on a medium-copy plasmid(colE1 origin).nature biotechnology VOLUME 36 NUMBER 10 OCTOBER 2018 1007resourceof the cloned sequences readily permitted observation of 99.4%of the library(242,516 strains with 10 reads;Supplementary Fig.4 and Supplementary Data 9 and 10).Several features of our expression system were chosen to minimize potential bias in experimental outcomes.The expression cassette(Fig.1d)accepts the library as a specific protease-cleavable N-terminal trans-lational fusion to the superfolder GFP gene(sfGFP)31.This cleav-age produces a processed fluorescent reporter that is invariant across strains,thus reducing potential post-translational effects of the designed sequence(for example,differential protein stability).Sequences are transcribed from an IPTG-inducible promoter to avoid strain competition during library construction and propaga-tion.Translation initiates at a ShineDalgarno signal(SDR)embed-ded in a short leader sequence that overlap the fused reporter by a single nucleotide(Fig.1d).Translation of that leader cistron permits unwinding of potential mRNA structures around the SDR to mitigate their impact on reporter gene translation15.We derived the pEVOL unnatural suppressor system32 to enable fully tunable control of leader sequence translation(Supplementary Fig.5 and Supplementary Data 1113).This allows us to assess structural control of initiation without confounding effects of sequence modification.Quantification of effects on protein productionWe used fluorescence-activated cell sorting followed by high-throughput targeted sequencing of the designed region to measure fluorescent protein production as a proxy for translation rate28,33.We quantified protein production under normal initiation(PNI)condi-tions for 242,269 strains in the library,aggregating four highly repro-ducible measurements of independent biological replicates(Online Methods,Supplementary Fig.6ac,Supplementary Code 16 and Supplementary Data 14 and 15).We conducted a multiway ANOVA27,28 to quantify the relative contribution of each sequence property and its first-order interac-tions with PNI(Supplementary Code 7 and Supplementary Data 16).First-order interactions quantify the dependence of one propertys effect on another propertys levels.This analysis revealed that mRNA secondary structures around the start codon(30 nt in either direction;STR30:+30)have the biggest effect on translation efficiency,accounting for 83%of design properties total effect(Fig.2a,top).The main other contributors to variability of translation are%AT(7%);STR+01:+60(4%)and its interaction with STR30:+30(3%);and STR+31:+90(2%)and its interaction with STR+01:+60(1%).Most of these properties involve structures.Even%AT might capture unaccounted structural signal.Similar pictures emerge from conducting multiple and recur-sive regression analyses(Supplementary Fig.6d,e,Supplementary Code 8 and 9,and Supplementary Data 17 and 18).Taken together,design properties and first-order interactions could only explain 28%of PNI variance,suggesting that the

此文档下载收益归作者所有

下载文档
收起
展开