當前位置:
首頁 > 最新 > 全長轉錄組你必須要看的

全長轉錄組你必須要看的

來源 生信人

小編給大家分享今年最新的全長轉錄組文章,如下:

1. 四倍體棉花全長轉錄組

Wang,M., Wang, P., (2017), A global survey ofalternative splicing in allopolyploid cotton: landscape, complexity andregulation. New Phytol. doi:10.1111/nph.14762

摘要

選擇性剪接(AS)是真核生物中重要的調節機制,其作用是大大增加轉錄本的多樣性。在二代測序中已經揭示了AS的廣泛性和複雜性。然而在多倍體植物中match,由於亞基因組之間序列的高度相似性,該技術在準確鑒定多倍體物種可變剪接方面效果較差。在這裡我們描述了四倍體棉花中的的AS。藉助Pacific Biosciences單分子測序(Iso-Seq),我們開發了用於Iso-Seq轉錄組數據分析(https://github.com/Nextomics/pipeline-for-isoseq)的流程。我們從44 968個基因模型中確定了17,6 849個全長轉錄本,並更新了相關的基因注釋。這些數據幫助我們識別15 102個與纖維有關的AS事件,並估計約 51.4%的同源基因在每個亞基因組中產生不同的可變剪接體。我們發現AS允許miRNA對相同基因的不同剪接體進行差異調控。我們研究還顯示DNA甲基化等在染色質水平上對外顯子形成起著重要的作用。本研究為AS的複雜性和調控提供了新的見解,並將增強我們對多倍體物種中AS的理解。我們的Iso-Seq數據分析方法可以作為其他物種中AS研究的有用參考。

英文摘要

Alternative splicing (AS) is a crucial regulatory mechanism in eukaryotes, which acts by greatly increasing transcriptome diversity. The extent and complexity of AS has been revealed in model plants using high-throughput next-generation sequencing. However, this technique is less effective in accurately identifying transcript isoforms in polyploid species because of the high sequence similarity between coexisting subgenomes. Here we characterize AS in the polyploid species cotton. Using Pacific Biosciences single-molecule long-read isoform sequencing (Iso-Seq), we developed an integrated pipeline for Iso-Seq transcriptome data analysis (https://github.com/Nextomics/pipeline-for-isoseq). We identified 176 849 full-length transcript isoforms from 44 968 gene models and updated gene annotation. These data led us to identify 15 102 fibre-specific AS events and estimate that c. 51.4% of homoeologous genes produce divergent isoforms in each subgenome. We reveal that AS allows differential regulation of the same gene by miRNAs at the isoform level. We also show that nucleosome occupancy and DNA methylation play a role in defining exons at the chromatin level. This study provides new insights into the complexity and regulation of AS, and will enhance our understanding of AS in polyploid species. Our methodology for Iso-Seq data analysis will be a useful reference for the study of AS in other species.

2.兔全長轉錄組

Chen S Y, Deng F, Jia X, et al. A transcriptome atlas of rabbit revealed by PacBio single-molecule long-read sequencing[J]. Scientific Reports, 2017, 7.

摘要

眾所周知,轉錄多樣性對真核生物的生物調控有很大貢獻。自從第二代測序技術出現以來,大量的RNA測序研究大大提高了我們對轉錄複雜度的理解。然而,由於短read組裝的困難,獲得全長轉錄本仍然是一個巨大的挑戰。在本研究中,我們採用PacBio單分子長讀長測序技術,用於繪製兔(Oryctolagus cuniculus)的全轉錄組圖譜。我們從14,474個基因座中獲得了36,186個高可信度轉錄本,其中超過23%的基因座和66%的轉錄本在目前的參考基因組中尚未被注釋。此外,約17%的轉錄本顯示為非編碼RNA。在此重新構建的轉錄本中檢測到多達24,797個可變剪接(AS)和11,184個可選擇性多聚腺苷酸化(APA)事件。結果提供了一整套全面的轉錄本參考數據集,從而有助於改進兔基因組的注釋。

英文摘要

It is widely acknowledged that transcriptional diversity largely contributes to biological regulation in eukaryotes. Since the advent of second-generation sequencing technologies, a large number of RNA sequencing studies have considerably improved our understanding of transcriptome complexity. However, it still remains a huge challenge for obtaining full-length transcripts because of difficulties in the short read-based assembly. In the present study we employ PacBio single-molecule long-read sequencing technology for whole-transcriptome profiling in rabbit (Oryctolagus cuniculus). We totally obtain 36,186 high-confidence transcripts from 14,474 genic loci, among which more than 23% of genic loci and 66% of isoforms have not been annotated yet within the current reference genome. Furthermore, about 17% of transcripts are computationally revealed to be non-coding RNAs. Up to 24,797 alternative splicing (AS) and 11,184 alternative polyadenylation (APA) events are detected within this de novo constructed transcriptome, respectively. The results provide a comprehensive set of reference transcripts and hence contribute to the improved annotation of rabbit genome.

3.甘蔗全長轉錄本

Hoang N V, Furtado A, Mason P J, et al. A survey of the complex transcriptome from the highly polyploid sugarcane genome using full-length isoform sequencing and de novo assembly from short read sequencing[J]. Bmc Genomics, 2017, 18(1):395.

摘要

對來自22個甘蔗品種的不同發育階段的葉,節間和根組織的混樣RNA樣品進行Iso-Seq測序,以探索捕獲全長轉錄本的可能性。共獲得107,598種非冗餘的的轉錄本,占預計甘蔗基因總數的71%。大部分數據集(92%)與植物蛋白質資料庫相匹配,而超過2%是新的轉錄本,超過2%是長期非編碼RNA。總共序列約56%和23%分別注釋到GO和KEGG通路資料庫。與來自同一實驗的公共資料庫中Illumina 二代RNA測序(RNA-Seq)的從頭組裝結果比較顯示,Iso-Seq方法獲得更多全長轉錄本,具有較高的N50和大的平均長度;而在RNA-Seq中捕獲了更多的基因和RNA轉錄本。只有62%的PacBio轉錄本能夠比對到67%的二代從頭組裝的轉錄本中,而未比對上的歸因於包含葉/根組織和PacBio的歸一化,以及二代組裝結果中更多的基因和RNA轉錄本。 約69%PacBio轉錄本能夠比對到高粱基因組上,而二代從頭組裝轉錄本約41%能夠比對上

英文摘要

Despite the economic importance of sugarcane in sugar and bioenergy production, there is not yet a reference genome available. Most of the sugarcane transcriptomic studies have been based on Saccharum officinarum gene indices (SoGI), expressed sequence tags (ESTs) and de novo assembled transcript contigs from short-reads; hence knowledge of the sugarcane transcriptome is limited in relation to transcript length and number of transcript isoforms.

The sugarcane transcriptome was sequenced using PacBio isoform sequencing (Iso-Seq) of a pooled RNA sample derived from leaf, internode and root tissues, of different developmental stages, from 22 varieties, to explore the potential for capturing full-length transcript isoforms. A total of 107,598 unique transcript isoforms were obtained, representing about 71% of the total number of predicted sugarcane genes. The majority of this dataset (92%) matched the plant protein database, while just over 2% was novel transcripts, and over 2% was putative long non-coding RNAs. About 56% and 23% of total sequences were annotated against the gene ontology and KEGG pathway databases, respectively. Comparison with de novo contigs from Illumina RNA-Sequencing (RNA-Seq) of the internode samples from the same experiment and public databases showed that the Iso-Seq method recovered more full-length transcript isoforms, had a higher N50 and average length of largest 1,000 proteins; whereas a greater representation of the gene content and RNA diversity was captured in RNA-Seq. Only 62% of PacBio transcript isoforms matched 67% of de novo contigs, while the non-matched proportions were attributed to the inclusion of leaf/root tissues and the normalization in PacBio, and the representation of more gene content and RNA classes in the de novo assembly, respectively. About 69% of PacBio transcript isoforms and 41% of de novo contigs aligned with the sorghum genome, indicating the high conservation of orthologs in the genic regions of the two genomes.

The transcriptome dataset should contribute to improved sugarcane gene models and sugarcane protein predictions; and will serve as a reference database for analysis of transcript expression in sugarcane.

喜歡這篇文章嗎?立刻分享出去讓更多人知道吧!

本站內容充實豐富,博大精深,小編精選每日熱門資訊,隨時更新,點擊「搶先收到最新資訊」瀏覽吧!


請您繼續閱讀更多來自 EasyScience 的精彩文章:

周末坐上AI大巴,提前感受未來
白皮書:中國納米專利數量全球第一 論文是美國的兩倍
創業者們,薛定諤的貓其實你們都有一隻
蛋白質組學和代謝組學聯合搞事情
非常難得的金相制樣秘籍,趕緊收藏!

TAG:EasyScience |