把tcga大计划的CNS级别文章标题画一个词云
时间:2022-07-23
本文章向大家介绍把tcga大计划的CNS级别文章标题画一个词云,主要内容包括其使用实例、应用技巧、基本知识点总结和需要注意事项,具有一定的参考价值,需要的朋友可以参考一下。
TCGA计划官方文章在:https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga/publications
全部的标题的英文很容易提取和整理,如下:
Comprehensive genomic characterization defines human glioblastoma genes and core pathways
Integrated genomic analyses of ovarian carcinoma
Comprehensive molecular characterization of human colon and rectal cancer
Comprehensive molecular portraits of human breast tumours
Comprehensive genomic characterization of squamous cell lung cancers
Integrated genomic characterization of endometrial carcinoma
Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia
Comprehensive molecular characterization of clear cell renal cell carcinoma
The Cancer Genome Atlas Pan-Cancer analysis project
The somatic genomic landscape of glioblastoma
Comprehensive molecular characterization of urothelial bladder carcinoma
Comprehensive molecular profiling of lung adenocarcinoma
Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin
The Somatic Genomic Landscape of Chromophobe Renal Cell Carcinoma
Comprehensive molecular characterization of gastric adenocarcinoma
Integrated genomic characterization of papillary thyroid carcinoma
Comprehensive genomic characterization of head and neck squamous cell carcinomas
Genomic Classification of Cutaneous Melanoma
Comprehensive, Integrative Genomic Analysis of Diffuse Lower-Grade Gliomas
Comprehensive Molecular Portraits of Invasive Lobular Breast Cancer
The Molecular Taxonomy of Primary Prostate Cancer
Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma
Comprehensive Pan-Genomic Characterization of Adrenocortical Carcinoma
Distinct patterns of somatic genome alterations in lung adenocarcinomas and squamous cell carcinomas
Integrated genomic characterization of oesophageal carcinoma
Comprehensive Molecular Characterization of Pheochromocytoma and Paraganglioma
Integrated Molecular Characterization of Uterine Carcinosarcoma
Integrative Genomic Analysis of Cholangiocarcinoma Identifies Distinct IDH-Mutant Molecular Profiles
Integrated genomic and molecular characterization of cervical cancer
Comprehensive and Integrative Genomic Characterization of Hepatocellular Carcinoma
Integrative Analysis Identifies Four Molecular and Clinical Subsets in Uveal Melanoma
Integrated Genomic Characterization of Pancreatic Ductal Adenocarcinoma
Comprehensive Molecular Characterization of Muscle-Invasive Bladder Cancer
Comprehensive and Integrated Genomic Characterization of Adult Soft Tissue Sarcomas
The Integrated Genomic Landscape of Thymic Epithelial Tumors
Pan-cancer Alterations of the MYC Oncogene and Its Proximal Network across the Cancer Genome Atlas
Scalable Open Science Approach for Mutation Calling of Tumor Exomes Using Multiple Genomic Pipelines
Molecular Characterization and Clinical Relevance of Metabolic Expression Subtypes in Human Cancers
Systematic Analysis of Splice-Site-Creating Mutations in Cancer
Somatic Mutational Landscape of Splicing Factor Genes and Their Functional Consequences across 33 Cancer Types
The Cancer Genome Atlas Comprehensive Molecular Characterization of Renal Cell Carcinoma
Pan-Cancer Analysis of lncRNA Regulation Supports Their Targeting of Cancer Genes in Each Tumor Context
Spatial Organization and Molecular Correlation of Tumor-Infiltrating Lymphocytes Using Deep Learning on Pathology Images
Machine Learning Detects Pan-cancer Ras Pathway Activation in The Cancer Genome Atlas
Genomic and Molecular Landscape of DNA Damage Repair Deficiency across The Cancer Genome Atlas
Driver Fusions and Their Implications in the Development and Treatment of Human Cancers
Genomic, Pathway Network, and Immunologic Features Distinguishing Squamous Carcinomas
Integrated Genomic Analysis of the Ubiquitin Pathway across Cancer Types
SnapShot: TCGA-Analyzed Tumors
The Cancer Genome Atlas: Creating Lasting Value beyond Its Data
Machine Learning Identifies Stemness Features Associated with Oncogenic Dedifferentiation
Oncogenic Signaling Pathways in The Cancer Genome Atlas
Perspective on Oncogenic Processes at the End of the Beginning of Cancer Genomics
Comprehensive Characterization of Cancer Driver Genes and Mutations
An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics
Pathogenic Germline Variants in 10,389 Adult Cancers
A Pan-Cancer Analysis of Enhancer Expression in Nearly 9000 Patient Samples
Genomic and Functional Approaches to Understanding Cancer Aneuploidy
A Comprehensive Pan-Cancer Molecular Study of Gynecologic and Breast Cancers
Comparative Molecular Analysis of Gastrointestinal Adenocarcinomas
lncRNA Epigenetic Landscape Analysis Identifies EPIC1 as an Oncogenic lncRNA that Interacts with MYC and Promotes Cell-Cycle Progression in Cancer
The Immune Landscape of Cancer
Integrated Molecular Characterization of Testicular Germ Cell Tumors
Comprehensive Analysis of Alternative Splicing Across Tumors from 8,705 Patients
A Pan-Cancer Analysis Reveals High-Frequency Genetic Alterations in Mediators of Signaling by the TGF-β Superfamily
Integrative Molecular Characterization of Malignant Pleural Mesothelioma
The chromatin accessibility landscape of primary human cancers
Comprehensive Molecular Characterization of the Hippo Signaling Pathway in Cancer
Before and After: Comparison of Legacy and Harmonized TCGA Genomic Data Commons’ Data
Comprehensive Analysis of Genetic Ancestry and Its Molecular Correlates in Cancer
简单的使用bing搜索一下关键词:word clound in r ,就可以找到解决方案,第一个链接就是:http://www.sthda.com/english/wiki/text-mining-and-word-cloud-fundamentals-in-r-5-simple-steps-you-should-know,代码分成5个步骤。
- Step 1: Create a text file
- Step 2 : Install and load the required packages
- Step 3 : Text mining
- Step 4 : Build a term-document matrix
- Step 5 : Generate the Word cloud
把R的知识点路线图搞定,如下:
- 了解常量和变量概念
- 加减乘除等运算(计算器)
- 多种数据类型(数值,字符,逻辑,因子)
- 多种数据结构(向量,矩阵,数组,数据框,列表)
- 文件读取和写出
- 简单统计可视化
- 无限量函数学习
核心代码就是wordcloud函数,但是这个wordcloud函数要求的输入数据就需要认真做出来。
# 安装R包相信无需再强调了
library("tm")
library("SnowballC")
library("wordcloud")
library("RColorBrewer")
# 这里我们直接读取自己电脑剪切的数据即可
# 运行下面这句代码的同时,需要保证你已经复制了前面我们整理好的文章标题哦!
text=readLines(pipe("pbpaste"))
# 好像这里Mac系统跟Windows系统稍微不一样,大家需要自行把握
# Load the data as a corpus
docs <- Corpus(VectorSource(text))
toSpace <- content_transformer(function (x , pattern ) gsub(pattern, " ", x))
docs <- tm_map(docs, toSpace, "/")
docs <- tm_map(docs, toSpace, "@")
docs <- tm_map(docs, toSpace, "\|")
# Convert the text to lower case
docs <- tm_map(docs, content_transformer(tolower))
# Remove numbers
docs <- tm_map(docs, removeNumbers)
# Remove english common stopwords
docs <- tm_map(docs, removeWords, stopwords("english"))
# Remove your own stop word
# specify your stopwords as a character vector
docs <- tm_map(docs, removeWords, c("blabla1", "blabla2"))
# Remove punctuations
docs <- tm_map(docs, removePunctuation)
# Eliminate extra white spaces
docs <- tm_map(docs, stripWhitespace)
# Text stemming
# docs <- tm_map(docs, stemDocument)
dtm <- TermDocumentMatrix(docs)
m <- as.matrix(dtm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
head(d, 10)
set.seed(1234)
wordcloud(words = d$word, freq = d$freq, min.freq = 1,
max.words=200, random.order=FALSE, rot.per=0.35,
colors=brewer.pal(8, "Dark2"))
词云绘图结果每次布局都不一样哦,如下所示:
其实就是把词频给可视化了一下:
> head(d, 10)
word freq
1 characterization 25
2 molecular 25
3 genomic 24
4 cancer 23
5 comprehensive 22
6 analysis 13
7 integrated 12
8 carcinoma 11
9 cell 8
10 genome 8
出现次数很多的单词,在词云就显示大一点,仅此而已。
在三年前我就整理并且制作了TCGA肿瘤数据库知识图谱视频教程,一年半前免费公布在生信技能树的B站,现在勉勉强强也快有两万的观看量。
- 视频地址:https://www.bilibili.com/video/av49363776
- 代码地址:https://github.com/jmzeng1314/tcga_example
- xml-rpc(2)-first demo_v2
- xml-rpc(1)-first demo
- ASP.NET MVC5+EF6+EasyUI 后台管理系统(55)-工作流设计-表单布局
- 网站源文件被注入了iframe代码—ARP欺骗的木马病毒攻击
- ASP.NET MVC5+EF6+EasyUI 后台管理系统--工作流演示截图
- 基于CPPN与GAN+VAE生成高分辨率图像
- ASP.NET MVC5+EF6+EasyUI 后台管理系统(54)-工作流设计-所有流程监控
- (收藏)搭建.NET Framework 3.0开发环境 及SharePoint 2007/WSS 3环境
- WCF技术剖析之八:ClientBase<T>中对ChannelFactory<T>的缓存机制
- ASP.NET MVC5+EF6+EasyUI 后台管理系统(48)-工作流设计-起草新申请
- 把windows2003“搬”到手机上。
- ASP.NET MVC5+EF6+EasyUI 后台管理系统(53)-工作流设计-我的批阅
- ASP.NET MVC5+EF6+EasyUI 后台管理系统--任务调度系统解析
- ASP.NET MVC5+EF6+EasyUI 后台管理系统(52)-美化EasyUI皮肤和图标
- JavaScript 教程
- JavaScript 编辑工具
- JavaScript 与HTML
- JavaScript 与Java
- JavaScript 数据结构
- JavaScript 基本数据类型
- JavaScript 特殊数据类型
- JavaScript 运算符
- JavaScript typeof 运算符
- JavaScript 表达式
- JavaScript 类型转换
- JavaScript 基本语法
- JavaScript 注释
- Javascript 基本处理流程
- Javascript 选择结构
- Javascript if 语句
- Javascript if 语句的嵌套
- Javascript switch 语句
- Javascript 循环结构
- Javascript 循环结构实例
- Javascript 跳转语句
- Javascript 控制语句总结
- Javascript 函数介绍
- Javascript 函数的定义
- Javascript 函数调用
- Javascript 几种特殊的函数
- JavaScript 内置函数简介
- Javascript eval() 函数
- Javascript isFinite() 函数
- Javascript isNaN() 函数
- parseInt() 与 parseFloat()
- escape() 与 unescape()
- Javascript 字符串介绍
- Javascript length属性
- javascript 字符串函数
- Javascript 日期对象简介
- Javascript 日期对象用途
- Date 对象属性和方法
- Javascript 数组是什么
- Javascript 创建数组
- Javascript 数组赋值与取值
- Javascript 数组属性和方法
- CentOS7安装Git
- RocketMQ-环境搭建(单master模式-ubuntu版)
- 使用Web.xml配置SpringMvc(同时使用xml配置文件)
- 第七节:Activiti6.0——Task的变量使用
- 第八节:Activiti6.0——启动流程相关
- 第九节:Activiti6.0——ReceiveTask接收信号、中间信号捕获事件和中间消息捕获事件的流程推进
- SpringBoot——全局异常捕获和自定义异常
- RTSP协议视频平台EasyNVR接入到EasyNVS管理平台后无法显示RTMP及RTSP视频流地址问题
- SpringBoot——配置logback日志
- Istio 运维实战系列(2):让人头大的『无头服务』-上
- 第十节:Activiti6.0——四种Job工作的产生与管理
- 使用vue3.0,不需要build也可以
- 听说vue项目不用build也能用?
- 使用 Vue 3.0,你可能不再需要Vuex了
- MySQL InnoDB索引:存储结构