「R」Obtain RNAseq Values for a Specific Gene in Xena Database
写这篇文档的原因是有使用者问我如何获取单个基因的表达值,这个操作我其实在很久之前的生存分析示例文档中介绍过,但用户有所疑惑,说明我写的不清楚或者无法找到,所以针对性就这类问题进行介绍。
❝Hi Shixiang, How can I use Xena tools to extract and compare RNAseq values for a specific gene for TCGA LUAD tumor vs. LUAD adjacent normal? Are there instructions provided anywhere on how to specifically extract the adjacent normal data?❞
When using UCSCXenaTools
package, you may want to focus on single gene analysis, a typical case has been shown in my previous blog UCSCXenaTools: Retrieve Gene Expression and Clinical Information from UCSC Xena for Survival Analysis[1]. Here I will describe how to get single gene values (especially RNAseq data) in details.
Let’s load package.
library(UCSCXenaTools)
First, Find Your Interest Dataset
UCSC Xena provides more than 1000 datasets, when you want to get values for single gene, you must select a target dataset. You can find them in the following table or from UCSC Xena datasets page[2].
DT::datatable(UCSCXenaTools::XenaData)
❝此处是 1000 多行的表格,查看原文 https://shixiangwang.github.io/home/en/post/2020-07-22-ucscxenatools-single-gene/[3] ❞
Pick up a dataset and get its XenaHosts
and XenaDatasets
, i.e. get its data hub host URL and dataset ID. You can copy them or you can use your R skill to get and store them in a object. For example, I got a reader want to study RNASeq values of TCGA LUAD gene.
I can use R:
library(dplyr)
#>
#> 载入程辑包:'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
ge <- XenaData %>%
filter(XenaHostNames == "tcgaHub") %>% # select TCGA Hub
XenaScan("TCGA Lung Adenocarcinoma") %>%
filter(DataSubtype == "gene expression RNAseq", Label == "IlluminaHiSeq")
str(ge)
#> tibble [1 × 17] (S3: tbl_df/tbl/data.frame)
#> $ XenaHosts : chr "https://tcga.xenahubs.net"
#> $ XenaHostNames : chr "tcgaHub"
#> $ XenaCohorts : chr "TCGA Lung Adenocarcinoma (LUAD)"
#> $ XenaDatasets : chr "TCGA.LUAD.sampleMap/HiSeqV2"
#> $ SampleCount : int 576
#> $ DataSubtype : chr "gene expression RNAseq"
#> $ Label : chr "IlluminaHiSeq"
#> $ Type : chr "genomicMatrix"
#> $ AnatomicalOrigin: chr "Lung"
#> $ SampleType : chr "tumor"
#> $ Tags : chr "cancer,non-small cell lung cancer"
#> $ ProbeMap : chr "probeMap/hugo_gencode_good_hg19_V24lift37_probemap"
#> $ LongTitle : chr "TCGA lung adenocarcinoma (LUAD) gene expression by RNAseq (polyA+ IlluminaHiSeq)"
#> $ Citation : chr NA
#> $ Version : chr "2017-10-13"
#> $ Unit : chr "log2(norm_count+1)"
#> $ Platform : chr "IlluminaHiSeq_RNASeqV2"
Or I just copy https://tcga.xenahubs.net
and TCGA.LUAD.sampleMap/HiSeqV2
.
Get Your Gene Values
Once you got dataset information, you can get a specific gene expression (it also works for gene-level CNV, mutation, etc based on your dataset) by fetch_dense_values
. Run ?fetch
in your R console to see more details.
For example, I will query the gene TP53
.
TP53 <- fetch_dense_values(
host = ge$XenaHosts, # You can also set "https://tcga.xenahubs.net"
dataset = ge$XenaDatasets, # You can also set "TCGA.LUAD.sampleMap/HiSeqV2"
identifiers = "TP53",
use_probeMap = TRUE
) %>%
.[1, ]
#> -> Checking identifiers...
#> -> use_probeMap is TRUE, skipping checking identifiers...
#> -> Done.
#> -> Checking samples...
#> -> Done.
#> -> Checking if the dataset has probeMap...
#> -> Done. ProbeMap is found.
head(TP53)
#> TCGA-69-7978-01 TCGA-62-8399-01 TCGA-78-7539-01 TCGA-50-5931-11 TCGA-73-4658-01
#> 9.89 8.31 10.35 9.62 10.02
#> TCGA-44-6775-01
#> 10.16
Typically, the TCGA sample ID have 15 letters, and the 14-15th letters mark a sample type. When it <10
, it is a tumor sample, otherwise it is a normal sample.
table(as.integer(substr(names(TP53), 14, 15)))
#>
#> 1 2 11
#> 515 2 59
Now you can start your analysis with this data.
Other Things May Help
In addition to fetch_*
functions, I generated many low-level API functions for UCSC Xena database, which described at https://shixiangwang.github.io/home/en/tools/ucscxenatools-api/. These functions can access different levels of data information in UCSC Xena. Some of them are combined to construct the core functionalities provided by UCSCXenaTools
for now.
NOTE: not API functions work well, I haven’t tested them all, they are all generated by dynamic code based on XQuery[4].
An R Shiny package UCSCXenaShiny[5] provides a web-based platform to download datasets and analyze single genes. Besides, we have constructed some functions to get pan-cancer level single gene expression, CNV and mutation etc.
You can download recent development version in GitHub with:
remotes::install_github("openbiox/XenaShiny")
After you load this package, you can use the following functions to get data easily.
get_ccle_cn_value: Fetch copy number value from CCLE dataset
get_ccle_gene_value: Fetch gene expression value from CCLE dataset
get_ccle_protein_value: Fetch gene protein expression value from CCLE dataset
get_ccle_mutation_status: Fetch gene mutation info from CCLE dataset
get_pancan_value: Fetch identifier value from pan-cancer dataset
get_pancan_gene_value: Fetch gene expression value from pan-cancer dataset
get_pancan_protein_value: Fetch protein expression value from pan-cancer dataset
get_pancan_mutation_status: Fetch mutation status value from pan-cancer dataset
get_pancan_cn_value: Fetch gene copy number value from pan-cancer dataset processed by GISTIC 2.0
Any questions can be posted online at https://github.com/openbiox/UCSCXenaShiny/issues or https://github.com/ropensci/UCSCXenaTools/issues.
总结一下,除了 UCSCXenaTools 的 README,加上本文,我已经写了 4 篇介绍文档了:
- Introduction and basic usage of UCSCXenaTools[6]
- UCSCXenaTools: Retrieve Gene Expression and Clinical Information from UCSC Xena for Survival Analysis[7]
- Obtain RNAseq Values for a Specific Gene in Xena Database[8]
- UCSC Xena Access APIs in UCSCXenaTools[9]
References
- Wang et al., (2019). The UCSCXenaTools R package: a toolkit for accessing genomics data from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq. Journal of Open Source Software, 4(40), 1627, https://doi.org/10.21105/joss.01627
- Wang, S.; Xiong, Y.; Gu, K.; Zhao, L.; Li, Y.; Zhao, F.; Li, X.; Liu, X. UCSCXenaShiny: An R Package for Exploring and Analyzing UCSC Xena Public Datasets in Web Browser. Preprints 2020, 2020070179 (doi: 10.20944/preprints202007.0179.v1).
References
[1]UCSCXenaTools: Retrieve Gene Expression and Clinical Information from UCSC Xena for Survival Analysis: https://shixiangwang.github.io/home/en/post/ucscxenatools-201908/
[2]UCSC Xena datasets page: https://xenabrowser.net/datapages/
[3]https://shixiangwang.github.io/home/en/post/2020-07-22-ucscxenatools-single-gene/: https://shixiangwang.github.io/home/en/post/2020-07-22-ucscxenatools-single-gene/
[4]XQuery: https://github.com/ropensci/UCSCXenaTools/tree/master/inst/queries
[5]UCSCXenaShiny: https://github.com/openbiox/UCSCXenaShiny/
[6]Introduction and basic usage of UCSCXenaTools: https://shixiangwang.github.io/home/en/tools/ucscxenatools-intro
[7]UCSCXenaTools: Retrieve Gene Expression and Clinical Information from UCSC Xena for Survival Analysis: https://shixiangwang.github.io/home/en/post/ucscxenatools-201908/
[8]Obtain RNAseq Values for a Specific Gene in Xena Database: https://shixiangwang.github.io/home/en/post/2020-07-22-ucscxenatools-single-gene/
[9]UCSC Xena Access APIs in UCSCXenaTools: https://shixiangwang.github.io/home/en/tools/ucscxenatools-api
- hive学习笔记——Hive表的创建
- 和开发讨论的一个数据变更需求(r9笔记第8天)
- Java案例-分数查等级程序
- Go语言的标准输入-scan 和bufio
- Java案例-判断给定年份是闰年
- 分分钟搭建Oracle环境 (r9笔记第23天)
- Java面试系列25-spring(4)-国际化、加入web容器,标签、事务等
- Java面试系列24-spring(3)-配置文件相关问题
- Java基础-day03-基础题
- 简单易学的机器学习算法——EM算法
- 备库跳归档恢复的有趣案例(r9笔记第19天)
- Java基础-day02-代码题
- 优化算法——拟牛顿法之L-BFGS算法
- 一次性能突发情况的紧急修复(r9笔记第18天)
- JavaScript 教程
- JavaScript 编辑工具
- JavaScript 与HTML
- JavaScript 与Java
- JavaScript 数据结构
- JavaScript 基本数据类型
- JavaScript 特殊数据类型
- JavaScript 运算符
- JavaScript typeof 运算符
- JavaScript 表达式
- JavaScript 类型转换
- JavaScript 基本语法
- JavaScript 注释
- Javascript 基本处理流程
- Javascript 选择结构
- Javascript if 语句
- Javascript if 语句的嵌套
- Javascript switch 语句
- Javascript 循环结构
- Javascript 循环结构实例
- Javascript 跳转语句
- Javascript 控制语句总结
- Javascript 函数介绍
- Javascript 函数的定义
- Javascript 函数调用
- Javascript 几种特殊的函数
- JavaScript 内置函数简介
- Javascript eval() 函数
- Javascript isFinite() 函数
- Javascript isNaN() 函数
- parseInt() 与 parseFloat()
- escape() 与 unescape()
- Javascript 字符串介绍
- Javascript length属性
- javascript 字符串函数
- Javascript 日期对象简介
- Javascript 日期对象用途
- Date 对象属性和方法
- Javascript 数组是什么
- Javascript 创建数组
- Javascript 数组赋值与取值
- Javascript 数组属性和方法
- 2020-09-25:rust中Point是结构体类型,【let p1=Point{x:25,y:25};let p2=p1;】...
- R语言在BRFSS数据中可视化分析探索糖尿病的影响因素
- R语言可视化探索BRFSS数据并逻辑回归Logistic回归预测中风
- R语言对BRFSS数据探索回归数据分析
- 使用R语言创建自定义桑基图Sankey图
- 在r语言中使用GAM(广义相加模型)进行电力负荷时间序列分析
- R语言中的偏最小二乘回归PLS-DA
- R语言实现偏最小二乘回归法 partial least squares (PLS)回归
- 用于NLP的Python:使用Keras的多标签文本LSTM神经网络分类
- python爬虫进行Web抓取LDA主题语义数据分析报告
- Python使用神经网络进行简单文本分类
- 在Python中自然语言处理生成词云WordCloud
- 使用Python中的ImageAI进行对象检测
- 适用于NLP自然语言处理的Python:使用Facebook FastText库
- R语言ISLR工资数据进行多项式回归和样条回归分析