rm sample redo DESeq2#
[1]:
library(UpSetR)
library(readxl)
library(ggplot2)
library(clusterProfiler)
library(DESeq2)
clusterProfiler v3.16.1 For help: https://guangchuangyu.github.io/software/clusterProfiler
If you use clusterProfiler in published research, please cite:
Guangchuang Yu, Li-Gen Wang, Yanyan Han, Qing-Yu He. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS: A Journal of Integrative Biology. 2012, 16(5):284-287.
Attaching package: ‘clusterProfiler’
The following object is masked from ‘package:stats’:
filter
Loading required package: S4Vectors
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:parallel’:
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from ‘package:stats’:
IQR, mad, sd, var, xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, append, as.data.frame, basename, cbind, colnames,
dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
union, unique, unsplit, which, which.max, which.min
Attaching package: ‘S4Vectors’
The following object is masked from ‘package:clusterProfiler’:
rename
The following object is masked from ‘package:base’:
expand.grid
Loading required package: IRanges
Attaching package: ‘IRanges’
The following object is masked from ‘package:clusterProfiler’:
slice
Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: DelayedArray
Loading required package: matrixStats
Attaching package: ‘matrixStats’
The following objects are masked from ‘package:Biobase’:
anyMissing, rowMedians
Attaching package: ‘DelayedArray’
The following objects are masked from ‘package:matrixStats’:
colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges
The following object is masked from ‘package:clusterProfiler’:
simplify
The following objects are masked from ‘package:base’:
aperm, apply, rowsum
[5]:
read_data <- function(file){
data <- data.frame(read_excel(file))
print(dim(data))
data$abslog2FC <- abs(data$log2FoldChange)
data<- subset(data, abslog2FC>2 & padj<0.01, select=c("ID", "regulated", "log2FoldChange", "pvalue", "padj"))
#data<- subset(data, abslog2FC>2 & padj<0.01)
rownames(data) <- data$ID
print(dim(data))
return(data)
}
read_data2 <- function(file){
data <- data.frame(read_excel(file))
print(dim(data))
data$abslog2FC <- abs(data$log2FoldChange)
#data<- subset(data, abslog2FC>2 & padj<0.01, select=c("ID", "regulated", "log2FoldChange", "pvalue", "padj"))
#data<- subset(data, abslog2FC>2 & padj<0.01)
rownames(data) <- data$ID
print(dim(data))
return(data)
}
[8]:
C24 <- read_data2("~/Documents/phd/tomato_metabolic/data/番茄转录组数据分析/CC:C/全部基因/C24h_vs_CC24h.all.annot.xlsx")
[1] 20527 34
[1] 20527 35
[16]:
C24_counts <- subset(C24, select=c("C24h.1_count", "C24h.2_count", "CC24h.1_count", "CC24h.2_count", "CC24h.3_count"))
[20]:
design = data.frame(c("C", "C", "CC", "CC", "CC"))
colnames(design) <- c("condition")
rownames(design) <- c("C24h.1_count", "C24h.2_count", "CC24h.1_count", "CC24h.2_count", "CC24h.3_count")
design
| condition | |
|---|---|
| <chr> | |
| C24h.1_count | C |
| C24h.2_count | C |
| CC24h.1_count | CC |
| CC24h.2_count | CC |
| CC24h.3_count | CC |
[48]:
ddsFullCountTable <- DESeqDataSetFromMatrix(countData = C24_counts, colData = design , design = ~condition)
dds = DESeq(ddsFullCountTable)
res = results(dds , contrast = c("condition", "CC", "C"))
converting counts to integer mode
Warning message in DESeqDataSet(se, design = design, ignoreRank):
“some variables in design formula are characters, converting to factors”
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
[49]:
res
log2 fold change (MLE): condition CC vs C
Wald test p-value: condition CC vs C
DataFrame with 20527 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue
<numeric> <numeric> <numeric> <numeric> <numeric>
Solyc04g071083.1 7724.324 8.39113 0.2208068 38.0021 0.00000e+00
Solyc08g068730.1 1271.626 6.63353 0.2631825 25.2050 3.52647e-140
Solyc04g064760.3 1086.379 3.03536 0.1185985 25.5936 1.79831e-144
Solyc10g084240.3 841.275 4.35920 0.1849628 23.5680 8.21383e-123
Solyc05g008895.1 5964.746 2.54905 0.0926708 27.5065 1.46732e-166
... ... ... ... ... ...
Solyc11g044910.2 15952.8277 3.35938 0.511573 6.56676 5.14207e-11
Solyc12g013700.2 1929.0393 0.37018 0.210633 1.75746 7.88392e-02
Solyc12g098603.1 31.6306 -1.66827 0.469552 -3.55290 3.81015e-04
novel.1271 49.9403 1.93188 0.456657 4.23048 2.33195e-05
novel.6321 16.9013 1.04730 0.680772 1.53841 1.23949e-01
padj
<numeric>
Solyc04g071083.1 0.00000e+00
Solyc08g068730.1 3.29036e-137
Solyc04g064760.3 1.84570e-141
Solyc10g084240.3 6.02162e-120
Solyc05g008895.1 2.50997e-163
... ...
Solyc11g044910.2 5.34166e-10
Solyc12g013700.2 1.45311e-01
Solyc12g098603.1 1.40415e-03
novel.1271 1.11114e-04
novel.6321 2.10517e-01
[24]:
colnames(C24)
- 'ID'
- 'C24h.1_fpkm'
- 'C24h.2_fpkm'
- 'C24h.3_fpkm'
- 'CC24h.1_fpkm'
- 'CC24h.2_fpkm'
- 'CC24h.3_fpkm'
- 'C24h.1_count'
- 'C24h.2_count'
- 'C24h.3_count'
- 'CC24h.1_count'
- 'CC24h.2_count'
- 'CC24h.3_count'
- 'log2FoldChange'
- 'pvalue'
- 'padj'
- 'regulated'
- 'chr'
- 'start'
- 'end'
- 'strand'
- 'gene_name'
- 'gene_description'
- 'KEGG'
- 'KEGG.pathway'
- 'NR'
- 'Swissprot'
- 'Tremble'
- 'KOG'
- 'GO'
- 'Pfam'
- 'TF.Family'
- 'TF.Category'
- 'TF.Classification'
- 'abslog2FC'
[50]:
C24$log2FoldChange <- res$log2FoldChange
C24$pvalue <- res$pvalue
C24$padj <- res$padj
C24$abslog2FC <- abs(res$log2FoldChange)
[51]:
C24$regulated <- "false"
C24[subset(C24, log2FoldChange>2 & padj<0.01)$ID, "regulated"] = "up"
C24[subset(C24, log2FoldChange>-2 & padj<0.01)$ID, "regulated"] = "down"
[52]:
write.table(C24, "newdata/C24_rmC24-3_new_DE.xls", sep="\t", quote=FALSE)
C84#
[53]:
C84 <- read_data2("~/Documents/phd/tomato_metabolic/data/番茄转录组数据分析/CC:C/全部基因/C84h_vs_CC84h.all.annot.xlsx")
[1] 20690 34
[1] 20690 35
[54]:
C84_counts <- subset(C84, select=c("C84h.1_count", "C84h.3_count", "CC84h.1_count", "CC84h.2_count", "CC84h.3_count"))
[55]:
design = data.frame(c("C", "C", "CC", "CC", "CC"))
colnames(design) <- c("condition")
rownames(design) <- c("C84h.1_count", "C84h.3_count", "CC84h.1_count", "CC84h.2_count", "CC84h.3_count")
design
| condition | |
|---|---|
| <chr> | |
| C84h.1_count | C |
| C84h.3_count | C |
| CC84h.1_count | CC |
| CC84h.2_count | CC |
| CC84h.3_count | CC |
[56]:
ddsFullCountTable <- DESeqDataSetFromMatrix(countData = C84_counts, colData = design , design = ~condition)
dds = DESeq(ddsFullCountTable)
res = results(dds , contrast = c("condition", "CC", "C"))
converting counts to integer mode
Warning message in DESeqDataSet(se, design = design, ignoreRank):
“some variables in design formula are characters, converting to factors”
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
[57]:
C84$log2FoldChange <- res$log2FoldChange
C84$pvalue <- res$pvalue
C84$padj <- res$padj
C84$abslog2FC <- abs(res$log2FoldChange)
C84$regulated <- "false"
C84[subset(C84, log2FoldChange>2 & padj<0.01)$ID, "regulated"] = "up"
C84[subset(C84, log2FoldChange>-2 & padj<0.01)$ID, "regulated"] = "down"
[58]:
write.table(C84, "newdata/C84_rmC84-2_new_DE.xls", sep="\t", quote=FALSE)
CD 96-2#
[ ]:
[ ]:
[ ]:
[ ]:
[ ]:
[ ]: