rm sample redo DESeq2#

[1]:
library(UpSetR)
library(readxl)
library(ggplot2)
library(clusterProfiler)
library(DESeq2)


clusterProfiler v3.16.1  For help: https://guangchuangyu.github.io/software/clusterProfiler

If you use clusterProfiler in published research, please cite:
Guangchuang Yu, Li-Gen Wang, Yanyan Han, Qing-Yu He. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS: A Journal of Integrative Biology. 2012, 16(5):284-287.


Attaching package: ‘clusterProfiler’


The following object is masked from ‘package:stats’:

    filter


Loading required package: S4Vectors

Loading required package: stats4

Loading required package: BiocGenerics

Loading required package: parallel


Attaching package: ‘BiocGenerics’


The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB


The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs


The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which, which.max, which.min



Attaching package: ‘S4Vectors’


The following object is masked from ‘package:clusterProfiler’:

    rename


The following object is masked from ‘package:base’:

    expand.grid


Loading required package: IRanges


Attaching package: ‘IRanges’


The following object is masked from ‘package:clusterProfiler’:

    slice


Loading required package: GenomicRanges

Loading required package: GenomeInfoDb

Loading required package: SummarizedExperiment

Loading required package: Biobase

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.


Loading required package: DelayedArray

Loading required package: matrixStats


Attaching package: ‘matrixStats’


The following objects are masked from ‘package:Biobase’:

    anyMissing, rowMedians



Attaching package: ‘DelayedArray’


The following objects are masked from ‘package:matrixStats’:

    colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges


The following object is masked from ‘package:clusterProfiler’:

    simplify


The following objects are masked from ‘package:base’:

    aperm, apply, rowsum


[5]:
read_data <- function(file){
    data <- data.frame(read_excel(file))
    print(dim(data))
    data$abslog2FC <- abs(data$log2FoldChange)
    data<- subset(data, abslog2FC>2 & padj<0.01, select=c("ID", "regulated", "log2FoldChange", "pvalue", "padj"))
    #data<- subset(data, abslog2FC>2 & padj<0.01)
    rownames(data) <- data$ID
    print(dim(data))
    return(data)
}

read_data2 <- function(file){
    data <- data.frame(read_excel(file))
    print(dim(data))
    data$abslog2FC <- abs(data$log2FoldChange)
    #data<- subset(data, abslog2FC>2 & padj<0.01, select=c("ID", "regulated", "log2FoldChange", "pvalue", "padj"))
    #data<- subset(data, abslog2FC>2 & padj<0.01)
    rownames(data) <- data$ID
    print(dim(data))
    return(data)
}

[8]:
C24 <- read_data2("~/Documents/phd/tomato_metabolic/data/番茄转录组数据分析/CC:C/全部基因/C24h_vs_CC24h.all.annot.xlsx")
[1] 20527    34
[1] 20527    35
[16]:
C24_counts <- subset(C24, select=c("C24h.1_count", "C24h.2_count", "CC24h.1_count", "CC24h.2_count", "CC24h.3_count"))
[20]:
design = data.frame(c("C", "C", "CC", "CC", "CC"))
colnames(design) <- c("condition")
rownames(design) <- c("C24h.1_count", "C24h.2_count", "CC24h.1_count", "CC24h.2_count", "CC24h.3_count")
design
A data.frame: 5 × 1
condition
<chr>
C24h.1_countC
C24h.2_countC
CC24h.1_countCC
CC24h.2_countCC
CC24h.3_countCC
[48]:
ddsFullCountTable <- DESeqDataSetFromMatrix(countData = C24_counts, colData = design , design = ~condition)
dds = DESeq(ddsFullCountTable)
res = results(dds , contrast = c("condition", "CC", "C"))

converting counts to integer mode

Warning message in DESeqDataSet(se, design = design, ignoreRank):
“some variables in design formula are characters, converting to factors”
estimating size factors

estimating dispersions

gene-wise dispersion estimates

mean-dispersion relationship

final dispersion estimates

fitting model and testing

[49]:
res
log2 fold change (MLE): condition CC vs C
Wald test p-value: condition CC vs C
DataFrame with 20527 rows and 6 columns
                   baseMean log2FoldChange     lfcSE      stat       pvalue
                  <numeric>      <numeric> <numeric> <numeric>    <numeric>
Solyc04g071083.1   7724.324        8.39113 0.2208068   38.0021  0.00000e+00
Solyc08g068730.1   1271.626        6.63353 0.2631825   25.2050 3.52647e-140
Solyc04g064760.3   1086.379        3.03536 0.1185985   25.5936 1.79831e-144
Solyc10g084240.3    841.275        4.35920 0.1849628   23.5680 8.21383e-123
Solyc05g008895.1   5964.746        2.54905 0.0926708   27.5065 1.46732e-166
...                     ...            ...       ...       ...          ...
Solyc11g044910.2 15952.8277        3.35938  0.511573   6.56676  5.14207e-11
Solyc12g013700.2  1929.0393        0.37018  0.210633   1.75746  7.88392e-02
Solyc12g098603.1    31.6306       -1.66827  0.469552  -3.55290  3.81015e-04
novel.1271          49.9403        1.93188  0.456657   4.23048  2.33195e-05
novel.6321          16.9013        1.04730  0.680772   1.53841  1.23949e-01
                         padj
                    <numeric>
Solyc04g071083.1  0.00000e+00
Solyc08g068730.1 3.29036e-137
Solyc04g064760.3 1.84570e-141
Solyc10g084240.3 6.02162e-120
Solyc05g008895.1 2.50997e-163
...                       ...
Solyc11g044910.2  5.34166e-10
Solyc12g013700.2  1.45311e-01
Solyc12g098603.1  1.40415e-03
novel.1271        1.11114e-04
novel.6321        2.10517e-01
[24]:
colnames(C24)
  1. 'ID'
  2. 'C24h.1_fpkm'
  3. 'C24h.2_fpkm'
  4. 'C24h.3_fpkm'
  5. 'CC24h.1_fpkm'
  6. 'CC24h.2_fpkm'
  7. 'CC24h.3_fpkm'
  8. 'C24h.1_count'
  9. 'C24h.2_count'
  10. 'C24h.3_count'
  11. 'CC24h.1_count'
  12. 'CC24h.2_count'
  13. 'CC24h.3_count'
  14. 'log2FoldChange'
  15. 'pvalue'
  16. 'padj'
  17. 'regulated'
  18. 'chr'
  19. 'start'
  20. 'end'
  21. 'strand'
  22. 'gene_name'
  23. 'gene_description'
  24. 'KEGG'
  25. 'KEGG.pathway'
  26. 'NR'
  27. 'Swissprot'
  28. 'Tremble'
  29. 'KOG'
  30. 'GO'
  31. 'Pfam'
  32. 'TF.Family'
  33. 'TF.Category'
  34. 'TF.Classification'
  35. 'abslog2FC'
[50]:
C24$log2FoldChange <- res$log2FoldChange
C24$pvalue <- res$pvalue
C24$padj <- res$padj
C24$abslog2FC <- abs(res$log2FoldChange)

[51]:
C24$regulated <- "false"
C24[subset(C24, log2FoldChange>2 & padj<0.01)$ID, "regulated"] = "up"
C24[subset(C24, log2FoldChange>-2 & padj<0.01)$ID, "regulated"] = "down"
[52]:
write.table(C24, "newdata/C24_rmC24-3_new_DE.xls", sep="\t", quote=FALSE)

C84#

[53]:
C84 <- read_data2("~/Documents/phd/tomato_metabolic/data/番茄转录组数据分析/CC:C/全部基因/C84h_vs_CC84h.all.annot.xlsx")
[1] 20690    34
[1] 20690    35
[54]:
C84_counts <- subset(C84, select=c("C84h.1_count", "C84h.3_count", "CC84h.1_count", "CC84h.2_count", "CC84h.3_count"))
[55]:
design = data.frame(c("C", "C", "CC", "CC", "CC"))
colnames(design) <- c("condition")
rownames(design) <- c("C84h.1_count", "C84h.3_count", "CC84h.1_count", "CC84h.2_count", "CC84h.3_count")
design
A data.frame: 5 × 1
condition
<chr>
C84h.1_countC
C84h.3_countC
CC84h.1_countCC
CC84h.2_countCC
CC84h.3_countCC
[56]:
ddsFullCountTable <- DESeqDataSetFromMatrix(countData = C84_counts, colData = design , design = ~condition)
dds = DESeq(ddsFullCountTable)
res = results(dds , contrast = c("condition", "CC", "C"))

converting counts to integer mode

Warning message in DESeqDataSet(se, design = design, ignoreRank):
“some variables in design formula are characters, converting to factors”
estimating size factors

estimating dispersions

gene-wise dispersion estimates

mean-dispersion relationship

final dispersion estimates

fitting model and testing

[57]:
C84$log2FoldChange <- res$log2FoldChange
C84$pvalue <- res$pvalue
C84$padj <- res$padj
C84$abslog2FC <- abs(res$log2FoldChange)

C84$regulated <- "false"
C84[subset(C84, log2FoldChange>2 & padj<0.01)$ID, "regulated"] = "up"
C84[subset(C84, log2FoldChange>-2 & padj<0.01)$ID, "regulated"] = "down"
[58]:
write.table(C84, "newdata/C84_rmC84-2_new_DE.xls", sep="\t", quote=FALSE)

CD 96-2#

[ ]:

[ ]:

[ ]:

[ ]:

[ ]:

[ ]: