mold {ggbio} | R Documentation |
S4 method to transform a object to a data.frame.
## S4 method for signature 'eSet' mold(data) ## S4 method for signature 'GRanges' mold(data) ## S4 method for signature 'IRanges' mold(data) ## S4 method for signature 'GRangesList' mold(data, indName = "grl_name") ## S4 method for signature 'IRanges' mold(data) ## S4 method for signature 'Seqinfo' mold(data) ## S4 method for signature 'matrix' mold(data) ## S4 method for signature 'ExpressionSet' mold(data) ## S4 method for signature 'SummarizedExperiment' mold(data, assay.id = 1) ## S4 method for signature 'Views' mold(data) ## S4 method for signature 'Rle' mold(data) ## S4 method for signature 'RleList' mold(data)
data |
oringinal data object. |
indName |
character. When mold a |
assay.id |
an integer indicates which assay to be used to mold into the data.frame. |
For different object, we try to maximize the information kept during molding to a data.frame. Most cases, this is different from simply use method as.data.frame.
GRanges
return a data.frame with extra 'midpoint' column, which is (start+end)/2.
GRangesList
return a data.frame with extra 'midpoint' column, which is (start+end)/2, and with indName indicates which group they are originally from the list.
IRanges
return a data.frame with extra 'midpoint' column, which is (start+end)/2.
Seqinfo
return a data.frame column: seqnames, start, end, width, strand, midpoint, seqlengths, isCircular, genome.
matrix
return a data.frame with 'x', 'y', and 'value', 'row', 'col' column. If either colnames or rownames exists, a new 'colnames' or 'rownames' column will be created and added to the data.frame. Notice, 'x' and 'y' are numeric coordinates in the matrix while 'col' and 'row' are the same value but are all factors.
Views
return a data.frame with 'x', 'y', and 'value', 'start', 'end', 'width', 'midpoint', 'group' column. If either colnames or rownames exists, a new 'colnames' or 'rownames' column will be created and added to the data.frame. This is achieved by coerce it to a matrix first. Additional variable 'row' will be added to indicate the group, but it actually equals to 'y'.
ExpressionSet
parse the matrix by using
exprs
on it and then mold
the matrix to a
data.frame with 'x', 'y', and 'value', 'col', 'row' column, 'colnames' for
sample data and 'rownames' for features. 'x' and 'y' are numeric coordinates in the
matrix while 'col' and 'row' are the same value but are all
factors. The pheno data is also integrated with it.
Rle
coerce to a data.frame with column 'x', 'y', 'col', 'row', 'value'. 'x' and 'y' are numeric coordinates in the matrix while 'col' and 'row' are the same value but are all factors.
RleList
coerce to a data.frame with column 'x', 'y','col', 'row', 'value' and 'group', and 'group' variable indicates the original list entry number. 'x' and 'y' are numeric coordinates in the matrix while 'col' and 'row' are the same value but are all factors.
SummarizedExperiment
parse the matrix by using
exprs
on it and then mold
the matrix to a
data.frame with 'x', 'y', and 'value', 'col', 'row' column, 'colnames' for
sample data and 'rownames' for features. 'x' and 'y' are numeric coordinates in the
matrix while 'col' and 'row' are the same value but are all
factors. The colData
and rowData
are also integrated with it.
a data.frame object.
Tengfei Yin
set.seed(1)
N <- 1000
library(GenomicRanges)
## GRanges
gr <- GRanges(seqnames =
sample(c("chr1", "chr2", "chr3"),
size = N, replace = TRUE),
IRanges(
start = sample(1:300, size = N, replace = TRUE),
width = sample(70:75, size = N,replace = TRUE)),
strand = sample(c("+", "-", "*"), size = N,
replace = TRUE),
value = rnorm(N, 10, 3), score = rnorm(N, 100, 30),
sample = sample(c("Normal", "Tumor"),
size = N, replace = TRUE),
pair = sample(letters, size = N,
replace = TRUE))
## GRangesList
grl <- split(gr, values(gr)$pair)
head(mold(grl))
## seqnames start end width strand grl_name value score sample pair
## 1 chr2 252 324 73 * a 3.631 156.58 Normal a
## 2 chr2 14 83 70 - a 14.871 68.92 Tumor a
## 3 chr1 272 342 71 + a 9.060 86.62 Tumor a
## 4 chr2 25 95 71 + a 10.725 117.33 Normal a
## 5 chr3 35 105 71 - a 13.912 113.62 Normal a
## 6 chr2 160 230 71 + a 7.125 105.82 Tumor a
## midpoint
## 1 288.0
## 2 48.5
## 3 307.0
## 4 60.0
## 5 70.0
## 6 195.0
head(mold(grl, indName = "group_sample"))
## seqnames start end width strand group_sample value score sample pair
## 1 chr2 252 324 73 * a 3.631 156.58 Normal a
## 2 chr2 14 83 70 - a 14.871 68.92 Tumor a
## 3 chr1 272 342 71 + a 9.060 86.62 Tumor a
## 4 chr2 25 95 71 + a 10.725 117.33 Normal a
## 5 chr3 35 105 71 - a 13.912 113.62 Normal a
## 6 chr2 160 230 71 + a 7.125 105.82 Tumor a
## midpoint
## 1 288.0
## 2 48.5
## 3 307.0
## 4 60.0
## 5 70.0
## 6 195.0
## IRanges
ir <- ranges(gr)
head(mold(ir))
## start end width midpoint
## 1 160 234 75 197.0
## 2 206 280 75 243.0
## 3 115 189 75 152.0
## 4 287 358 72 322.5
## 5 36 106 71 71.0
## 6 12 81 70 46.5
## Seqinfo
seqlengths(gr) <- c(400, 500, 420)
head(mold(seqinfo(gr)))
## seqnames start end width strand midpoint seqlengths isCircular genome
## chr1 chr1 1 400 400 * 200.5 400 NA <NA>
## chr2 chr2 1 500 500 * 250.5 500 NA <NA>
## chr3 chr3 1 420 420 * 210.5 420 NA <NA>
## matrix
mx <- matrix(1:12, nrow = 3)
head(mold(mx))
## x y value row col
## 1 1 1 1 1 1
## 2 2 1 4 1 2
## 3 3 1 7 1 3
## 4 4 1 10 1 4
## 5 1 2 2 2 1
## 6 2 2 5 2 2
colnames(mx)
## NULL
colnames(mx) <- letters[1:ncol(mx)]
mx
## a b c d
## [1,] 1 4 7 10
## [2,] 2 5 8 11
## [3,] 3 6 9 12
head(mold(mx))
## x y value row col colnames
## 1 1 1 1 1 1 a
## 2 2 1 4 1 2 b
## 3 3 1 7 1 3 c
## 4 4 1 10 1 4 d
## 5 1 2 2 2 1 a
## 6 2 2 5 2 2 b
rownames(mx)
## NULL
rownames(mx) <- LETTERS[1:nrow(mx)]
head(mold(mx))
## x y value row col colnames rownames
## 1 1 1 1 1 1 a A
## 2 2 1 4 1 2 b A
## 3 3 1 7 1 3 c A
## 4 4 1 10 1 4 d A
## 5 1 2 2 2 1 a B
## 6 2 2 5 2 2 b B
## ExpressionSet
library(Biobase)
data(sample.ExpressionSet)
sample.ExpressionSet
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 500 features, 26 samples
## element names: exprs, se.exprs
## protocolData: none
## phenoData
## sampleNames: A B ... Z (26 total)
## varLabels: sex type score
## varMetadata: labelDescription
## featureData: none
## experimentData: use 'experimentData(object)'
## Annotation: hgu95av2
set.seed(1)
## select 50 features
idx <- sample(seq_len(dim(sample.ExpressionSet)[1]), size = 50)
eset <- sample.ExpressionSet[idx,]
head(mold(eset))
## x y value row col colnames rownames sex type score
## A 1 1 177.1 1 1 A 31372_at Female Control 0.75
## B 2 1 171.2 1 2 B 31372_at Male Case 0.40
## C 3 1 277.2 1 3 C 31372_at Male Control 0.73
## D 4 1 190.2 1 4 D 31372_at Male Case 0.42
## E 5 1 138.4 1 5 E 31372_at Female Case 0.93
## F 6 1 174.0 1 6 F 31372_at Male Control 0.22
## Rle
library(IRanges)
lambda <- c(rep(0.001, 4500), seq(0.001, 10, length = 500),
seq(10, 0.001, length = 500))
xVector <- rpois(1e4, lambda)
xRle <- Rle(xVector)
head(mold(xRle))
## value x y row col
## 1 0 1 1 1 1
## 2 0 2 1 1 2
## 3 0 3 1 1 3
## 4 0 4 1 1 4
## 5 0 5 1 1 5
## 6 0 6 1 1 6
## RleList
xRleList <- RleList(xRle, 2L * xRle)
xRleList
## SimpleRleList of length 2
## [[1]]
## numeric-Rle of length 10000 with 829 runs
## Lengths: 729 1 208 1 1599 1 ... 1 1 1 5 1 4512
## Values : 0 1 0 1 0 1 ... 1 0 1 0 1 0
##
## [[2]]
## numeric-Rle of length 10000 with 829 runs
## Lengths: 729 1 208 1 1599 1 ... 1 1 1 5 1 4512
## Values : 0 2 0 2 0 2 ... 2 0 2 0 2 0
head(mold(xRleList))
## value x y row col group
## 1 0 1 1 1 1 1
## 2 0 2 1 1 2 1
## 3 0 3 1 1 3 1
## 4 0 4 1 1 4 1
## 5 0 5 1 1 5 1
## 6 0 6 1 1 6 1
names(xRleList) <- c("a" ,"b")
xRleList
## SimpleRleList of length 2
## $a
## numeric-Rle of length 10000 with 829 runs
## Lengths: 729 1 208 1 1599 1 ... 1 1 1 5 1 4512
## Values : 0 1 0 1 0 1 ... 1 0 1 0 1 0
##
## $b
## numeric-Rle of length 10000 with 829 runs
## Lengths: 729 1 208 1 1599 1 ... 1 1 1 5 1 4512
## Values : 0 2 0 2 0 2 ... 2 0 2 0 2 0
head(mold(xRleList))
## value x y row col group
## 1 0 1 1 1 1 a
## 2 0 2 1 1 2 a
## 3 0 3 1 1 3 a
## 4 0 4 1 1 4 a
## 5 0 5 1 1 5 a
## 6 0 6 1 1 6 a
## SummerizedExperiments
library(GenomicRanges)
nrows <- 200; ncols <- 6
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
counts2 <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
rowData <- GRanges(rep(c("chr1", "chr2"), c(50, 150)),
IRanges(floor(runif(200, 1e5, 1e6)), width=100),
strand=sample(c("+", "-"), 200, TRUE))
colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3),
row.names=LETTERS[1:6])
sset <- SummarizedExperiment(assays=SimpleList(counts=counts,
counts2 = counts2),
rowData=rowData, colData=colData)
head(mold(sset))
## x y value row col colnames cd.e seqnames start end width strand
## 1 1 1 1627 1 1 A ChIP chr1 527305 527404 100 -
## 1.1 2 1 8700 1 2 B Input chr1 527305 527404 100 -
## 1.2 3 1 8447 1 3 C ChIP chr1 527305 527404 100 -
## 1.3 4 1 4958 1 4 D Input chr1 527305 527404 100 -
## 1.4 5 1 8776 1 5 E ChIP chr1 527305 527404 100 -
## 1.5 6 1 9604 1 6 F Input chr1 527305 527404 100 -
## VCF
library(VariantAnnotation)
vcffile <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation")
vcf <- readVcf(vcffile, "hg19")