Discussion:
[Bioc-devel] a pattern to be avoided? mcols(x)$y <- z
Vincent Carey
2018-10-03 14:01:40 UTC
Permalink
The following comes up in use of Fdb.InfiniumMethylation.hg19::getPlatform


debug: mcols(GR)$channel <- Rle(as.factor(mcols(GR)$channel450))

Browse[3]> system.time(uu <- Rle(as.factor(mcols(GR)$channel450)))

user system elapsed

0.020 0.003 0.022

Browse[3]> system.time(mcols(GR)$channel <-
Rle(as.factor(mcols(GR)$channel450)))

user system elapsed

47.263 0.067 47.373

Browse[3]> GR$channel[1]

factor-Rle of length 1 with 1 run

Lengths: 1

Values : Both

Levels(3): Both Grn Red

Browse[3]> system.time(GR$channel <- Rle(as.factor(mcols(GR)$channel450)))

user system elapsed

0.058 0.006 0.065


Presumably the mcols()$<- copies/rewrites a lot of data needlessly?
--
The information in this e-mail is intended only for the ...{{dropped:18}}
Hervé Pagès
2018-10-03 15:20:46 UTC
Permalink
Hi Vince,

This issue was reported here a couple of weeks ago:

https://github.com/Bioconductor/GenomicRanges/issues/11

Internally $<- uses something like:

do.call(DataFrame, list(DF1, DF2))

to combine the metadata columns. However in some situations
the do.call(DataFrame, list(...)) form is **very** inefficient
compared to the more direct DataFrame(...) form:

library(S4Vectors)
DF1 <- DataFrame(a=Rle(11:1999, 1011:2999), b=5)
DF2 <- DataFrame(c=Rle(12:2000, 1011:2999))
system.time(DF12 <- do.call(DataFrame, list(DF1, DF2)))
# user system elapsed
# 4.476 0.000 4.476
system.time(DF12b <- DataFrame(DF1, DF2))
# user system elapsed
# 0.002 0.000 0.001
identical(DF12, DF12b)
# [1] TRUE

@Michael: Any idea what's going on?

Thanks,
H.
Post by Vincent Carey
The following comes up in use of Fdb.InfiniumMethylation.hg19::getPlatform
debug: mcols(GR)$channel <- Rle(as.factor(mcols(GR)$channel450))
Browse[3]> system.time(uu <- Rle(as.factor(mcols(GR)$channel450)))
user system elapsed
0.020 0.003 0.022
Browse[3]> system.time(mcols(GR)$channel <-
Rle(as.factor(mcols(GR)$channel450)))
user system elapsed
47.263 0.067 47.373
Browse[3]> GR$channel[1]
factor-Rle of length 1 with 1 run
Lengths: 1
Values : Both
Levels(3): Both Grn Red
Browse[3]> system.time(GR$channel <- Rle(as.factor(mcols(GR)$channel450)))
user system elapsed
0.058 0.006 0.065
Presumably the mcols()$<- copies/rewrites a lot of data needlessly?
--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: ***@fredhutch.org
Phone: (206) 667-5791
Fax: (206) 667-1319
Pages, Herve
2018-11-07 00:44:10 UTC
Permalink
Hi Vince,

It looks like Michael took care of this in devel (thanks Michael):

  https://github.com/Bioconductor/GenomicRanges/issues/11

H.
Post by Hervé Pagès
Hi Vince,
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_Bioconductor_GenomicRanges_issues_11&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=rQYCrACByPJfpkobQLfW_4tycLFlqOKZhV11BY0jS-Y&s=ZigXp_UGHmp6bEdO6oHZZYWDLD7hgLvoKXgtJ_1pZHA&e=
  do.call(DataFrame, list(DF1, DF2))
to combine the metadata columns. However in some situations
the do.call(DataFrame, list(...)) form is **very** inefficient
  library(S4Vectors)
  DF1 <- DataFrame(a=Rle(11:1999, 1011:2999), b=5)
  DF2 <- DataFrame(c=Rle(12:2000, 1011:2999))
  system.time(DF12 <- do.call(DataFrame, list(DF1, DF2)))
  #   user  system elapsed
  #  4.476   0.000   4.476
  system.time(DF12b <- DataFrame(DF1, DF2))
  #   user  system elapsed
  #  0.002   0.000   0.001
  identical(DF12, DF12b)
  # [1] TRUE
@Michael: Any idea what's going on?
Thanks,
H.
Post by Vincent Carey
The following comes up in use of
Fdb.InfiniumMethylation.hg19::getPlatform
debug: mcols(GR)$channel <- Rle(as.factor(mcols(GR)$channel450))
Browse[3]> system.time(uu <- Rle(as.factor(mcols(GR)$channel450)))
    user  system elapsed
   0.020   0.003   0.022
Browse[3]> system.time(mcols(GR)$channel <-
Rle(as.factor(mcols(GR)$channel450)))
    user  system elapsed
  47.263   0.067  47.373
Browse[3]> GR$channel[1]
factor-Rle of length 1 with 1 run
   Lengths:    1
   Values : Both
Levels(3): Both Grn Red
Browse[3]> system.time(GR$channel <-
Rle(as.factor(mcols(GR)$channel450)))
    user  system elapsed
   0.058   0.006   0.065
Presumably the mcols()$<- copies/rewrites a lot of data needlessly?
--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: ***@fredhutch.org
Phone: (206) 667-5791
Fax: (206) 667-1319

_______________________________________________
Bioc-***@r-project.org mailing

Loading...