I'll start with an example, and then describe the logic I'm trying to use.
I have two normal IRanges
objects that span the same total range, but may do so in a different number of ranges. Each IRanges
has one mcol
, but that mcol
is different across IRanges
.
a
#IRanges object with 1 range and 1 metadata column:
# start end width | on_betalac
# <integer> <integer> <integer> | <logical>
# [1] 1 167 167 | FALSE
b
#IRanges object with 3 ranges and 1 metadata column:
# start end width | on_other
# <integer> <integer> <integer> | <logical>
# [1] 1 107 107 | FALSE
# [2] 108 112 5 | TRUE
# [3] 113 167 55 | FALSE
You can see both of these IRanges
span 1 to 167, but a
has one range and b
has three. I would like to combine them to get output like this:
my_great_function(a, b)
#IRanges object with 3 ranges and 2 metadata columns:
# start end width | on_betalac on_other
# <integer> <integer> <integer> | <logical> <logical>
# [1] 1 107 107 | FALSE FALSE
# [2] 108 112 5 | FALSE TRUE
# [3] 113 167 55 | FALSE FALSE
The output is a like a disjoin
of the inputs, but it keeps the mcols
, and even spreads them so that the output range has the same value of the mcol
as the input range that led to it.
Option 1: Using IRanges::findOverlaps
m <- findOverlaps(b, a)
c <- b[queryHits(m)]
mcols(c) <- cbind(mcols(c), mcols(a[subjectHits(m)]))
#IRanges object with 3 ranges and 2 metadata columns:
# start end width | on_other on_betacalc
# <integer> <integer> <integer> | <logical> <logical>
# [1] 1 107 107 | FALSE FALSE
# [2] 108 112 5 | TRUE FALSE
# [3] 113 167 55 | FALSE FALSE
The resulting object c
is a IRanges
object with two metadata columns.
Option 2: Using IRanges::mergeByOverlaps
c <- mergeByOverlaps(b, a)
c
#DataFrame with 3 rows and 4 columns
# b on_other a on_betacalc
# <IRanges> <logical> <IRanges> <logical>
#1 1-107 FALSE 1-167 FALSE
#2 108-112 TRUE 1-167 FALSE
#3 113-167 FALSE 1-167 FALSE
The resulting output object is a DataFrame
with IRanges
columns and original metadata columns as additional columns.
Option 3: Using data.table::foverlaps
library(data.table)
a.dt <- as.data.table(cbind.data.frame(a, mcols(a)))[, width := NULL]
b.dt <- as.data.table(cbind.data.frame(b, mcols(b)))[, width := NULL]
setkey(b.dt, start, end)
foverlaps(a.dt, b.dt, type = "any")[, `:=`(i.start = NULL, i.end = NULL)][]
start end on_other on_betacalc
1: 1 107 FALSE FALSE
2: 108 112 TRUE FALSE
3: 113 167 FALSE FALSE
The resulting object is a data.table
.
Option 4: Using fuzzyjoin::interval_left_join
library(fuzzyjoin)
a.df <- cbind.data.frame(a, mcols(a))
b.df <- cbind.data.frame(b, mcols(b))
interval_left_join(b.df, a.df, by = c("start", "end"))
# start.x end.x width.x on_other start.y end.y width.y on_betacalc
#1 1 107 107 FALSE 1 167 167 FALSE
#2 108 112 5 TRUE 1 167 167 FALSE
#3 113 167 55 FALSE 1 167 167 FALSE
The resulting object is a data.frame
.
Sample data
library(IRanges)
a <- IRanges(1, 167)
mcols(a)$on_betacalc = F
b <- IRanges(c(1, 108, 113), c(107, 112, 167))
mcols(b)$on_other <- c(F, T, F)
Here's what I've been able to come up with. Not as elegant as MauritsEvers, but maybe useful to others in some way.
combine_exposures <- function(...) {
cd <- c(...)
mc <- mcols(cd)
dj <- disjoin(x = cd, with.revmap = TRUE)
r <- mcols(dj)$revmap
d <- as.data.frame(matrix(nrow = length(dj), ncol = ncol(mc)))
names(d) <- names(mc)
for (i in 1:length(dj)) {
d[i,] <- sapply(X = 1:ncol(mc), FUN = function(j) { mc[r[[i]][j], j] })
}
mcols(dj) <- d
return(dj)
}
here is dput(c(e1, e2, e3, e4))
(e1, e2, e3, and e4 are some example IRanges that all span 1,167):
new("IRanges", start = c(1L, 1L, 108L, 113L, 1L, 1L), width = c(167L,
107L, 5L, 55L, 167L, 167L), NAMES = NULL, elementType = "ANY",
elementMetadata = new("DataFrame", rownames = NULL, nrows = 6L,
listData = list(on_betalac = c(FALSE, NA, NA, NA, NA,
NA), on_other = c(NA, FALSE, TRUE, FALSE, NA, NA), on_pen = c(NA,
NA, NA, NA, FALSE, NA), on_quin = c(NA, NA, NA, NA, NA,
FALSE)), elementType = "ANY", elementMetadata = NULL,
metadata = list()), metadata = list())