Project is related to MNVs, and involves collecting data per sample from nearby variants to look for cis alternate alleles.
Grouping the window
def collect_window(mt: MatrixTable, window_size: Int) -> MatrixTable
would add an extra row field prev_rows of type
array<mt.row.dtype>, and an extra entry field prev_entries of type
mt.prev_row is the previous row, and
mt.prev_row[-1] is the first row in the window.
def collect_window(mt: MatrixTable, bp_window_size: Int) -> MatrixTable
Same result semantics. We can limit to 3 * bp_window_size previous rows to consider (for partitioning considerations).
Proposal 2 is preferred.
Within the grouped window
Find pairs of phased (cis) heterozygotes or hom-var genotypes.
variant1, variant2, sample pairs of phased genotypes
We care about phased non-ref calls. Filtering entries first drastically reduces the memory requirements.