Interesting problem: automatic phenotype group detection

Pavlos brought up an interesting question on the user forum

The problem as I see it is: Given a list of phenotypes Y, each of which has N samples with defined phenotypes, compute a list of lists of phenotypes such that the number of samples included in each group, G, is not smaller than N by more than some small error term E for any phenotype in the group:

G/N > 1 - E

We want to minimize the number of groups, in order to take best advantage of the BLAS3 optimizations in linear regression rows.