makeSimilar: Idea for Reproducible Debugging

Just thought of this, recording here to see if others have thoughts, and as a reminder to myself.

MatrixTable.makeSimilar(rowCount, colCount) and Table.makeSimilar(rowCount)

I want methods that are capable of generating data of a similar schema to any given Table or MatrixTable, but with randomized data. I’m imagining this as a debugging tool. If a user encounters a bug, I want them to be able to generate a table with the same structure and fields, then write that to a file. Then they could just give us the file and replicating their pipeline would be pretty easy without requiring them to share data with us.

I’m not sure if this would be worth the trouble, and it’s maybe not straightforward to do this in a way that will preserve pipeline functionality but not leak any data about the original tables other than the structure. Could potentially think about something like this as a co-op project if others thought this would make their lives easier.

1 Like