makeSimilar: Idea for Reproducible Debugging

johnc1231 · February 26, 2021, 6:16pm

Just thought of this, recording here to see if others have thoughts, and as a reminder to myself.

MatrixTable.makeSimilar(rowCount, colCount) and Table.makeSimilar(rowCount)

I want methods that are capable of generating data of a similar schema to any given Table or MatrixTable, but with randomized data. I’m imagining this as a debugging tool. If a user encounters a bug, I want them to be able to generate a table with the same structure and fields, then write that to a file. Then they could just give us the file and replicating their pipeline would be pretty easy without requiring them to share data with us.

I’m not sure if this would be worth the trouble, and it’s maybe not straightforward to do this in a way that will preserve pipeline functionality but not leak any data about the original tables other than the structure. Could potentially think about something like this as a co-op project if others thought this would make their lives easier.

Topic		Replies	Views
MatrixTable file format reference	5	930	January 30, 2020
Some thoughts on Matrix joins	2	671	March 27, 2018
Matrix row key redesign - help wanted	3	621	February 5, 2018
Thoughts on mutable tensor tables "rewinding"	1	575	April 16, 2019
Things that illegally use MatrixValue right now	0	721	July 3, 2018

makeSimilar: Idea for Reproducible Debugging

MatrixTable.makeSimilar(rowCount, colCount) and Table.makeSimilar(rowCount)

Related topics