Look great. I have one minor question. > There is a slight tradeoff here, I think, with the old version of mlprof; > because we default to ChunkPerFunc for the native backend, this coarsest > grain profiling can correspond to very large chunks of code. I don't see why ChunkPerFunc is relevant. That doesn't affect the CPS functions at all.