Transferring Simulation to Real Data

A primary factor for the success of machine learning is the quality of labeled training data. However, in many fields, labeled data can be costly, difficult, or even impossible to acquire. In comparison, computer simulation data can now be generated at a much higher abundance with a much lower cost. These simulation data could potentially solve the problem of data deficiency in many machine learning tasks.

We are interested in developing machine learning and deep learning techniques that are able to leverage the knowledge in simulation data and transfer it to real data based tasks. In this process, we address the discrepancy between these two domains of data arising from model assumptions, simplifications and possible errors. We also attempt to distill the knowledge gained from simulation data such that it is generalizable across the variations of all possible parameter settings in the simulation model. We investigate the development of this concept in a variety of clinical applications.