GRiT: A Generative Region-to-text Transformer for Object Understanding
Jialian Wu, Jianfeng Wang, Zhengyuan Yang, Zhe Gan, Zicheng Liu, Junsong Yuan, and Lijuan Wang
ArXiv Preprint, 2022
Our new work, GRiT, is a general and open-set object understanding framework that localizes objects and
can generate any style of free-form texts it was trained with to describe objects, e.g., class names, descriptive sentences (including object
attributes, actions, etc). GRiT can be applied in various region/object-level tasks, e.g., object detection, object captioning.