摘要 | Face reenactment is challenging due to the need to establish
dense correspondence between various face representations
for motion transfer. Recent studies have utilized Neural Radi-
ance Field (NeRF) as fundamental representation, which fur-
ther enhanced the performance of multi-view face reenact-
ment in photo-realism and 3D consistency. However, estab-
lishing dense correspondence between different face NeRFs
is non-trivial, because implicit representations lack ground-
truth correspondence annotations like mesh-based 3D para-
metric models (e.g., 3DMM) with index-aligned vertexes. Al-
though aligning 3DMM space with NeRF-based face repre-
sentations can realize motion control, it is sub-optimal for
their limited face-only modeling and low identity fidelity.
Therefore, we are inspired to ask: Can we learn the dense
correspondence between different NeRF-based face repre-
sentations without a 3D parametric model prior? To ad-
dress this challenge, we propose a novel framework, which
adopts tri-planes as fundamental NeRF representation and
decomposes face tri-planes into three components: canoni-
cal tri-planes, identity deformations, and motion. In terms
of motion control, our key contribution is proposing a Plane
Dictionary (PlaneDict) module, which efficiently maps the
motion conditions to a linear weighted addition of learnable
orthogonal plane bases. To the best of our knowledge, our
framework is the first method that achieves one-shot multi-
view face reenactment without a 3D parametric model prior.
Extensive experiments demonstrate that we produce better
results in fine-grained motion control and identity preser-
vation than previous methods. Project page (video demo):
https://songlin1998.github.io/planedict/. |
作者单位 | 1.School of Artificial Intelligence, University of Chinese Academy of Sciences, China 2.CRIPAC & MAIS, Institute of Automation, Chinese Academy of Sciences, China 3.S-Lab, Nanyang Technological University, Singapore 4.SenseTime, China
|
修改评论