CN114037802B

CN114037802B - Three-dimensional face model reconstruction method, device, storage medium and computer equipment

Info

Publication number: CN114037802B
Application number: CN202111409845.2A
Authority: CN
Inventors: 王顺飞
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2021-11-24
Filing date: 2021-11-24
Publication date: 2025-05-30
Anticipated expiration: 2041-11-24
Also published as: CN114037802A

Abstract

The present application discloses a method, device, storage medium and computer equipment for reconstructing a three-dimensional face model. The method includes: inputting a target two-dimensional image into a trained face reconstruction model, outputting a corresponding target three-dimensional face model and a target texture map, and then generating a textured three-dimensional face model containing texture information based on the three-dimensional face model and the target texture map, wherein the face reconstruction model is generated by training a cost function constructed based on a predicted three-dimensional face model, a standard three-dimensional face model, a smooth predicted three-dimensional face model, predicted face key points in a two-dimensional image, face key points in a sample two-dimensional image, predicted texture maps and standard texture maps. By using the embodiments of the present application, a textured three-dimensional face model containing texture information with rich details and stable effects can be reconstructed using a pre-trained face reconstruction model.

Description

Three-dimensional face model reconstruction method and device, storage medium and computer equipment

Technical Field

The present application relates to the field of computer equipment application technologies, and in particular, to a three-dimensional face model reconstruction method, apparatus, storage medium, and computer equipment.

Background

The three-dimensional face model reconstruction refers to reconstructing a three-dimensional face model from one or more two-dimensional face images, and has one more dimension than the two-dimensional face images, and is widely applied to the fields of movies, games and the like. The three-dimensional face reconstruction technology can restore the three-dimensional shape of the face in the two-dimensional face image, has higher application value in the fields of animation production, network games and the like, and has wide application prospect.

Disclosure of Invention

The embodiment of the application provides a three-dimensional face model reconstruction method, a device, a storage medium and computer equipment, wherein a three-dimensional face model reconstructed by using a pre-trained face reconstruction model has a smooth effect with outstanding effect. The technical scheme is as follows:

In a first aspect, an embodiment of the present application provides a three-dimensional face model reconstruction method, which is characterized in that the method includes:

acquiring a target two-dimensional image;

Inputting the target two-dimensional image into a trained face reconstruction model, and outputting a target three-dimensional face model and a target texture map corresponding to the target two-dimensional image;

And generating a texture three-dimensional face model containing texture information based on the target three-dimensional face model and the target texture map.

In a second aspect, an embodiment of the present application provides a three-dimensional face model reconstruction device, where the target three-dimensional face model reconstruction device includes:

The image acquisition module is used for acquiring a target two-dimensional image;

The model prediction module is used for inputting the target two-dimensional image into a trained face reconstruction model and outputting a target three-dimensional face model and a target texture map corresponding to the target two-dimensional image;

and the target model generation module is used for generating a texture three-dimensional face model containing texture information based on the target three-dimensional face model and the target texture map.

In a third aspect, embodiments of the present application provide a storage medium storing at least one instruction adapted to be loaded by a processor and to perform the above-described method steps.

In a fourth aspect, embodiments of the present application provide a computer device that may include a processor and a memory, wherein the memory stores at least one instruction adapted to be loaded by the processor and to perform the method steps described above.

The technical scheme provided by the embodiments of the application has the beneficial effects that at least:

By adopting the three-dimensional face model reconstruction method provided by the embodiment of the application, a target two-dimensional image is input into a trained face reconstruction model, a corresponding target three-dimensional face model and a target texture map are output, a texture three-dimensional face model containing texture information is generated based on the target three-dimensional face model and the target texture map, wherein the face reconstruction model is obtained by predicting a two-dimensional image based on a first cost function constructed by a predicted three-dimensional face model corresponding to the sample two-dimensional image and a standard three-dimensional face model corresponding to the sample two-dimensional image, a second cost function constructed by a smooth predicted three-dimensional face model obtained by smoothing the predicted three-dimensional face model, a third cost function constructed by face key points of the predicted three-dimensional face model after two-dimensional projection and face key points of the sample two-dimensional image, and a fourth cost function constructed by a predicted texture map corresponding to the sample two-dimensional image, and the predicted three-dimensional face model is obtained by predicting the sample two-dimensional image based on the initial face reconstruction model, and the three-dimensional face model is obtained by constraining and reconstructing the key points of the sample two-dimensional face model based on the average key points of the sample two-dimensional face model. By adopting the embodiment of the application, the texture three-dimensional face model with stable effect and containing texture information can be reconstructed based on the two-dimensional image by using the face reconstruction model which is generated based on the pre-training of a plurality of cost functions.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a three-dimensional face model reconstruction method according to an embodiment of the present application;

FIG. 2 is a schematic diagram illustrating an exemplary three-dimensional face reconstruction according to an embodiment of the present application;

FIG. 3 is a schematic flow chart of a three-dimensional face model reconstruction method according to an embodiment of the present application;

fig. 4 is a schematic flow chart of a three-dimensional face model reconstruction method according to an embodiment of the present application;

FIG. 5 is a flowchart for reconstructing a standard three-dimensional face model according to an embodiment of the present application;

FIG. 6 is a flow chart of generating a standard texture map according to an embodiment of the present application;

FIG. 7 is an exemplary diagram of generating a standard texture map according to an embodiment of the present application;

FIG. 8 is an exemplary schematic diagram of a predicted texture map according to an embodiment of the present application;

FIG. 9 is a schematic flow chart of a three-dimensional face model reconstruction method according to an embodiment of the present application;

FIG. 10 is a flowchart for training a face reconstruction model according to an embodiment of the present application;

FIG. 11 is a schematic flow chart of a three-dimensional face model reconstruction method according to an embodiment of the present application;

FIG. 12 is a schematic illustration of an exemplary stylization process provided in accordance with an embodiment of the present application;

FIG. 13 is a schematic illustration of an exemplary reconstruction of a standard three-dimensional face model using face keypoint constraints in accordance with an embodiment of the present application;

fig. 14 is a schematic structural diagram of a three-dimensional face model reconstruction device according to an embodiment of the present application;

Fig. 15 is a schematic structural diagram of a three-dimensional face model reconstruction device according to an embodiment of the present application;

FIG. 16 is a schematic diagram of a model training module according to an embodiment of the present application;

FIG. 17 is a schematic diagram of a standard acquiring unit according to an embodiment of the present application;

FIG. 18 is a schematic diagram of a model training unit according to an embodiment of the present application;

fig. 19 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

In the description of the present application, it should be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In the description of the present application, it should be noted that, unless expressly specified and limited otherwise, "comprise" and "have" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus. The specific meaning of the above terms in the present application will be understood in specific cases by those of ordinary skill in the art. Furthermore, in the description of the present application, unless otherwise indicated, "a plurality" means two or more. "and/or" describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate that there are three cases of a alone, a and B together, and B alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

In order to more clearly describe the technical solution of the embodiments of the present application, before the description, some concepts of the present application are described in detail so as to better understand the present solution.

Cost function to obtain parameters of a trained logistic regression model, a cost function is usually required, and parameters are obtained by training the cost function.

Stylizing, namely changing the two-dimensional image from one style area to another style area, such as converting the realistic two-dimensional image into a cartoon two-dimensional image.

Key points of the human face, such as eyes, nose tips, corners of mouth, eyebrows, contour points of various parts of the human face and other important characteristic point positions.

Texture mapping, which is one or more two-dimensional figures representing details of the surface of an object, when the texture is mapped onto the surface of the object in a specific manner, can make the object look more realistic, including texture of the surface of the object in the general sense even though the surface of the object exhibits rugged grooves, as well as color patterns on the smooth surface of the object.

The conventional three-dimensional face reconstruction method is mostly based on image information, such as three-dimensional face reconstruction based on one or more information modeling technologies of image brightness, edge information, linear perspective, color, relative height, parallax and the like. The model-based three-dimensional face reconstruction method is a popular three-dimensional face reconstruction method at present, the popular models at present comprise a general face model (CANDIDE-3), a three-dimensional deformation model (three-dimensional MM) and variant models thereof, and the three-dimensional face reconstruction algorithm based on the model-general face model and the model-variant model comprises a traditional algorithm and a deep learning algorithm.

In the prior art, whether the three-dimensional face reconstruction is based on the traditional algorithm or the three-dimensional face reconstruction is based on the deep learning algorithm, the reconstruction result is often strong in model sense and low in precision due to lack of image priori feature information, and the reconstruction result is often not ideal when facing faces with large expressions or postures.

Based on the method, the three-dimensional face model reconstruction method based on the deep learning algorithm is provided, the depth learning based face reconstruction model is trained in advance, the smoothness, the precision of the three-dimensional face model and the stability of the target texture mapping are repeatedly and iteratively trained in the training process, and then the three-dimensional face reconstruction is carried out on the two-dimensional image based on the trained face reconstruction model, so that the texture three-dimensional face model which has stable effect, abundant detail and vividness and contains texture information can be reconstructed based on the two-dimensional image.

The following is a detailed description of specific embodiments. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the application as detailed in the accompanying claims. The flow diagrams depicted in the figures are exemplary only and are not necessarily to be taken in the order shown. For example, some steps are juxtaposed and there is no strict order of logic, so the actual order of execution is variable.

Referring to fig. 1, a flow chart of a three-dimensional face model reconstruction method is provided in an embodiment of the present application. In a specific embodiment, the three-dimensional face reconstruction method is applied to a three-dimensional face model reconstruction device and computer equipment provided with the three-dimensional face model reconstruction device. The specific flow of the present embodiment will be described below by taking a computer device as an example, and it will be understood that the computer device applied in the present embodiment may be a smart phone, a tablet computer, a desktop computer, a wearable device, etc., which is not limited herein. The following will describe the flow shown in fig. 1 in detail, and the method for reconstructing a three-dimensional face specifically may include the following steps:

S101, acquiring a target two-dimensional image;

the target two-dimensional image refers to a two-dimensional face image which needs to be reconstructed by a three-dimensional face, and can be obtained through modes such as offline shooting, cloud downloading or calling in a local gallery.

S102, inputting the target two-dimensional image into a trained face reconstruction model, and outputting a target three-dimensional face model and a target texture map corresponding to the target two-dimensional image;

Specifically, the target two-dimensional image is input into a trained face reconstruction model, and the face reconstruction is based on face image information in the target two-dimensional image to reconstruct a target three-dimensional face model and a target texture map corresponding to the target two-dimensional image.

Optionally, before inputting the target two-dimensional image into the trained face reconstruction model, preprocessing the target two-dimensional image, wherein the preprocessing can comprise rotating the target two-dimensional image to enable the face direction in the target two-dimensional image to be upward if the target two-dimensional image is not the face direction to be upward, cutting the target two-dimensional image, cutting a background area outside the face area to reduce the data quantity required to be calculated by the face reconstruction model, normalizing the cut target two-dimensional image, normalizing the gray value of each pixel point in the target two-dimensional image, and facilitating calculation of the face reconstruction model.

Alternatively, the face reconstruction model may include a convolutional neural network (Convolutional Neural Network, CNN) based back bone and Multi-layer Perceptron (MLP) model. The backbox is a CNN-based encoder network and is used for extracting image features of the target two-dimensional image, and the backbox can be but is not limited to mobilenet, resnet, xception and other backbox networks.

Optionally, the method comprises the steps of inputting the target two-dimensional image into a trained face reconstruction model, outputting a target three-dimensional face model and a target texture map corresponding to the target two-dimensional image, on the one hand, inputting the target two-dimensional image into a backbone network based on CNN, extracting image features of the target two-dimensional image by the backbone, inputting the extracted image features into an MLP model, outputting regression parameters by the MLP model, transforming the regression parameters to generate a deformation parameter matrix, further obtaining the target three-dimensional face model corresponding to the target two-dimensional image based on the deformation parameter matrix, on the other hand, inputting the target two-dimensional image into the backbone network based on CNN, outputting illumination parameters and texture parameters by the backbone based on the image features, modeling based on the illumination parameters to obtain an illumination map containing information, generating a first texture map based on the texture parameters and a pre-designed texture template, and performing a multiplication combination operation on the illumination map and the first texture map to generate the target map.

Referring to fig. 2, an exemplary schematic diagram of three-dimensional face reconstruction is provided in an embodiment of the present application.

As shown in fig. 2, the target three-dimensional face model as shown in the figure can be obtained by inputting the target two-dimensional image into the face reconstruction model. The face reconstruction model shown includes a convolutional neural network (Convolutional Neural Network, CNN) based back bone and a Multi-layer Perceptron (MLP) model.

It may be understood that the face reconstruction model is obtained based on complex training, in the embodiment of the present application, the face reconstruction model is generated based on cost function training, the cost function includes a first cost function, a second cost function, a third cost function and a fourth cost function, the first cost function is obtained based on a predicted three-dimensional face model corresponding to a sample two-dimensional image and a standard three-dimensional face model corresponding to the sample two-dimensional image, the second cost function is obtained based on the predicted three-dimensional face model and a smooth predicted three-dimensional face model obtained after smoothing the predicted three-dimensional face model, the third cost function is obtained based on face key points obtained after two-dimensional projection of the predicted three-dimensional face model and face key points in the sample two-dimensional image, the fourth cost function is obtained based on a predicted texture map corresponding to the sample two-dimensional image and a standard texture map corresponding to the sample two-dimensional image, the predicted three-dimensional face model is obtained based on the created initial face reconstruction model, and the average face key points in the sample two-dimensional image is obtained based on the standard three-dimensional face constraint three-dimensional face key points.

And S103, generating a texture three-dimensional face model containing texture information based on the target three-dimensional face model and the target texture map.

Specifically, the target texture map is rendered into the target three-dimensional face model according to the defined mapping relation between the texture UV coordinates and the three-dimensional vertex coordinates in the target three-dimensional face model, and a texture three-dimensional face model containing texture information is generated.

Optionally, before generating the texture three-dimensional face model containing texture information based on the target three-dimensional face model and the target texture map, smoothing the target three-dimensional face model, thereby further improving the smoothing effect of the three-dimensional face model.

Optionally, before generating the texture three-dimensional face model including texture information based on the target three-dimensional face model and the target texture map, performing feature extraction on the target two-dimensional image based on a back bone network and outputting projection parameters, performing projection transformation on the target three-dimensional face model, so that the gesture of the target three-dimensional face model is closer to the gesture in the target two-dimensional image, and improving the effect of the finally generated texture three-dimensional face model.

By adopting the three-dimensional face model reconstruction method provided by the embodiment of the application, the face reconstruction model which is generated based on the training of a plurality of cost functions in advance can be used for reconstructing the target texture map with stable effect and the target three-dimensional face model based on the target two-dimensional image, and then the target texture map is rendered into the target three-dimensional face model, so that the texture three-dimensional face model which has rich details and stable effect and contains texture information can be generated, and the reconstruction effect on the three-dimensional face is improved.

Referring to fig. 3, a flow chart of a three-dimensional face model reconstruction method is provided in another embodiment of the present application. As shown in fig. 3, the three-dimensional face model reconstruction method may include the following steps:

s201, collecting a two-dimensional image of a sample;

the sample two-dimensional image comprises a face image, and the sample two-dimensional image is used for being used as training data for assisting in training a face three-dimensional model.

Optionally, the face image area in the sample two-dimensional image should exceed a certain proportion of the total area of the sample two-dimensional image, and the proportion may be 75%.

Optionally, if the face image area in the sample two-dimensional image does not exceed the preset proportion of the total area of the sample two-dimensional image, cutting the sample two-dimensional image to obtain a new sample two-dimensional image, so that the face image area in the new sample two-dimensional image exceeds the preset proportion of the total area of the sample two-dimensional image.

Optionally, if the face image in the sample two-dimensional image is skewed, performing rotation processing on the sample two-dimensional image to obtain a new sample two-dimensional image, so that the face image in the new sample two-dimensional image is a front-view face image.

The sample two-dimensional image may be acquired by off-line photography.

S202, a standard three-dimensional face model and a standard texture map corresponding to the sample two-dimensional image are obtained;

the standard three-dimensional face model is a three-dimensional face model with higher precision and closer to the effect of a real face, and the standard texture map is a texture map with higher precision and closer to the effect of the texture of the real face.

In an embodiment, the obtaining the standard three-dimensional face model and the standard texture map corresponding to the sample two-dimensional image may be obtaining three-dimensional point cloud data of a face by collecting a real face of a person in the sample two-dimensional image on-line based on a depth camera, and generating the standard three-dimensional face model and the standard texture map according to the three-dimensional point cloud data.

In an embodiment, the obtaining the standard three-dimensional face model and the standard texture map corresponding to the sample two-dimensional image may further include extracting face key points in the sample two-dimensional image, obtaining an average face three-dimensional model, performing constraint reconstruction on the average face three-dimensional model based on the face key points in the sample two-dimensional image to obtain the standard three-dimensional face model corresponding to the sample two-dimensional image, obtaining UV coordinates of each vertex in the standard three-dimensional face model, and obtaining the standard texture map of a preset size from the sample two-dimensional image through bilinear interpolation based on the UV coordinates of each vertex.

S203, an initial face reconstruction model is created, and the initial face reconstruction model is trained based on the sample two-dimensional image, the standard three-dimensional face model and the standard texture mapping, so that a trained face reconstruction model is obtained;

The initial face reconstruction model may generate a texture map corresponding to a two-dimensional image input into the model and a three-dimensional face model based on the two-dimensional image reconstruction.

In one embodiment, an initial face reconstruction model is trained based on a sample two-dimensional image, a standard three-dimensional face model corresponding to the sample two-dimensional image and a standard texture map corresponding to the sample two-dimensional image, so that the effect of the initial face reconstruction model on the texture map and the three-dimensional face model reconstruction generation is improved, a trained face reconstruction model is obtained, and the trained face reconstruction model has a good reconstruction effect on the texture map and the three-dimensional face model.

S204, acquiring a target two-dimensional image;

s205, inputting the target two-dimensional image into a trained face reconstruction model, and outputting a target three-dimensional face model and a target texture map corresponding to the target two-dimensional image;

s206, generating a texture three-dimensional face model containing texture information based on the target three-dimensional face model and the target texture map.

Step S210 to step S212 refer to the detailed description in step S101 to step S103, and are not described here.

By adopting the three-dimensional face model reconstruction method provided by the embodiment of the application, the face reconstruction model generated by pre-training can be used for reconstructing the target texture map with stable effect and the target three-dimensional face model based on the target two-dimensional image, and then the target texture map is rendered into the target three-dimensional face model, so that the texture three-dimensional face model with abundant details and stable effect and containing texture information can be generated, and the reconstruction effect on the three-dimensional face is improved.

Referring to fig. 4, a flow chart of a three-dimensional face model reconstruction method is provided in another embodiment of the present application. In this embodiment, a training process of the face reconstruction model is described in detail. As shown in fig. 4, the three-dimensional face model reconstruction method may include the following steps:

S301, collecting a two-dimensional image of a sample;

S302, extracting face key points in the sample two-dimensional image, and acquiring an average face three-dimensional model;

In one embodiment, face key points in the acquired sample two-dimensional image are obtained by detecting the face key points, wherein the face key points refer to important characteristic points such as eyes, nose tips, mouth corner points, eyebrows, contour points of various parts of the face and the like.

The face key point detection refers to the detection method of the face key point by locating the key region position of the face and extracting the key point of the key region position, and the detection method of the face key point comprises, but is not limited to, an active shape Model (ACTIVE SHAPE Model, ASM) algorithm, an active appearance Model (ACTIVE APPEARANCE Model, AAM) algorithm, a deep learning algorithm and the like. The embodiment of the present application is not limited in this regard.

In one embodiment, an average face three-dimensional model is obtained.

The average face three-dimensional model can be obtained by collecting a face of a real person by a depth camera to obtain three-dimensional point cloud data of the face, and generating the face three-dimensional model according to the three-dimensional point cloud data. And acquiring a plurality of real people and generating a plurality of face three-dimensional models, and finally averaging the face three-dimensional models to obtain an average face three-dimensional model.

Optionally, the average face three-dimensional model may be obtained based on big data analysis or a face model designed in advance by a designer, which is not limited in the embodiment of the present application.

Optionally, in an embodiment, when the face key point in the sample two-dimensional image is extracted, the sample two-dimensional image is subjected to stylized processing to obtain a stylized sample two-dimensional image, and then the face key point in the stylized sample two-dimensional image is extracted.

Alternatively, the stylizing of the sample two-dimensional image may be based on an unsupervised CycleGAN algorithm.

S303, carrying out constraint reconstruction on the average face three-dimensional model based on face key points in the sample two-dimensional image to obtain a standard three-dimensional face model corresponding to the sample two-dimensional image;

In one embodiment, an initial deformation matrix is obtained, an average face three-dimensional model is reconstructed based on the deformation matrix to obtain an initial three-dimensional face model, the initial three-dimensional face model is projected to a two-dimensional space to obtain a two-dimensional face image, face key points in the two-dimensional face image are extracted, comparison and difference values are obtained between the face key points in the two-dimensional face image and the face key points in the sample two-dimensional image, the deformation matrix is updated based on the difference values, the process is iteratively executed until the difference values between the face key points in the two-dimensional face image and the face key points in the sample two-dimensional image are smaller than a preset threshold value, iteration is stopped, the deformation matrix at the moment is the obtained deformation matrix, and constraint reconstruction is carried out on the average face three-dimensional model based on the deformation matrix to obtain a standard three-dimensional face model corresponding to the sample two-dimensional image.

Referring to fig. 5, a flowchart for reconstructing a standard three-dimensional face model is provided in an embodiment of the present application.

As shown in fig. 5, face key points are extracted from the sample two-dimensional image, and the average face three-dimensional model is subjected to constraint reconstruction by using the face key points extracted from the sample two-dimensional image, so that a standard three-dimensional face model can be obtained.

Optionally, in an embodiment, the performing constrained reconstruction on the average face three-dimensional model based on the face key points in the sample two-dimensional image to obtain a standard three-dimensional face model corresponding to the sample two-dimensional image may include performing constrained reconstruction on the average face three-dimensional model based on the face key points in the sample two-dimensional image to obtain a constrained three-dimensional face model and a first projection parameter, and performing projective transformation on the constrained three-dimensional face model based on the first projection parameter to obtain the standard three-dimensional face model. It should be understood that the process of performing constraint reconstruction on the average face three-dimensional model based on the face key points is an iterative optimization process, in the iterative optimization process, the deformation matrix can be continuously updated to update the reconstruction standard three-dimensional face model, in the iterative optimization process, projection parameter information representing the face pose can be obtained, and the first projection parameters comprise rotation parameters, translation parameters and scale parameters. And after the iterative optimization is finished, carrying out projection transformation on the constrained three-dimensional face model obtained by constrained reconstruction based on the face key points based on the first projection parameters so as to obtain a standard three-dimensional face model with the same pose as the face pose in the sample two-dimensional image. For example, if the face in the sample two-dimensional image is in a head bias state, the standard three-dimensional face model can be transformed into the head bias state based on the first projection parameters after iterative optimization.

Optionally, the iterative optimization process may be performed by solving an optimal value of the energy function, and continuously updating the deformation matrix and the projection parameters, please refer to the following steps:

First, an initial deformation matrix is created, along with projection parameters. The iteration is then performed according to the following procedure.

Equation 1:min _{R,t,P′,w,Π}E_def(P′,w)+λE_lan (pi, R, t, P')

Wherein w is a deformation matrix, pi is a scale parameter in the projection parameters, R is a rotation parameter in the projection parameters, t is a translation parameter in the projection parameters, P 'is coordinates of points in the standard three-dimensional face model, the deformation matrix and the projection parameters can be fixed by using the formula, and the coordinates P' of the points in the standard three-dimensional face model are obtained.

Judging whether the difference value of E _def between the current iteration and the last iteration is smaller than a set threshold value, namely |E _def ^j-E_def ^j-1 |, wherein j represents the iteration number, stopping iteration if the iteration number is met, and if the iteration number is not met, continuing to execute the following process, and carrying out iteration.

Formula 2:E _def＝||P′-w.P||²

Wherein P is the coordinate of the midpoint of the three-dimensional model of the average face, P 'is calculated by using a formula 1, P' in the formula is fixed, and the deformation matrix w is calculated by using a least square method.

Equation 3:

Wherein q _i is the coordinate of each key point of the i-th key point in the sample two-dimensional image, P '_i is the coordinate of the point of q _i corresponding to the standard three-dimensional face model in the sample two-dimensional image, and the P' obtained by using the formula 1 is introduced into the system using the formula 3 to update the projection parameters pi, R and t.

It will be appreciated that equation 2 essentially updates the deformation matrix during each iteration, and equation 3 essentially updates the projection parameters during each iteration.

Optionally, in an embodiment, after the average face three-dimensional model is constrained and reconstructed based on the face key points in the sample two-dimensional image to obtain a standard three-dimensional face model, smoothing the standard three-dimensional face model to obtain a new standard three-dimensional face model, so that the new standard three-dimensional face model has a better smoothing effect. Thus, in step S308, based on the first cost function constructed by predicting each vertex in the three-dimensional face model and each vertex in the standard three-dimensional face model, the function played by the first cost function in the training process is not limited to vertex correction of the face reconstruction model, but also can enhance the smoothing effect of the face reconstruction model.

Specifically, the smoothing processing is performed on the standard three-dimensional face model to obtain a new standard three-dimensional face model, which includes:

Wherein the V _i' is the ith three-dimensional vertex in the standard three-dimensional face model before smoothing, the For the first-order neighborhood points directly adjacent to V _i', i represents the ith three-dimensional vertex, j represents the label of the first-order neighborhood point, andAnd for the ith three-dimensional vertex in the standard three-dimensional face model after the smoothing treatment, alpha represents a smoothing coefficient, and alpha epsilon (0, 1).

Optionally, the smoothing processing is performed on the standard three-dimensional face model to obtain the standard three-dimensional face model, or modes such as curvature-based smoothing, taubin smoothing algorithm and the like, curvature-based smoothing, taubin smoothing algorithm and the like, and the embodiment of the application is not limited to a specific implementation mode of the smoothing processing.

S304, obtaining UV coordinates of each vertex in the standard three-dimensional face model, and obtaining a standard texture map with a preset size by bilinear interpolation of the sample two-dimensional image based on the UV coordinates of each vertex;

In one embodiment, the obtained standard three-dimensional face model is subjected to mapping relation between predefined three-dimensional vertex coordinates and UV coordinates to obtain UV coordinates corresponding to each vertex in the standard three-dimensional face model, and then a bilinear interpolation algorithm is adopted to interpolate according to the UV coordinates corresponding to each vertex in the standard three-dimensional face model and pixel information corresponding to each pixel point in the sample two-dimensional image to obtain a standard texture map with a preset size. The pixel information includes, but is not limited to, color information, transparency information, roughness information, metallic information, highlight information and the like of each pixel point.

It should be appreciated that the standard texture map is for rendering to a surface of a standard three-dimensional face model, and thus, in one embodiment, the preset size is set based on the size of the standard three-dimensional face model.

Referring to fig. 6, a flowchart of generating a standard texture map is provided for an embodiment of the present application.

As shown in fig. 6, coordinate conversion is performed on the standard three-dimensional face model according to a predefined coordinate mapping relationship to obtain UV coordinates of each vertex in the standard three-dimensional face model, and bilinear interpolation is performed on the basis of the UV coordinates of each vertex and pixel information of each pixel point in the sample two-dimensional image to obtain a standard texture map with a preset size. For example, please refer to the exemplary schematic diagram of generating a standard texture map shown in fig. 7.

S305, an initial face reconstruction model is created, and the sample two-dimensional image is input into the initial face reconstruction model to be predicted so as to obtain a predicted three-dimensional face model and a predicted texture map;

the initial face reconstruction model refers to an untrained, coarse deep learning model.

In one embodiment, an initial face reconstruction model is created, a sample two-dimensional image is input into the initial face model, and the initial face reconstruction model predicts to obtain a predicted three-dimensional face model and a predicted texture map based on initial parameter information in the initial face reconstruction model and image feature information in the sample two-dimensional image.

It can be understood that, since the initial face reconstruction model is an untrained and rough deep learning model, the predicted three-dimensional face model and the predicted texture map obtained by prediction do not have good model effect and texture effect, so that the initial face reconstruction model needs to be trained to continuously update the parameter information in the initial face reconstruction model to continuously improve the predicted model effect and the texture effect of the predicted three-dimensional face model.

Further, in an embodiment, the creating an initial face reconstruction model and inputting the sample two-dimensional image into the initial face reconstruction model for prediction to obtain a predicted three-dimensional face model and a predicted texture map may include creating an initial face reconstruction model and inputting the sample two-dimensional image into the initial face reconstruction model for prediction to obtain a deformation parameter matrix, an illumination parameter and a texture parameter, predicting based on the deformation parameter matrix to obtain a predicted three-dimensional face model, performing illumination modeling based on the illumination parameter to obtain an illumination map, and obtaining a first texture map based on the texture parameter and a preset texture template, and generating a predicted texture map based on the illumination map and the first texture map.

In this embodiment, the illumination modeling based on the illumination parameter may be based on a spherical harmonic illumination technology, specifically, a normal map is extracted from a sample two-dimensional image based on a backhaul network, and the illumination map is obtained based on the normal map and the illumination parameter. See the following formula:

Wherein the said And (3) as an illumination parameter, wherein the normal is a normal map, the H _i (normal) is used for expanding the normal map from a (H, w, c) dimension to a (9, H, w) dimension, wherein H represents the height of the normal map, w represents the width of the normal map, c represents the number of channels, and the SH _shading is the illumination map.

Optionally, the illumination modeling based on the illumination parameters may further perform illumination modeling based on other achievable illumination models, which is not limited by the embodiment of the present application.

Optionally, the texture template is a base texture template pre-designed based on factors of different skin colors, different sexes, different ages, etc. Further, a first texture map corresponding to the sample two-dimensional image may be generated based on texture parameters corresponding to the sample two-dimensional image output by the backbone network and the base texture template. Specifically, the first texture map may be calculated based on the following manner:

Wherein Tex _mean is obtained by averaging M texture templates, M represents the number of the texture templates, and The texture parameters are output in the backstone network training process, M texture parameters are used in total, and Tex _basis is the average value of each texture template minus the texture template.

Optionally, the first texture map includes albedo information, roughness information, metallic information, highlight information, altitude information, transparency information, and the like.

It should be understood that, the first texture map corresponding to the sample two-dimensional image is generated based on the pre-designed texture template and the output texture parameters, so that the predicted texture map generated based on the first texture map and the illumination map is always within the variation range defined by the pre-designed texture template, no larger error occurs, and the texture effect is relatively stable.

It will be appreciated that the greater the number of pre-designed texture templates, the more detail of the first texture map generated.

Referring to fig. 8, an exemplary schematic diagram of a predicted texture map is provided in accordance with an embodiment of the present application. As shown in fig. 8, a sample two-dimensional image is input into a back bone network in a face reconstruction model, the back bone network performs feature extraction on the sample two-dimensional image and outputs illumination parameters and texture parameters, illumination modeling is performed based on the illumination parameters to obtain an illumination map, a first texture map is obtained based on the texture parameters and a pre-designed texture template, and a final prediction texture map is obtained based on the first texture map and the illumination map by means of combination operation.

S306, carrying out smoothing treatment on the predicted three-dimensional face model to obtain a smooth predicted three-dimensional face model;

Specifically, reference may be made to the specific implementation manner of performing the smoothing process on the standard three-dimensional face model in step S303 to obtain a new standard three-dimensional face model.

S307, carrying out two-dimensional projection on the predicted three-dimensional face model to obtain a predicted two-dimensional image, and extracting face key points in the predicted two-dimensional image;

In one embodiment, after a sample two-dimensional image is input into a CNN-based backhaul network in a face reconstruction model, outputting second projection parameters by the backhaul network, performing two-dimensional projection on the predicted three-dimensional face model based on the second projection parameters to obtain a predicted two-dimensional image, and extracting face key points in the predicted two-dimensional image.

It can be understood that the embodiment of the application is essentially an iterative training process, in which parameters of the initial face reconstruction model are adjusted and updated in each iterative process, and then the updated initial face reconstruction model is used for performing the next round of iterative process, and after the parameters of the initial face reconstruction model are adjusted and updated, the updated initial face reconstruction model also updates the output second projection parameters.

S308, constructing a cost function based on the predicted three-dimensional face model, the standard three-dimensional face model, the smooth predicted three-dimensional face model, face key points in the predicted two-dimensional image, face key points in the sample two-dimensional image, the predicted texture map and the standard texture map;

illustratively, step S307 may include S3081, S3082, S3083, S3084, and S3085, as shown in fig. 9, on the basis of fig. 4.

S3081, constructing a first price function based on each vertex in the predicted three-dimensional face model and each vertex in the standard three-dimensional face model;

Wherein N is the number of vertices, V _i is the ith vertex in the predicted three-dimensional face model, and V _GT,i is the ith vertex in the standard three-dimensional face model.

It is easy to understand that the predicted three-dimensional face model is obtained by directly predicting a sample two-dimensional image by using an initial face reconstruction model, and the standard three-dimensional face model can be obtained by performing constraint reconstruction on an average face three-dimensional model based on face key points in the sample two-dimensional image and then performing smoothing treatment. Therefore, the standard three-dimensional face model is more accurate and smoother than the predicted three-dimensional face model, whether in the shape of the model. And then, constructing a first cost function based on each vertex in the predicted three-dimensional face model and each vertex in the standard three-dimensional face model, and carrying out adjustment training on parameters in the initial face reconstruction model based on the first cost function, so that the accuracy and the smoothing effect of the initial face reconstruction model on the reconstruction model can be improved.

S3082, constructing a second cost function based on each vertex in the predicted three-dimensional face model and each vertex in the smooth predicted three-dimensional face model;

Wherein N is the number of vertexes, V _i is the ith vertex in the predicted three-dimensional face model, and And predicting an ith vertex in the three-dimensional face model for the smoothness.

It is easy to understand that the predicted three-dimensional face model is obtained by directly predicting a sample two-dimensional image by using an initial face reconstruction model, and the smooth predicted three-dimensional face model is obtained by performing smoothing treatment on the predicted three-dimensional face model. Therefore, the smooth predictive three-dimensional face model has a remarkable smoothing effect compared to the predictive three-dimensional face model. And constructing a second cost function based on each vertex in the predicted three-dimensional face model and each vertex in the smooth predicted three-dimensional face model, and performing model internal parameter adjustment training on the initial face reconstruction model based on the second cost function, so that the smoothing effect of the initial face reconstruction model on the reconstruction model can be optimized.

S3083, constructing a third price function based on the face key points in the predicted two-dimensional image and the face key points in the sample two-dimensional image;

The M is the number of face key points, lmk _i is the ith face key point in the predicted two-dimensional image, and lmk _GT,i is the ith face key point in the sample two-dimensional image.

It is easy to understand that the predicted two-dimensional image is obtained by two-dimensional projection of the predicted three-dimensional face model. Therefore, compared with the sample two-dimensional image, the predicted two-dimensional image has larger difference in the face key points, a third cost function is constructed based on the face key points in the predicted two-dimensional image and the face key points in the sample two-dimensional image, and the initial face reconstruction model is subjected to model internal parameter adjustment training based on the third cost function, so that the accuracy of the initial face reconstruction model to the reconstruction model can be optimized, and the reconstructed target three-dimensional face model is more similar to the standard three-dimensional face model corresponding to the sample two-dimensional image.

S3084, constructing a fourth cost function based on each pixel in the prediction texture map and each pixel in the standard texture map;

Wherein, pixel is the Pixel number, tex _i is the i-th Pixel in the predicted texture map, and Tex _GT,i is the i-th Pixel in the standard texture map.

It can be understood that the standard texture map is obtained by bilinear interpolation based on the standard three-dimensional face model and the sample two-dimensional image, and compared with the predicted texture map obtained by direct prediction of the initial face reconstruction model, the standard texture map has a prominent texture effect, and the details are richer and more accurate. And constructing a fourth cost function based on the predicted texture map and the standard texture map, and performing model internal parameter adjustment training on the initial face reconstruction model based on the fourth cost function, so that the predicted effect of the initial face reconstruction model on the texture map can be optimized, the predicted texture map is closer to the standard texture map with rich and accurate details and prominent texture effect, and the predicted effect of the initial face reconstruction model on the texture map is improved.

S3085, carrying out weighted summation on the first cost function, the second cost function, the third cost function and the fourth cost function to obtain a cost function.

In one embodiment, different weight values are set for the first cost function, the second cost function, the third cost function and the fourth cost function, respectively, and weighted summation is performed to obtain a total cost function including the four cost functions, so as to perform adjustment training of parameters in the model on the initial face reconstruction model based on the total cost function.

Further, in one possible implementation, the MSE-based cost function is used to calculate and back-propagate. Alternatively, a cost function of another form other than MSE may be used, which is not limited to the embodiment of the present application. For example, a SAD-based cost function may be used, and then the first, second, and third cost functions may be expressed as:

first cost function:

second cost function:

Third generation cost function:

S309, training the initial face reconstruction model based on the cost function to obtain a trained face reconstruction model;

In the embodiment of the present application, step S305 to step S309 are iterative loop training processes, and step S301 to step S304 are used for providing training data for the iterative loop training processes of step S305 to step S309.

In one embodiment, the model parameters of the face reconstruction model are updated based on the back propagation of the cost function, and the training is iterated until the function value of the cost function is smaller than a preset threshold value or a preset number of loops is reached.

Optionally, in an implementation manner, the method includes performing adjustment training on parameters in the model of the initial face reconstruction model based on the first cost function, the second cost function and the third cost function, after the cost function is smaller than a certain threshold or the number of iterations reaches a preset number, finishing the first-stage training, and then combining the first cost function, the second cost function, the third cost function and the fourth cost function to perform second-stage training to further optimize the prediction effect of the face reconstruction model on the three-dimensional face model and the texture map.

Referring to fig. 10, a flowchart for training a face reconstruction model is provided in an embodiment of the present application.

The method comprises the steps of inputting a sample two-dimensional image into an initial face reconstruction model, predicting a predicted three-dimensional face model and a predicted texture mapping by the initial face reconstruction model, constructing a first cost function based on the predicted three-dimensional face model and a standard three-dimensional face model obtained in a flow chart shown in fig. 5, smoothing the predicted three-dimensional face model to obtain a smooth predicted three-dimensional face model, constructing a second cost function based on the smooth predicted three-dimensional face model and the predicted three-dimensional face model, projecting projection parameters given by combining the predicted three-dimensional face model with the initial face reconstruction model to two dimensions to obtain a predicted two-dimensional image, extracting face key points in the predicted two-dimensional image, constructing a third cost function based on the face key points 2 in the predicted two-dimensional image and the face key points 1 in the sample two-dimensional image, constructing a fourth cost function based on the predicted texture mapping and the standard texture mapping which are predicted by the initial face reconstruction model, weighting the first cost function, the second cost function, the third cost function and the new cost function are reconstructed by the method. The above-mentioned process is a complete training process, the process is iterative training, the training end mark can be that the iteration number reaches the preset number of times or the weighted sum value of the first cost function, the second cost function, the third cost function and the fourth cost function is smaller than the preset threshold value, and the face reconstruction model can be obtained after the training is ended.

S310, acquiring a target two-dimensional image;

s311, inputting the target two-dimensional image into a trained face reconstruction model, and outputting a target three-dimensional face model and a target texture map corresponding to the target two-dimensional image;

S312, based on the target three-dimensional face model and the target texture map, a texture three-dimensional face model containing texture information is generated.

Step S310 to step S312 refer to the detailed description in step S101 to step S103, and are not described herein.

In the embodiment of the application, an initial face reconstruction model is firstly established, a sample two-dimensional image is acquired, face key points in the sample two-dimensional image are then extracted, an average face three-dimensional model is obtained, the average face three-dimensional model is subjected to constrained reconstruction based on the face key points in the sample two-dimensional image to obtain a standard three-dimensional face model corresponding to the sample two-dimensional image, UV coordinates of each vertex in the standard three-dimensional face model are obtained, the sample two-dimensional image is subjected to bilinear interpolation based on the UV coordinates of each vertex to obtain a standard texture map with preset size, the sample two-dimensional image is then input into the initial face reconstruction model to be predicted to obtain a predicted three-dimensional face model, the predicted three-dimensional face model is subjected to smoothing processing to obtain a smooth predicted three-dimensional face model, the predicted three-dimensional face model is subjected to two-dimensional projection to obtain a predicted two-dimensional image, the method comprises the steps of further extracting face key points in a predicted two-dimensional image, constructing four cost functions based on the predicted three-dimensional face model, a standard three-dimensional face model, a smooth predicted three-dimensional face model, the face key points in the predicted two-dimensional image, the face key points in a sample two-dimensional image, a predicted texture map and a standard texture map, performing iterative training on an initial face reconstruction model by using the constructed four cost functions, and finally obtaining a trained face reconstruction model, wherein in the iterative training process, the training of a smooth effect, model shape accuracy, a stabilizing effect of the texture map and a detail effect is included, so that the trained face reconstruction model has good accuracy, a prominent smooth effect and good stability for three-dimensional face model reconstruction and target texture map prediction; then, the process is carried out, the three-dimensional face model reconstruction and the target texture mapping prediction with high accuracy, obvious smooth effect and high detail reduction degree can be performed on the target two-dimensional image by using the trained face reconstruction model, wherein the texture mapping is generated based on parameters and a pre-designed texture template, so that the texture effect of the texture mapping is always within the effect range limited by the texture template, no larger texture effect error occurs, the texture effect of the texture mapping is improved, and the stability of the texture mapping is ensured.

According to the embodiment of the application, the face reconstruction model is trained based on a plurality of cost functions, the target texture map with stable effect and the target three-dimensional face model can be reconstructed based on the target two-dimensional image, then the target texture map is rendered into the target three-dimensional face model, the texture three-dimensional face model with rich details and stable effect and containing texture information can be generated, and the reconstruction effect of the three-dimensional face model is improved.

Optionally, stylization is added in the training process of the face reconstruction model, so that the face reconstruction model can reconstruct a three-dimensional face model with stylization and a target texture map.

Referring to fig. 11, a flow chart of a three-dimensional face model reconstruction method is provided in an embodiment of the present application. As shown in fig. 11, the three-dimensional face model reconstruction method includes the following steps.

S401, collecting a two-dimensional image of a sample;

s402, performing stylization processing on the sample two-dimensional image to obtain a stylized sample two-dimensional image, and extracting face key points in the stylized sample two-dimensional image;

The stylized process refers to converting a sample two-dimensional image into a specific stylized sample two-dimensional image, such as a sketch portrait style, a cartoon (animated) style, a canvas style, and the like.

Referring to fig. 12, an exemplary schematic diagram of a stylized process is provided in accordance with an embodiment of the present application.

As shown in FIG. 12, the sample two-dimensional image is stylized to obtain a stylized sample two-dimensional image of the cartoon as shown.

S403, acquiring an average face three-dimensional model;

s404, carrying out constraint reconstruction on the average face three-dimensional model based on face key points in the stylized sample two-dimensional image to obtain a standard three-dimensional face model corresponding to the sample two-dimensional image;

Referring to fig. 13, an exemplary schematic diagram of reconstructing a standard three-dimensional face model using face key point constraints is provided in an embodiment of the present application.

As shown in fig. 13, the average face three-dimensional model is constrained by face key points in the stylized sample two-dimensional image in the process of reconstructing the standard three-dimensional face model to obtain the standard three-dimensional face model corresponding to the stylized sample two-dimensional image.

S405, acquiring UV coordinates of each vertex in the standard three-dimensional face model, and obtaining a standard texture map with a preset size by bilinear interpolation of the stylized sample two-dimensional image based on the UV coordinates of each vertex;

S406, an initial face reconstruction model is created, and the sample two-dimensional image is input into the initial face reconstruction model for prediction to obtain a predicted three-dimensional face model and a predicted texture map;

S407, carrying out smoothing treatment on the predicted three-dimensional face model to obtain a smooth predicted three-dimensional face model;

S408, carrying out two-dimensional projection on the predicted three-dimensional face model to obtain a predicted two-dimensional image, and extracting face key points in the predicted two-dimensional image;

s409, constructing a cost function based on the predicted three-dimensional face model, the standard three-dimensional face model, the smooth predicted three-dimensional face model, face key points in the predicted two-dimensional image, face key points in the stylized sample two-dimensional image, the predicted texture map and the standard texture map;

s410, training the initial face reconstruction model based on the cost function to obtain a trained face reconstruction model;

S411, acquiring a target two-dimensional image;

S412, inputting the target two-dimensional image into a trained face reconstruction model, and outputting a stylized three-dimensional face model and a stylized texture map corresponding to the target two-dimensional image;

The face reconstruction model is generated based on cost function training, the cost function comprises a first cost function, a second cost function and a third cost function, the first cost function is obtained based on a predicted stylized three-dimensional face model corresponding to a sample two-dimensional image and a standard stylized three-dimensional face model corresponding to the sample two-dimensional image, the second cost function is obtained based on the predicted stylized three-dimensional face model and a smooth stylized three-dimensional face model obtained after smoothing processing of the predicted stylized three-dimensional face model, and the third cost function is obtained based on face key points obtained after two-dimensional projection of the predicted stylized three-dimensional face model and face key points in a stylized sample two-dimensional image corresponding to the sample two-dimensional image;

The sample two-dimensional image is predicted by the prediction stylized three-dimensional face model based on the created initial face reconstruction model, and the average face three-dimensional model is obtained by constraint reconstruction of the standard stylized three-dimensional face model based on face key points in the stylized sample two-dimensional image corresponding to the sample two-dimensional image.

And S413, generating a texture three-dimensional face model containing texture information based on the stylized three-dimensional face model and the stylized texture map.

By adopting the three-dimensional face model reconstruction method provided by the embodiment of the application, the stylized processing is added in the process of training the face reconstruction model, the stylized three-dimensional face model and the stylized texture map can be output based on the face reconstruction model trained by the smooth effect, the accuracy and the texture effect stability, the texture three-dimensional face model containing texture information can be generated based on the stylized three-dimensional face model and the stylized texture map, and the interestingness and the functionality of the three-dimensional face model reconstruction are increased.

Referring to fig. 14, a schematic structural diagram of a three-dimensional face model reconstruction device is provided in an embodiment of the present application. As shown in fig. 14, the three-dimensional face model reconstruction apparatus 1 may be implemented as all or a part of a computer device by software, hardware, or a combination of both. According to some embodiments, the three-dimensional face model reconstruction device 1 includes an image acquisition module 11 and a model prediction module 12, and specifically includes:

an image acquisition module 11 for acquiring a target two-dimensional image;

the model prediction module 12 is configured to input the target two-dimensional image into a trained face reconstruction model, and output a target three-dimensional face model corresponding to the target two-dimensional image;

a target model generating module 13, configured to generate a texture three-dimensional face model containing texture information based on the target three-dimensional face model and the target texture map.

Optionally, the model prediction module 12 is specifically configured to:

Inputting the target two-dimensional image into a trained face reconstruction model, and outputting a stylized three-dimensional face model and a stylized texture map corresponding to the target two-dimensional image;

the object model generating module 13 is specifically configured to:

And generating a texture three-dimensional face model containing texture information based on the stylized three-dimensional face model and the stylized texture map.

Optionally, as shown in fig. 15, the apparatus further includes a model training module 14.

Optionally, referring to fig. 16, a schematic structural diagram of a model training module is provided in an embodiment of the present application. As shown in fig. 16, the model training module 14 includes:

an image acquisition unit 141 for acquiring a two-dimensional image of a sample;

A standard obtaining unit 142, configured to obtain a standard three-dimensional face model and a standard texture map corresponding to the sample two-dimensional image;

The model training unit 143 is configured to create an initial face reconstruction model, train the initial face reconstruction model based on the sample two-dimensional image, the standard three-dimensional face model and the standard texture map, and obtain a trained face reconstruction model.

Optionally, referring to fig. 17, a schematic structural diagram of a standard acquiring unit is provided in an embodiment of the present application. As shown in fig. 17, the standard acquisition unit 142 includes:

A first key point extracting subunit 1421, configured to extract key points of a face in the sample two-dimensional image, and obtain an average face three-dimensional model;

a model reconstruction subunit 1422, configured to perform constraint reconstruction on the average face three-dimensional model based on the face key points in the sample two-dimensional image, so as to obtain a standard three-dimensional face model corresponding to the sample two-dimensional image;

And a texture generation subunit 1423, configured to obtain UV coordinates of each vertex in the standard three-dimensional face model, and obtain a standard texture map with a preset size by performing bilinear interpolation on the sample two-dimensional image based on the UV coordinates of each vertex.

Optionally, the model reconstruction subunit 1422 is specifically configured to:

Performing constraint reconstruction on the average face three-dimensional model based on face key points in the sample two-dimensional image to obtain a constrained three-dimensional face model and a first projection parameter;

And carrying out projection transformation on the constrained three-dimensional face model based on the first projection parameters to obtain a standard three-dimensional face model.

Optionally, referring to fig. 18, a schematic structural diagram of a model training unit is provided in an embodiment of the present application. As shown in fig. 18, the model training unit 143 includes:

the model prediction subunit 1431 is configured to create an initial face reconstruction model, and input the sample two-dimensional image to the initial face reconstruction model to perform prediction to obtain a predicted three-dimensional face model and a predicted texture map;

a smoothing processing subunit 1432, configured to perform smoothing processing on the predicted three-dimensional face model to obtain a smooth predicted three-dimensional face model;

A second key point extraction subunit 1433, configured to two-dimensionally project the predicted three-dimensional face model to obtain a predicted two-dimensional image, and extract a face key point in the predicted two-dimensional image;

a cost function construction subunit 1434 configured to construct a cost function based on the predicted three-dimensional face model, the standard three-dimensional face model, the smooth predicted three-dimensional face model, the face keypoints in the predicted two-dimensional image, the face keypoints in the sample two-dimensional image, the predicted texture map, and the standard texture map;

And a model training subunit 1435, configured to train the initial face reconstruction model based on the cost function, so as to obtain a trained face reconstruction model.

Optionally, the model predictor unit 1431 is specifically configured to:

Establishing an initial face reconstruction model, and inputting the sample two-dimensional image into the initial face reconstruction model for prediction to obtain a deformation parameter matrix, illumination parameters and texture parameters;

predicting based on the deformation parameter matrix to obtain a predicted three-dimensional face model;

Carrying out illumination modeling based on the illumination parameters to obtain an illumination map;

obtaining a first texture map based on the texture parameters and a preset texture template, and

A predicted texture map is generated based on the illumination map and the first texture map.

Optionally, the second keypoint extraction subunit 1433 is specifically configured to:

And acquiring second projection parameters in the initial face reconstruction model, carrying out two-dimensional projection on the predicted three-dimensional face model based on the second projection parameters to obtain a predicted two-dimensional image, and extracting face key points in the predicted two-dimensional image.

Optionally, the cost function construction subunit 1434 is specifically configured to:

constructing a first cost function based on each vertex in the predicted three-dimensional face model and each vertex in the standard three-dimensional face model: Wherein N is the number of vertices, V _i is the ith vertex in the predicted three-dimensional face model, and V _GT,i is the ith vertex in the standard three-dimensional face model;

constructing a second cost function based on each vertex in the predicted three-dimensional face model and each vertex in the smooth predicted three-dimensional face model: Wherein N is the number of vertexes, V _i is the ith vertex in the predicted three-dimensional face model, and An ith vertex in the smooth prediction three-dimensional face model;

constructing a third price function based on the face key points in the predicted two-dimensional image and the face key points in the sample two-dimensional image: wherein M is the number of face key points, lmk _i is the ith face key point in the predicted two-dimensional image, and lmk _GT,i is the ith face key point in the sample two-dimensional image;

constructing a fourth cost function based on each pixel in the predicted texture map and each pixel in the standard texture map: Wherein, pixel is the Pixel number, tex _i is the i-th Pixel in the predicted texture map, and Tex _GT,i is the i-th Pixel in the standard texture map;

And carrying out weighted summation on the first cost function, the second cost function, the third cost function and the fourth cost function to obtain a cost function.

L=β·L₁+μ·₂+λ·₃+ω·₄

Wherein β is a weight corresponding to the first cost function L ₁, μ is a weight corresponding to the second cost function L ₂, λ is a weight corresponding to the third cost function L ₃, and ω is a weight corresponding to the fourth cost function L ₄.

The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

According to the three-dimensional face model reconstruction method provided by the embodiment of the application, the face reconstruction model which is generated based on the pre-training of a plurality of cost functions can be used for reconstructing the target texture mapping with stable effect and the target three-dimensional face model based on the target two-dimensional image, the target texture mapping is generated based on parameters and the pre-designed texture template, so that the texture effect of the target texture mapping is always within the effect range limited by the texture template, no larger texture effect error occurs, the texture effect of the target texture mapping is improved, the stability of the target texture mapping is ensured, the target texture mapping is rendered into the target three-dimensional face model, the texture three-dimensional face model which contains texture information and has rich details and stable effect can be generated, the three-dimensional face reconstruction effect is improved, the stylized processing can be optionally added in the process of training the face reconstruction model, the stylized three-dimensional face model and the stylized texture mapping can be output based on the smooth effect and the accurate trained face reconstruction model, the interesting three-dimensional face model can be generated based on the stylized three-dimensional face model and the stylized texture mapping, and the three-dimensional face reconstruction model can be generated, and the three-dimensional face reconstruction function of the three-dimensional face model is increased.

The embodiment of the present application further provides a computer storage medium, where the computer storage medium may store a plurality of instructions, where the instructions are suitable for being loaded by a processor and executed by the processor, where the specific execution process may refer to the specific description of the embodiment shown in fig. 1 to 13, and details are not repeated herein.

The present application also provides a computer program product, where at least one instruction is stored, where the at least one instruction is loaded by the processor and executed by the processor, where the specific execution process may refer to the specific description of the embodiment shown in fig. 1 to 13, and details are not repeated herein.

Referring to fig. 19, a block diagram of a computer device according to an exemplary embodiment of the present application is shown. The computer device of the present application may include one or more of a processor 110, a memory 120, an input device 130, an output device 140, and a bus 150. The processor 110, the memory 120, the input device 130, and the output device 140 may be connected by a bus 150.

Processor 110 may include one or more processing cores. The processor 110 utilizes various interfaces and lines to connect various portions of the overall computer device, perform various functions of the computer device 100 and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 120, and invoking data stored in the memory 120. Alternatively, the processor 110 may be implemented in at least one hardware form of Digital Signal Processing (DSP), field-programmable gate array (FPGA), programmable logic array (programmable logic Array, PLA). The processor 110 may integrate one or a combination of several of a central processing unit (central processing unit, CPU), an image processor (graphics processing unit, GPU), a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like, the GPU is used for rendering and drawing display contents, and the modem is used for processing wireless communication. It will be appreciated that the modem may not be integrated into the processor 110 and may be implemented solely by a single communication chip.

The memory 120 may include a random access memory (random Access Memory, RAM) or a read-only memory (ROM). Optionally, the memory 120 includes a non-transitory computer readable medium (non-transitory computer-readable storage medium). Memory 120 may be used to store instructions, programs, code, sets of codes, or sets of instructions.

The input device 130 is configured to receive input instructions or data, and the input device 130 includes, but is not limited to, a keyboard, a mouse, a camera, a microphone, or a touch device. The output device 140 is used to output instructions or data, and the output device 140 includes, but is not limited to, a display device, a speaker, and the like. In an embodiment of the present application, the input device 130 may be a temperature sensor for acquiring an operating temperature of the computer apparatus. The output device 140 may be a speaker for outputting audio signals.

In addition, those skilled in the art will appreciate that the structures of the computer devices shown in the above-described figures are not limiting and that a computer device may include more or fewer components than shown, or may combine certain components, or a different arrangement of components. For example, the computer device further includes components such as a radio frequency circuit, an input unit, a sensor, an audio circuit, a wireless fidelity (WIRELESS FIDELITY, wiFi) module, a power supply, and a bluetooth module, which are not described herein.

In the embodiment of the present application, the execution subject of each step may be the computer device described above. Optionally, the execution subject of each step is an operating system of the computer device. The operating system may be an android system, an IOS system, or other operating systems, which is not limited by the embodiments of the present application.

In the computer device shown in fig. 19, the processor 110 may be configured to invoke a three-dimensional face model reconstruction program stored in the memory 120 and execute to implement the three-dimensional face model reconstruction method according to the method embodiments of the present application.

It will be clear to a person skilled in the art that the solution according to the application can be implemented by means of software and/or hardware. "Unit" and "module" in this specification refer to software and/or hardware capable of performing a particular function, either alone or in combination with other components, such as Field programmable gate arrays (Field-ProgrammaBLE GATE ARRAY, FPGA), integrated circuits (INTEGRATED CIRCUIT, ICs), etc.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, such as the division of the units, merely a logical function division, and there may be additional manners of dividing the actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some service interface, device or unit indirect coupling or communication connection, electrical or otherwise.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods of the above embodiments may be implemented by hardware associated with a program of instructions, which may be stored in a computer readable Memory, which may include a flash disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, etc.

The foregoing is merely exemplary embodiments of the present application and is not intended to limit the scope of the present application. That is, equivalent changes and modifications are contemplated by the teachings of the present application, which fall within the scope of the present application. Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

Claims

1. A method for reconstructing a three-dimensional face model, characterized in that the method comprises:

Collecting a two-dimensional image of the sample;

Obtaining a standard three-dimensional face model and a standard texture map corresponding to the sample two-dimensional image;

Creating an initial face reconstruction model, and inputting the sample two-dimensional image into the initial face reconstruction model for prediction to obtain a predicted three-dimensional face model and a predicted texture map;

Smoothing the predicted three-dimensional face model to obtain a smoothed predicted three-dimensional face model;

Performing two-dimensional projection on the predicted three-dimensional face model to obtain a predicted two-dimensional image, and extracting key points of the face in the predicted two-dimensional image;

Constructing a first cost function based on each vertex in the predicted three-dimensional face model and each vertex in the standard three-dimensional face model: Wherein, N is the number of vertices, _Vi is the i-th vertex in the predicted three-dimensional face model, and VGT _,i is the i-th vertex in the standard three-dimensional face model;

Constructing a second cost function based on each vertex in the predicted three-dimensional face model and each vertex in the smooth predicted three-dimensional face model: Wherein N is the number of vertices, _Vi is the i-th vertex in the predicted three-dimensional face model, The i-th vertex in the smoothed predicted three-dimensional face model;

Constructing a third cost function based on the facial key points in the predicted two-dimensional image and the facial key points in the sample two-dimensional image: Wherein, M is the number of facial key points, lmk _i is the i-th facial key point in the predicted two-dimensional image, and lmk _GT,i is the i-th facial key point in the sample two-dimensional image;

A fourth cost function is constructed based on each pixel in the predicted texture map and each pixel in the standard texture map: Wherein, Pixel is the number of pixels, Tex _i is the i-th pixel in the predicted texture map, and Tex _GT,i is the i-th pixel in the standard texture map;

Performing weighted summation on the first cost function, the second cost function, the third cost function and the fourth cost function to obtain a cost function;

Training the initial face reconstruction model based on the cost function to obtain a trained face reconstruction model;

Acquire a target two-dimensional image;

Inputting the target two-dimensional image into the face reconstruction model, and outputting a target three-dimensional face model and a target texture map corresponding to the target two-dimensional image;

A textured three-dimensional face model containing texture information is generated based on the target three-dimensional face model and the target texture map.

2. The method according to claim 1, characterized in that the step of inputting the target two-dimensional image into a trained face reconstruction model and outputting a target three-dimensional face model and a target texture map corresponding to the target two-dimensional image comprises:

The step of generating a textured three-dimensional face model containing texture information based on the target three-dimensional face model and the target texture map comprises:

Based on the stylized three-dimensional face model and the stylized texture map, a textured three-dimensional face model containing texture information is generated.

3. The method according to claim 1, characterized in that the step of obtaining a standard three-dimensional face model and a standard texture map corresponding to the sample two-dimensional image comprises:

Extracting key points of the human face in the sample two-dimensional image and obtaining an average face three-dimensional model;

Based on the facial key points in the sample two-dimensional image, the average face three-dimensional model is constrainedly reconstructed to obtain a standard three-dimensional face model corresponding to the sample two-dimensional image;

The UV coordinates of each vertex in the standard three-dimensional face model are obtained, and based on the UV coordinates of each vertex, the sample two-dimensional image is bilinearly interpolated to obtain a standard texture map of a preset size.

4. The method according to claim 3, characterized in that the constrained reconstruction of the average face three-dimensional model based on the facial key points in the sample two-dimensional image to obtain a standard three-dimensional face model corresponding to the sample two-dimensional image comprises:

Based on the facial key points in the sample two-dimensional image, the average face three-dimensional model is constrained to reconstruct to obtain a constrained three-dimensional face model and a first projection parameter;

The constrained three-dimensional face model is projectively transformed based on the first projection parameters to obtain a standard three-dimensional face model.

5. The method according to claim 1, characterized in that the step of creating an initial face reconstruction model and inputting the sample two-dimensional image into the initial face reconstruction model for prediction to obtain a predicted three-dimensional face model and a predicted texture map comprises:

Creating an initial face reconstruction model, and inputting the sample two-dimensional image into the initial face reconstruction model for prediction to obtain a deformation parameter matrix, illumination parameters, and texture parameters;

Obtaining a predicted three-dimensional face model based on the deformation parameter matrix prediction;

Performing illumination modeling based on the illumination parameters to obtain an illumination map;

Obtaining a first texture map based on the texture parameters and a preset texture template; and

A predicted texture map is generated based on the light map and the first texture map.

6. The method according to claim 1, characterized in that the step of performing two-dimensional projection on the predicted three-dimensional face model to obtain a predicted two-dimensional image and extracting key facial points in the predicted two-dimensional image comprises:

The second projection parameters in the initial face reconstruction model are obtained, the predicted three-dimensional face model is two-dimensionally projected based on the second projection parameters to obtain a predicted two-dimensional image, and the face key points in the predicted two-dimensional image are extracted.

7. The method according to claim 1, characterized in that the step of performing weighted summation on the first cost function, the second cost function, the third cost function and the fourth cost function to obtain the cost function comprises:

L＝β·L ₁ +μ·L ₂ +λ·L ₃ +ω·L ₄

Among them, the β is the weight corresponding to the first cost function _L1 , the μ is the weight corresponding to the second cost function _L2 , the λ is the weight corresponding to the third cost function _L3 , and the ω is the weight corresponding to the fourth cost function _L4 .

8. A 3D face model reconstruction device, characterized in that the device comprises:

The model training module is used to collect a sample two-dimensional image; obtain a standard three-dimensional face model and a standard texture map corresponding to the sample two-dimensional image; create an initial face reconstruction model, and input the sample two-dimensional image into the initial face reconstruction model for prediction to obtain a predicted three-dimensional face model and a predicted texture map; smooth the predicted three-dimensional face model to obtain a smoothed predicted three-dimensional face model; perform two-dimensional projection on the predicted three-dimensional face model to obtain a predicted two-dimensional image, and extract key face points in the predicted two-dimensional image; construct a first cost function based on each vertex in the predicted three-dimensional face model and each vertex in the standard three-dimensional face model: Wherein, N is the number of vertices, _Vi is the i-th vertex in the predicted three-dimensional face model, and VGT _,i is the i-th vertex in the standard three-dimensional face model; a second cost function is constructed based on each vertex in the predicted three-dimensional face model and each vertex in the smooth predicted three-dimensional face model: Wherein N is the number of vertices, _Vi is the i-th vertex in the predicted three-dimensional face model, is the i-th vertex in the smooth predicted three-dimensional face model; constructing a third cost function based on the face key points in the predicted two-dimensional image and the face key points in the sample two-dimensional image: Wherein M is the number of facial key points, lmk _i is the i-th facial key point in the predicted two-dimensional image, and lmk _GT,i is the i-th facial key point in the sample two-dimensional image; a fourth cost function is constructed based on each pixel in the predicted texture map and each pixel in the standard texture map: Wherein, Pixel is the number of pixels, Tex _i is the i-th pixel in the predicted texture map, and Tex _GT,i is the i-th pixel in the standard texture map; the first cost function, the second cost function, the third cost function and the fourth cost function are weightedly summed to obtain a cost function; the initial face reconstruction model is trained based on the cost function to obtain a trained face reconstruction model;

An image acquisition module, used for acquiring a target two-dimensional image;

A model prediction module, used for inputting the target two-dimensional image into the face reconstruction model, and outputting a target three-dimensional face model and a target texture map corresponding to the target two-dimensional image;

The target model generation module is used to generate a textured three-dimensional face model containing texture information based on the target three-dimensional face model and the target texture map.

9. A storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the method according to any one of claims 1 to 7.

10. A computer device, comprising: a processor and a memory; wherein the memory stores a computer program, and the computer program is suitable for being loaded by the processor and executing the steps of the method according to any one of claims 1 to 7.