CN109068081A

CN109068081A - Video generation method, device, electronic equipment and storage medium

Info

Publication number: CN109068081A
Application number: CN201810911033.XA
Authority: CN
Inventors: 韩旭
Original assignee: Beijing Microlive Vision Technology Co Ltd
Current assignee: Beijing Microlive Vision Technology Co Ltd
Priority date: 2018-08-10
Filing date: 2018-08-10
Publication date: 2018-12-21
Also published as: WO2020029523A1

Abstract

Present disclose provides a kind of video generation method, device, electronic equipment and storage mediums.This method comprises: obtaining video record resource, video record resource includes music and humanoid standard operation picture corresponding with each broadcast nodes of music when receiving the video record trigger action of user；Music is played, and acquires user video in playing process, in broadcasting to each broadcast nodes, shows corresponding humanoid standard operation picture；According to the matching degree of the standard operation in the user action and corresponding humanoid standard operation picture in the corresponding video frame images of broadcast nodes each in user video, the action evaluation information of each user action is determined；According to the action evaluation information of video record resource, user video and each user action, target video is generated.The scheme of the disclosure provides the selection of more video record modes for user, improves sense of participation of the user in video record, effectively improve the experience of user.

Description

Video generation method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of terminal technologies, and in particular, to a video generation method and apparatus, an electronic device, and a storage medium.

Background

With the rapid development of science and technology and the improvement of the living standard of people, terminal devices (such as smart phones, tablet computers and the like) become an indispensable part of the life of people, and users can install terminal Application programs (APP) on the terminal devices to enrich the experience of respective terminal use.

With the rapid increase of the kinds and the number of the APPs, the demands of users on the APPs are increasing. In order to better meet the requirements of users, the existing APP also starts to pay more and more attention to the interaction experience of users, and a plurality of social application platforms are produced accordingly. Through the platforms, users can record and upload videos and watch various types of videos, but the existing video recording mode is single, the entertainment requirements of the users cannot be met, and the participation sense of the users is low.

Disclosure of Invention

The present disclosure aims to solve at least one of the above technical drawbacks. The technical scheme adopted by the disclosure is as follows:

in a first aspect, the present disclosure provides a video generation method, including:

when a video recording triggering operation of a user is received, video recording resources are obtained, and the video recording resources comprise music and human-shaped standard action pictures corresponding to all playing nodes of the music;

playing music, collecting a user video in the playing process, and displaying a corresponding human-shaped standard action picture when the user video is played to each playing node;

determining action evaluation information of each user action according to the matching degree of the user action in the video frame image corresponding to each playing node in the user video and the standard action in the corresponding humanoid standard action picture;

and generating a target video according to the video recording resource, the user video and the action evaluation information of each user action.

In an optional embodiment, the displayed corresponding standard action picture is a humanoid standard action picture with the transparency degree of a first transparency degree;

generating a target video according to the video recording resource, the user video and the action evaluation information of each user action comprises the following steps:

and generating a target video according to the music, the human-shaped standard action pictures which correspond to the playing nodes of the music and have the second transparency, the user videos and the action evaluation information of each user action, wherein the second transparency is greater than the first transparency.

In an alternative embodiment, generating the target video according to the video recording resource, the user video and the action evaluation information of each user action comprises:

adding a humanoid standard action picture in a video recording resource into a corresponding video frame image in a user video;

and generating a target video according to the music, the user video added with the human-shaped standard motion picture and the motion evaluation information of the user motion.

adding the action evaluation information of each user action into a corresponding video frame image in the user video;

and generating a target video according to the video recording resources and the user video added with the action evaluation information.

In an optional implementation, after determining the action evaluation information of each user action, the method further includes:

determining comprehensive evaluation information of the user video according to the action evaluation information of each user action;

generating a target video according to the video recording resources and the user video added with the action evaluation information, wherein the method comprises the following steps:

and generating a target video according to the video recording resources, the user video added with the action evaluation information and the comprehensive evaluation information.

In an alternative embodiment, the method further comprises:

and after the music playing is finished, displaying the comprehensive evaluation information.

In an optional implementation manner, the video recording resource further includes special effect information corresponding to the motion evaluation information, and the special effect information includes an animation special effect and/or a sound effect special effect;

after the action evaluation information of each user action is determined, the method further comprises the following steps:

and displaying the action evaluation information of each user action and/or the special effect information corresponding to the action evaluation information of each user action to a display interface of the corresponding humanoid standard action picture.

In an alternative embodiment, adding the motion evaluation information of each user motion to the corresponding video frame image in the user video comprises:

adding the action evaluation information of each user action and the special effect information corresponding to the action evaluation information of each user action into a corresponding video frame image in the user video;

and generating a target video according to the video recording resource and the user video added with the action evaluation information and the special effect information.

In an alternative embodiment, before playing the music, the method further comprises:

it is determined that the user is within the video capture range.

In an optional implementation, after the target video is generated, the method further includes:

receiving target video release operation of a user;

according to the publishing operation, the target video is published to a video publishing platform; or,

and when a rephoto triggering operation of the user is received, regenerating the target video based on the video recording resource.

In an optional implementation manner, when a video recording trigger operation of a user is received, acquiring a video recording resource includes:

when receiving a video recording triggering operation of a user, controlling to display a music selection interface;

acquiring music selection operation of a user through a music selection interface;

and acquiring video recording resources according to the music selection operation.

when a video recording triggering operation of a user is received through the video playing interface, video recording resources corresponding to a video currently played through the video playing interface are obtained.

In a second aspect, the present disclosure provides a video generating apparatus, comprising:

the recording resource acquisition module is used for acquiring video recording resources when receiving video recording triggering operation of a user, wherein the video recording resources comprise music and human-shaped standard action pictures corresponding to each playing node of the music;

the video acquisition module is used for playing music, acquiring a user video in the playing process, and displaying a corresponding human-shaped standard action picture when the user video is played to each playing node;

the evaluation information determining module is used for determining the action evaluation information of each user action according to the matching degree of the user action in the video frame image corresponding to each playing node in the user video and the standard action in the corresponding humanoid standard action picture;

and the target video generation module is used for generating a target video according to the video recording resource, the user videos and the action evaluation information of each user action.

In an optional implementation manner, the video capture module is specifically configured to, when displaying the corresponding standard motion picture: displaying a human-shaped standard action picture with the first transparency;

the target video generation module is specifically configured to:

and generating a target video according to the music, the human-shaped standard action pictures corresponding to the playing nodes of the music and having the second transparency, the user videos and the action evaluation information of each user action, wherein the second transparency is greater than the first transparency.

In an optional implementation manner, the target video generation module is specifically configured to:

In an alternative embodiment, the evaluation information determination module is further configured to:

after the action evaluation information of each user action is determined, determining comprehensive evaluation information of the user video according to the action evaluation information of each user action;

the target video generation module is specifically configured to, when generating a target video according to the video recording resource and the user video to which the action evaluation information is added:

In an alternative embodiment, the apparatus further comprises:

and the first display module is used for displaying the comprehensive evaluation information after the comprehensive evaluation information of the user video is determined and the music playing is finished.

In an optional implementation manner, the video recording resource further includes special effect information corresponding to the motion evaluation information, and the special effect information includes an animation special effect and/or a sound effect special effect; the device also includes:

and the second display module is used for displaying the action evaluation information of each user action and/or special effect information corresponding to the action evaluation information of each user action to a display interface of a corresponding humanoid standard action picture after the action evaluation information of each user action is determined.

In an alternative embodiment, the target video generation module, when adding the motion evaluation information of each user motion to the corresponding video frame image in the user video, is specifically configured to:

In an optional embodiment, the video capture module is further configured to:

before playing music, it is determined that the user is within the video capturing range.

In an alternative embodiment, the apparatus further comprises:

and the target video publishing module is used for receiving the target video publishing operation of the user after the target video is generated, and publishing the target video to the video publishing platform according to the publishing operation.

In an alternative embodiment, the apparatus further comprises:

and the rephotograph module is used for regenerating the target video based on the video recording resource after the target video is generated and when the rephotograph triggering operation of the user is received.

In an optional implementation manner, the recording resource obtaining module is specifically configured to:

In a third aspect, the present disclosure provides an electronic device comprising a memory and a processor;

the memory has stored therein computer program instructions;

a processor for reading computer program instructions to perform the video generation method as shown in the first aspect of the present disclosure or any one of the optional embodiments of the first aspect.

In a fourth aspect, the present disclosure provides a computer-readable storage medium having computer program instructions stored therein, which when executed by a processor, implement the video generation method shown in the first aspect of the present disclosure or any one of the optional embodiments of the first aspect.

The technical scheme provided by the disclosure has the following beneficial effects: according to the video generation method, the video generation device, the electronic equipment and the storage medium, the user video is collected in the music playing process, the human-shaped standard motion picture is displayed to the user when the user video is played to the playing node, so that the user can make corresponding motion according to the picture, motion evaluation information of the user is obtained by comparing the user motion with the standard motion, and the target video is generated according to the video recording resource, the user video and the motion evaluation information. According to the scheme, in the video recording process, the user can make the action according to the standard action in the displayed picture, the recording of the video with the dance action is completed, the participation sense and the use experience of the user can be effectively improved, more abundant video recording modes are provided for the user, and the needs of the user are better met.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present disclosure, the drawings used in the description of the embodiments of the present disclosure will be briefly described below.

Fig. 1 is a schematic flow chart of a video generation method provided in an embodiment of the present disclosure;

fig. 2a is a schematic diagram of an interface for receiving a video recording trigger operation according to an example of the present disclosure;

fig. 2b is a schematic diagram of an interface for receiving a video recording trigger operation according to another example of the present disclosure;

FIG. 3a is a schematic diagram illustrating a manner of displaying a standard human-shaped motion picture according to an example of the disclosure;

FIG. 3b is a schematic diagram illustrating a manner of displaying a standard human-shaped motion picture according to another example of the disclosure;

FIG. 4 is a schematic view of a music selection interface in an example of the present disclosure;

fig. 5 is a schematic illustration showing a video frame image in a target video in an example of the present disclosure;

FIG. 6 is a schematic flow chart diagram of a video generation method in one example of the present disclosure;

fig. 7 is a schematic structural diagram of a video generation apparatus provided in an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of a terminal device provided in an embodiment of the present disclosure.

Detailed Description

Reference will now be made in detail to the embodiments of the present disclosure, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of illustrating the present disclosure and are not to be construed as limiting the present invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

The following describes the technical solutions of the present disclosure and how to solve the above technical problems in detail with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present disclosure will be described below with reference to the accompanying drawings.

Fig. 1 shows a schematic flowchart of a video generation method provided in an embodiment of the present disclosure, and as shown in fig. 1, the method may be specifically executed by a terminal device, and the method mainly may include:

step S110: when a video recording triggering operation of a user is received, video recording resources are obtained, and the video recording resources comprise music and human-shaped standard action pictures corresponding to all playing nodes of the music;

it should be noted that the specific form of the video recording triggering operation of the user is not limited, and the video recording triggering operation may be configured in the corresponding application program according to needs, and specifically may include, but is not limited to, a triggering action of a specified position on a user interface of the application program, a video recording voice instruction of the user, and the like.

For example, in one possible implementation manner, a video recording key may be set on a user interface of an application program in the terminal device, for example, a virtual key, named "dance video shooting" or "personalized video shooting", displayed on the application user interface, in an example shown in fig. 2a, when the user triggers a key region corresponding to "dance video shooting", the terminal device receives a video recording triggering operation of the user.

In another possible implementation manner, a search key or a voice key or the like may be set on a user interface of an application program in the terminal device, and a user may set the search key or the voice key or the likeTo interact with the device by searching or triggering a voice button. In the example shown in FIG. 2b, a user may navigate through a search at an application interfaceAfter keywords (the keywords can be configured according to needs) such as 'dance video recording' or 'personalized video recording' are input into the area, clicking is carried outThe button can be used for carrying out video recording triggering operation and also can be used for clicking a voice buttonAnd (3) speaking a video recording instruction, such as 'recording personalized video' or 'recording dance video', and the like, wherein the voice instruction of the user is the video recording triggering operation of the user.

The terminal equipment acquires video recording resources required by video recording after receiving video recording triggering operation of a user, each playing node of music and a human-shaped standard action picture corresponding to each playing node can be pre-configured, and each playing node corresponds to one action picture. In practical application, the corresponding relation between the human-shaped standard motion picture and the playing node can be configured according to the temperament or other music characteristics of each piece of music.

The human-shaped standard motion picture is a picture containing a virtual character form, the virtual character form has a pre-configured standard motion, and the standard motion is a motion required to be completed by a user.

Step S120: playing music, collecting a user video in the playing process, and displaying a corresponding human-shaped standard action picture when the user video is played to each playing node;

when music begins to be played, a shooting interface is started by starting a camera of the terminal equipment, shooting is started, and user videos are collected. When the pictures are played to a playing node, the human-shaped standard action pictures corresponding to the playing node are displayed on a shooting interface, so that a user can make dance actions according to the standard actions in the pictures to acquire images of video frames with the dance actions. As shown in fig. 3a, after music is played, the camera is controlled to be turned on, a user image is started to be shot, a user video is recorded, when the music is played to a music node, a human-shaped standard motion picture P is displayed on a shooting interface, and the user needs to complete a corresponding dance motion according to the motion in the picture.

In practical application, the user can also control the pause of music according to needs, and when the music is paused, the user can configure the video recording to pause or continue the recording.

The specific mode for displaying the human-shaped standard motion picture can be configured according to actual needs. For example, in an alternative embodiment, the picture may be fixedly displayed at a pre-configured position in the user interface, or the picture may be controlled to move in the shooting interface according to a preset movement track, for example, the picture P is controlled to move from the bottom of the terminal device to a designated position in the shooting interface according to a preset track and then disappear.

In practical application, a user needs a certain reaction time from seeing the human-shaped standard motion picture to making a corresponding motion, so that the human-shaped standard motion picture corresponding to the node can be played at a preset time according to the actual playing time of the playing node. The set time may be configured according to actual needs, for example, may be set to 0.5 second, and then play the corresponding motion picture 0.5 second before each playing node, and may also be set to the average response time of the human brain, or may also be set according to the average response time of statistics by counting the time from when different users see the picture to make a response. It is understood that the display duration of the humanoid standard motion picture can be configured as required.

In practical application, different human-shaped standard motion pictures corresponding to different playing nodes may be displayed on the same interface at the same time according to different configured display modes, display time and other factors of the human-shaped standard motion pictures. In the example shown in fig. 3b, the human-shaped standard motion picture can move from the bottom of the interface to the top of the interface along the preset estimation and disappear, and before the picture above the interface disappears, the human-shaped standard motion picture corresponding to the next playing node is also shown, so that the user is provided with sufficient time for doing the motion.

In addition, in practical application, when configuring the human-shaped standard action pictures, two or more sets of human-oriented standard action picture groups can be configured according to different male/female users, and when a user registers an application program account or sets an application program, the user can also provide pictures which are displayed to meet the requirements of the user better by acquiring the gender or other related information of the user, or when a video is recorded, picture type options are provided for the user. For example, a female user can be provided with a human-shaped standard motion picture of the type that the human shape is a skirt and the like in the picture.

Step S130: determining action evaluation information of each user action according to the matching degree of the user action in the video frame image corresponding to each playing node in the user video and the standard action in the corresponding humanoid standard action picture;

the matching degree may be a similarity between the user action and the standard action. The specific form of the action evaluation information may be configured as required, and may be, for example, an action score, such as a specific score of 0-100, or an evaluation result, such as one of a bad, general, good, very good, and so on. In an alternative embodiment, it may be configured that the action evaluation information is perfect (perfect) when the similarity is between 95% and 100%, the action evaluation information is good (very good) when the similarity is between 90% and 95%, the action evaluation information is good (good) when the similarity is between 80% and 90%, the action evaluation information is OK (OK) when the similarity is between 70% and 80%, and the action evaluation information is miss (miss) when the similarity is lower than 70%.

In practical application, according to different retention times of user actions, a video frame image corresponding to each playing node in the collected user video may be a frame image or a multi-frame image. For each playing node, when determining the action evaluation information of the user action according to the matching degree of the user action in the video frame image corresponding to the playing node and the standard action corresponding to the playing node, the action evaluation information corresponding to each frame image can be obtained according to the matching degree of the user action in any frame image corresponding to the playing node or each frame image corresponding to the playing node, and then the action evaluation information of the user action corresponding to the playing node is determined based on the action evaluation information corresponding to each frame image. For example, the motion evaluation information of the user motion corresponding to the playback node may be obtained by integrating the user motion evaluation information corresponding to each frame image, or the best motion evaluation information among the motion evaluation information corresponding to all the frame images corresponding to the playback node may be used as the motion evaluation information of the user motion corresponding to the playback node.

The recognition of the user action and the specific manner of determining the matching degree between the user action and the standard action may be implemented by using the prior art, for example, the user action in the image may be recognized based on depth information of the image or joint point information of a human body, the matching degree may be determined based on key point information in the user action and key point information in the standard action, or the matching degree between the user action and the standard action is obtained by obtaining a neural network based on training, which is not described in detail herein.

Step S140: and generating a target video according to the video recording resource, the user video and the action evaluation information of each user action.

According to the video generation method, the user video is collected in the music playing process, the human-shaped standard action picture is displayed for the user, so that the user can make corresponding actions according to the picture, action evaluation information of the user is obtained by comparing the user action with the standard action, and the target video can be generated according to the video recording resource, the user video and the action evaluation information. Through the scheme, the recording of the video with the dance actions of the user is realized, richer video recording modes are provided for the user, the participation sense and the use experience of the user can be effectively improved, and the needs of the user are better met. In addition, by generating the action evaluation information of each user, the user can know whether the own action meets the standard or not based on the evaluation information, and the use perception of the user is further improved.

In an optional embodiment of the present disclosure, when a video recording trigger operation of a user is received, acquiring a video recording resource may include:

In practical application, when a video recording triggering operation of a user is received on a user interface of an application program, music selection can be provided for the user by displaying a music selection interface, the user can perform music selection operation through the interface, the operation is used for indicating music selected by the user, and therefore the application program can obtain corresponding video recording resources according to the selection operation of the user. Through the scheme, the user can select music according to own preference, and the use perception of the user is further improved.

The music selection interface is a user interface for a user to select music, and the specific form of the interface can be configured according to needs. For example, in an alternative embodiment, the names of all selectable music may be displayed in a list, and the user may click or otherwise select the name of a music in the list to complete the music selection operation; the music type can also be displayed in the interface, and after the user selects one music type, the names of all the music in the type are displayed to the user for the user to select. In another alternative embodiment, a music search option may be provided for the user in the music selection interface, and the corresponding music search result may be presented to the user for selection according to a search instruction (a search keyword or a voice search instruction, etc.) of the user.

In an example shown in fig. 4, when a video recording triggering operation of the user is received, music names (e.g., music 1, music 2, etc. shown in the figure) may be presented to the music selection interface in the form of a music list, and the user may select music from the list according to a preference, so as to better improve the user experience.

In the embodiment of the present disclosure, the specific display content (such as a song name list) in the music selection interface may be content that is obtained from the server and is already stored locally by the terminal device, or may be content that is obtained from the server after receiving a video recording trigger operation of the user. Similarly, the video recording resource may be a resource that has been acquired from the server and stored locally, or a video recording resource that is acquired to the server after receiving the music selection operation.

In practical application, in order to improve response efficiency to user operation, so that a user can record videos in an offline state and experience of the user is improved, specific display contents and video recording resources in a music selection interface can be selected as resources which are acquired from a server and stored locally.

In practical application, if a user performs a video recording triggering operation when playing a certain video or entering a playing interface of a certain video, it indicates that the user is likely to want to perform video recording with music corresponding to the current video, and therefore, the user can directly perform video recording based on the video recording resource corresponding to the current video, so that the user can enter video recording quickly.

It can be understood that, when the video recording triggering operation is an operation received through the video playing interface, the video recording resource may also be determined by using the above-described manner of displaying the music selection interface.

In an optional embodiment of the present disclosure, before playing the music, the method may further include:

it is determined that the user is within the video capture range.

In order to secure the effect of the target video, before starting to play music, it may be first determined whether the user is within the video photographing range, and when it is determined that the user is within the photographing range, photographing may be started again so that the user appears in the video frame image. When the user is not in the shooting range, the user can be prompted in a voice prompt or text prompt mode so as to enable the user to enter the shooting range.

In practical applications, the music may be automatically played after the video recording resource is determined, or the music may be played again after the video recording resource is determined and a trigger action of the user to start recording is received. No matter which mode is adopted, before the real playing is started, the judgment of the user in the video shooting range can be determined, so that the video effect of the target video is ensured, and the satisfaction degree of the user is improved.

In an optional embodiment of the present disclosure, the displayed corresponding standard action picture may be a humanoid standard action picture with a transparency of a first transparency;

generating the target video according to the video recording resource, the user video, and the action evaluation information of each user action may include:

and generating a target video according to the music in the video recording resource, the human-shaped standard action picture which corresponds to each playing node of the music and has the second transparency, the user video and the action evaluation information of each user action.

In practical applications, the first transparency may be selected to be zero, i.e. completely opaque, and the second transparency may be set to a value greater than zero, e.g. may be set to 50%.

The humanoid standard action picture is a picture for guiding a user to finish dance actions, and the lower the transparency is, for example, the completely opaque is, the more clearly the user can see the standard action, so that the better guiding effect is achieved. In the generated target video, if the transparency of the humanoid standard action picture is too low, the humanoid standard action picture is likely to shield the user, the effect of the target video is affected, and the satisfaction degree of the user is reduced.

In an optional embodiment, the second transparency is optionally set to be less than 100%, so that a user can know whether own actions are standard or not according to the human-shaped standard action picture in the target video and the actions finished by the user, and the use experience of the user is further improved.

In one example shown in fig. 5, a video frame image of one frame in the target video is shown, in which a user action H, a human-shaped standard action picture P, and special effect information (described later in detail) and the like are shown. The standard human-shaped motion picture P shown in fig. 5 is the same picture as the standard human-shaped motion picture P shown in the video recording process in fig. 3a, the transparency of the picture in fig. 3a is a first transparency, and the transparency of the picture in fig. 5 is a second transparency, as can be seen from the two pictures, the second transparency is lower than the first transparency, the transparency in fig. 3a is lower, a clearer indication motion can be shown for a user in the video recording process, and the transparency in fig. 5 is higher, so that the human-shaped standard motion picture can be effectively prevented from shielding the user image. It can be understood that fig. 5 is only an example, and the effect in practical application is more obvious, thereby improving the use experience of the user.

It should be noted that the standard human shape motion picture displayed in the music playing process and the standard human shape motion picture based on which the catalog video is generated correspond to the standard human shape motion picture in the video recording resource, and different from the standard human shape motion picture, only when the picture is displayed and the target video is obtained based on the picture, in order to better meet the actual requirement, improve the user experience, and adjust the transparency of the picture.

It can be understood that, in practical application, if the transparency of the human-shaped standard motion picture in the configured video recording resource is the first transparency, the transparency does not need to be adjusted during display, and only the transparency of the human-shaped standard motion picture of the video recording resource needs to be adjusted when the target video is generated; if the transparency of the human-shaped standard action picture in the configured video recording resource is the second transparency, after the resource is obtained, the transparency of the human-shaped standard action picture needs to be adjusted to the first transparency before the obtained picture is displayed, and the transparency does not need to be adjusted when the target video is generated; if the transparency of the standard human-shaped motion picture in the configured video recording resource is neither the first transparency nor the second transparency, after the resource is obtained, the transparency of the standard human-shaped motion picture needs to be adjusted to the first transparency before the obtained picture is displayed, and the standard human-shaped motion picture in the resource needs to be adjusted first when the target video is generated.

In an optional embodiment of the present disclosure, generating the target video according to the video recording resource, the user video, and the action evaluation information of each user action includes:

Through the mode, the human-shaped standard action pictures in the video recording resources are added to the collected video frame images when the user completes the same action, so that the user can see the standard action and the action of the user in the same video frame image when playing the target video, the completion condition of the action of the user is known, and the user experience is improved. In the example shown in fig. 5, after obtaining the target video, the user can play the target video, and in the target video playing interface, when the music is played to the playing node, the user can simultaneously see the standard motion in the standard motion picture P and the motion H completed by the user in the corresponding video frame image in the user video.

Through the scheme, the video frame image corresponding to each playing node in the generated target video carries the corresponding action evaluation information, so that a user can know the effect of own dance action through the playing target video, know which actions are completed better, and which actions need to be improved.

As can be seen from the foregoing description, the video frame image corresponding to each playback node may be an image of multiple frames, and when the action evaluation information of each user action is added to the corresponding video frame image, the evaluation information may be added to any corresponding video frame image, or may be added to each frame image, and when the action evaluation information is obtained based on one frame image of the multiple frame images, the action evaluation information may also be added to the one frame image.

In an optional embodiment of the present disclosure, the video recording resource may further include special effect information corresponding to the action evaluation information, where the special effect information includes an animation special effect and/or a sound effect special effect; after the determining the action evaluation information of each user action, the method may further include:

After the action evaluation information of each user action is determined, the action evaluation information of the current user action can be displayed on the display interface of the corresponding human-shaped action picture, and the corresponding special effect information can be displayed for the user according to the action evaluation information, so that the user can know the completion effect of the self action according to the action evaluation information and/or the special effect information.

The specific form of the special effect may be configured as required, for example, a flower special effect, an animation special effect, a sound special effect, and the like, and different pieces of motion evaluation information correspond to different pieces of special effect information. As shown in fig. 3a, after determining the action evaluation information (e.g., good) of the current user action based on the matching degree between the standard action in the human-shaped standard action picture P and the user action made by the user according to the picture, the action evaluation information "good" and the animation special effect shown in fig. 3a may be displayed on the display interface of the picture P, and the user can better know the completion condition of the own action through the interface, and in addition, the sound effect special effect "good" may be played at the same time.

In an optional embodiment of the present disclosure, adding the motion evaluation information of each user motion to a corresponding video frame image in the user video may include:

correspondingly, generating the target video according to the video recording resource and the user video added with the action evaluation information may include:

By adopting the scheme, the user can see the action evaluation information and the special effect information in the generated target video, the satisfaction degree of the user can be effectively improved, and the content of the target video is enriched.

In an optional embodiment of the present disclosure, after determining the action evaluation information of each user action, the method may further include:

determining comprehensive evaluation information of the user video according to the action evaluation information of all user actions;

Accordingly, in an optional embodiment of the present disclosure, after determining the comprehensive evaluation information of the user video, the method may further include:

After the music is played, the user can know the completion condition of all standard actions of the user in the video recording process by displaying the comprehensive evaluation information of the user video.

By adding the comprehensive evaluation information into the target video, a user can obtain the target video containing the evaluation information of each action and the comprehensive evaluation information at the same time, and the user can know the completion condition and the comprehensive completion condition of each action by playing the target video.

The form of the comprehensive evaluation information may be configured as needed, and may be, for example, a comprehensive score or a comprehensive evaluation result. In an alternative embodiment, the motion evaluation information of each user motion may be a motion score, and the comprehensive evaluation information may be calculated by using a weighted average method based on the motion score of each user motion.

In practical application, special effect information can be configured for the comprehensive evaluation information, different comprehensive evaluation information corresponds to different special effect information, and when the comprehensive evaluation information is determined, the corresponding special effect information is displayed for a user. It is to be understood that the special effect information corresponding to the comprehensive evaluation information may be configured in the same manner as the special effect information corresponding to each user action, or may be configured in a different manner.

In an optional embodiment of the present disclosure, after generating the target video, the method may further include:

receiving target video release operation of a user;

and according to the publishing operation, publishing the target video to a video publishing platform.

For example, for a jittering application, the video publishing platform may be a jittering video publishing platform, may also be a third-party publishing platform, and may also be other applications, for example, a user may share a target video to others through other applications.

It can be understood that, if the platform corresponds to the application program itself, the user may only need to perform the publishing trigger operation, such as clicking a target video publishing button, and the like, and if the platform corresponds to another publishing platform or application program, the user may display platform options and/or application options for the user after performing the publishing trigger operation, and the user selects a specific platform or application program that the user wants to publish from the options.

In an optional embodiment of the present disclosure, after the generating the target video, the method further includes:

After the target video is generated, a rephotograph option, such as a rephotograph button, can be provided for the user on the user interface, and the user can record the video again and regenerate the video based on the same video recording resource through the rephotograph option.

In an optional implementation manner, a rephotography option may be displayed on the comprehensive evaluation information display interface, so that the user can determine whether to re-record according to the comprehensive evaluation information.

In another optional embodiment, a target video playing option may also be provided for the user at the same time on the user interface, the user may trigger the option to play the target video, determine whether to re-record by playing the target video, provide a rephoto option for the user on the playing completion page, and if re-recording is required, trigger the rephoto mode selection on the page by the user.

It should be understood that the above two embodiments are only described as examples, and do not limit the manner in which the user's rephotography triggering operation is received.

It should be noted that the user interfaces referred to in the embodiments of the present disclosure are all display interfaces on the application program, and the user interfaces can receive operations of the user. For the user interfaces corresponding to different operations, in practical application, the same user interface may be configured as required, or different user interfaces may be configured.

The video generation method provided in the embodiments of the present disclosure is further described below with reference to a specific example. According to the scheme of the embodiment of the disclosure, the target video generated by the scheme of the embodiment of the disclosure is a video with user actions, in this example, in order to describe the video more vividly, the target video is called a dance video, and a video recording resource required when the target video is generated is called a dance video recording resource. The action evaluation information of the user action in this example is an action score, which may range from 0 to 100, and the higher the degree of matching of the user action with the standard action, the higher the action score.

Fig. 6 is a schematic flow chart of the video generation method in this example, which may be mainly divided into three main parts, i.e., dance video recording resource production, dance video recording resource acquisition, and dance video generation.

Dance video recording resource production: this section is a preparation phase implemented by embodiments of the present disclosure for making video recording resources required for generating dance videos. As shown in fig. 6, in practical application, dance video recording resources may be configured according to requirements of practical application, where the dance video recording resources may include background music (music that a user may select when recording a video), special effect music (i.e., music special effect), animation special effect, resource pictures (i.e., humanoid standard motion pictures), and the like, a corresponding resource picture is configured for each background music, and a corresponding relationship between each resource picture and a playing node is configured, that is, which resource picture is displayed when music is played, and a corresponding relationship between different special effect information and an action score needs to be configured, that is, how much the action score is or in which range, what special effect information should be played. And after the dance video recording resource is manufactured, uploading the dance video recording resource to a server.

Obtaining dance video recording resources: different users can send dance video recording resource acquisition requests to the server through application programs installed on respective terminal equipment, namely clients, and the server sends dance video recording resources to the clients after receiving the requests.

And (3) generation of a dance video: when the client receives dance video recording triggering operation of a user, for example, when the user clicks a 'dance video recording' button on a user interface of an application program, the client can display dance video recording resources downloaded from the server to the user in a song name, namely a music name form, and the user can select a certain song to enter a recording mode, namely, start a camera and enter a video recording page.

Before starting to play music selected by a user, whether the user stands in a lens or not can be firstly identified, namely whether the user is in a shooting range or not is judged, specifically, whether the user is in the shooting range or not can be determined by continuously shooting images and identifying whether the user exists in the images, if not, the user can be prompted to move to enter the shooting range, if the user is in the shooting range, the music can be started to be played, and meanwhile, a user video starts to be recorded.

In the process of playing music, when the music is played to a playing node each time, displaying the humanoid standard action picture corresponding to the node to a user, and obtaining an action score of the user action according to the matching degree of the displayed action and the action made by the user in the corresponding video frame image. In this example, the standard motion picture is displayed in a mode that the transparency is zero, and the corresponding special effect information is displayed according to the motion score, such as displaying score animation, playing score sound effect, and when the music playing is finished, ending the recording of the user video. After the recording is finished, the total score of the user video, i.e. the comprehensive information, may be obtained based on the action score of each user action, and for example, the total score may be calculated in a weighted average manner or other preconfigured manners. And finally, generating the dance video of the user according to the dance video recording resource selected by the user, the recorded user video, the action score of each user action, the corresponding special effect information and the total score, wherein when the dance video is generated, the transparency of the humanoid standard action picture can be adjusted to be 50%, the total score can be positioned in a first frame image of the dance video, can also be positioned in a last frame image, can also be newly added with one frame image, and the total score is added into the newly added image.

After the dance video is generated, the user can play the video, the dance action of the user cannot be blocked due to the fact that the human-shaped standard action picture in the generated dance video is semitransparent, the user can see the corresponding standard action at the same time, the action of the user can be compared with the standard action, and the completion condition of the dance action of the user can be known at any time by combining action scores and special effect information. The user may also learn of the overall completion based on the total score.

In addition, after the dance video is generated or played, the user can also select whether to re-record the video or not based on the relevant information of the video (such as action score, total score and the like of each user action), or release the dance video to a relevant video release platform and the like as required. Dance videos issued by different users are recorded in the video issuing platform, the users can watch the dance videos issued by the different users through the video issuing platform, and challenges can be initiated to authors of the dance videos, namely videos are recorded based on music adopted by the authors, interaction among the different users can be further improved, and the enthusiasm of the users for participating in video recording is aroused.

Based on the same principle as the method shown in fig. 1, a video generation apparatus is also provided in the embodiments of the present disclosure, as shown in fig. 7, the video generation apparatus 400 may include a recording resource obtaining module 410, a video collecting module 420, an evaluation information determining module 430, and a target video generating module 440. Wherein:

a recording resource obtaining module 410, configured to obtain video recording resources when a video recording trigger operation of a user is received, where the video recording resources include music and a humanoid standard motion picture corresponding to each playing node of the music;

the video acquisition module 420 is used for playing music, acquiring a user video in the playing process, and displaying a corresponding human-shaped standard action picture when the user video is played to each playing node;

the evaluation information determining module 430 is configured to determine action evaluation information of each user action according to a matching degree between the user action in the video frame image corresponding to each playing node in the user video and the standard action in the corresponding humanoid standard action picture;

and the target video generating module 440 is configured to generate a target video according to the video recording resource, the user video and the action evaluation information of each user action.

The video generation device of the embodiment collects the user video in the music playing process, displays the human-shaped standard action picture to the user, enables the user to make corresponding actions according to the picture, obtains action evaluation information of the user by comparing the user action with the standard action, and generates the target video according to the video recording resource, the user video and the action evaluation information. Through the scheme, the recording of the video with the dance actions of the user is realized, richer video recording modes are provided for the user, the participation sense and the use experience of the user can be effectively improved, and the needs of the user are better met. In addition, by generating the action evaluation information of each user, the user can know whether the own action meets the standard or not based on the evaluation information, and the use perception of the user is further improved.

It is to be understood that the above modules of the video generating apparatus in the embodiment of the present disclosure have functions of implementing corresponding steps in the video generating method shown in fig. 1, and the functions may be implemented by hardware or by hardware executing corresponding software, where the hardware or software includes one or more modules corresponding to the above functions. The modules can be realized independently or by integrating a plurality of modules. For the functional description of each module of the video generation apparatus, reference may be specifically made to the corresponding description in the video generation method shown in fig. 1, and details are not repeated here.

In an optional embodiment of the present disclosure, the video capture module 420 is specifically configured to, when displaying the corresponding standard motion picture:

displaying a human-shaped standard action picture with the first transparency;

the target video generation module 440 is specifically configured to:

In an optional embodiment of the present disclosure, the target video generation module 440 may be specifically configured to:

In an optional embodiment of the present disclosure, when the target video generation module 440 generates the target video according to the video recording resource, the user video, and the action evaluation information of each user action, it may specifically be configured to:

In an embodiment of the present disclosure, the evaluation information determining module 430 may further be configured to:

after determining the action evaluation information of each user action, determining comprehensive evaluation information of the user video according to the action evaluation information of each user action;

when the target video generation module 440 generates the target video according to the video recording resource and the user video to which the action evaluation information is added, the target video generation module may be specifically configured to:

In an embodiment of the present disclosure, the video generating apparatus 400 may further include:

and the first display module is used for displaying the comprehensive evaluation information after the music playing is finished.

In the embodiment of the disclosure, the video recording resource further comprises special effect information corresponding to the action evaluation information, wherein the special effect information comprises an animation special effect and/or a sound effect special effect; the video generation apparatus 400 may further include:

It is understood that the first display module and the second display module may be the same module or different modules.

In the embodiment of the present disclosure, when adding the motion evaluation information of each user motion to the corresponding video frame image in the user video, the target video generation module 440 may specifically be configured to:

correspondingly, when the target video generation module 440 generates the target video according to the video recording resource and the user video to which the action evaluation information is added, the target video generation module may be specifically configured to:

In an embodiment of the present disclosure, the video capture module 420 may be further configured to:

and the rephoto module is used for regenerating the target video through the video acquisition module 420, the evaluation information determination module 430 and the target video generation module 440 based on the video recording resource when receiving the rephoto trigger operation of the user.

In an embodiment of the present disclosure, the recording resource obtaining module 410 may be specifically configured to:

It can be understood that the actions performed by the modules in the video generation apparatus in the embodiments of the present disclosure correspond to the steps in the video generation method in the embodiments of the present disclosure, and for the detailed functional description of the modules of the video generation apparatus, reference may be specifically made to the description in the corresponding video generation method shown in the foregoing, and details are not described here again.

Based on the same principle as the video generation method in the embodiments of the present disclosure, an electronic device is further provided in the embodiments of the present disclosure, where the electronic device includes a memory and a processor, where the memory stores computer program instructions, and the processor is configured to read the computer program instructions to execute the video generation method shown in any one of the embodiments of the present disclosure.

Based on the same principle as the video generation method of the embodiments of the present disclosure, a computer-readable storage medium is also provided in the embodiments of the present disclosure, in which computer program instructions are stored, and when the computer program instructions are executed by a processor, the processor implements the video generation method shown in any embodiment of the present disclosure.

The embodiment of the disclosure also provides a terminal device, as shown in fig. 8. The terminal device 2000 may include, but is not limited to: a processor 2001, a memory 2002, a communication bus 2003 for connecting the different components of the device to enable communication between the different components. The memory 2002 may store computer programs and data, and the processor 2001 may implement the video generation method in the embodiments of the present disclosure by calling the computer programs in the memory 2002 to perform corresponding actions and processes. The structure of the terminal device 2000 shown in the figure does not constitute a limitation on the embodiments of the present disclosure.

Terminal device 2000 may also include a display 2004. The processor 2001 may display a user interface, prompt information, or interactive information with an end user, as desired or capable of being displayed, to the user via the display 2004 as the actions or processes are performed.

The processor 2001 may be a CPU, general purpose processor, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 2001 may also be a combination of computing functions, e.g., comprising one or more microprocessors, DSPs and microprocessors, and the like.

The communication bus 2003 may include a path to transfer information between the above components. The bus 2003 may be a PCI bus or an EISA bus, etc. The bus 2003 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 8, but this is not intended to represent only one bus or type of bus.

The memory 2003 may be, but is not limited to, a ROM or other type of static storage device that can store static information and instructions, a RAM or other type of dynamic storage device that can store information and instructions, an EEPROM, a CD-ROM or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.

Terminal device 2000 may also include input/output component 2005 for enabling input/output of information and user interaction with the device via input/output component 2005.

In practical applications, the input/output component 2005 can be configured according to actual needs, and can include, but is not limited to, a keyboard, a mouse, a touch screen, an audio component, a video component, and the like. The audio/video components may be configured for input and/or output of audio/video signals of the device. The audio components may include, but are not limited to, speakers, microphones, etc., and the video components may include, but are not limited to, cameras, video interfaces (HDMI, VGA, and/or DVI interfaces), etc

It is understood that the input/output components 2005 can be used alone or in combination to process information, for example, when a music playing instruction of a user is received through a touch screen, music is played through an audio component.

The terminal device 2000 can further include a communication component 2006, the communication component 2006 being configured to enable communicative interaction between the terminal device 2000 and other devices (e.g., terminal device, storage device). The communication component 2006 can include, but is not limited to, a wired communication component (e.g., a mobile network communication unit such as 3G, 4G, 5G, etc.), a wireless communication component (e.g., a bluetooth, WIFI communication unit), a USB communication component, an audio component, a video component, and the like.

The terminal device 2000 may further include a power management module 2007, and the power management module 2007 may be configured to supply power to the device, convert power of the device, manage charging and discharging of the power, and the like, and may be further configured with a charging interface.

It should be noted that the terminal device of the embodiment of the present disclosure may be implemented as, but not limited to, a smart phone, a smart television, a Personal Digital Assistant (PDA), a tablet computer, a desktop computer, a portable terminal device (e.g., a portable computer), an in-vehicle device, and the like.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method of video generation, comprising:

when a video recording triggering operation of a user is received, video recording resources are obtained, wherein the video recording resources comprise music and human-shaped standard action pictures corresponding to all playing nodes of the music;

playing the music, collecting a user video in the playing process, and displaying a corresponding human-shaped standard action picture when the user video is played to each playing node;

2. The method according to claim 1, wherein the corresponding standard motion picture displayed is a humanoid standard motion picture with a transparency of a first transparency;

the generating a target video according to the video recording resource, the user video and the action evaluation information of each user action comprises:

and generating the target video according to the music, the human-shaped standard action picture which corresponds to each playing node of the music and has a second transparency, the user video and the action evaluation information of each user action, wherein the second transparency is greater than the first transparency.

3. The method according to claim 1 or 2, wherein the generating a target video according to the video recording resource, the user video and the action evaluation information of each user action comprises:

adding the human-shaped standard action picture in the video recording resource into a corresponding video frame image in the user video;

and generating the target video according to the music, the user video added with the human-shaped standard action picture and the action evaluation information of the user action.

4. The method according to claim 1 or 2, wherein the generating a target video according to the video recording resource, the user video and the action evaluation information of each user action comprises:

adding the action evaluation information of each user action to a corresponding video frame image in the user video;

and generating the target video according to the video recording resource and the user video added with the action evaluation information.

5. The method of claim 4, wherein after determining the action rating information for each user action, further comprising:

generating a target video according to the video recording resource and the user video added with the action evaluation information, wherein the method comprises the following steps:

and generating the target video according to the video recording resource, the user video added with the action evaluation information and the comprehensive evaluation information.

6. The method of claim 5, further comprising:

7. The method of claim 4, wherein the video recording resource further comprises effect information corresponding to the action evaluation information, the effect information comprising an animation effect and/or a sound effect;

and displaying the action evaluation information of each user action and/or the special effect information corresponding to the action evaluation information of each user action to a display interface of a corresponding humanoid standard action picture.

8. The method according to claim 7, wherein the adding the motion evaluation information of each user motion to the corresponding video frame image in the user video comprises:

and generating the target video according to the video recording resource and the user video added with the action evaluation information and the special effect information.

9. The method of claim 1 or 2, wherein before playing the music, further comprising:

determining that the user is within a video capture range.

10. The method according to claim 1 or 2, wherein after the generating the target video, further comprising:

receiving target video publishing operation of the user;

according to the release operation, the target video is released to a video release platform; or,

and when the rephoto triggering operation of the user is received, regenerating the target video based on the video recording resource.

11. The method according to claim 1 or 2, wherein the acquiring a video recording resource when receiving a video recording trigger operation of a user comprises:

when receiving a video recording triggering operation of the user, controlling to display a music selection interface;

acquiring music selection operation of the user through the music selection interface;

and acquiring the video recording resource according to the music selection operation.

12. The method according to claim 1 or 2, wherein the acquiring a video recording resource when receiving a video recording trigger operation of a user comprises:

and when the video recording triggering operation of the user is received through a video playing interface, acquiring a video recording resource corresponding to a video currently played by the video playing interface.

13. A video generation apparatus, comprising:

the recording resource acquisition module is used for acquiring video recording resources when a video recording triggering operation of a user is received, wherein the video recording resources comprise music and human-shaped standard action pictures corresponding to each playing node of the music;

the video acquisition module is used for playing the music, acquiring a user video in the playing process, and displaying a corresponding human-shaped standard action picture when the user video is played to each playing node;

and the target video generation module is used for generating a target video according to the video recording resource, the user video and the action evaluation information of each user action.

14. The apparatus of claim 13, wherein the video capture module, when displaying the corresponding standard motion picture, is specifically configured to:

displaying a human-shaped standard action picture with the first transparency;

the target video generation module is specifically configured to:

and generating a target video according to the music, the human-shaped standard action pictures which correspond to the playing nodes of the music and have the second transparency, the user videos and the action evaluation information of each user action, wherein the second transparency is larger than the first transparency.

15. An electronic device comprising a memory and a processor;

the memory having stored therein computer program instructions;

the processor for reading the computer program instructions to perform the video generation method of any of claims 1 to 12.

16. A computer-readable storage medium, characterized in that the storage medium has stored therein computer program instructions, which when executed by a processor, implement the video generation method of any of claims 1 to 12.