CN117133057A

CN117133057A - Sports counting and illegal action identification method based on human posture recognition

Info

Publication number: CN117133057A
Application number: CN202311209213.0A
Authority: CN
Inventors: 栾博恒; 李雨雨; 易金城; 周河
Original assignee: Godes Hangzhou Intelligent Technology Co ltd
Current assignee: Godes Hangzhou Intelligent Technology Co ltd
Priority date: 2023-09-18
Filing date: 2023-09-18
Publication date: 2023-11-28

Abstract

The invention discloses a physical movement counting and illegal action judging method based on human body gesture recognition, which comprises the following steps: step A, human body identification; step B, screening personnel; step C, detecting repeated actions based on a human body key point detection technology; and D, judging illegal actions. The invention uses the deep learning human body key point detection technology, and calculates the relative position and the relative angle of each limb part according to the position information, thereby judging what action is being performed by the tester, whether the action is in place or not, and whether the violation condition exists or not. Finally, combining the time domain information, correlating the front and rear multi-frame actions, and judging whether the human body accurately completes each standard action. When each action is completed in sequence, the system automatically counts. The invention is automatically adapted to different human body characteristics, and compared with the infrared radio frequency technology, the invention has high counting accuracy.

Description

Physical exercise counting and illegal action distinguishing method based on human body gesture recognition

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a physical exercise counting and illegal action distinguishing method based on human body gesture recognition.

Background

In the process of sports (rope skipping, sit-ups, pull-ups and the like), the traditional manual counting has the conditions of high working intensity and serious human errors.

Aiming at the defects of manual counting, the prior art uses an infrared counting sensor for technology, such as an invention patent named as a combined device of a long rope automatic counting rope swinging device and a jumping stand (application publication number: CN 101757767A), which needs to be installed on site, needs to be adjusted according to the characteristics of different heights, weights and the like of human bodies, and has low counting accuracy.

Disclosure of Invention

The invention aims to solve the technical problems in the prior art, and provides a physical exercise counting and illegal action judging method based on human body gesture recognition, which improves the counting accuracy.

In order to solve the technical problems, the invention adopts the following technical scheme:

the method for discriminating the sports motion count and the illegal actions based on the human body gesture recognition is characterized by comprising the following steps:

step A, human body identification: acquiring real-time video data from a camera, and detecting personnel in the video by adopting a deep learning target detection technology;

step B, personnel screening: calculating a test area, eliminating irrelevant personnel, and reserving athletes;

step C, detecting repeated actions based on human body key point detection technology:

1) Data collection and labeling: collecting an image or video dataset containing repeated actions and labeling the start and end keypoints of each action to identify a complete action cycle;

2) Model training: training the prepared data set by using a convolutional neural network;

3) And (3) key point detection: predicting the position of a key point in each frame in a dataset by using a trained human body key point detection model;

4) And (3) identifying an action cycle: identifying the starting point and the ending point of the action according to the position change of the key point;

5) Action count: calculating the number of actions by the identified action cycle;

step D, judging illegal actions:

1) Feature extraction: extracting features from the key point sequence;

2) Establishing rules: according to the characteristics of the illegal actions, corresponding rules are formulated;

3) Model training and evaluation: training a classification model to judge illegal actions by using a deep learning method, training by using a data set with labels, and evaluating by using test data;

4) And (3) illegal action judgment: judging actions in the image or the video according to the extracted characteristics and the trained classification model, and judging the actions as illegal actions if the conditions of the actions which are inconsistent with the illegal rules or the model predictions are detected;

5) Feedback and warning: for detected violations, alerts, hints and recordings are given.

Further, in step B, there are two ways of obtaining the test area:

in a first mode, a test area is appointed on site; in a second mode, the test equipment is automatically identified by using a deep learning target detection technology, and a test area is automatically generated around the equipment.

Further, the deep learning model adopts a convolutional neural network, and the convolutional neural network comprises a convolutional layer, a pooling layer, a full connection layer, an activation function and a batch normalization layer. The convolution layer slides on the input image by using a group of filters for learning, performs local feature extraction, and each filter performs convolution operation on a region of the input image to generate feature mapping; the pooling layer is used for reducing the space size of the feature mapping, reducing the computational complexity and enabling the network to have translation invariance; the full connection layer is used for connecting the feature mapping extracted by the convolution layer and the pooling layer to the output layer to carry out classification or regression tasks; the activation function introduces nonlinear properties after the convolution layer and the full connection layer, so that the network can learn complex feature mapping; the batch normalization layer normalizes the input of each characteristic channel to zero mean and unit variance, reduces the problem of gradient disappearance, and is helpful for accelerating the training process and improving the stability of the model.

Further, in step D, the classification model selects a lightweight classification model based on a convolutional neural network, and the operation steps are as follows:

s1, importing a software library and data: firstly, a software library is imported, an image data set is loaded, data is preprocessed, and the data is divided into a training set and a testing set.

S2, loading a pre-training model: classification models that have been pre-trained on large-scale image datasets are used as base models.

S3, modifying a model architecture: the pre-training model comprises an output layer for classifying a large number of different categories, the structure of the output layer is modified according to the task so as to adapt to specific classification problems, and the last full-connection layer needs to be replaced by a new full-connection layer with the number matched with the number of classification categories.

S4, freezing part layers: to speed training and prevent losing the feature extraction ability of the classification model, frozen convolutional layers are chosen so that they remain unchanged during training.

S5, training a model: the modified model is trained using a training set, and during the training process, the weights of the model are updated by back propagation so that they can be learned from the training data.

S6, evaluating a model: the performance of the model is evaluated using a test set, typically measured by computing classification accuracy, confusion matrix, and other evaluation metrics.

S7, fine adjustment: when the model performs insufficiently well in practice, it is necessary to fine-tune the parameters of the model or try different data enhancement strategies.

S8, predicting new data: after training is completed, the model is used to classify the new unlabeled image.

Further, in step C1), regular human actions and sports actions are collected, and video actions are also collected from the internet public channel.

Further, in step C2), the training process adjusts the network parameters by minimizing the error between the predicted keypoint locations and the labeling locations so that they can accurately predict keypoints.

Further, in step C, 4), this is achieved by determining whether the location of the specific keypoint passes a threshold.

Further, in step D1), the distance, angle and speed features between the keypoints are used to describe the variation of human motion.

Due to the adoption of the technical scheme, the invention has the following beneficial effects:

the invention uses the deep learning human body key point detection technology to detect the position information of the human body key points: eyes, ears, nose, mouth, shoulders, hands, elbows, hips, knees, feet, etc. According to the position information, the relative position and the relative angle of each limb part are calculated, so that whether the action is in place or not and whether the violation condition exists is judged. Finally, combining the time domain information, correlating the front and rear multi-frame actions, and judging whether the human body accurately completes each standard action. When each action is completed in sequence, the system automatically counts. Thus, the number of repeated movements of the human body in total can be recorded within a period of time.

The whole process has the following characteristics:

1. by adopting modularized operation, a developer can quickly build according to design requirements, so that development efficiency is improved, development period is shortened, and development cost is reduced.

2. Asynchronous parallel processing is adopted, so that computing resources are fully utilized, and the deployment cost is reduced; the detection efficiency is improved, and the response time is shortened.

The invention is automatically adapted to different human body characteristics, and compared with the infrared radio frequency technology, the invention has high counting accuracy.

Drawings

The invention is further described below with reference to the accompanying drawings:

FIG. 1 is a schematic diagram of the key point detection of the present invention;

FIG. 2 is a graph of accuracy versus IR RF technique according to the present invention.

Detailed Description

As shown in fig. 1 and 2, the method for discriminating the sports motion count and the illegal action based on the human body gesture recognition comprises the following steps:

step A, human body identification: real-time video data is acquired from the camera, and a deep learning target detection technology is adopted to detect personnel in the video.

Step B, personnel screening: and calculating a test area, eliminating irrelevant personnel, and reserving athletes. The test area has two acquisition modes: (1) designating a test area on site. (2) The test equipment is automatically identified by using a deep learning target detection technology, and a test area is automatically generated around the equipment.

1) Data collection and labeling: image or video datasets containing repeated actions are collected and the start and end keypoints of each action are annotated to identify a complete action cycle. In order to enable the technology to have better detection performance, adapt to different human body structural characteristics, detect limit actions under special conditions, and collect video actions such as yoga, body building, gymnastics, dance and the like from Internet public channels besides conventional human body actions and sports actions.

2) Model training: the set of prepared data is trained using a convolutional neural network. The training process adjusts the network parameters by minimizing the error between the predicted keypoint locations and the annotation locations so that they can accurately predict keypoints.

3) And (3) key point detection: and predicting the positions of the key points in each frame in the data set by using the trained human body key point detection model.

4) And (3) identifying an action cycle: and identifying the starting point and the ending point of the action according to the position change of the key point. This can be achieved by determining whether the position of a particular key point passes a threshold, for example, if the arm rises to a certain height indicating the start of the action and if the arm falls to the same height indicating the end of the action.

5) Action count: the number of actions is calculated from the identified action periods. Each time a complete action cycle is detected, a count is made.

In general, by human body keypoint detection techniques, the start and end keypoints of an action may be monitored, thereby identifying the period of repeated actions and achieving a count. The method can be applied to scenes needing to count repeated actions, such as fitness, physical training, production lines and the like.

And D, judging the illegal actions, wherein in the conventional action recognition process, the illegal actions need to be recognized and removed. Determining offending actions involves analysis of human gestures, action features, and predefined rules. The method comprises the following specific implementation steps:

1) Feature extraction: features, such as distance, angle, speed, etc., between key points are extracted from the sequence of key points to describe the change in human motion.

2) Establishing rules: and according to the characteristics of the illegal action, corresponding rules are formulated. For example, some actions may be body posture misalignment, hand or foot position misalignment, etc.

3) Model training and evaluation: the rule determination is not accurate enough, and a deep learning method is used for training a classification model to determine the illegal action. Training is performed using the labeled dataset and evaluation is performed using the test data.

4) And (3) illegal action judgment: and judging the action in the image or the video according to the extracted characteristics and the trained model. If a condition that is inconsistent with the violation rules or model predictions is detected, then a violation is determined.

Convolutional neural networks are a deep learning model widely used for computer vision tasks such as image classification, object detection, image segmentation, and the like. The design inspiration is derived from a biological vision system, in particular to a processing mode of visual information of human visual cortex.

The following are the main features and components of the convolutional neural network:

convolution layer: the convolution operation is the core of the convolution neural network. It performs local feature extraction using a set of learnable filters (also called convolution kernels) that slide over the input image. Each filter performs a convolution operation on a small region of the input image to generate a feature map. This helps capture local features in the image, such as edges, textures, etc.

Pooling layer: pooling operates to reduce the spatial size of feature maps, reduce computational complexity, and make the network translationally invariant. Common pooling operations include maximum pooling and average pooling, which take the maximum or average of the local regions, respectively, to reduce the size of the feature map.

Activation function: after the convolutional layer and the fully-connected layer, the activation function introduces non-linear properties that enable the network to learn complex feature mappings. Common activation functions include ReLU (Rectified Linear Unit), sigmoid, and Tanh.

Full tie layer: the full connection layer is used for connecting the feature mapping extracted by the convolution layer and the pooling layer to the output layer for classification or regression tasks.

Batch normalization layer: batch normalization is a regularization technique that helps to speed up the training process and improve the stability of the model. The method normalizes the input of each characteristic channel to zero mean and unit variance, and reduces the gradient vanishing problem.

Convolutional neural network architecture: a stack of multiple convolution layers and pooling layers is typically employed to progressively extract abstract and advanced features. Common convolutional neural network architectures include LeNet, alexNet, VGG, googLeNet, resNet, etc., which perform well on different tasks.

Convolutional neural networks have enjoyed great success in the field of image processing and have found application in other fields as well, such as natural language processing and speech recognition. Its success is due in part to the locality of convolution and pooling operations, and the ability of the deep network to automatically learn the feature representation.

The classification model is a machine learning or deep learning model for classifying input data into different categories or labels. The invention selects a lightweight classification model based on convolutional neural network, which is specially designed for running in resource-constrained environments, such as mobile devices and embedded systems. The following steps are as follows:

s1, importing a software library and data: firstly, importing a necessary software library, loading an image data set, preprocessing data, and dividing the data into a training set and a testing set.

S2, loading a pre-training model: classification models that have been pre-trained on large-scale image datasets are used as base models. This may be achieved by the functionality provided in the deep learning framework.

S3, modifying a model architecture: the pre-training model typically includes an output layer for classifying a number of different classes, the structure of which is modified according to the task to accommodate specific classification problems. The last full-connection layer needs to be replaced with a new full-connection layer with a matching number of classification categories.

S4, freezing part layers: to speed training and prevent losing the feature extraction ability of the classification model, frozen portions of the convolutional layers are selected so that they remain unchanged during training.

S5, training a model: the modified model is trained using the training set. During training, the weights of the model are updated by back propagation so that they can be learned from the training data.

The whole process has the following characteristics:

As shown in fig. 2, the abscissa represents the accuracy rate, and the ordinate represents the number of tests conforming to the accuracy rate. The counting accuracy of the infrared radio frequency is above 98%, and the counting accuracy of the infrared radio frequency counting device is above 99%. The counting accuracy of the invention is superior to that of infrared radio frequency counting. According to the invention, through 5 ten thousand actual scene high-strength tests, the accuracy reaches 99.7%.

The above is only a specific embodiment of the present invention, but the technical features of the present invention are not limited thereto. Any simple changes, equivalent substitutions or modifications and the like made on the basis of the present invention to solve the substantially same technical problems and achieve the substantially same technical effects are included in the scope of the present invention.

Claims

1. The method for discriminating the sports motion count and the illegal actions based on the human body gesture recognition is characterized by comprising the following steps:

2) Model training: training the prepared data set by using a deep learning model;

step D, judging illegal actions:

1) Feature extraction: extracting features from the key point sequence;

3) Model training and evaluation: training a classification model to judge illegal actions by using a deep learning method, wherein the classification model is a machine learning or deep learning model and is used for classifying input data into different categories or labels, training by using a data set with labels and evaluating by using test data;

2. The method for discriminating sports counting and offensive action based on human gesture recognition according to claim 1 wherein: in step B, there are two ways of obtaining the test area:

in a first mode, a test area is appointed on site;

in a second mode, the test equipment is automatically identified by using a deep learning target detection technology, and a test area is automatically generated around the equipment.

3. The method for discriminating sports counting and offensive action based on human gesture recognition according to claim 1 wherein: the deep learning model adopts a convolutional neural network, wherein the convolutional neural network comprises a convolutional layer, a pooling layer, a full-connection layer, an activation function and a batch normalization layer;

the convolution layer slides on the input image by using a group of filters for learning, local feature extraction is performed, and each filter performs convolution operation on a region of the input image to generate feature mapping;

the pooling layer is used for reducing the space size of the feature mapping, reducing the computational complexity and enabling the network to have translation invariance;

the full connection layer is used for mapping and connecting the features extracted by the convolution layer and the pooling layer to the output layer to perform classification or regression tasks;

after the convolution layer and the full connection layer, the activation function introduces nonlinear properties, so that the network can learn complex feature mapping;

the batch normalization layer normalizes the input of each characteristic channel to zero mean and unit variance, reduces the problem of gradient disappearance, and is beneficial to accelerating the training process and improving the stability of the model.

4. A physical movement counting and offensive action discriminating method based on human gesture recognition as claimed in claim 3, wherein: in the step D, the classification model selects a lightweight classification model based on a convolutional neural network, and the operation steps are as follows:

s1, importing a software library and data: firstly, importing a software library, loading an image data set, preprocessing data, and dividing the data into a training set and a testing set;

s2, loading a pre-training model: using as a base model a classification model that has been pre-trained on a large-scale image dataset;

s3, modifying a model architecture: the pre-training model comprises output layers for classifying a large number of different categories, the structure of the output layers is modified according to the task so as to adapt to specific classification problems, and the last full-connection layer is required to be replaced by a new full-connection layer with the number matched with the number of classification categories;

s4, freezing part layers: in order to accelerate the training speed and prevent the feature extraction capability of the classification model from being lost, the frozen convolution layers are selected to keep the frozen convolution layers unchanged in the training process;

s5, training a model: training the modified model by using a training set, and updating the weight of the model through back propagation in the training process so that the model can be learned from training data;

s6, evaluating a model: using a test set to evaluate the performance of the model, typically by calculating classification accuracy, confusion matrix, and other evaluation metrics;

s7, fine adjustment: when the performance of the model is not good enough in actual operation, parameters of the model need to be finely adjusted or different data enhancement strategies are tried;

5. The method for discriminating sports counting and offensive action based on human gesture recognition according to claim 1 wherein: in step C1), regular human and sports actions are collected, and video actions are also collected from the internet public channel.

6. The method for discriminating sports counting and offensive action based on human gesture recognition according to claim 1 wherein: in step C2), the training process adjusts the network parameters by minimizing the error between the predicted keypoint locations and the labeling locations so that they can accurately predict keypoints.

7. The method for discriminating sports counting and offensive action based on human gesture recognition according to claim 1 wherein: in step C, 4), this is achieved by determining whether the location of a particular keypoint passes a threshold.

8. The method for discriminating sports counting and offensive action based on human gesture recognition according to claim 1 wherein: in step D1), the distance, angle and speed features between the keypoints are used to describe the variation of the human motion.