Human identification based on biometric become one of the most significant and fundamental research topics in the field of computer vision. Biometrics defines as the study for uniquely recognizing humans based on their intrinsic physical(face, fingerprint, hand geometry, iris, retina, and voice recognition) or behavioral traits(typing patterns, gait or gestures). Human gait also is known as soft biometric help to recognize people through their walking and can be obtained from people at larger and wider distances at low resolution. Gait recognition has the vast application prospect in crime investigation and wide-area surveillance1.Compared with other biometrics (fingerprint, hand geometry, and iris), human gait has unique characteristics:1 ) it has the ability to identify the person remotely where other methods can be applied only by physical contact or at a close distance from the recording device, 2) Works in challenging conditions such as low light and low resolution 3) doesn’t require explicit cooperation from the subject 4) it is difficult to hide, steal, or fake, 5) capturing the images of individual gait can be done easily in public area without special devices. Although the study of human kinesiological showed the possibility of identifying the subject based on gait, some obstacles make gait recognition a difficult task. That is due to the subject may have different walking speed, changes in viewing angle, footwear, carrying conditions or belongings 2.
Gait recognition can be obtained by two approaches: Model-based and Model-free or Appearance-based approach. The model-based approach focuses on the extraction of the gait with stride parameters of the subject by using human body structure. The parameter can be static body parameters (the size ratios of various body parts) or dynamic parameters (the stride length and speed) 3. The obtained information from extraction is then used to construct a recognition model. Model-based approach need to use the images with higher resolution images of the subject as well as being computationally expensive even through gait recognition is supposed to work and effective at low resolution and real-time1.
Appearance-based approach directly operates on the gait sequence or silhouettes of gait and focus on the motion of the human body. The gait descriptors or gait patterns are extracted from the silhouettes. The appearance-based approach directly operated on gait sequence compared to model-based ones. In other ways, this approach may be influenced by several covariates such as carrying bags, clothing, walking speed and view changes14.In this category, most of the researchers purposed to extract gait representation from well-engineered features such as silhouette representations: Gait Flow Image (GFI), Gait Energy Image (GEI), Masked GEI based on GEnI(MGEI), Gait Entropy Image(GEnI),Frequency-Domain Feature(FDF) and so on1.According to Zhang et.al, they proposed to use the GEI(Gait Energy Image) instead of the raw sequence of human gait to solve the data-limitation problem. GEI normally removes the noisy information and only keeps the shapes of the human and changes during the walking. And also, the deep natural network can quickly capture the discriminative biometrics information in human gait by the help of GEI representation. For multi-dimensional data analysis, GEI has also been used for gender classification and fusion biometric. 25. Toby H.W.Lam et.al argues even through GEI represents the recency of the motion but it is not embedded with any information of the movement. So they purposed other new data representation called GFI(gait flow image) for gait representation. Since GFI can performed any transformation and can be computed without the need to construct any model. By using optical flow field, the relative motion information of the gait sequence can be determined in GFI.The framework of appearance-based approach includes silhouette extraction, period detection, representation generation and recognition 15.
Silhouette extraction/Background extraction
Silhouette extraction means the detection and extraction of the human body form the gait video by using background subtraction method. Toby H.W.Lam et.al suggested that the using simple background subtraction or the segmentation method cab achieved silhouette which extracted from gait sequence. They used binarization process which renders background image is black (pixel=0) and foreground image is white(pixel=1). The binarized image computes the bounding box of the silhouette image which is cropped according to the size and position of the bounding box. And then the image needs to be normalized to the fixed size for next step which is gait period estimation 5.In 2012, Shajina.T et.al purposed another method of silhouette extraction which called GMM (Gaussian Mixture Model).To model the pixel color, the mixture of Gaussians is used in this algorithm. Until the matching Gaussians is found, current frame pixel needs to check against the background model which compared with every Gaussian in the model. If the matching gaussian is found, the mean and variance of that are updated and otherwise the current pixel color’s mean equal value of new Gaussians and some initial variance is introduced into the Gaussians mixtures 6.
As the first step of gait recognition system, Negin K.Hosseini et.al proposed silhouette extraction in the less complicated way. Since the main purpose of silhouette extraction is dividing video frames into background and foreground to achieving binary silhouette of the walking gait. So, they applied the TUM-IITK GP gait database with the black and white sequence. They tracked the subject and detected the blob moving in gait sequence’s each frame. Then, the binary silhouette was extracted and centralized and also applied morphological operators to reduce noises 7.Zhang et.al suggested that they can extract the human silhouette and averaged the GEI(Gait Energy Image)representation to represent the motion of human into the single image.GEI an only preserved the temporal information not well maintained the original information since they use averaging operation algorithms2
As the second step, period detection is estimation the gait period; the numbers of image frames in each gait cycle. Toby H.W.Lam et.al proposed to obtain the gait period for the GFI generation process. In the mid-stance position, the smallest number of foreground pixels while greatest number ib double support position. By calculating the number of pixels in the silhouette image can get the gait period. In their research, they calculate the gait period by finding the median of the three-consecutive gait cycle’s distance. To generate the GFIs, the binary silhouette is divided into cycles with the result of gait period 5. Negin K.Hosseini et.al suggested the simpler way for period detection by counting the white pixel of the frames in the gait sequence of the subject. The number of white pixels is minimized while taking the initial swing phases and legs are becoming closer than other phases. And then, they counted all the white pixels in each frame and selected the frames placed in between three minimum white pixel frames 7. Shajina.T et.al use the key poses extraction instead of period extraction/detection. They represented the gait cycle with key poses which can be extracted by using the K-mean clustering and Eigen space projection algorithm. In Eigen space projection, the silhouette images are represented a column vector and find the mean of the column vectors. And then, they subtract each silhouette column vector from the mean column vector thus normalized silhouette image can be found. The second method is using K-means clustering which means K-means clustering is applied to this weight vector after finding the weight vectors 6.Choa Li et.al proposed the deep gait generation for appearance-based gait recognition. They calculated the NAC(Normalized Auto Correction) of each normalized gait sequence along the temporal axis. A state-of-the-art deep convolutional model(VGG-D) also adopted in their paper. VGG-D consists of 16 convolutional layers and 3 fully connected layers (19 parameterized layer) and uses an architecture with very small convolution filters to evaluated very deep convolutional network 1.F.M Castro et.al also proposed the CNN architecture for the gait signature extraction. In their case, they use optical flow(OF) as the input data for representation in the video with CNN and they tried to decode the simple and low-resolution optical flow. The original video might have different temporal length and need to extract the fixed sized needed by CNN architecture. So, the input data with the size of 60×60×50 that can be obtained from OF frames 8. Zhang et.al also suggested the CNN architecture in their research. They combine raw video sequence from the surveillance camera into GEIs(gait energy Image) as the input for the DCNN (deep convolutional neural network).And then they introduce the new algorithm called Siamese network to learn sufficient features representation of the gait. Differently, two parallel CNN architectures which shared the same parameter, contained in the Siamese network. Since CNN architecture more focuses on classification problem in recognition and there is a huge gap between the recognition and classification. So they proposed to use Siamese network to solve this problem 2.
For the representation generation, researchers suggested the wide different range of methods based on the input data such as GEIs, GFIs, optical flows and so on. Negin K.Hosseini et.al proposed the Principal Component Analysis(PCA) which is applied to the Eigenspace transformation as mentioned in period estimation stage. The main reason for choosing that method is to reduce the dimensionality of the feature space. The largest eigenvector makes a subspace to rebuilt average silhouette and applied that to ignore the small eigenvalue and eigenvectors 7.Chao Li et.al suggested the step of their representation generalization and visualization with their input data GEIs. They simply averaging silhouette sequence with the result o gait period estimation and capture both temporal and spiral information. They use max-pooling method over gait periods’ features to combine the spatio-temporal information. When the DeepGait is valid, they tried another vision with the average-pooling method that showed inferior performance1.Shajina.T suggested applying time series shapelets which include Brute force method, Subsequence Distance Early Abandon and Admissible Entropy Pruning6.Since GFIs contains the human gait’s motion information and they generated by defining the optical flow field from each gait cycle. As suggested by Toby H.W.Lam et.al, there are two optical flow field; the horizontal components of the field and vertical components of flow. To obtain the fields, they adopted Horn and Schunck’s approach and as a parameter, the regularization is 0.5 and repeated the computation for five iterations 5. According to F.M Castro et.al who used optical flows as the input data and CNN as the architecture for estimation purpose, they practiced in MatConNet library in this phase. And, they also use the CUDA and cuDNN to develop the CNN architecture in an easy and fast manner 8.Zhang et.al proposed to use the Siamese Network based gait recognition which can reduce the constraints of CNN.By using distance metric learning architecture, Simase network simultaneously minimizes the distance between the similar subject and maximize the distance between dissimilar pairs2.
Gait recognition can be divided into two major tasks which are gait verification and gait identification. Zhang et.al proposed the verification method with intra-degree and inter-degree. They record the dataset of the subject as a probe(i.e., query) and gallery(i.e., Source).They use the gallery for training since there is multiple gait cycle for each subject in the gallery set. However, they did not use the probe in training. For intra-view identification tasks, they use a state-of-the-art method which including FDF and GEI template matching strategy, HWLD and CNN-based method. They also compared all the test results with their proposed method; Siamese network-based method and the results got better accuracy and can solve verification scenario of gait recognition 2. Toby H.W.Lam et.al suggested using the similarity score which represents the similarity level between the gallery and the probe data. The gait sequence contained each subject and there are subjects in gallery data. So, the determination between probe sequence and gallery sequence called a similarity sequence. They used three algorithms; Direct matching, dimension reduction, and classifier.Direct matching is calculating the similarity score SimScore but the difference in the numbers of gallery and probe sequence resulted in increasing the degree of computational complexity.For Dimension reduction, they adopted the linear discriminant analysis (LDA) which can reduce the dimensionality and transform a high dimension data space to a lower dimension ones.Lastly, the nearest neighbor classifier is proposed to use as a classifier which also uses Simesocre between projected testing sequence and training sequence5.Chao Li et.al also suggested using Sim scored to decided whether input gait sequence(gallery, probe) belong to the same subject. However, they adapted other algorithm called Joint Bayesian in Simcoe to evaluate the given sequence. For comparison, they also adopted Euclidean distance as a baseline method and Principal Component Analysis(PCA) for the Dimension detection1. Negin K.Hosseini et.al also proposed Euclidian distance for the classification stage to compute the distance of the input image with training set7.Lastly, F.M Castro et.al purposed to classify the gait signatures to indent the subjects after obtaining the gait signatures. Even through softmax layer of the CNN is a classifier, they use fully-connected layers as an input for gait signature for the SVM(Support Vector Machine).They also use Nearest Neighbor(NN) classifier which did not require any training step8.
This review presents the serval recent development methods of gait recognition. Gait recognition is recently wide popularity research in a biometric field because that doesn’t need the direct contact with a subject, unlike fingerprint, retina, hand geometry. We have seen there are two approaches for gait recognition: appearance-based and model-based.The appearance-based approach does not require the high-quality image.There are simply four stages of gait recognition: silhouette extraction, period detection, representation generation and recognition. As mentioned in the above, there are quite numbers of algorithms and methods are used for each stage.