Copyright
©The Author(s) 2023.
World J Psychiatry. Jan 19, 2023; 13(1): 1-14
Published online Jan 19, 2023. doi: 10.5498/wjp.v13.i1.1
Published online Jan 19, 2023. doi: 10.5498/wjp.v13.i1.1
Neutral | Angry | |
Prosody features | ||
Mean Fundamental frequency (F0) | 200 Hz | 225 Hz |
Minimum of the fundamental frequency | 194 Hz | 223 Hz |
Maximum of the fundamental frequency | 213 Hz | 238 Hz |
Mean intensity | 60 dB | 78 dB |
Spectral features | ||
First formant frequency (F1) | 853 Hz | 686 Hz |
Second formant frequency (F2) | 2055 Hz | 1660 Hz |
Third formant frequency (F3) | 3148 Hz | 2847 Hz |
Fourth formant frequency (F4) | 4245 Hz | 3678 Hz |
Approaches | Emotional indicators |
Psychological | (1) Positive and negative emotion words; (2) Standard function word categories; (3) Content categories; (4) The way of pronoun usage; and (5) Acoustic variables (such as pitch variety, pause time, speaking rate and emphasis) |
Linguistic | (1) Phonetic: Spectral analysis, temporal analysis; (2) Semantic & Discourse-pragmatic: Words, field, cultural identity, emotional implicatures, illocutionary acts, deixis and indexicality; and (3) Cognitive: Metaphor, metonymy |
Data science | (1) SER: Looking at sounds with acoustic and spectral features; and (2) NLP: Looking at words with specific semantic properties, word embedding |
Method/Model | Short description | Ref. |
HMM | A HMM is a statistical model that can be used to describe the evolution of observable events that depend on internal factors, which are not directly observable. The observed event is called a ‘symbol’ and the invisible factor underlying the observation is called a ‘state’. A HMM consists of two stochastic processes, namely, an invisible process of hidden states and a visible process of observable symbols. The hidden states form a Markov chain and the probability distribution of the observed symbol depend on the underlying stateVia this model, the observations are modeled in two layers: One visible and the other invisible. Thus, it is useful in classification problems where raw observations are to be put into a number of categories that are more meaningful to us (Supplementary Figure 1) | [121,122] |
Gaussian mixture model | A Gaussian mixture model is a probabilistic model that assumes all data points are generated from a mixture of a finite number of Gaussian distributions with unknown parameters (Supplementary Figure 2) | [123] |
KNN | KNN is a type of supervised learning algorithm used for classification. KNN tries to predict the correct class for the test data by calculating the distance between the test data and all training points. The algorithm then selects the K number of points which are closest to the test data. The KNN algorithm calculates the probability of the test data belonging to the classes of ‘K’ training data where the class that holds the highest probability (by majority voting) will be selected (Supplementary Figure 3) | [123] |
SVM | The SVM is an algorithm that finds a hyperplane in an N-dimensional space (N: The number of features) that distinctly classifies the data points in a way that the plane has the maximum margin, i.e., the maximum distance between data points of the two classes. Maximizing this margin distance would allow the future test points to be classified more accurately. Support vectors are data points that are closer to the hyperplane and influence the position as well as orientation of the hyperplane (Supplementary Figure 4) | [123] |
Artificial neural network | An artificial neural network is a network of interconnected artificial neurons. An artificial neuron which is inspired by the actual neuron is modeled with inputs which are multiplied by weights, and then passed to a mathematical function which determines the activation of the neuron. The neurons in a neural network are grouped into layers. There are three main types of layers: – Input Layer – Hidden Layer(s) – Output Layer. Depending on the architecture of the network, outputs of some neurons are carried along with certain weights as inputs to some other neurons. By passing an input through these layers, the neural network finally outputs a value (discrete or continuous) which can be used to perform various classification/regression tasks. In this context, the neural network first has to learn the set of weights via the patterns within the so called training dataset, which is a sufficiently large set of input data labeled with their corresponding correct (expected) output (Supplementary Figure 5) | [124] |
Bayes classifier | Bayes classifier, which is based on Bayes’ theorem in probability, models the probabilistic relationships between the feature set and the class variable. Based on the modeled relationships, it estimates the class membership probability of the unseen example, in such a way that it minimizes the probability of misclassification | [123] |
Linear discriminant analysis | Linear discriminant analysis is a method used in statistical machine learning, to find a linear combination of features that characterizes or separates two or more classes of objects or events. The resulting linear combination can be used as a linear classifier, or, as a means to dimension reduction prior to the actual classification task | [124] |
- Citation: Adibi P, Kalani S, Zahabi SJ, Asadi H, Bakhtiar M, Heidarpour MR, Roohafza H, Shahoon H, Amouzadeh M. Emotion recognition support system: Where physicians and psychiatrists meet linguists and data engineers. World J Psychiatry 2023; 13(1): 1-14
- URL: https://www.wjgnet.com/2220-3206/full/v13/i1/1.htm
- DOI: https://dx.doi.org/10.5498/wjp.v13.i1.1