Emotion recognition support system: Where physicians and psychiatrists meet linguists and data engineers

doi:10.5498/wjp.v13.i1.1

Advanced Search

BPG is committed to discovery and dissemination of knowledge

Home / Archive / Volume 13, Issue 1

This Article

Academic Content and Language Evaluation of This Article

CrossCheck and Google Search of This Article

Academic Rules and Norms of This Article

Supplementary Materials of This Article

Citation of this article

Corresponding Author of This Article

Research Domain of This Article

Article-Type of This Article

Open-Access Policy of This Article

Times Cited Counts in Google of This Article

Number of Hits and Downloads for This Article

Total Article Views (6752)

All Articles published online

The chart showing PDF series, WORD series, HTML series, Figures (1-3) series, Tables (1-3) series.

Item

Count

PDF

277

WORD

HTML

2812

Figures (1-3)

587

Tables (1-3)

629

Sum=4358

Featured Article

The chart showing Browse series, Download series.

Item

Count

Browse

284

Download

866

Sum=1150

Publishing Process of This Article

Item

Count

Browse

246

Download

998

Sum=1244

Jan 19, 2023 (publication date) through Aug 18, 2025

Times Cited of This Article

Times Cited (5)

Journal Information of This Article

Publication Name

World Journal of Psychiatry

ISSN

2220-3206

Publisher of This Article

Baishideng Publishing Group Inc, 7041 Koll Center Parkway, Suite 160, Pleasanton, CA 94566, USA

Review

World J Psychiatry. Jan 19, 2023; 13(1): 1-14
Published online Jan 19, 2023. doi: 10.5498/wjp.v13.i1.1

Table 1 Acoustic differences related to prosody and spectral features of the word (sahar) produced by a Persian female speaker in neutral and anger situations

	Neutral	Angry
Prosody features
Mean Fundamental frequency (F0)	200 Hz	225 Hz
Minimum of the fundamental frequency	194 Hz	223 Hz
Maximum of the fundamental frequency	213 Hz	238 Hz
Mean intensity	60 dB	78 dB
Spectral features
First formant frequency (F1)	853 Hz	686 Hz
Second formant frequency (F2)	2055 Hz	1660 Hz
Third formant frequency (F3)	3148 Hz	2847 Hz
Fourth formant frequency (F4)	4245 Hz	3678 Hz

Table 2 Different approaches to recognizing the emotional indicators in speech

Approaches	Emotional indicators
Psychological	(1) Positive and negative emotion words; (2) Standard function word categories; (3) Content categories; (4) The way of pronoun usage; and (5) Acoustic variables (such as pitch variety, pause time, speaking rate and emphasis)
Linguistic	(1) Phonetic: Spectral analysis, temporal analysis; (2) Semantic & Discourse-pragmatic: Words, field, cultural identity, emotional implicatures, illocutionary acts, deixis and indexicality; and (3) Cognitive: Metaphor, metonymy
Data science	(1) SER: Looking at sounds with acoustic and spectral features; and (2) NLP: Looking at words with specific semantic properties, word embedding

SER: Speech emotion recognition; NLP: Natural language processing.

Table 3 A brief description of some data science models/methods

Method/Model	Short description	Ref.
HMM	A HMM is a statistical model that can be used to describe the evolution of observable events that depend on internal factors, which are not directly observable. The observed event is called a ‘symbol’ and the invisible factor underlying the observation is called a ‘state’. A HMM consists of two stochastic processes, namely, an invisible process of hidden states and a visible process of observable symbols. The hidden states form a Markov chain and the probability distribution of the observed symbol depend on the underlying stateVia this model, the observations are modeled in two layers: One visible and the other invisible. Thus, it is useful in classification problems where raw observations are to be put into a number of categories that are more meaningful to us (Supplementary Figure 1)	[121,122]
Gaussian mixture model	A Gaussian mixture model is a probabilistic model that assumes all data points are generated from a mixture of a finite number of Gaussian distributions with unknown parameters (Supplementary Figure 2)	[123]
KNN	KNN is a type of supervised learning algorithm used for classification. KNN tries to predict the correct class for the test data by calculating the distance between the test data and all training points. The algorithm then selects the K number of points which are closest to the test data. The KNN algorithm calculates the probability of the test data belonging to the classes of ‘K’ training data where the class that holds the highest probability (by majority voting) will be selected (Supplementary Figure 3)	[123]
SVM	The SVM is an algorithm that finds a hyperplane in an N-dimensional space (N: The number of features) that distinctly classifies the data points in a way that the plane has the maximum margin, i.e., the maximum distance between data points of the two classes. Maximizing this margin distance would allow the future test points to be classified more accurately. Support vectors are data points that are closer to the hyperplane and influence the position as well as orientation of the hyperplane (Supplementary Figure 4)	[123]
Artificial neural network	An artificial neural network is a network of interconnected artificial neurons. An artificial neuron which is inspired by the actual neuron is modeled with inputs which are multiplied by weights, and then passed to a mathematical function which determines the activation of the neuron. The neurons in a neural network are grouped into layers. There are three main types of layers: – Input Layer – Hidden Layer(s) – Output Layer. Depending on the architecture of the network, outputs of some neurons are carried along with certain weights as inputs to some other neurons. By passing an input through these layers, the neural network finally outputs a value (discrete or continuous) which can be used to perform various classification/regression tasks. In this context, the neural network first has to learn the set of weights via the patterns within the so called training dataset, which is a sufficiently large set of input data labeled with their corresponding correct (expected) output (Supplementary Figure 5)	[124]
Bayes classifier	Bayes classifier, which is based on Bayes’ theorem in probability, models the probabilistic relationships between the feature set and the class variable. Based on the modeled relationships, it estimates the class membership probability of the unseen example, in such a way that it minimizes the probability of misclassification	[123]
Linear discriminant analysis	Linear discriminant analysis is a method used in statistical machine learning, to find a linear combination of features that characterizes or separates two or more classes of objects or events. The resulting linear combination can be used as a linear classifier, or, as a means to dimension reduction prior to the actual classification task	[124]

HMM: Hidden markov model; KNN: K-nearest neighbor; SVM: Support vector machine.

Citation: Adibi P, Kalani S, Zahabi SJ, Asadi H, Bakhtiar M, Heidarpour MR, Roohafza H, Shahoon H, Amouzadeh M. Emotion recognition support system: Where physicians and psychiatrists meet linguists and data engineers. World J Psychiatry 2023; 13(1): 1-14
URL: https://www.wjgnet.com/2220-3206/full/v13/i1/1.htm
DOI: https://dx.doi.org/10.5498/wjp.v13.i1.1