Construction and analysis of the word network based on the Random
Reading Frame (RRF) method
WenJun Zhang
School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, China
Network Biology
ISSN 2220-8879
http://www.iaees.org/publications/journals/nb/online-version.asp
2021
11
3
154
193
International Academy of Ecology and Environmental Sciences
Hong Kong
9 August 2018
15 January 2021
1 September 2021
word association
association rules
correlation measures
Random Reading Frame
network construction
network analysis
algorithm
text mining
In present study, a method was developed to construct and analyze the word network. The core of the method is Random Reading Frame (RRF) method. First, download or collect word files (in various formats, e.g., pdf, txt, doc, docx, rtf, html, etc.) from internet or local machine in terms of the concerned topics. All files were then combined in a final text file. Excepting for splitting words and stop words, all words were arranged in a word vector following their orders in the combined text file. In the RRF method, for a given pair of unique words (x, y), x, y belong to {u1,u2,...,um}, a reading frame with randomly changeable width is randomly placed on the vector to count the respective number of the two words in the frame. Randomly repeating the procedure p times, the paired numbers are thus achieved: (x1, y1), (x2,y2), ..., (xp, yp). In such a way, the paired numbers for all pairs of unique words are achieved. Thereafter, for a given pair of unique words (x, y), Pearson correlation and Pearson partial correlation, Spearman rank correlation, or point correlation is used to calculate their correlation value according to their paired numbers (x1, y1), (x2,y2), ..., (xp, yp), and the statistically significance can be determined by t-test (Pearson correlation, Pearson partial correlation, Spearman rank correlation) or chi2-test (point correlation). In such a way, all statistically significant word pairs are achieved in terms of the correlation measure chosen by user. Finally, the word network, in terms of the correlation measure chosen, can be constructed based on these word pairs, and no links between statistically insignificant word pairs. Network analysis is conducted for the word network constructed from significant between-word positive correlations among all unique words. Word centrality measures, word tree, word chains, word modules, etc., can be calculated in the method. The Matlab software, wordNetwork for the method was given also.
http://www.iaees.org/publications/journals/nb/articles/2021-11(3)/construction-and-analysis-of-word-network-from-Random-Reading-Frame.pdf