Home

Network Biology, 2021, 11(3): 154-193
[XML] [EndNote] [RefManager] [BibTex] [ Full PDF (1418K)] [Comment/Review Article]

Article

Construction and analysis of the word network based on the Random Reading Frame (RRF) method

WenJun Zhang
School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, China

Received 9 August 2018;Accepted 15 January 2021;Published 1 September 2021
IAEES

Abstract
In present study, a method was developed to construct and analyze the word network. The core of the method is Random Reading Frame (RRF) method. First, download or collect word files (in various formats, e.g., pdf, txt, doc, docx, rtf, html, etc.) from internet or local machine in terms of the concerned topics. All files were then combined in a final text file. Excepting for splitting words and stop words, all words were arranged in a word vector following their orders in the combined text file. In the RRF method, for a given pair of unique words (x, y), x, y<-{u1,u2,...,um}, a reading frame with randomly changeable width is randomly placed on the vector to count the respective number of the two words in the frame. Randomly repeating the procedure p times, the paired numbers are thus achieved: (x1, y1), (x2, y2), ..., (xp, yp). In such a way, the paired numbers for all pairs of unique words are achieved. Thereafter, for a given pair of unique words (x, y), Pearson correlation and Pearson partial correlation, Spearman rank correlation, or point correlation is used to calculate their correlation value according to their paired numbers (x2, y2), ..., (xp, yp), and the statistically significance can be determined by t-test (Pearson correlation, Pearson partial correlation, Spearman rank correlation) or chi2-test (point correlation). In such a way, all statistically significant word pairs are achieved in terms of the correlation measure chosen by user. Finally, the word network, in terms of the correlation measure chosen, can be constructed based on these word pairs, and no links between statistically insignificant word pairs. Network analysis is conducted for the word network constructed from significant between-word positive correlations among all unique words. Word centrality measures, word tree, word chains, word modules, etc., can be calculated in the method. The Matlab software, wordNetwork for the method was given also.

Keywords word association;association rules;correlation measures;Random Reading Frame;network construction;network analysis;algorithm;text mining.



International Academy of Ecology and Environmental Sciences. E-mail: office@iaees.org
Copyright © 2009-2024 International Academy of Ecology and Environmental Sciences. All rights reserved.
Web administrator: office@iaees.org, website@iaees.org; Last modified: 2024/4/20


Translate page to: