Home

Network Biology, 2018, 8(3): 126-136
[XML] [EndNote] [RefManager] [BibTex] [ Full PDF (159K)] [Comment/Review Article]

Article

Analysis of word occurrence frequency and word association in English text file: A big data analytics method

YanHong Qi1, GuangHua Liu2, WenJun Zhang3
1Sun Yat-sen University Libraries, Sun Yat-sen University, Guangzhou 510275, China
2Guangdong AIB Polytech College, Guangzhou 510507, China
3School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, China

Received 9 August 2017;Accepted 15 October 2017;Published 1 September 2018
IAEES

Abstract
In present study, I presented an algorithm for analysis of word occurrence frequency and word association in English text file. Various delimiters were used for splitting words. In addition, common used grammatical words are ignored in word occurrence and association analysis. All different words were listed according to word occurrence frequency from the greater to the smaller. Word association was detected by using one-dimensional ordered cluster analysis. The words fallen in the same class may likely have strong association. Theoretically, various classes at distinct clustering hierarchical level may represent different hierarchical topics. Java software of the algorithm was provided.

Keywords big data analytics;word splitting;word occurrence frequency;word association;English text;algorithm;software.



International Academy of Ecology and Environmental Sciences. E-mail: office@iaees.org
Copyright © 2009-2024 International Academy of Ecology and Environmental Sciences. All rights reserved.
Web administrator: office@iaees.org, website@iaees.org; Last modified: 2024/3/29


Translate page to: