The goal is to create a program where two site’s contents can be compared and represent to show their similarities in a heatmap. This is used in professional SEO services.
By using Hierarchical clustering and K mean Clustering, a group of terms has been selected according to their TF. Then it is shown and compared side by side in a form of Document heat map form where the colors represent their TF in that particular document. Seo company can provide more info about document heat map.
To download package outside of CRAN archive:
“genefilter” package needed to download in order to plot the heat map.
Loading stopwords files:
Text Data input:
This line will take every text file (.txt) in a list file, present in the default directory.
lapply function is applied for operations on list objects and returns a list object of same length of original set. Know from a seo expert.
Creating a Corpus:
Creating Term Doc Matrix:
Distance Between vectors:
Algorithm used Euclidean distance to measure the distance of each terms in this case.
Number of group to visualize
Creating Clusters, using Ward method.
Hang= -1 is use to level the output.
K means Clustering:
Plotting the K means
a number of pseudo-randomly-generated numbers, it takes an (arbitrary) integer argument. So we can take any argument, say, 1 or 123 or 300 or 12345 to get the reproducible random numbers. Know more from Seo consultant service.
Row Variance Of An Array:
Calculates variances of each row of an array
Get Indicies For Significant Edges
Get the indicies for the significant edges in a network.
Plotting the heatmap:
Heat map plotting
Doc 1 Doc2