Using a network-diagram to facilitate eliminating predictor variables

01 Jan 2015

In ecological niche modelling, we always faced the problem of variable selection and colinearity (Dormann et al. 2013). Because highly correlated predictor variables may lead to model overfitting, so we introduced a method to facilitate eliminating predictor variables when the aim is to avoid colinearity (see more details in our paper Feng et al. 2015). We used network-diagram which is straight forward in showing the complex colinearity structure of all predictor variables.

The general steps:

network1

Figure 1. Network diagram of 19 bioclimatic variables. The yellow circles represent temperature variables and the green circles represent precipitation variables. Two circles are linked if the two represented variables have a high correlation (|r|≥ 0.7). The selected variables are marked with red boundaries.

*The network diagram is implemented in R, with raster and igraph libraries.

#include igraph and raster libraries
library(igraph)
library(raster)

#load predictor variables
#bioFiles <- list.files(path = "path for variable", pattern = ".asc$", recursive = FALSE, full.names=TRUE)
bioFiles <- list.files(path = "D:/projects/2012.IOZ.pheasant/2012.11-ioz-modify/9.rejected/layer4/prjArea_china_10m_asc/", pattern = ".asc$", recursive = FALSE, full.names=TRUE)
bioRaster<- stack(bioFiles)

#Build correlation matrix
m <- layerStats(bioRaster,'pearson',na.rm=TRUE)[[1]]

#Set threshold for "high" correlation, here we used 0.7
m <- abs(m)
m[m<0.7] <- 0

#
#m<-m*10

# Link variables
net <- graph.adjacency(m,mode="undirected",weighted=TRUE,diag=FALSE)

for (i in 1:11){
    tt<-paste("bio",i,sep="")
	for(j in 1:length(V(net)$name)){
		if ( V(net)$name[j]==tt){
			# set up color for the first 11 variables
			V(net)$color[[j]] <- "yellow"
			# set up node size for the first 11 variables
			V(net)$size[[j]] <- 18
			break
			}
	}
}
for (i in 12:19){
    tt<-paste("bio",i,sep="")
	for(j in 1:length(V(net)$name)){
		if ( V(net)$name[j]==tt){
			V(net)$color[[j]]<-"green"
			V(net)$size[[j]]<-18
		}
	}
}
# Draw the network diagram
tkplot(net)

References: