A Comparative Study of the CNN Based Models Used for Remote Sensing Image Classification

.co.in A Comparative Study of CNN Bases Models used of Remote ░ ABSTRACT-Remotely sensed images, their classification and accuracy play a vital role in measuring a country’s scientific growth and technological development. Remote Sensing (RS) can be interpreted as a way of assessing the characteristics of a surface or an entity from a distance. This task of identifying and classifying datasets of RS images can be done using Convolutional Neural Network (CNN). For classifying images of large-scale areas, the traditional CNN approach produces coarse maps. For addressing this issue, Object based CNN method can be used. Classifying images with high spatial resolution can be done effectively using Object based image analysis. Deep learning methods offer the strength of auto learning the spatial features of an image. Object scale based adaptive CNN is a novel technique that can improve the accuracy of image classification of high spatial resolution images. For efficient RS image classification, a novel Deep learning approach called distributed CNN can be used which leads to enhanced accuracy of RS image classification. In this paper, three CNN models have been compared while considering the training time and efficiency to classify RS images as parameters of measure to assess the CNN models.


░ 1. INTRODUCTION
The process of remotely identifying, measuring and studying the features of an entity or land cover surface from a distance is called remote sensing. In the past few years, there has been remarkable increase in the generation and collection of remote sensing data [31] due to the introduction of numerous active and passive remote sensors into the space. This has led to a huge growth in the size of RS datasets. Moreover, the large variety of resolutions available among the RS data and the process of classifying these images has become a true challenge. The processing capacity of the available remote sensing image classifiers is not in pace with the volume of RS data being generated. Therefore, CNN based methods including multiscale CNNs that can perform object analysis are very important for precise and speedy classification of RS images. Even though there has been lot of research in the field of image processing, the task of classifying large areas has been tough for aerial imagery and high-resolution satellites. The high spatial resolution images can be classified using Object based image analysis (OBIA) since they use segments of the images first and then classifies them. The segments are identified as object primitives and scaling is performed. However, due to the variety and complexity present in the remotely sensed dataset images, the idea of object scaling used in OBIA leads to over and under segmentation at the same time. Deep learning algorithms like multiscale CNNs make use of patch-based schemes [19] and pixel-to-pixel scaling for precise classification of RS images.

Dataset Preparation
The multiscale Object based CNN [5] [26] uses aerial imagery taken from National Agriculture Imagery Program (NAIP) conducted in 2015 in the Iowa state of USA. A large dataset was created consisting of ~6100 image tiles resulting in around 1million images, roughly 955 GB with 140-170 MB /tile. The dataset was used to create 10 land cover classes that included structures, roads, rivers, ponds, cultivated crops, fallow, shadow, forest, grassland and barren land. Every class consists of 100,000 patch images and a central label was given for each central 2 in the patches, termed as seed points [8]. In this way, a multiscale dataset with 6 patch sizes was created. The OCNN model was then trained with the false-colour Iowa Net dataset. The convolutional neural network based on Object-Scale Adaption (OSA-CNN) [2] efficiently combines Object based image analysis (OBIA) [27] and Deep Learning techniques [23] to attain better classification outcomes for HSR images. The deep learning technologies [10] [14] are capable of auto learning image features and this avoids the hassle of manual feature extraction. The experimental data contained aerial images composed from the Ohio State-wide Imagery Program. Here, the patch-based scheme obtains image patches for creating net dataset and for this purpose a full connection layer is used that includes a sliding window for super-pixel segmentation [11]. A Full Convolutional Network (FCN) [25] is used in the pixel-topixel scheme. The High spatial resolution RS images are segmented into object primitives and then the Hard Boundary Constrained semantic segmentation method (HBC-SEG) [6] is adopted.
The RS-Distributed CNN model [1] makes use of images of land covers like deserts, forests, rivers and mountains to prepare the dataset. The dataset is partitioned as two sets: the first set for training the RS-DCNN so that it can predict the labels correctly, and the second set for testing the accurateness of the trained network. The images are labelled for training the CNN model in such a way that each label uniquely represents a land cover type. The process has two main steps: First step is to the ingest the RS big data, which separates data across several distributed file systems and the second step involves application of Distributed CNN in order to precisely classify the RS images. Initially, the RS images are read and processed to split them into smaller images of equal size and later, for classifying every image pixel, the RS-DCNN model implements an algorithm named Maximum Likelihood [16] which is a supervised classification algorithm. In this CNN model, Polygons showing particular regions of Saudi Arabia were digitalized to obtain thematic information.

Convolution Neural Network (CNN)
A convolutional neural network can be defined as deep learning neural network [20] to process data present in structured arrays like images. A typical CNN architecture comprises of following layers: (i) convolutional layer (ii) pooling layer and finally the(iii) fully connected (FC) layers. The convolutional layer of CNN contains trainable weights using which spatial features [17] of the images can be extracted and classified as complex structures with complexity levels from low to high.

The Multiscale Object based CNN
Multiscale CNNs have different window sizes that are suitable for different applications. For capturing the information which are dependent on scale of image, the range of size of input window must be clearly related with target images being studied and this capacity is provided by multiscale CNNs while performing object-based image classification. The multiscale OCNN model [3] consists of six CNN architectures, all of which are based on the recognized architectures AlexNet and LeNet-5. All the CNN architectures were trained inside the multiscale CNN framework having varying resolutions of input patches: CNN8 -8 * 8, CNN16 -16 * 16, CNN32-32 * 32, CNN64 -64 * 64, CNN128 -128 * 128 and CNN256 -256 * 256 pixels.

Figure 2: Overview of Multiscale CNN framework
Mean-Shift algorithm [29] is used for image segmentation and the resultant segmented images contain similar spatial and spectral information which help in the process of edge detection [28] for high resolution images [18]. The algorithm works with 3 scale parameters namely: (i) spatial scale (hr), (ii)spectral scale (hs), (iii) minimum segment size (Ms). Based on the outcomes, parameters of global scale were chosen as [hr, hs, Ms] = [16.0, 16.5, 25] for segmenting the images of NAIP dataset. As a result, of segmentation step, approximately 1.01 billion segments were generated for the state of Iowa 2015 data, reaching from 2.7 to 24.5 million segments for every region. For object analysis, the semantic free segments act as input polygons and the mean shift algorithm defines the input patch size and the convolutional locations that are necessary to make CNN predictions. The main processing module will generate input patches out of the images of original NAIP dataset for particular locations along with the size of each patch. After training the CNNs, the model calculates probability per-class by making use of these input patches and land cover with maximum class score is

The Object-Scale Adaptive CNN Model
Deep Learning based Object-scale Adaptive CNN model is used here since the DL techniques auto learn the image features of High Spatial Resolution images [11]. Merging the OBIA and DL method proposes an appropriate method to classify HSR images. The RS image dataset was contained aerial images that were obtained from the Ohio State wide Imagery Program. Based on the conversion from object primitives to image patches, and considering the rules of MSA, the size of image patches is accustomed automatically and designated according to the width of object primitives. These schemes naturally club the OBIA with the CNN methods, apply the benefit of MSA, and conduct the classification with OBIA. To get highly accurate results, OSA-CNN considers 2 scales for comparing the probability and to fuse information. OSA-CNN implements the OBIA methodical context: first segment and then classify. Through the process of segmenting the images, the OSA-CNN gathers information by one click on the object primitives. This enhances the overall effectiveness of sample collection and empowers CNN applications [20]. The RS-Finder software was used along with Visual C++. This method was used on Windows 7 platform using variety of HSR images. High accuracy was attained in the experiments conducted in OSA-CNN for classifying HSR images.

RS-DCNN: Distributed CNN to Classify Remote Sensing Images
The framework of the RS-DCNN model is built on the basis of a huge data processing architecture that works on the Apache Spark big data [9] cluster shown in Figure 7. A huge RS dataset containing variety of images is given as input to the system and the classified images are obtained as output. For making imageclassification efficient and faster, Apache Spark conducts computations inside memory and in addition, it uses some very efficient libraries available in SQL. Tools and library packages of machine learning and Fuzzy C-means segmentation methods [15] are used here. The architecture used in Apache Spark exploits a master/slave arrangement including 2 main daemon Website: www.ijeer.forexjournal.co.in A Comparative Study of CNN Bases Models used of Remote nodes and a node as the cluster manager. Daemon nodes act as slave node as well as master node.

Figure 5: Architecture of RS-DCNN
The components in the RS-DCNN approach include: (1) SparkDriver (master-node) having the SparkContext located inside the main program (2) The cluster manager, and the (3) worker nodes (slave nodes) in which the executors exist. In this architecture, ten slave nodes were used.
The point of entrance to the Spark architecture is the master node which also acts as the central point. Once the SparkDriver receives the information from the Spark master, it coordinates with the Spark cluster and then Spark driver allocates tasks uniformly to executors and also collects data back from them. The executor processes need to have both, the application and also the tasks to be carried out, which will be provided by SparkContext for running in every executor process. The SparkDriver which operating on the master node of the Spark cluster negotiates with the cluster manager to create the schedules for execution of every job. This process leads to translation of the Resilient Distributed Datasets RDDs into an execution graph and further transformations and actions present in the code gets reformed as a Directed Acyclic Graph (DAG). Once the physical plan of execution gets created, creation of small physical threads is done by the SparkDriver process. These threads are also called as tasks and they are clustered together so that they can communicate efficiently and further information is transmitted onto the Spark cluster. Cluster manager then initiates the SparkDriver to negotiate for the essential resources and launches the distributed executor processes on the work nodes (slave nodes) before execution starts. The RS-DCNN approach primarily implements two steps: (1) preparation of the training dataset which is done by dividing large RS satellite images into smaller images and then, implementing an algorithm called Maximum Likelihood which is a supervised classification algorithm (2) A distributed CNN algorithm is applied to carry out accurate classification of the big satellite images. For bringing in parallelism for image classification, the Asynchronous Distributed Stochastic Gradient Descent (ADSGD) algorithm [24] is used which distributes the execution of the CNN algorithm across the entire big data cluster.
The ADSDG algorithm procedure works with following worker node steps:

Requirement: A Worker node Set S {w1, w2……wn}
For every worker Wi in the set S, do the following: obtain the information from master node that contains the server parameters. From a random local dataset, choose a sample set with uniform distribution and calculate the partial gradient ∆Vi depending on the size of input image X and Vi for each worker Wi of node S and finally, forward the partial gradient value to the master node. The Scale gradient parameter V at every iteration of training the dataset samples is found. Consider the scaling factor ∆Vij that can be calculated for the iteration i+1 till iteration j in the SDG training set using the following equation:

Vi+1,j = Vi -α ∆Lj
The ADSDG algorithm for the master node receives information regarding the learning rate α, the loss function Lj and the partial gradient parameters from the worker nodes to compute the global gradient parameter V for entire RS dataset samples and updates it to all the worker nodes.
For validating the efficiency of the RS-DCNN model, the experimental analysis was performed on the satellite image taken from RS dataset of SPOT7. In this approach, five land cover types namely, water, vegetation, road, urban, and soil were considered for classification. The RS-DCNN ensured accurate image classification achieving an overall classification accuracy of 92.06% and the value of Kappa was 88.3%. In addition, to assess the efficiency of the RS-DCNN, the training time of this approach was compared with a CNN model in which a single node was used for execution. Since multiple daemon nodes are used in RS-DCNN, it outperforms the efficiency of GAN (Generative Adversarial Networks) [21] with respect to classification accuracy in several scenarios. When the size of the image taken is 8 GB, time needed for training on a single node computer is approximately 6 hours, whereas, and the RS-DCNN approach reduces this time to 20.3 minutes. Four different scenarios have been considered while varying the parameters in each case. Here the parameters taken are number of workers is W, the size of image sample taken in GBs is X and accuracy rate is found.

░ 4. RESULTS AND DISCUSSION
Convolutional neural networks have the capability to process the deep features related to spatial images for land cover classification. In this paper, three versions of CNN models were identified and compared for performing classification of largescale remote sensing images. The Multiscale OCNN performs segmentation and classification of ten different types of land covers and it is more efficient compared to the Traditional CNN model. The OSA-CNN performs classification of high spatial resolution images by applying the deep learning techniques to auto-learn the image features and performs classification task in a remarkably less amount of time. The Distributed CNN model is capable of classifying huge remote sensing datasets by making use of Apache Spark clusters. A comparative study was made to recognize the type of land cover images that were being classified and to assess the accuracy of training time in the three models under different scenarios. The overall accuracy and the kappa accuracy values were found to be the highest for the RS-DCNN model with the availability of more worker nodes in the International Journal of Electrical and Electronics Research (IJEER) Open Access | Rapid and quality publishing Review Article | Volume 11, Issue 3 | Pages 646-651 | e-ISSN: 2347-470X cluster. Therefore, it can be concluded that the RS-DCNN model is a better model for RS image classification purpose.

░ 5. ACKNOWLEDGMENTS
Our sincere thanks to everyone who has contributed towards the completion of this study. Special thanks to the reviewers for their timely reviews and suggestions.