year 6, Issue 4 (2017winter 2017)                   E.E.R. 2017, 6(4): 47-67 | Back to browse issues page

XML Persian Abstract Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Adhami M, Zabihi M, Zare Naghadeh S, Mostafazadeh R. Choosing the Best Hierarchical Clustering Technique Based on Principal Components Analysis for Suspended Sediment Load Estimation. E.E.R. 2017; 6 (4) :47-67
URL: http://magazine.hormozgan.ac.ir/article-1-357-en.html
Mohaghegh Ardabili University , Raoofmostafazadeh@yahoo.com
Abstract:   (10082 Views)

1- INTRODUCTION

The assessment of watershed sediment load is necessary for controling soil erosion and reducing the potential of sediment production. Different estimates of sediment amounts along with the lack of long-term measurements limits the accessibility to reliable data series of erosion rate and sediment yield. Therefore, the observed data of suspended sediment load could be used to estimate soil loss in the catchment upstream. Hence, one of the valid methods to estimate soil erosion is using of the recorded data of hydrometery stations in combination with catchment characteristics that will provide accurate predictions. For this purpose, recognition of similar sub-watersheds according to climatic, physiographic, geologic land use could be useful in the erosion control operations.

2- THEORETICAL FRAMEWORK

To estimate the exact amount of sediment in the ungauged areas, clustering is introduced as a key step. Various methods and techniques have been used to determine the best number of clusters. However, application of different clustering methods and selection of the best one is rarely found. To this aim, the objective of present study is to determine the most important variables in sediment production using Single linkage, Ward and β-Flexible methods for the clustering of sub-watersheds of Gorganroud and Qareh-Sou river basins in Golestan Province.

3- METHODOLOGY

The Gorganroud and Qareh Sou Watersheds are located at the North-Eastern part of Iran. The seventeen hydrometric stations were selected with a 24-year (1986–2010) recorded data of discharge and suspended sediment load. The Grubbs and Beck method was used to perform the verity in order to verify the outlier discharge measured data. The correlation method was used to fill the missing data in time series. The normality of discharge and suspended sediment data were tested using Kolmogrov-Smirnov test and verified for choosing the well-set trend analyses method. The linear regression and Mann-Kendal Taw methods were used for the data with normal and non-normal distribution in trend analysis, respectively. Auto Correlation Function (ACF) test method was used to determine the internal consistency between the data series.

A set of 38 factors from the five main groups of categories were investigated to determine the sediment yield controlling independent variables. Principal Component Analysis (PCA) was used to determine the most effective variables. In order to detect the best classification method, three classification techniques (Single linkage, Ward’s, and β-flexible methods) were examined in the study area. The Single Linkage also called nearest neighbor is a simple clustering method. The object pairs forms clusters hierarchically starting from the most similar pairs according to the similarity in a descending order. Ward’s algorithm is one of the frequently used techniques for the regionalization studies of hydrology and climatology factors. A generalized hierarchical method, β-Flexible, formed the group calculating the external object. The distance from a point to the group was computed in this method.

Many indices have been developed to examine the validity of clustering techniques based on finding an optimal partitioning. In the present study, Pseudo F and Dunn’s Indices were used to assess the accuracy of clustering algorithms. Accurate clustering means having non-overlapping partitions. One of the most commonly used criteria for the selection of group number is the maximization of pseudo-F statistics. This statistics is based on multivariate normal distribution of data.

4- RESULTS

All data series of 17 sub-watersheds in Gorganroud and Qareh Sou basins were tested with different clustering alghorithms. Two data series showed autocorrelation, detected by the ACF test. Two data sets had trends according to the Kendal’s test. Therefore, 13 sub-watersheds remained for the final classification. Some 38 independent variables were calculated and screened with PCA. The variables with similar effects on sediment yield, were grouped in 7 components. The selected components were chosen according to the amount of variance. The results of PCA and the selected representative variables in each component have been given in Table 1.

Table 1: Result of Principal Component Analysis of effective variables on sediment yield in Gorganroud and Qareh Sou Watersheds, Iran

Components

Spatial Amount

Variance (%)

Cumulative Variance (%)

Representative variable

1

7.99

21.60

21.60

Main stream length

2

6.82

18.43

40.03

Flow discharge with 10 years of return period

3

5.97

16.12

56.16

Percent of forest area

4

5.25

14.18

70.33

Percent of agricultural lands area

5

4.98

13.47

83.81

Drainage density

6

2.56

6.92

90.73

Percentage of permeable formations area

7

1.95

5.28

96.01

Concentration time

Results of Ward’s, Single linkage and β-flexible methods as hierarchical techniques have been summarized in Table 2.

Table 2 Results of the hierarchical clustering technique in Gorganroud and Qareh Sou Watersheds, Iran.

Method

Clusters Number

Dunn Coefficient

Psedue-F

Single Linkage

2

0.29

2.12

3

0.45

3.50

4

0.32

2.89

5

0.43

3.30

Ward

2

0.29

4.06

3

0.19

2.73

4

-

-

5

-

-

β-Flexible

2

0.29

3.57

3

-

-

4

0.37

4.06

5

-

-

5- CONCLUSIONS & SUGGESTIONS

The results showed that the Single linkage method presented a better performance considering the accuracy criterion. The suspended sediment values were determined using measured discharge and available Sediment Rating Curves; therefore, the identified clusters as the reliable and appropriate watershed grouping methods which could be regarded as a useful tool in the management of watersheds particularly in the context of erosion and sedimentation.

Full-Text [PDF 1625 kb]   (2723 Downloads)    
Type of Study: Research |
Received: 2016/09/27 | Published: 2017/06/6

Add your comments about this article : Your username or Email:
CAPTCHA

Send email to the article author


Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

© 2024 CC BY-NC 4.0 | Environmental Erosion Research Journal

Designed & Developed by : Yektaweb