|Year : 2018 | Volume
| Issue : 2 | Page : 73-79
Mining relationships among knowledge, attitude, and practice of drivers using self-organizing map and decision tree: The case of Bandar Abbas city taxi drivers
Esmaeil Hadavandi1, Leila Omidi2, Abdolhamid Tajvar3, Ali Ghanbari4
1 Department of Industrial Engineering, Birjand University of Technology, Birjand, Iran
2 Department of Occupational Health Engineering, School of Public Health, Tehran University of Medical Sciences, Tehran, Iran
3 Research Center for Social Determinants in Health Promotion, Hormozgan University of Medical Sciences, Bandar Abbas, Iran
4 Department of Epidemiology and Biostatistics, School of Public Health, Tehran University of Medical Sciences, Tehran, Iran
|Date of Web Publication||18-Nov-2018|
Mr. Abdolhamid Tajvar
Lecturer in Occupational Health Engineering, Research Center for Social Determinants in Health Promotion, Hormozgan University of Medical Sciences, Bandar Abbas
Source of Support: None, Conflict of Interest: None
Background and Objectives: Traffic accidents are the leading causes of fatal or nonfatal work-related injuries in many countries. Analyzing influencing factors on knowledge, attitude, and practice of drivers is a topic of interest for policymakers to decrease traffic accident injury victims. Materials and Methods: In this article, a two-stage data mining approach was presented for determining the mining relationships among knowledge, attitude, and practice of drivers. In the first stage, because of existing multidimensional practice variables, self-organizing map neural network was utilized to automatically arrange drivers into two safe and unsafe driving practice clusters. In the second stage, a decision tree was used to model relationships among knowledge and attitude of drivers and practice clusters. The authors' designed questionnaires were used to collect data in 235 male taxi drivers of Bandar Abbas city in Iran regarding the drivers' knowledge and attitude toward traffic regulations. The driving practices were assessed using a prepared checklist. Results: The most important attribute affecting practice of drivers was the maximum safe speed in the city. Conclusions: The results of this investigation showed that drivers' knowledge toward traffic regulations had a dramatic impact on safe driving practices. Levels of drivers' education can influence practice of drivers.
Keywords: Attitude, decision tree, drivers, knowledge, practice, self-organizing map
|How to cite this article:|
Hadavandi E, Omidi L, Tajvar A, Ghanbari A. Mining relationships among knowledge, attitude, and practice of drivers using self-organizing map and decision tree: The case of Bandar Abbas city taxi drivers. Arch Trauma Res 2018;7:73-9
|How to cite this URL:|
Hadavandi E, Omidi L, Tajvar A, Ghanbari A. Mining relationships among knowledge, attitude, and practice of drivers using self-organizing map and decision tree: The case of Bandar Abbas city taxi drivers. Arch Trauma Res [serial online] 2018 [cited 2018 Dec 14];7:73-9. Available from: http://www.archtrauma.com/text.asp?2018/7/2/73/245579
| Introduction|| |
Traffic accidents are the leading causes of fatal or nonfatal injuries in many countries across the world. More than 20 million injuries and 1.17 million deaths are caused each year by road traffic accidents. Traffic accidents are responsible for six deaths per 100,000 people in Iran. The economic costs of traffic accidents in Iran during 2013 were 6% of gross national product. Traffic crashes and injuries are the most common causes of disabilities (approximately 90%) in developing countries.
Taxi driving is considered one of the most high-risk occupations. High levels of knowledge and positive attitudes toward traffic regulations may reduce accident risk among drivers and may have effects on behavior and practice of drivers., Inappropriate attitude of drivers toward traffic safety may be associated with high-risk behavior (practice) and is thought to be effective factor in road traffic crashes and injuries. Knowledge, attitude, and practice of drivers toward traffic regulations have been identified as the most important factors for causes of road traffic crashes. Some personal and behavioral factors including drug abuse, getting drunk, sensation-seeking, prestige-seeking, not wearing seat belts, and risk-accepting attitudes may lead to increases in the rate of traffic crashes. Previous studies have discussed the factors that contribute to safe driving practice using hypothesis tests in statistics. The prior research findings suggest significant relationships between demographic variables such as marital status (P = 0.02), education level (P = 0.01), occupation (P = 0.04), gender (P = 0.000), and practice of drivers. Tajvar et al. conducted a study in which they found that higher safe driving practice was observed in individuals with higher education. It has been demonstrated that younger drivers have higher accident rates as compared with older ones. In other words, younger drivers showed less positive safety attitudes toward safety than older ones and greater risk-taking attitudes.
Data mining approaches have been used in previous research studies on mining knowledge, attitude, and practice of drivers. Kashani and Mohaymany used decision tree (DT) to analyze the factors influencing crash injury severity on two-lane, two-way rural roads in Iran. According to the DT analysis, serious injuries occur in individuals failed to wear seat belts. Another important variable was causes of crashes such as weather conditions. Improper overtaking was the most important variable in the severity of road traffic crashes.
Clustering is an unsupervised task in data mining and is used to group individuals based on similarity of values for a set of input variables. The basic idea is to try to discover some clusters such that the individuals within each cluster are similar to each other and different from individuals in other clusters. Clustering methods can be classified as two categories: (i) partitioning methods that specify the number of clusters as an input parameter such as K-means or self-organizing map (SOM) neural networks and (ii) hierarchical methods that leave the decision of the number of clusters to the user. Among clustering methods, because of the flexible and stable architecture of SOM neural networks, it has been used in a wide range of applications.
It is therefore of interest to determine the relationships among practice cluster of a driver (safe or unsafe) and knowledge and attitude of him/her by considering the simultaneous impact of all influencing factors. In this article, the mining multivariate relationships among knowledge, attitude, and practice of drivers of Bandar Abbas city were considered using a two-stage data mining approach. In the first stage, the SOM was used to cluster drivers based on their practice factors into two major groups including safe driving and unsafe driving practices. Once the SOM identified the clusters of drivers, the relationships between practice clusters of the drivers and others variables such as demographics, knowledge, and attitude were identified using DT as an interpretable classification model.
| Materials and Methods|| |
A two-stage data mining model was presented to automatically group drivers based on their practice into two clusters (safe and unsafe driving practice) and identify the relationships between practice clusters and others variables. The framework of the proposed data mining approach is shown in [Figure 1].
Study subjects and measurement tools
A total of 252 male taxi drivers driving within Bandar Abbas city, a city in Southern Iran, were enrolled in this cross-sectional study. The participants who completed the questionnaires and checklist were included and taxi drivers were also excluded due to incomplete questionnaires or unwilling participation. The authors' designed questionnaire was used to collect data regarding the drivers' knowledge toward traffic regulations. The questionnaire consisted of 15 questions posed on the Iran's driver's license examination. The knowledge questionnaire included questions about the yellow lights at intersections, the maximum speed limit in city, proper braking techniques, maximum speed on suburban main road, the time interval between the two cars on a wet road, driver's actions when leaving a parking space, priority to go first at the junction, driver's actions when a vehicle appear suddenly in front of the vehicle, the best time to check the vehicle's tire pressure, the primary rule for driving on the highway, driver's actions in “no stopping (waiting)” zones, and driver's action when making a right turn at an intersection. Good level of internal consistency (Cronbach's alpha value of 0.82) was observed for this questionnaire. The attitude questionnaire, which was prepared under the supervision of experienced traffic police officers, included seven questions regarding using seat belt, exceeding speed limits, driving in prohibited areas, keeping safe distance from the next vehicle, crossing the road centerline, stop driving before entering a main street from a side street, and eating and drinking while driving. Acceptable level of internal consistency (Cronbach's alpha value of 0.75) was observed for attitude questionnaire. The driving practice checklist (Cronbach's alpha coefficient = 0.70) included 33 items about not using a seat belt, using a mobile phone while driving, speaking with passengers while driving, smoking while driving, exchanging taxi fare while driving, using impaired kilometer counter, eating and drinking while driving, driving over the speed limit, fatigue and drowsy driving, crossing against the red traffic light, and using horn in restricted area. Demographic features including age, work experience, marital status, and education level of drivers were also collected. The drivers were divided into nine groups based on their age (from range 1 [20–25 years] to range 9 [61–70 years]). According to the levels of education, they were divided into six groups (from illiterate to Bachelor of Science degree). Based on the work experience, the drivers were divided into seven groups (from range 1 [1–5 years] to range 7 [31–35 years]). All questions were answered by the drivers. For knowledge questionnaire, participants were required to select the correct answers. For attitude questions, all questions were assessed in five-point Likert scale. The driving practices were assessed using drivers' behavior checklist by ticking “yes” or “no” boxes in specially prepared checklist.
Self-organizing maps clustering
The self-organizing map is an unsupervised artificial neural network model proposed by Kohonen and further evaluated as a high-performance data clustering model to partition high-dimensional data. In SOM, a type of spatially organized clustering is done to group similar samples together and map the input space to a two-dimensional (2-D) space that detects the multidimensional proximity relationships between the clusters. Thus, SOMs are of great interest for pattern detection in behavioral science.
The SOM network consists of K neurons arranged in a 2-D hexagonal or rectangular grid. In an n-dimensional feature space, each neuron i is assigned a weight vector, wi ϵ Rn. The training algorithm suggested by Kohonen for organizing a feature map is stated as follows:
- Step 1: Initialization: Choosing random values for the initial weights wi of neuron i (index i = (p, q) for 2-D map)
- Step 2: Finding winner neuron: At each training epoch t, for each training sample, wi ϵ Rn the Euclidean distances between x and all neurons are calculated. A winning neuron with a weight of wj can be found according to the minimum distance to x, Eq. (1):
- Step 3: Updating weight of winner neuron and its neighborhoods: The SOM updates the weight of the winner neuron and its neighborhood neurons and moves closer to the input space according to Eq. (2):
Where αt and are monotonically describing learning rate and neighborhood kernel at epoch t, respectively. The initial learning rate α0 and a linearly decreasing function are used as Eq. (3).
Where Ep is the number of epochs for training SOM. The neighborhood kernel is a function of epoch number and distance between neighbor neuron i and the winning neuron. A widely applied neighborhood function is proposed in terms of the Gaussian function as Eq. (4).
Where rj and ri are the position of the winner neuron and neighborhood neuron on the map, respectively. σt is the kernel width and decreases in training phase. The process of weight-updating will be performed for a specified number of epochs (Ep).
Conventionally, statistical models such as logistic regression have been used to analyze classification problems in traffic area. However, these models use their own assumptions and predefined underlying relationships between dependent (class) and independent variables. If the assumptions are violated, the model can lead to erroneous predictions. One popular data mining model used for classification problems is DT. It is nonparametric model that does not depend on any functional form and requires no prior probabilistic knowledge on the phenomena under study. Furthermore, DT extracts a set of decision rules that can detect the patterns and behaviors in large data sets. There are different researches in safety science that use DT to find rules to understand the events leading up to a crash and identify the variables that determine how serious an accident will be.
There are different algorithms for DT construction, and the widely used algorithm in the literature is Classification And Regression Tree that built a binary tree. Another method is C4.5 algorithm that does not use the binary restriction in building tree. In this article, we use Chi-square Automatic Interaction Detector (CHAID) method as a highly efficient statistical technique for tree growing. CHAID can evaluate all of the values of a potential input variable using the significance of Chi-square statistical test and merges values that are judged to be statistically homogeneous with respect to the class variable and maintains all other values that are heterogeneous. It then selects the best input variable as root of tree to form the first branch such that each child node is made of a group of homogeneous values of the selected field. This process continues recursively until the tree is fully grown. The pseudocode of CHAID method is shown in algorithm 1 [Figure 2].
For DT, the research dataset is partitioned into two subsets including a training dataset consisted of 80% of the dataset used for model building and 20% of the remaining used for testing model. The performance evaluation of the models is done using testing set. By considering a classifier and an individual, there are four possible outcomes. If the pattern is positive and it is classified as positive, it is counted as a true positive (TP); if it is classified as negative, it is counted as a false negative (FN). If it is negative and is classified as negative, it is counted as a true negative (TN); and if it is classified as positive, it is counted as a false positive (FP). These measures are used as a basis for calculating many common performance metrics such as classification accuracy (CA), sensitivity, specificity, and G-measure. Sensitivity is the proportion of positives which are correctly classified and specificity is the proportion of negatives which are correctly classified. The CA is the proportion of true results (both TPs and TNs) in the individuals. The geometric mean (G-mean) measures the balance between model performance on the negative and positive class and avoids the overfitting to the negative class.
| Results|| |
Approximately, 43% of the drivers were in the age range 30–40 years, more than 90% of them were married, 40% had earned a high school diploma, 0.8% had a bachelor's degree, 32% had 6–10 years of work experience, and 27% had 11–15 years of work experience.
In this section, the proposed model was implemented using knowledge, attitude, and practice data of drivers in Bandar Abbas city, Iran. We used variables related to practice of a driver to obtain practice clusters.
Based on expert knowledge, it is popular to group drivers into two groups based on their practice variables. Hence, the variables related to practice of drivers were used to obtain practice clusters in the first stage of the proposed model. The clustering model utilized in this paper was arranged to form a 2-D SOM with 1 × 2 array of neurons. Each of these neurons was connected to the input vectors (practice variables) through synaptic weights which were adjusted during learning. The centers of the resulted clusters are shown in [Table 1]. Cluster 1 and cluster 2 contain 98 and 137 drivers, respectively. Because of the profile of clusters (mean of the practice variables for drivers in each cluster), we labeled cluster 1 as safe drivers and cluster 2 as unsafe drivers. The study showed that drivers in cluster 2 intended to engage in unsafe driving practice and the cluster mean points in this group were far away from number 2 and close to 1.
For visualizing two obtained clusters, principal component analysis was used to plot the samples in the first two principal components as shown in [Figure 3]. As can be seen in [Figure 3], the SOM clustering has partitioned drivers in two distinct clusters.
|Figure 3: The obtained clusters (1: “safe” and 2: “unsafe drivers”) in the first two principal components|
Click here to view
To establish statistical significance of difference between the obtained practice clusters, a paired t-test was performed to comparatively evaluate difference between the mean values of all variables in obtained clusters. To meet this purpose, the following hypotheses are proposed for all variables (knowledge, attitudes, and practice of drivers):
- H0: There is no difference among mean values of two clusters in a variable
- H1: A difference exists among mean values of two clusters in a variable.
We considered P > 0.95 as important and significant difference. P values of significant test of difference between mean values of clusters for important practice variables of drivers are shown in [Figure 4]. There were significant differences in the mean values of many variables for two obtained clusters that showed the capability of the SOM method to automatically identify practice clusters of drivers. The most important attribute and the second most important attribute with the highest predictive power were fail to follow priority rule and crossing prohibited area, respectively.
|Figure 4: P values of significant test of difference between mean values of clusters for important practice variables of drivers|
Click here to view
In the second stage of the proposed model, the relationships between practice cluster of the drivers and others variables such as demographics, knowledge, and attitude were identified using the CHAID method. The values of G-mean, accuracy, specificity, and sensitivity of the CHAID method for testing data were 0.81, 0.81, 0.79, and 0.84, respectively. Therefore, it can be said that the generalization accuracy of the CHAID method was high and it can be used in the analysis phase.
[Figure 5] presents the constructed DT for safe and unsafe practice of studied drivers. [Table 2] provides the six predictive rules obtained by DT. As can be seen from [Figure 5], the target feature in the root node was practice cluster; and maximum safe speed in city (question-related to drivers' knowledge toward traffic regulation) was the first-level attribute and other driving knowledge questions including driver's actions when leaving a parking space and driver can cross broken lines to overtake with caution were the third-level attributes. The fourth-level attributes were education levels of drivers as demographic variable and the time interval between the drivers' car and the car in front (knowledge question).
| Discussion|| |
In this study, the aim was to determine the relationships among knowledge, attitude, and practice of drivers toward traffic regulations using SOM and DT based on data obtained from taxi drivers of Bandar Abbas City. The cluster analysis showed two distinct groups of drivers including safe driving practice and unsafe driving practice.
Speaking with passengers while driving and exchanging taxi fare while driving have the lowest cluster mean points in drivers with unsafe driving practice. This finding is in agreement with the previous research conducted to determine the relationships between demographic features and knowledge, attitude, and practice of taxi drivers toward traffic regulations in Bandar Abbas.
They suggested that 78.8% of drivers exchange taxi fare while driving and 71.8% speak with passengers while driving.
Fail to follow the priority rules is the most important attribute of the drivers' unsafe practice. Among the 20 most important attributes, some practice variables (the first attributes) are very important. Then, attitude question including in an emergency, driver can drive in prohibited area, and some knowledge questions such as driver can cross broken lines to turn or overtake with caution and the yellow lights at intersections give high mean values. Among studied demographic features, work experience of drivers gives a high value of importance. It is encouraging to compare this figure with that found by Redhwan and Karim, who found a significant association between years of driving experience and a positive attitude toward speed rules and exposure to road traffic accident. Furthermore, according to the previous study, the rules of priority at the intersection may be one of the most important variables in traffic environment. Fail to follow the priority rules may lead to major accidents with serious adverse consequences.
According to the DT for practice cluster, if the drivers gave the correct answers (the first option is the correct answer “30 km/h in streets and squares”) to knowledge question of maximum speed in city and the answer for driver's actions when leaving a parking was the first option (the correct answer), then the drivers intended to engage in safe driving practice with probability 65.15 (Rule 1). Speed limit as an important factor has an important impact on traffic safety. Increases in speed limits were shown to increase the number of severe crashes and incidence of traffic accidents. Drivers' knowledge toward traffic regulations, especially high-speed driving, was the most important cause of traffic accidents in some studies conducted among university students in Malaysia. If the drivers did not give the correct answer to the knowledge question including driver's actions when leaving a parking (choosing option 2, 3, and 4 of knowledge questionnaire) and the education levels of drivers were middle school and diploma, then 82.35% of the drivers in this node intended to engage in unsafe practice (Rule 3). These results are consistent with those of other studies and suggest that drivers with lower education levels are more likely to contribute to unsafe practices compared with drivers with higher education levels. If the answers to “maximum speed in city” question were incorrect but the answers to “driver can cross broken lines to turn or overtake with caution” question and “the time interval between your car and the vehicle in front on a wet road” were correct (option 4 for both questions), then six drivers (100% of them in node 12) intended to engage in safe driving practice (Rule 6). Although, based on the DT model, attitude questions were not important attribute in the practice cluster of drivers, Mirzaei et al. have identified a significant association between drivers' attitude and unsafe driving practices (road traffic crashes) in Iranian drivers. Safety training program can enhance safety knowledge. Further studies, which take the mining relationships among knowledge, attitudes, and practice of drivers and traffic crashes and injuries into account, will need to be undertaken.
| Conclusions|| |
The purpose of the current study was to determine the mining relationships among knowledge, attitude, and practice of drivers toward traffic regulations using SOM and DT based on data obtained from taxi drivers of Bandar Abbas City. The most important attribute with the highest impact on practice cluster of drivers was “maximum speed in the city.” It was also shown that the drivers' knowledge toward traffic regulations had a tremendous impact on safe driving practice. Education levels of drivers can influence unsafe driving practice.
Financial support and sponsorship
This research was funded by Hormozgan University of Medical Sciences (Project No. 9290). We thank the staff of the University as well as drivers for their sincere supports and corporations.
Conflicts of interest
There are no conflicts of interest.
| References|| |
Tajvar A, Yekaninejad MS, Aghamolaei T, Shahraki SH, Madani A, Omidi L. Knowledge, attitudes, and practice of drivers towards traffic regulations in Bandar-Abbas, Iran. Electron Physician 2015;7:1566-74.
Redhwan A, Karim A. Knowledge, attitude and practice towards road traffic regulations among university students, Malaysia. Int Med J Malays 2010;9:29-34.
Salari H, Motevalian SA, Arab M, Esfandiari A, Akbari Sari A. Exploring measures to control road traffic injuries in Iran: Key informants points of view. Iran J Public Health 2017;46:671-6.
Moghaddam AM, Tabibi Z, Sadeghi A, Ayati E, Ravandi AG. Screening out accident-prone Iranian drivers: Are their at-fault accidents related to driving behavior? Transp Res F
Traffic Psychol Behav 2017;46:451-61.
Machin MA, De Souza JM. Predicting health outcomes and safety behaviour in taxi drivers. Transp Res F
Traffic Psychol Behav 2004;7:257-70.
Al-Khaldi YM. Attitude and practice towards road traffic regulations among students of Health Sciences College in Aseer Region. J Family Community Med 2006;13:109-13.
Mirzaei R, Hafezi-Nejad N, Sadegh Sabagh M, Ansari Moghaddam A, Eslami V, Rakhshani F, et al
. Dominant role of drivers' attitude in prevention of road traffic crashes: A study on knowledge, attitude, and practice of drivers in Iran. Accid Anal Prev 2014;66:36-42.
Yunesian M, Moradi A. Knowledge, attitude and practice of drivers regarding traffic regulations in Tehran. J Sch Public Health Inst Public Health Res 2005;3:57-66.
Turner C, McClure R. Age and gender differences in risk-taking behaviour as an explanation for high incidence of motor vehicle crashes as a driver in young males. Inj Control Saf Promot 2003;10:123-30.
Ulleberg P, Rundmo T. Risk-taking attitudes among young drivers: The psychometric qualities and dimensionality of an instrument to measure young drivers' risk-taking attitudes. Scand J Psychol 2002;43:227-37.
Kashani AT, Mohaymany AS. Analysis of the traffic injury severity on two-lane, two-way rural roads based on classification tree models. Saf Sci 2011;49:1314-20.
De Oliveira JV, Pedrycz W. Advances in Fuzzy Clustering and its Applications. Chichester: John Wiley & Sons; 2007.
Zarandi M, Hadavandi E, Turksen I. A hybrid fuzzy intelligent agent-based system for stock price prediction. Int J Intell Syst 2012;27:947-69.
Hadavandi E, Shavandi H, Ghanbari A. Integration of genetic fuzzy systems and artificial neural networks for stock price forecasting. Knowl Based Syst 2010;23:800-8.
Kohonen T. The self-organizing map. Proc IEEE 1990;78:1464-80.
Kohonen T. Essentials of the self-organizing map. Neural Netw 2013;37:52-65.
Al-Ghamdi AS. Using logistic regression to estimate the influence of accident factors on accident severity. Accid Anal Prev 2002;34:729-41.
Abellán J, López G, De Oña J. Analysis of traffic accident severity using decision rules via decision trees. Expert Syst Appl 2013;40:6047-54.
de Oña J, López G, Abellán J. Extracting decision rules from police accident reports through decision trees. Accid Anal Prev 2013;50:1151-60.
Kass GV. An exploratory technique for investigating large quantities of categorical data. Appl Statist 1980;29:119-27.
Bekkar M, Djemaa HK, Alitouche TA. Evaluation measures for models assessment over imbalanced data sets. J Inf Eng Appl 2013;3:27-38.
Jolliffe I. Principal Component Analysis. New York: Wiley Online Library; 2005.
Björklund GM, Åberg L. Driver behaviour in intersections: Formal and informal traffic rules. Transp Res F
Traffic Psychol Behav 2005;8:239-53.
Greibe P. Accident prediction models for urban roads. Accid Anal Prev 2003;35:273-85.
Hassen A, Godesso A, Abebe L, Girma E. Risky driving behaviors for road traffic accident among drivers in Mekele City, Northern Ethiopia. BMC Res Notes 2011;4:535.
Poursadeghiyan M, Omidi L, Hami M, Raei M, Biglari H. Epidemiology of fatal and non-fatal industrial accidents in Khorasan Razavi Province, Iran. Int J Trop Med 2016;11:170-4.
[Figure 1], [Figure 2], [Figure 3], [Figure 4], [Figure 5]
[Table 1], [Table 2]