The outlook to use crop residues, cover crops and plant residues with high content of allelochemicals into the soil, as well as extracts of allelopathic plants, can provide selective weed control through the release of allelochemicals [1-3] and can contribute to the maintenance and the development of sustainable agro-ecosystems. As Salvia exudates in a screening work have shown to possess inhibitory activity on the germination of Papaver rhoeas L. and Avena sativa L. [4], a database for fast and efficient data collection among the various collaborating groups to be used in the following research steps to define the allelopathic potential of these species has been developed. Through web-based tools we can exchange in a secure and safe way structured data in order to speed up their treatment. The germination experimental results are automatically placed in the database structure, together with morphological data collected on the subsequent growth of seedlings (height, length of cotyledon, root length, seedlings fresh and dry weight). In addition, this structure are complemented by algorithms for calculating the indexes usually reported in the literature (Total Germination - GT, Speed of Germination - S, Speed of Accumulated Germination - AS, Coefficient of the Rate of Germination - CRG), new indexes defined by us [5] (Weighted Average Damage - WAD, Differential Weighted Average Damage - DWAD, Germination Weighted Average Velocity - GWAV) and other variables usually recorded in experiments of phytotoxicity (LC50, LC90). Moreover, algorithms have been designed to calculate the one-way ANOVA followed by Duncan's multiple range test to automatically highlight significant differences between the species (Fig. 1). As all these indexes are not in a full agreement, it is not possible with inferential statistics only to give a comparative description of the activity or the different species. For this reason, further analysis, based on the artificial neural network approach, has been done. In fact, previous experience of some authors [6-8], evidenced that this approach can be very valuable in scenarios where a great amount of quantitative data are used to yield evidence based conclusions. The nature of the relational database structure allowed us to apply native methods of data mining to identify hidden relationships and connections within the full set of data, providing that these algorithms are fed by high valuable data from the information content point of view. For this reason we have created an algorithm associated with the database to filter incoming data according to their significance information, both within and among the attributes. The filtering criteria are based on the awareness that the first days after sowing germinal values are null and that the variables fresh weight and dry weight are highly correlated. Moreover, some rows and columns have been excluded because of a variation coefficient smaller than 10% (within one element), and of a correlation greater than 90% with another element (only one element over two are maintained with the considered set). Among all possible data mining algorithms present in literature, we have applied neural networks in the form of self-organizing maps (SOMs), also called Kohonen maps. The SOMs are single layer feed-forward neural networks where the output neurons are organized in grids [9]. Each SOM is trained using unsupervised learning to produce a representation of the training samples in a low dimension space while preserving the topological properties of the input space. After the application of SOMs, we have applied the k-means algorithm to define neuron clusters. It is notorious from the literature that the result of applying k-means algorithm depends greatly on the centroid initial choice. For this reason, we have chosen 5 neurons randomly selected as centroids, to avoid any bias and we repeated the clustering 100 times. Using the skimmed files we have began the clustering, setting the epochs number to 200 for the first phase (rough training phase) and to 1000 for the second phase (fine tuning phase); the number of neurons was set to 90 and the maximum number of clusters to search equal to 5. An example of tests of the clustering results is shown in Fig.2. Since 5 was the maximum number of clusters to be searched, the algorithm in some cases found 3 or 4 clusters, when convergence was reached in advance. Table 1 resumes clustering tests results. The clustering were found to be fairly homogeneous although the tests are independent of each other. In fact, as said above, the order of neurons is randomized for each test to ensure that the tests have the same input data but in random order, so that the operation of clustering does not depend from the order of input data. As in half the cases we have obtained a limited statistical evidence of the real presence of different clusters, we are currently developing an algorithm based on the Kohonen map topology and inspired by the Indicator Value Analysis method of Dufrène and Legendre [10] to receive a ranking of the activity of the various Salvias species. Also this new set of calculation rely on the relational structure of the DB, so we can assess that the present tool can be taken as a prototype for a fast and efficient management of such type of data. Further application of this tool will be able to prove also its versatility.
Phytotoxic activity of aerial part exudates of Salvia spp: data collection and advanced statistical analysis
GIACOMINI, MAURO;BISIO, ANGELA;GIACOMELLI, EMANUELA;PIVETTI, SUSANNA;BERTOLINI, SIMONA;ROMUSSI, GIOVANNI;
2010-01-01
Abstract
The outlook to use crop residues, cover crops and plant residues with high content of allelochemicals into the soil, as well as extracts of allelopathic plants, can provide selective weed control through the release of allelochemicals [1-3] and can contribute to the maintenance and the development of sustainable agro-ecosystems. As Salvia exudates in a screening work have shown to possess inhibitory activity on the germination of Papaver rhoeas L. and Avena sativa L. [4], a database for fast and efficient data collection among the various collaborating groups to be used in the following research steps to define the allelopathic potential of these species has been developed. Through web-based tools we can exchange in a secure and safe way structured data in order to speed up their treatment. The germination experimental results are automatically placed in the database structure, together with morphological data collected on the subsequent growth of seedlings (height, length of cotyledon, root length, seedlings fresh and dry weight). In addition, this structure are complemented by algorithms for calculating the indexes usually reported in the literature (Total Germination - GT, Speed of Germination - S, Speed of Accumulated Germination - AS, Coefficient of the Rate of Germination - CRG), new indexes defined by us [5] (Weighted Average Damage - WAD, Differential Weighted Average Damage - DWAD, Germination Weighted Average Velocity - GWAV) and other variables usually recorded in experiments of phytotoxicity (LC50, LC90). Moreover, algorithms have been designed to calculate the one-way ANOVA followed by Duncan's multiple range test to automatically highlight significant differences between the species (Fig. 1). As all these indexes are not in a full agreement, it is not possible with inferential statistics only to give a comparative description of the activity or the different species. For this reason, further analysis, based on the artificial neural network approach, has been done. In fact, previous experience of some authors [6-8], evidenced that this approach can be very valuable in scenarios where a great amount of quantitative data are used to yield evidence based conclusions. The nature of the relational database structure allowed us to apply native methods of data mining to identify hidden relationships and connections within the full set of data, providing that these algorithms are fed by high valuable data from the information content point of view. For this reason we have created an algorithm associated with the database to filter incoming data according to their significance information, both within and among the attributes. The filtering criteria are based on the awareness that the first days after sowing germinal values are null and that the variables fresh weight and dry weight are highly correlated. Moreover, some rows and columns have been excluded because of a variation coefficient smaller than 10% (within one element), and of a correlation greater than 90% with another element (only one element over two are maintained with the considered set). Among all possible data mining algorithms present in literature, we have applied neural networks in the form of self-organizing maps (SOMs), also called Kohonen maps. The SOMs are single layer feed-forward neural networks where the output neurons are organized in grids [9]. Each SOM is trained using unsupervised learning to produce a representation of the training samples in a low dimension space while preserving the topological properties of the input space. After the application of SOMs, we have applied the k-means algorithm to define neuron clusters. It is notorious from the literature that the result of applying k-means algorithm depends greatly on the centroid initial choice. For this reason, we have chosen 5 neurons randomly selected as centroids, to avoid any bias and we repeated the clustering 100 times. Using the skimmed files we have began the clustering, setting the epochs number to 200 for the first phase (rough training phase) and to 1000 for the second phase (fine tuning phase); the number of neurons was set to 90 and the maximum number of clusters to search equal to 5. An example of tests of the clustering results is shown in Fig.2. Since 5 was the maximum number of clusters to be searched, the algorithm in some cases found 3 or 4 clusters, when convergence was reached in advance. Table 1 resumes clustering tests results. The clustering were found to be fairly homogeneous although the tests are independent of each other. In fact, as said above, the order of neurons is randomized for each test to ensure that the tests have the same input data but in random order, so that the operation of clustering does not depend from the order of input data. As in half the cases we have obtained a limited statistical evidence of the real presence of different clusters, we are currently developing an algorithm based on the Kohonen map topology and inspired by the Indicator Value Analysis method of Dufrène and Legendre [10] to receive a ranking of the activity of the various Salvias species. Also this new set of calculation rely on the relational structure of the DB, so we can assess that the present tool can be taken as a prototype for a fast and efficient management of such type of data. Further application of this tool will be able to prove also its versatility.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.