The original source of the data comes from a PLoS-Computational Biology paper by Hasan and colleagues (Hasan et al., 2006), who collected various attributes on the 3927 genes in the H37Rv genome of Mycobacterium tuberculosis, including data on druggability, enzyme function, essentiality, and DNA micro-array results (gene expression profiles from various models simulating latency conditions, e.g. hypoxia, starvation, high pH...).
The Hasan paper combines the various values for each gene together using a weighted-sum (linear combination) scoring function. While the Hasan paper proposed a particular set of weights, we recognize that other researchers have alternative goals in mind and want to try different weighting schemes.
Column Selection Different researchers use different criterion to evaluate possible drug targets. In order to provide them with this flexibility, the first page of the tool allows the user to view all the different categories of information available regarding each of the targets. The user can then select the criterion that they want to explore further. If the user wishes to reselect criteria at any stage they can use the Reselect Columns button on the second page.
Weights and Score Each of the selected criteria can be assigned a user defined weight. (Default weights based on the Hasan paper are shown and can be automatically chosen.) Users can use this feature to examine the effects of varying the influence of criteria over the target prioritization. Each target is assigned a final score which is computed as the weighted sum over all the selected criteria. The drug targets are sorted based on the score. (Targets with the highest score are at the top of the list.)
Sorting As mentioned earlier, the targets are sorted in descending order based on the overall score. Additionally, it is possible to sort the targets based on any of the selection criteria either in ascending or descending order. The data can be sorted based on any one column at a time. In order to resort based on a selection either click on the Rescore button on the upper left hand corner of the page or click 'Enter' anywhere inside the table.
Normalization The user has two options to normalize the data in all the columns: unit normalize and standard normalization. Unit normalize option normalizes the data to the range [0, 1]. Standard normalize normalize option normalizes the data to a distribution with mean of 0 and variance of 1.
Selection Criterion
The user can choose to only view those targets that have values greater than a threshold. They
can specify the selection criteria for each column (for eg.) as
Correlate The user can examine the relationships between various criteria by calculating the correlation between these columns. At this point we cannot correlate discrete data with continuous data. If the two data columns being correlated contain discrete data, then a table containing the counts for each set fo discrete values is displayed. If the data columns contain continuous data then a graph showing the distribution is shown.
Transform Threshold The user can set a threshold for the values in each column. If the data value in a column is greater than or equal to the threshold, a value of 1 is added to count, otherwise zero is added. A final count for each target is computed. The assumption here is that if a target resonds to a treatment, its level of expression is not as important as there being an expression. This count is also accompanied by a color coding (green if data ≥ threshold). This allows greater visualization of the results. Additionally, the targets are sorted based on the count. (The top 200 targets are shown.)
Statistics For each column, the min and max values are computed. The mean, standard deviation and the range of data values are also computed.