Replication dataset

This page contains the data and scripts required to replicate the results of the study entitled Predicting Vulnerable Components: Software Metrics vs Text Mining.

All data files and scripts on this page were prepared or authored by Riccardo Scandariato, Jeffrey Stuckman, or James Walden

Feature data

(9.3K) File metrics for Drupal 6.0

(1.9MB) Text mining token data for Drupal 6.0

(121K) File metrics for Moodle 2.0.0

(18M) Text mining token data for Moodle 2.0.0

(15K) File metrics for PHPMyAdmin 3.3.0

(5.3M) Text mining token data for PHPMyAdmin 3.3.0

Product source code and tokenization scripts

(829) Script to tokenize PHP files

(572KB) Single-version archive containing Drupal 6.0

(304MB) Git repository providing files for all versions of Moodle

(215MB) Git repository providing files for all versions of PHPMyAdmin

README describing how product files can be extracted

Vulnerability prediction training and testing scripts

(1752) Script to perform cross-project prediction with file metrics

(1979) Script to perform cross-project prediction with text mining features

(1997) Script to train and test within-project prediction models with file metrics

(2212) Script to train and test within-project prediction models with text mining

(520) Script used by cross-project prediction scripts to print results

(577) Script used by within-project prediction scripts to print results

(938) Script to compare and test within-project results

Instructions

To perform the vulnerability prediction experiments using the scripts in this page, several steps must be followed: (1) ensuring that system requirements have been met; (2) downloading the scripts and data into the proper directory structure; (3) setting environment variables; (4) training and testing predictive models; (5) viewing the prediction results; (6) performing cross-project prediction.

System requirements

Downloading scripts and data

All data files and scripts required to perform the experiments are available from this page. Data files should be uncompressed and placed in one directory structure (here, called php-pred), while scripts should be uncompressed and placed in another structure (here, called pred-scripts).

The following files and subdirectories should be created in php-pred:

The following files and subdirectories should be created in pred-scripts:

Setting environment variables

An environment variable PHP_PRED should be set as the path of the php-pred directory created in the previous step. An environment variable WEKA_JAR should be set as the full pathname and filename of the weka.jar file included with Weka.

Testing and training predictive models

Switch to the previously created pred-scripts directory and run the following series of scripts:

./xvalid-metrics.sh -f 3 -l 10 -u -m forest drupal-6_0
./xvalid-text.sh -f 3 -l 10 -u -m forest drupal-6_0
./xvalid-metrics.sh -f 3 -l 10 -u -m forest phpmyadmin-3_3_0
./xvalid-text.sh -f 3 -l 10 -u -m forest phpmyadmin-3_3_0
./xvalid-metrics.sh -f 3 -l 10 -u -m forest moodle-2_0_0
./xvalid-text.sh -f 3 -l 10 -u -m forest moodle-2_0_0

Viewing and comparing prediction results

In the same directory, run the following series of scripts, changing ~/php-pred to the actual location of the php-pred directory:

Rscript R/xvalid-test.R ~/php-pred/out/drupal-6_0/xvalid.metrics.forest.csv ~/php-pred/out/drupal-6_0/xvalid.text.forest.csv
Rscript R/xvalid-test.R ~/php-pred/out/phpmyadmin-3_3_0/xvalid.metrics.forest.csv ~/php-pred/out/phpmyadmin-3_3_0/xvalid.text.forest.csv
Rscript R/xvalid-test.R ~/php-pred/out/moodle-2_0_0/xvalid.metrics.forest.csv ~/php-pred/out/moodle-2_0_0/xvalid.text.forest.csv

Cross-project prediction

In the same directory, run the following series of scripts:

./xproject-metrics.sh -u -l 10 -m forest -t phpmyadmin-3_3_0 -T moodle-2_0_0
./xproject-text.sh -u -l 10 -m forest -t phpmyadmin-3_3_0 -T moodle-2_0_0
./xproject-metrics.sh -u -l 10 -m forest -t phpmyadmin-3_3_0 -T drupal-6_0
./xproject-text.sh -u -l 10 -m forest -t phpmyadmin-3_3_0 -T drupal-6_0
./xproject-metrics.sh -u -l 10 -m forest -t moodle-2_0_0 -T phpmyadmin-3_3_0
./xproject-text.sh -u -l 10 -m forest -t moodle-2_0_0 -T phpmyadmin-3_3_0
./xproject-metrics.sh -u -l 10 -m forest -t moodle-2_0_0 -T drupal-6_0
./xproject-text.sh -u -l 10 -m forest -t moodle-2_0_0 -T drupal-6_0
./xproject-metrics.sh -u -l 10 -m forest -t drupal-6_0 -T phpmyadmin-3_3_0
./xproject-text.sh -u -l 10 -m forest -t drupal-6_0 -T phpmyadmin-3_3_0
./xproject-metrics.sh -u -l 10 -m forest -t drupal-6_0 -T moodle-2_0_0
./xproject-text.sh -u -l 10 -m forest -t drupal-6_0 -T moodle-2_0_0