Methods that perform very well under one challenge (e.g., background motion) may not perform well in the presence of another challenge (e.g., strong shadows or night videos). Moreover, authors tend to compare their own method to typical easily-implementable algorithms, thus giving an overwhelming importance to a limited number of methods while marginalizing those more difficult to implement. Furthermore, they often use their own videos making the comparison somewhat biased.

In addition to providing a fine-grained and accurate annotation of videos (2012 DATASET and 2014 DATASET ), we also provide tools to compute the following performance metrics:

  • Average ranking accross categories : (sum of ranks for all categories) / (number of categories)
  • Average ranking : (rank:Recall + rank:Spec + rank:FPR + rank:FNR + rank:PWC + rank:FMeasure + rank:Precision) / 7
  • TP : True Positive
  • FP : False Positive
  • FN : False Negative
  • TN : True Negative
  • Re (Recall) : TP / (TP + FN)
  • Sp (Specificity) : TN / (TN + FP)
  • FPR (False Positive Rate) : FP / (FP + TN)
  • FNR (False Negative Rate) : FN / (TP + FN)
  • PWC (Percentage of Wrong Classifications) : 100 * (FN + FP) / (TP + FN + FP + TN)
  • F-Measure : (2 * Precision * Recall) / (Precision + Recall)
  • Precision : TP / (TP + FP)
  • FPR-S : Average False positive rate in hard shadow areas

These metrics are reported for each dataset (2012 DATASET RESULTS and 2014 DATASET RESULTS) and allow to identify algorithms that are robust across various challenges. The source code to compute all performance metrics is provided in UTILITIES.