Dataset of Developer-Labeled Commit Messages


Current research on change classification centers around automated and semi-automated approaches which are based on evaluation by either the researchers themselves or external experts. In most cases, the persons evaluating the effectiveness of the classification schemes are not the authors of the original changes and therefore can only make assumptions about the intent of the changes. To support validation of existing labeling mechanisms and to provide a training set for future approaches, we present a survey of source code changes that were labeled by their original authors. Seven developers from six different project applied three existing classification schemes from current literature to enrich their own changes with meta-information, so the intent of the changes becomes more evident. The final data set consists of 967 classified changes and is available as an SQLite database as part of the MSR data set.

Talk: The 12th Working Conference on Mining Software Repositories (MSR), Florenz, Italien; 05-16-2015 - 05-17-2015; in: “Proceedings of the 12th Working Conference on Mining Software Repositories (MSR)", IEEE, (2015), ISBN: 978-0-7695-5594-2; 490 - 493
Christian Schanes
Projektass. Dipl.-Ing. Dr.techn.
Thomas Grechenig
Thomas Grechenig
Ao.Univ.Prof. Dipl.-Ing. Dr.techn.