Search-based Software Refactoring: Towards Semantic Preservation


Successful software products are evolved by introducing continuous changes. However, this may in turn introduce poor design effects and make systems complex. This complexity leads to significantly reduced productivity, decreased system’s performance, increased fault-proneness, made software costly and even canceled projects. Many studies reported that software engineers spend around 60% of their time in understanding the code. Clearly, there is an urgent need for our software engineers to find better ways to reduce and manage the growing complexity of software systems and improve their productivity. Refactoring, which improves design structure while preserving the overall functionalities and behavior, is an extremely important solution to address this challenge.

After more than a decade of research in the field, it is not sufficient to detect refactoring opportunities and suggest refactoring solutions based only on structural (metrics) indications. In object-oriented programs, objects reify domain concepts and/or physical objects, implementing their characteristics and behavior. Unlike other programming paradigms, grouping data and behavior into classes is not guided by development or maintenance considerations. Methods and fields of classes characterize the structure and behavior of the implemented domain elements. Consequently, a program could be syntactically correct, implement the appropriate behavior, but violate the domain semantics if the reification of domain elements is incorrect. During the initial design/implementation, programs usually capture well the domain semantics when object-oriented principles are applied. However, when these programs are refactored during maintenance, the adequacy with regards to domain semantics could be compromised. Semantics preservation is an important issue not well-addressed in current refactoring literature.

I will explore in this talk several approaches that we proposed based on mono/multi/many-objective optimization techniques for detecting refactoring opportunities and suggesting viable refactoring solutions based on several measures inspired from NLP (vocabulary-based similarity) such as cosine similarity to estimate the similarity between the name of code elements, comments and code documentation to recommend new code changes. I will also present several other applications of our vocabulary-based techniques to model transformation and meta-model matching. An evaluation of the proposed contributions on large scale open-source and industrial systems will be also presented during this talk. Finally, I will describe several challenges and future research directions in the refactoring area.