An IDE tool for data structure changes

A few days ago I was writing some unit tests for my code, when I realized the data structure my code was based on was incorrect.  I had, as one of the variants of my data type, RegExBase char, where I wanted RegExBase string instead.  This is because I wanted to be able to express empty string as a valid regular expression (other strings can just be built up from RegExConcat (regex * regex)).

This change was a fairly minor one, and the only real code changes I needed to make were in the pretty printer… and in the unit tests.  The changes I needed to make for the unit tests were fairly easy (search and replace ‘ for “).  However, it did highlight a critical flaw that many people who are unwilling to adopt unit testing bring up.  Adding in unit tests increases development time, and makes refactoring more difficult.  Increasing development time is up for debate, I saw a recent study that showed that TDD increased initial development time by ~20%, but caused there to be ~20% fewer bugs (I could be wrong on those percentages).  Of course, unit tests aren’t merely for decreasing bugs right there, serve as documentation for future users about the intention of the code.  However, refactoring is certainly more difficult for a user who knows the codebase.  It makes refactoring easier for future developers, as they can understand the code, but for the original author of the code, who understands how everything fits together, the unit tests are merely a barrier in the way of refactoring.

However, the IDE has a wealth of information available to it during these transformations.  Not only does it see the before and after state of the data structure, it also sees how the actual code is changed to work with the new data structure, and it knows it wants all previously failing tests to continue failing, and all previously succeeding tests to continue succeeding.  The IDE can use this information to make (or give suggestions) at automated changes to the unit tests.

But it can go a step further.  While it can automatically synthesize new unit tests, perhaps it can go the reverse direction, and a user needs to change the unit tests, and the code already changes.  This then begins to look like a normal input-output synthesis problem, but with the additional information about how the code was changed in the unit tests.  And it can then go even further, where maybe just one or two unit tests are changed, and it tries to generalize to the normal code and the rest of the unit tests.  Or maybe only one function is changed, and it tries to generalize to the rest of the code and all the unit tests.  This could make refactoring significantly easier when there are simple changes like changing a character parameter to a string parameter, especially when working with a large codebase, where large amounts of time could be spent chasing where the build failures occur.


Falling into the pit of success of filling conventions

Falling into the pit of success is a often toted phrase in programming, and amongst people in programming languages.  The idea behind the phrase is that in a well designed language, it is easier to write a correct program than an incorrect program.  Because it’s easier to write a correct program, the programmer will nearly accidentally write the correct program, falling into it seemingly accidentally.  This is done usually through strong typing, where incorrect programs aren’t considered programs, and through languages where the meaning of the program is easy to tell from looking at the program.  This is also done through some software engineering libraries, where the code in it reads like a sentence in English.

 .Call(handler) //TODO: better example

When another user reads this code, they can understand it as well as they can understand normal english sentences.  Similarly, writing in that code is as easy as writing in normal english sentences.

Conventions are certain restrictions imposed on developers.  These conventions can be used to remove bad and dangerous patterns.  One such type of convention is not allowing for global variables.  Another type of convention is to make the program easier to understand for readers.  For example, prepending m_ before private variables in C#.  Many of the reasons for conventions are those same reasons to get developers to fall into the pit of success.  Unfortunately, unlike a well designed programming language, or api, which will disallow through the type system and syntax things that go against conventions, conventions can be broken.  During high pressure situations, like a looming deadline, justifications can be made to break those conventions, and discipline must be exercised to keep those conventions from being broken.  However, in my experience, these justifications never are truly worth it, and don’t seem to pay off, but us humans are weak and will take that path of least resistance.  Perhaps languages should make themselves easily modifiable, allowing companies to put in their own conventions into the system.  Or perhaps in large releases of the language, existing and widely accepted conventions should be incorporated into the language.