The YACC Cleans up my Code!
Interesting week! Even in my wildest dreams, I never hoped that I would get to wet my hands again on the beautiful programming of writing a compiler. I have done this as a part of my BTech curriculum, but did not expect it to pop up during my Summer of Code.
My project included building a query interface and thus obviously a query language. Since the intended language was simple and easy,I was in an idea to implement some basic parsing technique like the recurrence descent parser. Once I started the coding, I found that recurrence descent parsing was painful, and I ended up writing the naive code to process a string array. This was error prone and very hard to manage, as it lead to a lot too many index errors and more dangerous, ignored extra parameters.
My next option was to use a regular expression based parser, that I successfully built and worked fine. I used the config.ini file to store the regular expressions for each command and used it to validate the command. This approach did not break anywhere, but the code looked rather ugly, with a lot too many `pops`.Another huge drawback was the lack of good error reporting, as I could not report where the command went wrong. All I reported was that the syntax is wrong and printed the command usage string. I asked in the python IRC about a way to print better error messages for failed regular expressions, it was there I got the suggestion to use a parser. Since this is a PSF project, I could not quite ignore the suggestion from the Python IRC. Further, regular expression approach only performed the validation part, I had to hard code the command parsing part. This can create difficulties in extending the project in the future.
I did some amount of research on the parser libraries available for python and I settled on the PLY, a project that had a good documentation and lot too many examples. I tried a few sample grammars and began writing the parser for my project. I used the class approach of PLY. I had to choose between a common parser for the whole project and separate parser for each command, I settled for the latter one, with a assumption that it would be cleaner and easier to manage and extend. In two days time,I completed the parser for each command and also rebuilt the environment variable management part. I completely stashed the decorators used for the command validation and pre-processing.
The code is now a lot more cleaner and readable than before. Errors are beautifully reported and handled and all works fine. Time is running fast, and I plan to complete my tasks at least a week before the deadline of 11/08/2014. I have announced in the mm-dev list that I work with 6th August as my soft deadline.
Last but not the least, blogging from the Kanyakumari Banglore Island Express, thanks to the Indian Railways!