Sunday, February 15, 2015

Intermittent Errors or issues which occur randomly.

Intermittent Errors or issues which occur randomly.

There may be various factors associated for any particular software getting failed randomly. I have come across these reasons.

1. Code or Data Issue.
When a developer develops a piece of code, he may not have considered a case or scenario where a sql statement could return two rows and developer would have coded believing return value of one row. So in this, he may actually miss to pick up relevant row in the subsequent while.resultset() code.
And the code may pick up wrong value and try to process wrong value which can lead to failure. Now this behavior will be random as the orders in which rows will be fetched were not being specified.
So, sometimes it would, just by chance have picked up correct value and processed and sometime would have failed.



2. Performance Issue.
In one case I found that my software (MDM) makes entry into history tables by using AFTER TRIGGERS to keep history of updated, deleted or newly inserted records.
Now on one of the UPDATE trigger for a particular table say ANSWER table, which would create a new entry in H_ANSWER table and update the previous entries with end_dates(to mark them as ended); the update sql statement(in UPDATE TRIGGER) had where clause which itself is equivalent to search clause on a particular column which actually was not indexed(problem).
So, whenever any updates on ANSWER table was coming, UPDATE trigger will do operations on H_ANSWER and was getting failed randomly due to performance.
I presumed where user had updated lot of answers per transaction, those while updating the H_ANSWER table were getting timed out.
After applying index on that particular column, problem got resolved.


3. Session
In one case, Tester came back and said that he is getting intermittent failure on one of the screen (USER Interface). Sometimes he will get and sometime he may not get.
I asked for logs of that particular environment and realized that in cases where he was getting error he was executing some different flow and somehow session was getting maintained and was getting carried to the screen where he was getting this failure.
And in cases he would simply test other flows and then test this flow, it would work fine.
So culprit here was improper session management.



So intermittent failures or errors are always associated with some issues which are hard to debug but can be fixed by doing some analysis. I just chalked out few cases. If you come across any, please do mention in the comments to let others know and help in their analysis.




1 comment:

  1. And today I found one because of comparison made between a timestamp and CurrentTimeStamp. Sometimes it was able to pass and sometimes it was failing.

    ReplyDelete