I had a bad case of bug hunting today which took me > 5 hours to track down (with the help of Adrien in the end).
I was trying to start a CubicWeb instance on my computer, and was encountering some strange pyro error at startup. So I edited some source file to add a pdb.set_trace() statement and restarted the instance, waiting for Python's debugger to kick in. But that did not happen. I was baffled. I first checked for standard problems:
- no pdb.py or pdb.pyc was lying around in my Python sys.path
- the pdb.set_trace function had not been silently redefined
- no other thread was bugging me
- the standard input and output were what they were supposed to be
- I was not able to reproduce the issue on other machines
After triple checking everything, grepping everywhere, I asked a question on StackOverflow before taking a lunch break (if you go there, you'll see the answer). After lunch, no useful answer had come in, so I asked Adrien for help, because two pairs of eyes are better than one in some cases. We dutifully traced down the pdb module's code to the underlying bdb and cmd modules and learned some interesting things on the way down there. Finally, we found out that the Python code frames which should have been identical where not. This discovery caused further bafflement. We looked at the frames, and saw that one of those frames's class was a psyco generated wrapper.
It turned out that CubicWeb can use two implementation of the RQL module: one which uses gecode (a C++ library for constraint based programming) and one which uses logilab.constraint (a pure python library for constraint solving). The former is the default, but it would not load on my computer, because the gecode library had been replaced by a more recent version during an upgrade. The pure python implementation tries to use psyco to speed up things. Installing the correct version of libgecode solved the issue. End of story.
When I checked out StackOverflow, Ned Batchelder had provided an answer. I didn't get the satisfaction of answering the question myself...
Once this was figured out, solving the initial pyro issue took 2 minutes...