python - Searching basic comments in C++ by regex -
i'm writing python program searching comments in c++ program using regex. wrote following code:
import re regex = re.compile(r'(\/\/(.*?))\n|(\/\*(.|\n)*\*\/)') comments = [] text = "" while true: try: x= raw_input() text = text + "\n"+ x except eoferror: break z = regex.finditer(text) match in z: print match.group(1)
this code should detect comment of type //i'm comment
, /*blah blah blah blah blah*/
i'm getting following output:
// program in c++ none //use cout
which i'm not expecting. thought match.group(1) should capture first parenthesis of (\/\*(.|\n)*\*\/)
, not. c++ program i'm testing is:
// program in c++ #include <iostream> /** love c++ awesome **/ using namespace std; int main () { cout << "hello world"; //use cout return 0; }
you didn't use order since inline comment can include inside multiline comment. need begin pattern multiline comment. example:
/\*[\s\s]*?\*/|//.*
note can improve pattern if have long multiline comments (this syntax emulation of atomic group feature not supported re module):
/\*(?:(?=([^*]+|\*(?!/))\1)*\*/|//.*
but note there other traps string contains /*...*/
or //.....
.
so if want avoid these cases, example if want make replacement, need capture before strings , use backreference in replacement string, this:
(pattern strings)|/\*[\s\s]*?\*/|//.*
replacement: $1
Comments
Post a Comment