-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HunkBlame.py Operation #103
Comments
@linzhp knows best about this stuff. As far as your first point, yes, it only goes down to the commit. This was all we needed it to do for our experiment. If you have a way of keeping the current functionality but going down to file level, that is a pull request I would take :) The second question I don't have the answer to. |
The initial implementation of HunkBlame was able to trace from a bug fixing hunk to bug introducing hunks: https://github.com/SoftwareIntrospectionLab/cvsanaly/blob/5a3a0edf6fc108d102351e457520a11db173dc5f/pycvsanaly2/extensions/HunkBlame.py hunk_blames id: int Though the coexistence of bug_commit_id and bug_hunk_id is a redundant, this structure won't break the code @lewisham is experimenting with. With regard to the second question, HunkBlame assign blames for all hunks in a given commit, the query in the code is: select distinct h.file_id, h.commit_id |
Hi guys, thanks for getting back to me! I appreciate it and your time answering me. @linzhp, I'll throw that code back in and take a look at the result. Glad to know that functionality exists. With the current version HunkBlame.py, here's what I'm looking at ... I just need a little help interpreting what/why/how it is operating: Here's some example code from the voldemort repository (it turns out that commit "3" 46a5b fixes a bug in the initial commit "1" fdb0f). Here's the diff for commit 3: git diff -U1 46a5b^ 46a5b | more diff --git a/.classpath b/.classpath
index 3c765c8..e393f40 100644
--- a/.classpath
+++ b/.classpath
@@ -9,3 +9,2 @@
<classpathentry kind="lib" path="lib/catalina-ant.jar"/>
- <classpathentry kind="lib" path="lib/colt.jar"/>
<classpathentry kind="lib" path="lib/commons-codec-1.3.jar"/>
@@ -31,2 +30,3 @@
<classpathentry kind="lib" path="lib/xerces.jar"/>
+ <classpathentry kind="lib" path="lib/colt-1.2.0.jar"/>
<classpathentry kind="output" path="classes"/>
diff --git a/src/java/voldemort/serialization/json/JsonTypeSerializer.java b/src/java/voldemort/seri
alization/json/JsonTypeSerializer.java
index 9fa1fc0..a28ad4b 100644
--- a/src/java/voldemort/serialization/json/JsonTypeSerializer.java
+++ b/src/java/voldemort/serialization/json/JsonTypeSerializer.java
@@ -133,2 +133,3 @@ public class JsonTypeSerializer implements Serializer<Object> {
writeBoolean(output, (Boolean) object);
+ break;
default:
@@ -480,3 +481,3 @@ public class JsonTypeSerializer implements Serializer<Object> {
+ " but got " + object);
- for(Map.Entry<String, Object> entry : type.entrySet()) {
+ for(Map.Entry<String, Object> entry: type.entrySet()) {
if(!object.containsKey(entry.getKey()))
@@ -493,3 +494,3 @@ public class JsonTypeSerializer implements Serializer<Object> {
Map<String, Object> m = new HashMap<String, Object>(type.size());
- for(String property : type.keySet())
+ for(String property: type.keySet())
m.put(property, read(stream, type.get(property)));
@@ -508,3 +509,3 @@ public class JsonTypeSerializer implements Serializer<Object> {
output.writeShort(objects.size());
- for(Object o : objects)
+ for(Object o: objects)
write(output, o, entryType); And here's what's in the database (after running the prior extensions and HunkBlame.py) for these hunks: sqlite> select * from hunks where commit_id = 3;
307|1|3|10|10|||0|
308|1|3|||31|31|0|
309|155|3|||134|134|0|
310|155|3|481|481|482|482|0|
311|155|3|494|494|495|495|0|
312|155|3|509|509|510|510|0|
sqlite> select * from files where id = 1 or id = 155;
1|.classpath|1
155|JsonTypeSerializer.java|1
sqlite> select * from hunk_blames where (hunk_id >= 307) and (hunk_id <= 312);
1|307|1
219|312|1
220|310|1
221|311|1 Let me see if I can work through the hunks: 307 is the delection of line 10 in classpath Ok, that seems to work the way I would expect. I'm not sure why I couldn't follow it through before -- part of it might have been that I was using the sqlitebrowser tool which seems to only handle queries correctly when it wants to. I redid the queries with the sqlite3 cmd line tool. Best, |
So, I loaded the linked version of HunkBlame: and it seems to be unhappy about the content handler's use of a db connection/cursor that was created in the (parent?) thread: line 41, in init Do you think it would be easier for me (1) to make the old version (of HunkBlame) use connections created within the handler (or at least within the BlameJob?) or (2) to make the new version (of HunkBlame) use the processing of the old version to grab the hunk to hunk blames? Or is there a better way? The recent versions of HunkBlame seem to do all the db querying outside the job, so the queries are all within the parent/root thread. Is that right? Best, |
If SQLite didn't allow you to share a database connection between different threads, try creating a connection for each thread, but remember to close the connection once that thread is done. |
So, I ran the Content extension and then the old HunkBlame extension. It seems to want a "content" member variable from the blame_line instance. And, I can't find where that might get assigned. Also, Content.py threw many exceptions (lots of :path specifiers being either (1) "not a valid object name" or (2) "path does not exist in ). . I'm going to dig into Content.py and the content table today and try to figure out what's going on. Here's the current show stopper: File "/usr/lib64/python2.7/site-packages/cvsanaly-2.4-py2.7.egg/pycvsanaly2/extensions/HunkBlameH2H.py", line 95, in line Best, Ok, so I'm practicing my "git-fu" and I found all occurances of ".contents" in the 5a3a0e commit: mfenner [599] % git grep ".content" 5a3a0ed | more That leads me to look at PatchParser and/or go outside cvsanaly and look at repohandler ... or am I getting farther away from the problem? |
In order to have "content" attribute of blame_line, apply this patch to your Guilty: linzhp/guilty@4e7f2cc |
Nice, thank you. I had found that BlameLine instantiation and added this on my own (I like very verbose regexs :) ) ... this is just for git): line_pattern = re.compile(r"""^
([^ ]+)[\t ]+ # commit id
([^ ]+)[\t ]+ # filename
\( # (
(.+) # author
\ # space
([0-9]+) # epoch secs
\ [+-][0-9][0-9][0-9][0-9][\t ]+
([0-9]+) # edited line no
\) # )
\ # space
(.*)$""", re.VERBOSE) I've been working on a couple other issues to get the old HunkBlame to work with the newer code around it. I "fixed" (I hope!) one piece in HunkBlame.run(): self.fp.get_path() -> self.fp.get_path_from_database(). I've now got problems with Blame.process_finished_jobs (the BlameJob isn't assigning failed ... so I'm looking into that): It's very quick & dirty, but I put the DB connection and cursor creation in the ContentHandler.init. Is it safe to put the DB tear-down in the ContentHandler.end_file() method? Best, |
Closing DB connection in end_file is OK off the top of my mind, the only concern is if it needs to do other db operation after end_file is called. You could try that. |
So, I think I have this "working" now. I renamed it HunkBlameH2H (hunk to hunk) to avoid confusion, mangling, etc. I need to clean it up quite a bit, test it quite a bit, and b/c of the connection creation inside jobs I may need to look at performance so it can run on a "real" repository. But, it's moving forward nicely. Thank you @linzhp. If I'm on the west coast, I'll buy you your choice of beer, coffee, tea, smoothie, etc. :) |
You are welcome. Good to know it's working |
Hi folks, a couple questions:
It appears that HunkBlame.py assigns the blame of a particular (fixing) hunk to a particular commit. Any thoughts on getting from the commit down to a file (and maybe down to a hunk)? (From a fix_hunk F, we know the file so we should be able to look it up in the commit. Finding a specific bug_hunk B within that file might be more difficult?)
Is HunkBlame only assigning a blame for the first hunk in a given commit? At least in one test project, that seems to be what it is doing for me. Thoughts on getting it to work through all hunks in a commit? I'm looking at the break (near line 48) in the BlameContentHandler.line and wondering if that drops out too soon.
Thanks and best,
Mark
The text was updated successfully, but these errors were encountered: