-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathtmp.txt
483 lines (369 loc) · 16.8 KB
/
tmp.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
commit 6d28d82fe19a86523d26c101039ca90e16ec85a0
Author: Keira Haskins <[email protected]>
Date: Wed Oct 9 09:58:57 2019 -0600
Cleaning up Main.hs in compareRepos and adding experiments.
commit 94bdfd365131e83efc11df8692587729d2c1ba8f
Author: Keira Haskins <[email protected]>
Date: Tue Oct 8 21:29:54 2019 -0600
Split up Main.hs, setting up process for analyzing all repos.
commit da6d5824396ddb886be55dfbfec2ce43c99d247a
Author: Keira Haskins <[email protected]>
Date: Tue Oct 8 11:34:12 2019 -0600
Added flag and case to app/Main.hs to make testing easier.
commit ec7e52797abc737f2d9549d8c1a0fdfc6ffd98c4
Author: Keira Haskins <[email protected]>
Date: Tue Oct 8 11:15:08 2019 -0600
Modified compareCoinHashes src/Util.hs to use longest list length.
commit 2e868221a2fd39e835568c90b163238b18d3ba33
Author: Keira Haskins <[email protected]>
Date: Wed Oct 2 15:14:53 2019 -0600
Cleaned up src/Main.hs by moving old commented code into misc/extracode.txt.
commit 665d0fc235efddccf1dd1c779299230cc0403486
Author: Keira Haskins <[email protected]>
Date: Wed Oct 2 11:38:35 2019 -0600
Modified compareCoinHashes in src/Util to also return a list of the files that match between two repositories.
commit a55c0547e987060fad4542437fe9ccf3e5f7e17f
Author: Keira Haskins <[email protected]>
Date: Tue Oct 1 20:52:52 2019 -0600
Tested bitcoin against bitcoin 2 and micro bitcoin and wrote data to notes.txt. Modified app/Main.hs line 127/8.
commit 8f61554281e6881bba560867bfd9b996a5360780
Author: Keira Haskins <[email protected]>
Date: Tue Oct 1 20:13:22 2019 -0600
stopped using regular expressions and switched to pattern matching functions: line 54 src/Util.hs.
commit 59353d75bccfc29ff5437b5a17882c03c54eeaba
Author: Keira Haskins <[email protected]>
Date: Tue Oct 1 14:08:05 2019 -0600
Testing regular expressions line 37 src/Util.hs, added src/Parser.hs.
commit 610079190f7eb0ca54d6fa767099ce1e8db4c129
Author: Keira Haskins <[email protected]>
Date: Sun Sep 29 18:31:57 2019 -0600
Added PDetect paper.
commit b18f805bb2d3c0a7127b1e4b0d9e7aec6c6f83ab
Author: Keira Haskins <[email protected]>
Date: Sat Sep 28 16:31:16 2019 -0600
Added util.rs to code_analysis.
commit 3f5ecba89596d234246f5dad3d0b053f0cfdcfec
Author: Keira Haskins <[email protected]>
Date: Thu Sep 26 16:33:02 2019 -0600
Have started whitlisting appropriate files rather than blacklisting.
commit b221958c2035b895623a33a88d860b2cdf6ffff9
Author: Keira Haskins <[email protected]>
Date: Fri Sep 20 08:23:20 2019 -0600
have gotten to where we can compare directories via hashes in full.
commit 85ab5cfc917f577662b1b86f7157f0d6acb873b0
Author: Keira Haskins <[email protected]>
Date: Tue Sep 17 10:02:46 2019 -0600
Have started working on the first version of comparing coin hashes.
commit e717d29da8240d839fdc99fb7a4d33ed0fef1a42
Author: Keira Haskins <[email protected]>
Date: Sat Sep 14 14:25:49 2019 -0600
Can now output CSV files containing filename, hash, and number of lines per file.
commit af337fe6ffe86164288fcfb36dc0b6748677c3e9
Author: Keira Haskins <[email protected]>
Date: Fri Sep 13 17:08:22 2019 -0600
Was able to clean up everything and package file names and hashes together via tuples.
commit cebc674a5e42bd9ada05d6826538478c8c602512
Author: Keira Haskins <[email protected]>
Date: Thu Sep 12 14:17:04 2019 -0600
Cleaned up Main.hs somewhat.
commit e17af149cf58b9746d431ca8a9581d6404187841
Author: Keira Haskins <[email protected]>
Date: Thu Sep 12 14:15:04 2019 -0600
Finally filtered out all the files that can't be read from BTC. Are now producing a list of hashes.
commit c0aeeb631ddb07832e796746564c7d75dd617418
Author: Keira Haskins <[email protected]>
Date: Thu Sep 12 11:22:50 2019 -0600
Have been struggling with exceptions in Haskell. Have reached about the same point with Rust now.
commit 3d8c567a94d47704a2eeff543ae8409793fa4bc7
Author: Keira Haskins <[email protected]>
Date: Wed Sep 11 12:08:12 2019 -0600
Have been able to cut back on exceptions by filtering various types of files.
commit b8ef1b2fdaded864d2dcd89eaf921d676a091e32
Author: Keira Haskins <[email protected]>
Date: Wed Sep 11 11:23:07 2019 -0600
Was able to filter hidden files and directories. Still getting exceptions for invalid arguments.
commit 34460aaba9dbe84595764ad6a13e2d5742921ed0
Author: Keira Haskins <[email protected]>
Date: Wed Sep 11 10:56:00 2019 -0600
Code is compiling, now running into an exception when encountering .git files. need to add a filter for this.
commit 8b947477041f5235d30415b626917486741ee756
Author: Keira Haskins <[email protected]>
Date: Tue Sep 10 11:47:22 2019 -0600
Able to compile, running into exceptions with .git files. Need to add some kind of filter.
commit 2abf079a7c210dd7626253f24e5636432493a25e
Author: Keira Haskins <[email protected]>
Date: Tue Sep 10 08:50:46 2019 -0600
unable to figure out typing error in Main.hs.
commit 83038cfb77dc19857800a1a55aa3d0e8944d3900
Author: Keira Haskins <[email protected]>
Date: Tue Sep 10 08:28:38 2019 -0600
Overcame first big IO challenge in Main.hs.
commit 12de56e39428f0ad45d40468e67b386c87bb91c3
Author: Keira Haskins <[email protected]>
Date: Mon Sep 9 18:01:01 2019 -0600
Working on building a directory hash tree in Rust.
commit c6cdff5549640da7f7e3288d915e2a5ce8f32830
Author: Keira Haskins <[email protected]>
Date: Mon Sep 9 16:03:47 2019 -0600
Going to work on the IO part of the code in Rust.
commit a0f056aac6397f05b87940170dd2f08d47544969
Author: Keira Haskins <[email protected]>
Date: Mon Sep 9 09:30:37 2019 -0600
Continuing work on IO issues in app/Main.hs
commit 36da1d84123fade23fc9362a498786284130e6fd
Author: Keira Haskins <[email protected]>
Date: Fri Sep 6 10:46:19 2019 -0600
put hashing stuff into Main.hs.
commit 2bdb392234534bf501edfd01efb758093f87f03b
Author: Keira Haskins <[email protected]>
Date: Thu Sep 5 21:00:56 2019 -0600
working on hashing function in util.hs.
commit 05d35f7f1cd5a6e5ec584fff2cc10202f7d1cf40
Author: Keira Haskins <[email protected]>
Date: Thu Sep 5 11:21:41 2019 -0600
added cabal file.
commit 97f13073d0a1cd867b62676b2daf4658ed6dadf0
Author: Keira Haskins <[email protected]>
Date: Thu Aug 29 13:18:15 2019 -0600
Have gotten directory names in a list. The next step is to generate a hash tree to describe the structure of our repository
commit 575ece60f347f166e7792c087e9f070cc90b10bb
Author: Keira Haskins <[email protected]>
Date: Thu Aug 29 11:47:35 2019 -0600
Working on building a directory hash tree.
commit 0354c716a0468a6dd6c3953a2a42ede00fd52ae4
Author: Keira Haskins <[email protected]>
Date: Wed Aug 28 20:44:07 2019 -0600
Added names.csv and repos.csv and added functionality for automated git repo cloning.
commit f80640946c2dc375d0632b190e3bc8d580e1e797
Merge: 8c8a723 7db9d95
Author: Keira Haskins <[email protected]>
Date: Mon Aug 26 12:03:06 2019 -0600
Merge branch 'master' of github.com:unm-secon/fraud-cryptocurrency-codebases
commit 8c8a72305151174118cb4549c73804676445473f
Author: Keira Haskins <[email protected]>
Date: Mon Aug 26 12:02:14 2019 -0600
Setup Main.hs to use names.csv as input.
commit 7db9d95da19cabb600760223a200c254c8ab0ddb
Author: Keira Haskins <[email protected]>
Date: Fri Aug 23 10:37:47 2019 -0600
Update README.md.
commit 8f97600c130c6e2a579d8baef78d3ec50709743d
Author: Keira Haskins <[email protected]>
Date: Fri Aug 23 10:36:37 2019 -0600
Working on regular expression in Scrape.hs.
commit 6fc959ec48c5546ebc7f0eb49be30fa46d15eac7
Author: Keira Haskins <[email protected]>
Date: Mon Aug 19 08:29:14 2019 -0600
First commit
*** Notes for repos.csv ***
- rpc, seeder, p2p, node, core, src, build -> Appended to coin name or similar we keep
- Unique names which include: message, ui, token, stats, js, py, or other languages other than cpp
-> do not keep
* Bitcloud seems to use LIMXTEC/bitcore-*, megacoin, electrumx, europecoin, etc. For now I have excluded these
* I have come across a number of coins/tokens which seem to use direct copies of repos from other coins directly in their own github hierarchies, as well as tokens
* e-Gulden
* TrustPlus
* Pylon Network
* APR Coin
* 808Coin
* Chronologic
* XGOX
* Signals Network
* Photon
* Parkgene
* Deutsche eMark -> German eMark?
* Machinecoin ***
* InsaneCoin
*** ParseTrees.hs ***
--serialize2 :: Cursor -> String
--serialize2 root = '(' : (show $ cursorKind root)
-- ++ (L.foldl (\acc y -> if (L.foldr (\yy xx -> if (show $ cursorKind y)
-- == yy then False
-- else xx) True filterOut)
-- then acc ++ serialize2 y
-- else acc) "" (cursorChildren root))
-- ++ ")"
--lcstr xs ys = maximumBy (compare `on` length) . concat $ [f xs' ys | xs' <- tails xs] ++ [f xs ys' | ys' <- drop 1 $ tails ys]
-- where f xs ys = scanl g [] $ zip xs ys
-- g z (x, y) = if x == y then z ++ [x] else []
-- Rosetta Code: Longest Common Substring
-- So far seems to be too inefficient
--subStrings :: [a] -> [[a]]
--subStrings s =
-- let intChars = length s
-- in [ take n $ drop i s
-- | i <- [0..intChars - 1]
-- , n <- [1..intChars - i] ]
--
--longestCommon :: Eq a => [a] -> [a] -> [a]
--longestCommon a b =
-- maximumBy (comparing length) (subStrings a `intersect` subStrings b)
--longestCommon :: Eq a => [a] -> [a] -> [a]
--longestCommon a b =
-- L.maximumBy (comparing length) $
-- (uncurry L.intersect . pair) $ [tail . L.inits <=< L.tails] <*> [a, b]
--
--pair :: [a] -> (a, a)
--pair [x, y] = (x, y)
--stringArr :: Int -> Int -> Array Int Int
--stringArr x y =
-- array (0, x * y) [(x, z) | x <- [0..(x * y)], ]
--
--longestCommon :: String -> String -> Int
--longestCommon a b
-- where lcSuff =
--parseCPP :: FilePath -> IO String
--parseCPP file = do
-- idx <- createIndex
-- tu <- parseTranslationUnit idx file ["-I/usr/local/include"]
-- let root = translationUnitCursor tu
-- --children = cursorChildren root
-- --children = L.foldr (\y x -> if (L.foldr (\yy xx -> if (show $ cursorKind y)
-- -- == yy then False else xx) True filterOut)
-- -- then y : x else x) [] (cursorDescendants root)
-- children = serialize2 root
--functionDecls = filter (\c -> cursorKind c == FunctionDecl) children
--forM_ children (print . cursorSpelling)
--print root
--print $ map cursorKind children
--mapM_ print $ map cursorKind children
--return children
--keep :: [String]
--keep = [ "FunctionDecl"
-- , "ParmDecl"
-- , "StructDecl"
-- , "UnexposedDecl"
-- , "UnexposedExpr"
-- , "IntegerLiteral"
-- ]
--buildTrees :: FilePath -> FilePath -> IO ()
--buildTrees file1 file2 = do
-- x <- evaluate $ treeToString file1
-- y <- evaluate $ treeToString file2
-- m <- x
-- n <- y
-- writeTreeToFile file1 m "1"
-- writeTreeToFile file2 n "2"
--let out = longestCommonSubstring $ x : [y]
--let test = (takeFileName file1, takeFileName file2, length x, length y, length out)
--return test
--compareAllParseTrees :: [FilePath] -> [FilePath] -> IO [(String, String, Int, Int, Int)]
--compareAllParseTrees xs ys = sequence $ helper xs ys []
-- where
-- helper :: [FilePath] -> [FilePath] -> [IO (String, String, Int, Int, Int)]
-- -> [IO (String, String, Int, Int, Int)]
-- helper [] _ acc = acc
-- helper (f:fs) ms acc = let
-- --fun <- L.foldr (\y a -> a ++ [(compareTrees f y)]) [] ms
-- fun = map (\y -> [(compareTrees f y)]) ms in
-- --let test = fmap (\n -> let (h,i,j,k,l) = n in (h,i,j,k,l)) fun
-- --let xx = L.foldr (\r m -> let (a, b, c, d, e) = r in (a,b,c,d,e) : m) [] test
-- helper fs ms (acc ++ (concat fun))
--compareAllParseTrees (f:fs) ys acc = compareAllParseTrees fs ys (acc ++ fun)
-- where
-- fun = L.foldr (\y a -> a ++ [(compareTrees f y)]) [] ys
--xx f2 = do
-- m <- compareTrees f f2
-- let yy = m
-- yy
--buildAllParseTrees :: [FilePath] -> [FilePath] -> IO [()]
--buildAllParseTrees xs ys = sequence $ helper xs ys []
-- where
-- helper :: [FilePath] -> [FilePath] -> [IO ()]
-- -> [IO ()]
-- helper [] _ acc = acc
-- helper (f:fs) ms acc = let
-- fun = map (\y -> [(buildTrees f y)]) ms in
-- helper fs ms (acc ++ (concat fun))
--buildParseTreesRepos :: String -> String -> IO [()]
--buildParseTreesRepos repo1 repo2 = do
-- dirlist1 <- traverseDir (\_ -> True) (\fs f -> pure (f : fs)) [] repo1
-- dirlist2 <- traverseDir (\_ -> True) (\fs f -> pure (f : fs)) [] repo2
-- let dirs1 = map (\x -> x ++ " ") dirlist1
-- let dirs2 = map (\x -> x ++ " ") dirlist2
-- let inter1 = map init $ filterFileType ".cpp " dirs1
-- let inter2 = map init $ filterFileType ".cpp " dirs2
-- let out = buildAllParseTrees inter1 inter2
-- out
--input <- fmap Txt.lines $ Txt.readFile "misc/repos.csv"
--input <- fmap Txt.lines $ Txt.readFile "/wheeler/scratch/khaskins/fraud-cryptocurrency-codebases/misc/repos.csv"
--input <- fmap Txt.lines $ Txt.readFile "/wheeler/scratch/khaskins/fraud-cryptocurrency-codebases/misc/names.csv"
--print (map snd $ transferToTuple tmp)
--print $ length $ (map snd $ filtered)
--print filtered
--print $ map (takeBaseName . last) tmp
--compareParseTreesRepos (head args) (head (tail args))
--generateAST :: String -> String -> IO ()
--generateAST repo file = do
-- DIR.createDirectoryIfMissing True ("/tmp/AST/" ++ repo ++ "/")
-- let command = prefix ++ suffix
-- (errc, out, err) <- readCreateProcessWithExitCode (shell command) []
-- print command
-- where
-- prefix = "clang -cc1 -ast-dump " ++ file
-- suffix = " > /tmp/AST/" ++ repo ++ "/" ++ (dropExtension $ takeFileName file) ++ ".ast"
--parseCSource file = do
-- idx <- createIndex
-- transUnit <- parseTranslationUnit idx file [""]
-- print transUnit -- translationUnitCursor transUnit
--readAST repo file = do
-- h <- Txt.readFile ("/tmp/AST/" ++ repo ++ "/" ++ (dropExtension $ takeFileName file) ++ ".ast")
-- let y = Txt.unpack h
-- let x = parseDot y
-- prettyPrintDot x
-- DOT.putDot x
-- let x = DOT.readDotFile ("/tmp/AST/" ++ repo ++ "/" ++ (dropExtension $ takeFileName file) ++ ".ast") -- :: DotGraph a
-- let y = x
-- DOT.putDot y
*** Util.hs ***
*** Basic.hs ***
-- let x = map (MD.md5 . BLU.fromString . Txt.unpack) a
-- let o = map (toText . fromBytes . MD.md5DigestBytes) x
-- let z = zip3 o n inter1
-- --LB.writeFile ("inter/" ++ name1 ++ "pre.csv") $ encode z
-- csv <- parseFromFile CSV.csvFile (encode z)
-- -- Second iteration
-- -- Get number of lines per file.
-- let n2 = map (length . Txt.lines) out2
-- let m2 = out2
--
-- -- To test on unmodified vs preprocessed use either m2 or b. (end of first line)
-- let x2 = map (MD.md5 . BLU.fromString . Txt.unpack) b
-- let o2 = map (toText . fromBytes . MD.md5DigestBytes) x2
-- let z2 = zip3 o2 n2 inter1a
-- --LB.writeFile ("inter/" ++ name2 ++ "pre.csv") $ encode z2
-- csv2 <- parseFromFile CSV.csvFile (encode z2)
--
-- let p = rights [csv]
-- let p2 = rights [csv2]
-- let l = sort (concat p)
-- let l2 = sort (concat p2)
--
-- let k = compressFiles l
-- let k2 = compressFiles l2
--
-- let yy = compareCoinHashes k k2 (length k) (length k2) ([], 0)
-- let aa = snd yy
-- let bb = fst yy
--
-- let dat = convertToCSVLine (name1, name2, length k, length k2, length bb)
-- writeDataToFile "level2" (dat ++ "\n")
-- input <- Txt.lines <$> Txt.readFile "misc/testset.csv"
-- let clean = fmap (\x -> fmap Txt.unpack x) $
-- fmap (\x -> (Txt.splitOn $ (Txt.pack ",") ) x) input
-- let tmp = splitEvery 3 $ fmap (filter (/= '\n')
-- . filter (/= '\r')) $ concat clean
-- let repos = map takeFileName $ concat $ map (tail . tail) tmp
slug slug.1 date date.1 \
0 dogecoin bitcoin Sun Aug 30 03:46:39 2009 Sun Aug 30 03:46:39 2009
commit-id commit-id2 overlap-basic1 overlap-basic2 overlap-parse id \
0 4405b78d6 4405b78d6 0.0 0.004082 0.581615 74
... quote.USD.percent_change_30d quote.USD.percent_change_60d \
0 ... -35.806242 12.165671
quote.USD.percent_change_90d quote.USD.market_cap \
0 441.77887 4.066116e+10
quote.USD.last_updated platform.id platform.name platform.symbol \
0 2021-06-16T19:00:03.000Z NaN NaN NaN
platform.slug platform.token_address
0 NaN NaN
[1 rows x 36 columns]