tmp.txt

commit 6d28d82fe19a86523d26c101039ca90e16ec85a0
Author: Keira Haskins <keirahaskins26@gmail.com>
Date:   Wed Oct 9 09:58:57 2019 -0600

    Cleaning up Main.hs in compareRepos and adding experiments.

commit 94bdfd365131e83efc11df8692587729d2c1ba8f
Author: Keira Haskins <keirahaskins26@gmail.com>
Date:   Tue Oct 8 21:29:54 2019 -0600

    Split up Main.hs, setting up process for analyzing all repos.

commit da6d5824396ddb886be55dfbfec2ce43c99d247a
Author: Keira Haskins <keirahaskins26@gmail.com>
Date:   Tue Oct 8 11:34:12 2019 -0600

    Added flag and case to app/Main.hs to make testing easier.

commit ec7e52797abc737f2d9549d8c1a0fdfc6ffd98c4
Author: Keira Haskins <keirahaskins26@gmail.com>
Date:   Tue Oct 8 11:15:08 2019 -0600

    Modified compareCoinHashes src/Util.hs to use longest list length.

commit 2e868221a2fd39e835568c90b163238b18d3ba33
Author: Keira Haskins <keirahaskins26@gmail.com>
Date:   Wed Oct 2 15:14:53 2019 -0600

    Cleaned up src/Main.hs by moving old commented code into misc/extracode.txt.

commit 665d0fc235efddccf1dd1c779299230cc0403486
Author: Keira Haskins <keirahaskins26@gmail.com>
Date:   Wed Oct 2 11:38:35 2019 -0600

    Modified compareCoinHashes in src/Util to also return a list of the files that match between two repositories.

commit a55c0547e987060fad4542437fe9ccf3e5f7e17f
Author: Keira Haskins <keirahaskins26@gmail.com>
Date:   Tue Oct 1 20:52:52 2019 -0600

    Tested bitcoin against bitcoin 2 and micro bitcoin and wrote data to notes.txt. Modified app/Main.hs line 127/8.

commit 8f61554281e6881bba560867bfd9b996a5360780
Author: Keira Haskins <keirahaskins26@gmail.com>
Date:   Tue Oct 1 20:13:22 2019 -0600

    stopped using regular expressions and switched to pattern matching functions: line 54 src/Util.hs.

commit 59353d75bccfc29ff5437b5a17882c03c54eeaba
Author: Keira Haskins <keirahaskins26@gmail.com>
Date:   Tue Oct 1 14:08:05 2019 -0600

    Testing regular expressions line 37 src/Util.hs, added src/Parser.hs.

commit 610079190f7eb0ca54d6fa767099ce1e8db4c129
Author: Keira Haskins <keirahaskins26@gmail.com>
Date:   Sun Sep 29 18:31:57 2019 -0600

    Added PDetect paper.

commit b18f805bb2d3c0a7127b1e4b0d9e7aec6c6f83ab
Author: Keira Haskins <wasp@unm.edu>
Date:   Sat Sep 28 16:31:16 2019 -0600

    Added util.rs to code_analysis.

commit 3f5ecba89596d234246f5dad3d0b053f0cfdcfec
Author: Keira Haskins <wasp@unm.edu>
Date:   Thu Sep 26 16:33:02 2019 -0600

    Have started whitlisting appropriate files rather than blacklisting.

commit b221958c2035b895623a33a88d860b2cdf6ffff9
Author: Keira Haskins <wasp@unm.edu>
Date:   Fri Sep 20 08:23:20 2019 -0600

    have gotten to where we can compare directories via hashes in full.

commit 85ab5cfc917f577662b1b86f7157f0d6acb873b0
Author: Keira Haskins <wasp@unm.edu>
Date:   Tue Sep 17 10:02:46 2019 -0600

    Have started working on the first version of comparing coin hashes.

commit e717d29da8240d839fdc99fb7a4d33ed0fef1a42
Author: Keira Haskins <wasp@unm.edu>
Date:   Sat Sep 14 14:25:49 2019 -0600

    Can now output CSV files containing filename, hash, and number of lines per file.

commit af337fe6ffe86164288fcfb36dc0b6748677c3e9
Author: Keira Haskins <wasp@unm.edu>
Date:   Fri Sep 13 17:08:22 2019 -0600

    Was able to clean up everything and package file names and hashes together via tuples.

commit cebc674a5e42bd9ada05d6826538478c8c602512
Author: Keira Haskins <wasp@unm.edu>
Date:   Thu Sep 12 14:17:04 2019 -0600

    Cleaned up Main.hs somewhat.

commit e17af149cf58b9746d431ca8a9581d6404187841
Author: Keira Haskins <wasp@unm.edu>
Date:   Thu Sep 12 14:15:04 2019 -0600

    Finally filtered out all the files that can't be read from BTC. Are now producing a list of hashes.

commit c0aeeb631ddb07832e796746564c7d75dd617418
Author: Keira Haskins <wasp@unm.edu>
Date:   Thu Sep 12 11:22:50 2019 -0600

    Have been struggling with exceptions in Haskell. Have reached about the same point with Rust now.

commit 3d8c567a94d47704a2eeff543ae8409793fa4bc7
Author: Keira Haskins <wasp@unm.edu>
Date:   Wed Sep 11 12:08:12 2019 -0600

    Have been able to cut back on exceptions by filtering various types of files.

commit b8ef1b2fdaded864d2dcd89eaf921d676a091e32
Author: Keira Haskins <wasp@unm.edu>
Date:   Wed Sep 11 11:23:07 2019 -0600

    Was able to filter hidden files and directories. Still getting exceptions for invalid arguments.

commit 34460aaba9dbe84595764ad6a13e2d5742921ed0
Author: Keira Haskins <wasp@unm.edu>
Date:   Wed Sep 11 10:56:00 2019 -0600

    Code is compiling, now running into an exception when encountering .git files. need to add a filter for this.

commit 8b947477041f5235d30415b626917486741ee756
Author: Keira Haskins <wasp@unm.edu>
Date:   Tue Sep 10 11:47:22 2019 -0600

    Able to compile, running into exceptions with .git files. Need to add some kind of filter.

commit 2abf079a7c210dd7626253f24e5636432493a25e
Author: Keira Haskins <wasp@unm.edu>
Date:   Tue Sep 10 08:50:46 2019 -0600

    unable to figure out typing error in Main.hs.

commit 83038cfb77dc19857800a1a55aa3d0e8944d3900
Author: Keira Haskins <wasp@unm.edu>
Date:   Tue Sep 10 08:28:38 2019 -0600

    Overcame first big IO challenge in Main.hs.

commit 12de56e39428f0ad45d40468e67b386c87bb91c3
Author: Keira Haskins <wasp@unm.edu>
Date:   Mon Sep 9 18:01:01 2019 -0600

    Working on building a directory hash tree in Rust.

commit c6cdff5549640da7f7e3288d915e2a5ce8f32830
Author: Keira Haskins <wasp@unm.edu>
Date:   Mon Sep 9 16:03:47 2019 -0600

    Going to work on the IO part of the code in Rust.

commit a0f056aac6397f05b87940170dd2f08d47544969
Author: Keira Haskins <wasp@unm.edu>
Date:   Mon Sep 9 09:30:37 2019 -0600

    Continuing work on IO issues in app/Main.hs

commit 36da1d84123fade23fc9362a498786284130e6fd
Author: Keira Haskins <wasp@unm.edu>
Date:   Fri Sep 6 10:46:19 2019 -0600

    put hashing stuff into Main.hs.

commit 2bdb392234534bf501edfd01efb758093f87f03b
Author: Keira Haskins <keirahaskins26@gmail.com>
Date:   Thu Sep 5 21:00:56 2019 -0600

    working on hashing function in util.hs.

commit 05d35f7f1cd5a6e5ec584fff2cc10202f7d1cf40
Author: Keira Haskins <wasp@unm.edu>
Date:   Thu Sep 5 11:21:41 2019 -0600

    added cabal file.

commit 97f13073d0a1cd867b62676b2daf4658ed6dadf0
Author: Keira Haskins <wasp@unm.edu>
Date:   Thu Aug 29 13:18:15 2019 -0600

    Have gotten directory names in a list. The next step is to generate a hash tree to describe the structure of our repository

commit 575ece60f347f166e7792c087e9f070cc90b10bb
Author: Keira Haskins <wasp@unm.edu>
Date:   Thu Aug 29 11:47:35 2019 -0600

    Working on building a directory hash tree.

commit 0354c716a0468a6dd6c3953a2a42ede00fd52ae4
Author: Keira Haskins <wasp@unm.edu>
Date:   Wed Aug 28 20:44:07 2019 -0600

    Added names.csv and repos.csv and added functionality for automated git repo cloning.

commit f80640946c2dc375d0632b190e3bc8d580e1e797
Merge: 8c8a723 7db9d95
Author: Keira Haskins <wasp@unm.edu>
Date:   Mon Aug 26 12:03:06 2019 -0600

    Merge branch 'master' of github.com:unm-secon/fraud-cryptocurrency-codebases

commit 8c8a72305151174118cb4549c73804676445473f
Author: Keira Haskins <wasp@unm.edu>
Date:   Mon Aug 26 12:02:14 2019 -0600

    Setup Main.hs to use names.csv as input.

commit 7db9d95da19cabb600760223a200c254c8ab0ddb
Author: Keira Haskins <wasp@unm.edu>
Date:   Fri Aug 23 10:37:47 2019 -0600

    Update README.md.

commit 8f97600c130c6e2a579d8baef78d3ec50709743d
Author: Keira Haskins <wasp@unm.edu>
Date:   Fri Aug 23 10:36:37 2019 -0600

    Working on regular expression in Scrape.hs.

commit 6fc959ec48c5546ebc7f0eb49be30fa46d15eac7
Author: Keira Haskins <wasp@unm.edu>
Date:   Mon Aug 19 08:29:14 2019 -0600

    First commit

*** Notes for repos.csv ***

 - rpc, seeder, p2p, node, core, src, build -> Appended to coin name or similar we keep
 - Unique names which include: message, ui, token, stats, js, py, or other languages other than cpp
   -> do not keep
* Bitcloud seems to use LIMXTEC/bitcore-*, megacoin, electrumx, europecoin, etc. For now I have excluded these
* I have come across a number of coins/tokens which seem to use direct copies of repos from other coins directly in their own github hierarchies, as well as tokens
  * e-Gulden
  * TrustPlus
  * Pylon Network
  * APR Coin
  * 808Coin
  * Chronologic
  * XGOX
  * Signals Network
  * Photon
  * Parkgene
  * Deutsche eMark -> German eMark?
  * Machinecoin ***
  * InsaneCoin

*** ParseTrees.hs ***

--serialize2 :: Cursor -> String
--serialize2 root = '(' : (show $ cursorKind root)
--                    ++ (L.foldl (\acc y -> if (L.foldr (\yy xx -> if (show $ cursorKind y)
--                                                                == yy then False
--                                                                        else xx) True filterOut)
--                                            then acc ++ serialize2 y
--                                              else acc) "" (cursorChildren root))
--                    ++ ")"

--lcstr xs ys = maximumBy (compare `on` length) . concat $ [f xs' ys | xs' <- tails xs] ++ [f xs ys' | ys' <- drop 1 $ tails ys]
--  where f xs ys = scanl g [] $ zip xs ys
--        g z (x, y) = if x == y then z ++ [x] else []

-- Rosetta Code: Longest Common Substring
-- So far seems to be too inefficient
--subStrings :: [a] -> [[a]]
--subStrings s =
--  let intChars = length s
--  in [ take n $ drop i s
--     | i <- [0..intChars - 1]
--     , n <- [1..intChars - i] ]
--
--longestCommon :: Eq a => [a] -> [a] -> [a]
--longestCommon a b =
--  maximumBy (comparing length) (subStrings a `intersect` subStrings b)

--longestCommon :: Eq a => [a] -> [a] -> [a]
--longestCommon a b =
--  L.maximumBy (comparing length) $
--  (uncurry L.intersect . pair) $ [tail . L.inits <=< L.tails] <*> [a, b]
--
--pair :: [a] -> (a, a)
--pair [x, y] = (x, y)

--stringArr :: Int -> Int -> Array Int Int
--stringArr x y =
--  array (0, x * y) [(x, z) | x <- [0..(x * y)], ]
--
--longestCommon :: String -> String -> Int
--longestCommon a b
--  where lcSuff =

--parseCPP :: FilePath -> IO String
--parseCPP file = do
--    idx <- createIndex
--    tu  <- parseTranslationUnit idx file ["-I/usr/local/include"]
--    let root = translationUnitCursor tu
--        --children = cursorChildren root
--        --children = L.foldr (\y x -> if (L.foldr (\yy xx -> if (show $ cursorKind y)
--        --                                             == yy then False else xx) True filterOut)
--        --                           then y : x else x) [] (cursorDescendants root)
--        children = serialize2 root
        --functionDecls = filter (\c -> cursorKind c == FunctionDecl) children
    --forM_ children (print . cursorSpelling)
    --print root
    --print $ map cursorKind children
    --mapM_ print $ map cursorKind children
    --return children

--keep :: [String]
--keep =      [ "FunctionDecl"
--            , "ParmDecl"
--            , "StructDecl"
--            , "UnexposedDecl"
--            , "UnexposedExpr"
--            , "IntegerLiteral"
--            ]

--buildTrees :: FilePath -> FilePath -> IO ()
--buildTrees file1 file2 = do
--  x <- evaluate $ treeToString file1
--  y <- evaluate $ treeToString file2
--  m <- x
--  n <- y
--  writeTreeToFile file1 m "1"
--  writeTreeToFile file2 n "2"
  --let out  = longestCommonSubstring $ x : [y]
  --let test = (takeFileName file1, takeFileName file2, length x, length y, length out)
  --return test

--compareAllParseTrees :: [FilePath] -> [FilePath] -> IO [(String, String, Int, Int, Int)]
--compareAllParseTrees xs ys = sequence $ helper xs ys []
--  where
--    helper :: [FilePath] -> [FilePath] -> [IO (String, String, Int, Int, Int)]
--                                       -> [IO (String, String, Int, Int, Int)]
--    helper []     _  acc = acc
--    helper (f:fs) ms acc = let
--      --fun <- L.foldr (\y a -> a ++ [(compareTrees f y)]) [] ms
--      fun = map (\y -> [(compareTrees f y)]) ms in
--      --let test = fmap (\n -> let (h,i,j,k,l) = n in (h,i,j,k,l)) fun
--      --let xx = L.foldr (\r m -> let (a, b, c, d, e) = r in (a,b,c,d,e) : m) [] test
--      helper fs ms (acc ++ (concat fun))

--compareAllParseTrees (f:fs) ys acc = compareAllParseTrees fs ys (acc ++ fun)
--  where
--    fun = L.foldr (\y a -> a ++ [(compareTrees f y)]) [] ys
    --xx f2 = do
    --  m <- compareTrees f f2
    --  let yy = m
    --  yy

--buildAllParseTrees :: [FilePath] -> [FilePath] -> IO [()]
--buildAllParseTrees xs ys = sequence $ helper xs ys []
--  where
--    helper :: [FilePath] -> [FilePath] -> [IO ()]
--                                       -> [IO ()]
--    helper []     _  acc = acc
--    helper (f:fs) ms acc = let
--      fun = map (\y -> [(buildTrees f y)]) ms in
--      helper fs ms (acc ++ (concat fun))

--buildParseTreesRepos :: String -> String -> IO [()]
--buildParseTreesRepos repo1 repo2 = do
--  dirlist1  <- traverseDir (\_ -> True) (\fs f -> pure (f : fs)) [] repo1
--  dirlist2  <- traverseDir (\_ -> True) (\fs f -> pure (f : fs)) [] repo2
--  let dirs1  = map (\x -> x ++ " ") dirlist1
--  let dirs2  = map (\x -> x ++ " ") dirlist2
--  let inter1 = map init $ filterFileType ".cpp " dirs1
--  let inter2 = map init $ filterFileType ".cpp " dirs2
--  let out    = buildAllParseTrees inter1 inter2
--  out

--input <- fmap Txt.lines $ Txt.readFile "misc/repos.csv"
--input <- fmap Txt.lines $ Txt.readFile "/wheeler/scratch/khaskins/fraud-cryptocurrency-codebases/misc/repos.csv"
--input <- fmap Txt.lines $ Txt.readFile "/wheeler/scratch/khaskins/fraud-cryptocurrency-codebases/misc/names.csv"

--print (map snd $ transferToTuple tmp)
--print $ length $ (map snd $ filtered)
--print filtered

--print $ map (takeBaseName . last) tmp

--compareParseTreesRepos (head args) (head (tail args))

--generateAST :: String -> String -> IO ()
--generateAST repo file = do
--  DIR.createDirectoryIfMissing True ("/tmp/AST/" ++ repo ++ "/")
--  let command = prefix ++ suffix
--  (errc, out, err) <- readCreateProcessWithExitCode (shell command) []
--  print command
--  where
--    prefix = "clang -cc1 -ast-dump " ++ file
--    suffix = " > /tmp/AST/" ++ repo ++ "/" ++ (dropExtension $ takeFileName file) ++ ".ast"

--parseCSource file = do
--  idx       <- createIndex
--  transUnit <- parseTranslationUnit idx file [""]
--  print transUnit -- translationUnitCursor transUnit

--readAST repo file = do
--  h <- Txt.readFile ("/tmp/AST/" ++ repo ++ "/" ++ (dropExtension $ takeFileName file) ++ ".ast")
--  let y = Txt.unpack h
--  let x = parseDot y
--  prettyPrintDot x
--  DOT.putDot x
--  let x = DOT.readDotFile ("/tmp/AST/" ++ repo ++ "/" ++ (dropExtension $ takeFileName file) ++ ".ast") -- :: DotGraph a
--  let y = x
--  DOT.putDot y

*** Util.hs ***

*** Basic.hs ***

--      let x = map (MD.md5 . BLU.fromString . Txt.unpack) a
--      let o = map (toText . fromBytes . MD.md5DigestBytes) x
--      let z = zip3 o n inter1
--      --LB.writeFile ("inter/" ++ name1 ++ "pre.csv") $ encode z
--      csv <- parseFromFile CSV.csvFile (encode z)
--      -- Second iteration
--      -- Get number of lines per file.
--      let n2 = map (length . Txt.lines) out2
--      let m2 = out2
--
--      -- To test on unmodified vs preprocessed use either m2 or b. (end of first line)
--      let x2 = map (MD.md5 . BLU.fromString . Txt.unpack) b
--      let o2 = map (toText . fromBytes . MD.md5DigestBytes) x2
--      let z2 = zip3 o2 n2 inter1a
--      --LB.writeFile ("inter/" ++ name2 ++ "pre.csv") $ encode z2
--      csv2 <- parseFromFile CSV.csvFile (encode z2)
--
--      let p  = rights [csv]
--      let p2 = rights [csv2]
--      let l  = sort (concat p)
--      let l2 = sort (concat p2)
--
--      let k = compressFiles l
--      let k2 = compressFiles l2
--
--      let yy = compareCoinHashes k k2 (length k) (length k2) ([], 0)
--      let aa = snd yy
--      let bb = fst yy
--
--      let dat = convertToCSVLine (name1, name2, length k, length k2, length bb)
--      writeDataToFile "level2" (dat ++ "\n")

--  input <- Txt.lines <$> Txt.readFile "misc/testset.csv"
--  let clean = fmap (\x -> fmap Txt.unpack x) $
--              fmap (\x -> (Txt.splitOn $ (Txt.pack ",") ) x) input
--  let tmp   = splitEvery 3 $ fmap (filter (/= '\n')
--            . filter (/= '\r')) $ concat clean
--  let repos = map takeFileName $ concat $ map (tail . tail) tmp


       slug   slug.1                      date                    date.1  \
0  dogecoin  bitcoin  Sun Aug 30 03:46:39 2009  Sun Aug 30 03:46:39 2009   

   commit-id commit-id2  overlap-basic1  overlap-basic2  overlap-parse  id  \
0  4405b78d6  4405b78d6             0.0        0.004082       0.581615  74   

   ... quote.USD.percent_change_30d quote.USD.percent_change_60d  \
0  ...                   -35.806242                    12.165671   

   quote.USD.percent_change_90d quote.USD.market_cap  \
0                     441.77887         4.066116e+10   

     quote.USD.last_updated  platform.id  platform.name  platform.symbol  \
0  2021-06-16T19:00:03.000Z          NaN            NaN              NaN   

   platform.slug  platform.token_address  
0            NaN                     NaN  

[1 rows x 36 columns]