-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Sensitive Data Detection in files like (.csv , .xlsx , json) #761
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for endearing-brigadeiros-63f9d0 canceled.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code is well-structured, handles different file types effectively.
few improvements are recommended:
- Test Coverage: Add tests for no sensitive data, empty files, and file-not-found scenarios.
- Optimization: Consider streaming large files for better memory management.
@coopernetes @JamieSlome Could you please review this PR and share your thoughts? |
const sensitivePatterns = [ | ||
/\d{3}-\d{2}-\d{4}/, // Social Security Number (SSN) | ||
/\b\d{16}\b/, // Credit card numbers | ||
/\b\d{5}-\d{4}\b/, // ZIP+4 codes | ||
// Add more patterns as needed | ||
]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent behind this change is good, though it must be noted these will produce a large number of false positives.
Ideally this wouldn't block (only warn), or would have an easy way to exclude false positives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point @rgmz ! I will think about this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Not to mention, this does not cover all geographies.
I'm inclined to merge it as it is not configured by default. A more holistic approach with better heuristics is worth investing in for the GitProxy project granted but this is a good enough start.
@@ -2,6 +2,7 @@ const Step = require('../../actions').Step; | |||
const simpleGit = require('simple-git') | |||
|
|||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mistaken change?
|
||
|
||
exec.displayName = 'logFileChanges.exec'; | ||
exports.exec = exec; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exports.exec = exec; | |
exports.exec = exec; | |
@@ -11,3 +11,4 @@ exports.checkCommitMessages = require('./checkCommitMessages').exec; | |||
exports.checkAuthorEmails = require('./checkAuthorEmails').exec; | |||
exports.checkUserPushPermission = require('./checkUserPushPermission').exec; | |||
exports.clearBareClone = require('./clearBareClone').exec; | |||
exports.checkSensitiveData = require('./checkSensitiveData').exec; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add missing newline at the end of the file.
exports.checkSensitiveData = require('./checkSensitiveData').exec; | |
exports.checkSensitiveData = require('./checkSensitiveData').exec; | |
const exec = async (req, action) => { | ||
const diffStep = action.steps.find((s) => s.stepName === 'diff'); | ||
const step = new Step('checksensitiveData'); | ||
|
||
if (diffStep && diffStep.content) { | ||
console.log('Diff content:', diffStep.content); | ||
|
||
// Use the parsing function to get file paths | ||
const filePaths = extractFilePathsFromDiff(diffStep.content); | ||
|
||
if (filePaths.length > 0) { | ||
// Check for sensitive data in all files | ||
const sensitiveDataFound = await Promise.all(filePaths.map(parseFile)); | ||
const anySensitiveDataDetected = sensitiveDataFound.some(found => found); | ||
|
||
if (anySensitiveDataDetected) { | ||
step.blocked= true; | ||
step.error = true; | ||
step.errorMessage = 'Your push has been blocked due to sensitive data detection.'; | ||
console.log(step.errorMessage); | ||
} | ||
} else { | ||
console.log('No file paths provided in the diff step.'); | ||
} | ||
} else { | ||
console.log('No diff content available.'); | ||
} | ||
action.addStep(step); | ||
return action; // Returning action for testing purposes | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Psingle20 since #793 has been released in 1.6.0, can we restructure this functionality into its own plugin? It'll require moving some files around and creating an npm package using npm init
.
The other change will be this:
const Step = require('@finos/git-proxy/src/proxy/actions').Step;
const config = require('@finos/git-proxy/src/config');
Use plugins/git-proxy-sample-plugins and refer to the docs (to be improved via #811) for details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will do that. Btw we have added some refactor in #810 which also contains some more feature should I do a PR individually for all the plugins ?
ALong with this we have added gitleaks support , EXIF metadata check and AIML usage check.
fs.mkdirSync(testDataPath, { recursive: true }); // Using recursive to ensure all directories are created | ||
} | ||
// Write the Excel file to the test_data directory | ||
XLSX.writeFile(workbook, path.join(testDataPath, 'sensitive_data2.xlsx')); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add missing newline at the end of the file.
XLSX.writeFile(workbook, path.join(testDataPath, 'sensitive_data2.xlsx')); | |
XLSX.writeFile(workbook, path.join(testDataPath, 'sensitive_data2.xlsx')); | |
@@ -25,6 +26,7 @@ const mockPushProcessors = { | |||
pullRemote: sinon.stub(), | |||
writePack: sinon.stub(), | |||
getDiff: sinon.stub(), | |||
checkSensitiveData : sinon.stub(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By moving this new functionality in its own plugin, it will keep the proxy chain easier to test. Having to constantly add new functions here will not be maintainable long term. This test itself is also not exactly well structured or easy to maintain so we want to not add to it as much as possible. Plugins are preferred.
const sensitivePatterns = [ | ||
/\d{3}-\d{2}-\d{4}/, // Social Security Number (SSN) | ||
/\b\d{16}\b/, // Credit card numbers | ||
/\b\d{5}-\d{4}\b/, // ZIP+4 codes | ||
// Add more patterns as needed | ||
]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Not to mention, this does not cover all geographies.
I'm inclined to merge it as it is not configured by default. A more holistic approach with better heuristics is worth investing in for the GitProxy project granted but this is a good enough start.
Co-authored-by: Thomas Cooper <[email protected]>
Co-authored-by: Thomas Cooper <[email protected]>
Co-authored-by: Thomas Cooper <[email protected]>
This PR introduces the checkSensitiveData feature, which enhances the security by scanning files like .csv for vulnerabilities and sensitive information.
The implementation includes:
Functionality:
I think this Functionality Solves the issue #745
you can run the custom test implemented using command
npx mocha test/SensitiveData.test.js
Edit
Proxy.config.json
and add the file ext intoProxyFileTypes
array . Eg :".csv"
Also Please run the
test/CreateExcel.js
file to create a test data for testing.@JamieSlome Please review this PR and suggest any changes necessary
Citi Hackathon
Team Members
Prachit Ingle Psingle20
Shabbir Kaderi shabbirflow
Chaitanya Deshmukh ChaitanyaD48