Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Sensitive Data Detection in files like (.csv , .xlsx , json) #761

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

Psingle20
Copy link

@Psingle20 Psingle20 commented Oct 26, 2024

This PR introduces the checkSensitiveData feature, which enhances the security by scanning files like .csv for vulnerabilities and sensitive information.
The implementation includes:

Functionality:

  • Created a push-action CheckSensitiveData which take this diff and scan the changed files for Sensitive Information.
  • Integrated it push_action chain .
  • Implemented a Test file for the push-action and modified the chain test file to make sure it works with feature added.

I think this Functionality Solves the issue #745
you can run the custom test implemented using command npx mocha test/SensitiveData.test.js
Edit Proxy.config.json and add the file ext into ProxyFileTypes array . Eg : ".csv"
Also Please run the test/CreateExcel.js file to create a test data for testing.

@JamieSlome Please review this PR and suggest any changes necessary

Citi Hackathon
Team Members
Prachit Ingle Psingle20
Shabbir Kaderi shabbirflow
Chaitanya Deshmukh ChaitanyaD48

Copy link

linux-foundation-easycla bot commented Oct 26, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

Copy link

netlify bot commented Oct 26, 2024

Deploy Preview for endearing-brigadeiros-63f9d0 canceled.

Name Link
🔨 Latest commit 868c074
🔍 Latest deploy log https://app.netlify.com/sites/endearing-brigadeiros-63f9d0/deploys/67513cacebea8d0008a644e4

@Psingle20 Psingle20 changed the title Feat: Sensitive Data Detection in files like (.csv , .xlsx , json) feat: Sensitive Data Detection in files like (.csv , .xlsx , json) Oct 26, 2024
.husky/commit-msg Outdated Show resolved Hide resolved
Copy link
Contributor

@laukik-target laukik-target left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is well-structured, handles different file types effectively.
few improvements are recommended:

  • Test Coverage: Add tests for no sensitive data, empty files, and file-not-found scenarios.
  • Optimization: Consider streaming large files for better memory management.

test/CheckSensitive.test.js Outdated Show resolved Hide resolved
@Psingle20
Copy link
Author

@coopernetes @JamieSlome Could you please review this PR and share your thoughts?

Comment on lines +9 to +14
const sensitivePatterns = [
/\d{3}-\d{2}-\d{4}/, // Social Security Number (SSN)
/\b\d{16}\b/, // Credit card numbers
/\b\d{5}-\d{4}\b/, // ZIP+4 codes
// Add more patterns as needed
];
Copy link
Contributor

@rgmz rgmz Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent behind this change is good, though it must be noted these will produce a large number of false positives.

Ideally this wouldn't block (only warn), or would have an easy way to exclude false positives.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point @rgmz ! I will think about this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Not to mention, this does not cover all geographies.

I'm inclined to merge it as it is not configured by default. A more holistic approach with better heuristics is worth investing in for the GitProxy project granted but this is a good enough start.

@@ -2,6 +2,7 @@ const Step = require('../../actions').Step;
const simpleGit = require('simple-git')



Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mistaken change?

Suggested change



exec.displayName = 'logFileChanges.exec';
exports.exec = exec;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
exports.exec = exec;
exports.exec = exec;

@@ -11,3 +11,4 @@ exports.checkCommitMessages = require('./checkCommitMessages').exec;
exports.checkAuthorEmails = require('./checkAuthorEmails').exec;
exports.checkUserPushPermission = require('./checkUserPushPermission').exec;
exports.clearBareClone = require('./clearBareClone').exec;
exports.checkSensitiveData = require('./checkSensitiveData').exec;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add missing newline at the end of the file.

Suggested change
exports.checkSensitiveData = require('./checkSensitiveData').exec;
exports.checkSensitiveData = require('./checkSensitiveData').exec;

Comment on lines +137 to +166
const exec = async (req, action) => {
const diffStep = action.steps.find((s) => s.stepName === 'diff');
const step = new Step('checksensitiveData');

if (diffStep && diffStep.content) {
console.log('Diff content:', diffStep.content);

// Use the parsing function to get file paths
const filePaths = extractFilePathsFromDiff(diffStep.content);

if (filePaths.length > 0) {
// Check for sensitive data in all files
const sensitiveDataFound = await Promise.all(filePaths.map(parseFile));
const anySensitiveDataDetected = sensitiveDataFound.some(found => found);

if (anySensitiveDataDetected) {
step.blocked= true;
step.error = true;
step.errorMessage = 'Your push has been blocked due to sensitive data detection.';
console.log(step.errorMessage);
}
} else {
console.log('No file paths provided in the diff step.');
}
} else {
console.log('No diff content available.');
}
action.addStep(step);
return action; // Returning action for testing purposes
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Psingle20 since #793 has been released in 1.6.0, can we restructure this functionality into its own plugin? It'll require moving some files around and creating an npm package using npm init.

The other change will be this:

const Step = require('@finos/git-proxy/src/proxy/actions').Step;
const config = require('@finos/git-proxy/src/config');

Use plugins/git-proxy-sample-plugins and refer to the docs (to be improved via #811) for details.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do that. Btw we have added some refactor in #810 which also contains some more feature should I do a PR individually for all the plugins ?
ALong with this we have added gitleaks support , EXIF metadata check and AIML usage check.

fs.mkdirSync(testDataPath, { recursive: true }); // Using recursive to ensure all directories are created
}
// Write the Excel file to the test_data directory
XLSX.writeFile(workbook, path.join(testDataPath, 'sensitive_data2.xlsx'));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add missing newline at the end of the file.

Suggested change
XLSX.writeFile(workbook, path.join(testDataPath, 'sensitive_data2.xlsx'));
XLSX.writeFile(workbook, path.join(testDataPath, 'sensitive_data2.xlsx'));

@@ -25,6 +26,7 @@ const mockPushProcessors = {
pullRemote: sinon.stub(),
writePack: sinon.stub(),
getDiff: sinon.stub(),
checkSensitiveData : sinon.stub(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By moving this new functionality in its own plugin, it will keep the proxy chain easier to test. Having to constantly add new functions here will not be maintainable long term. This test itself is also not exactly well structured or easy to maintain so we want to not add to it as much as possible. Plugins are preferred.

proxy.config.json Outdated Show resolved Hide resolved
.husky/commit-msg Outdated Show resolved Hide resolved
.gitignore Outdated Show resolved Hide resolved
Comment on lines +9 to +14
const sensitivePatterns = [
/\d{3}-\d{2}-\d{4}/, // Social Security Number (SSN)
/\b\d{16}\b/, // Credit card numbers
/\b\d{5}-\d{4}\b/, // ZIP+4 codes
// Add more patterns as needed
];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Not to mention, this does not cover all geographies.

I'm inclined to merge it as it is not configured by default. A more holistic approach with better heuristics is worth investing in for the GitProxy project granted but this is a good enough start.

Psingle20 and others added 3 commits December 1, 2024 12:10
Co-authored-by: Thomas Cooper <[email protected]>
Co-authored-by: Thomas Cooper <[email protected]>
Co-authored-by: Thomas Cooper <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants