Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plagarism Checker #3245

Closed
wants to merge 5 commits into from
Closed

Plagarism Checker #3245

wants to merge 5 commits into from

Conversation

WannaCry016
Copy link

Problem

-Identifying similarities between text files is essential in academic and professional environments to detect potential plagiarism. Manually checking for overlapping content can be time-consuming and inefficient.

Solution

-Developed a Python script using scikit-learn to automate plagiarism detection. The script reads .txt files in the directory, converts their content into numerical vectors using TF-IDF, and calculates cosine similarity scores between them to determine the extent of similarity.

Changes proposed in this Pull Request :

  • Added plag.py, which uses TfidfVectorizer to vectorize text documents and cosine_similarity to measure similarity between pairs of files.
  • Implemented a function to loop through each document, comparing it with others in the directory and calculating similarity scores.
  • Stored results in a set plagiarism_results, displaying file pairs with their respective similarity scores.
  • Ensured compatibility with .txt files and UTF-8 encoding for broader accessibility across text sources.

@WannaCry016 WannaCry016 closed this Nov 1, 2024
@WannaCry016 WannaCry016 deleted the project3 branch November 1, 2024 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant