Detect Plagiarism in files using Python

Detect Plagiarism in files using Python

Introduction

Hello Curious Coders,
In this project we are going to discuss how to check the plagiarism between the
contents present in two different text files. In python this can be done using
predefined package difflib but we are going to check it manually. Let’s get into it….

Code

				
					# import required library
from tkinter import *

# First we need to read the contents of two files
with open('File_1.txt') as f1:
    s1=f1.read().lower().split()
    l1=[]
    for i in s1:
        if i.isalnum():
            l1.append(i)
with open('File_2.txt') as f2:
    s2=f2.read().lower().split()
    l2=[]
    for i in s2:
        if i.isalnum():
            l2.append(i)

# Finding how many words are common in two files
plag_words=len(set(l1).intersection(set(l2)))

# Finding total number of words in two files
total_words=len(l1)+len(l2)

# Formula to calculate the plagarism percent
plag_percent=100-round((total_words-plag_words*2)/total_words*100)

# Displaying the result
result="      The Plagarized Content Percent among two files is "+str(plag_percent)+"%"
if plag_percent30 and plag_percent<=60:
    win= Tk()
    win.geometry("800x200")
    canvas= Canvas(win, width= 700, height= 650, bg="Yellow")
    canvas.create_text(300, 100, text=result, fill="black", font=('Helvetica 15 bold'))
    canvas.pack()
    win.mainloop()
else:
    win= Tk()
    win.geometry("800x200")
    canvas= Canvas(win, width= 700, height= 650, bg="Red")
    canvas.create_text(300, 100, text=result, fill="black", font=('Helvetica 15 bold'))
    canvas.pack()
    win.mainloop()
				
			

Code Explanation

1. First we read the contents of two files using open() method
2. We read the contents of lines using read() function which return as string.
So we splitted that string into list of words by excluding punctuation marks(,).
3. We extacted the common words from two lists by conveting them to sets.
4. Next we calcualted the length of plagarised words and total words present in two files.
5. Finally we applied a formual there to calculate the plagarised content from two files and printed it on the screen

Output

Detect plagiarism in files using Python

Find More Projects

library management system using python with source code using Python GUI Tkinter (Graphical User Interface) How to Run the code: Introduction:  Imagine …

Space Shooter Game Using Python with source code Overview: A space shooter game typically involves controlling a spaceship to navigate through space …

Hotel Management System Using Python with source code Introduction: Welcome to our blog post introducing a helpful tool for hotels: the Tkinter-based …

Student Management System Using Python Introduction: The Student Management System is a comprehensive software solution designed to streamline the process of managing …

Billing Management System using Python introduction: The Billing software using python is a simple yet effective Python application designed to facilitate the …

Medical Management System using Python with Complete Source code [By using Flask Framework] Introduction Hospital Management System The Hospital Management System is …

More Python Projects
Get Huge Discounts

All Coding Handwritten Notes

Browse Handwritten Notes