For this project we need two libraries¶
- PyPDF2
- nltk
if these libraries not installed you can install them from anaconda by conda install command, or pip install from terminal/cmd or from in a cell in jupyter notebook (following cell has an example)
#!pip install PyPDF2
#importing necessart library
import PyPDF2 as p2
from nltk import flatten
content = []
token = []
have_skills = []
#these tools are the company requirement for a data scientist post
tools = ["python", "hadoop", "r", "sql", "apache", "spark", "java", "git"]
#read the pdf file. these file can ve renamed sequecial order and can be read in a for loop to get
#multiple pdf score
pdf = open("Abrar's_CV.pdf", "rb")
pdfr = p2.PdfFileReader(pdf)
#appedning all the text from the pdf in a list, getNumPage() return the number of page
#use this number of page as termination condition of loop
for i in range(pdfr.getNumPages()):
page = pdfr.getPage(i)
#spliting each sentences in words
for i in range(len(content)):
#as we split words from different index content of a list, so spliting creates a new list with multiple list inside
# we will flatten the list to get all the words in a single list
token2 = flatten(token)
for i in range(len(token2)):
for j in range(len(tools)):
#here before comparing keywords with company tools keyword we convert all of them into lower string
if tools[j].lower() in token2[i].lower():
#printing all the skills that matches with the company requirements
#scoring the cv based on how much skills (actually keyword mentioned in CV)
#this score can be customised with any weights for each skills and any equations
score = len(set(have_skills))/len(tools)*100
#and finally printing the score
print(str(round(score, 2)) + "%")
Further work¶
- CV name and score can be stored in a excel file
- Sort the score based on accending order and approach the person who have heigher score
- At least remove the person who have very low score
- There is some encoding problem, sometimes some CV gives zero score
- So zero score CV should be handled carefully
