sort_CV

For this project we need two libraries

  1. PyPDF2
  2. nltk

if these libraries not installed you can install them from anaconda by conda install command, or pip install from terminal/cmd or from in a cell in jupyter notebook (following cell has an example)

In [1]:
#!pip install PyPDF2


#importing necessart library
import PyPDF2 as p2
from nltk import flatten
In [2]:
content = []
token = []
have_skills = []

#these tools are the company requirement for a data scientist post
tools = ["python", "hadoop", "r", "sql", "apache", "spark", "java", "git"]

#read the pdf file. these file can ve renamed sequecial order and can be read in a for loop to get
#multiple pdf score


pdf = open("Abrar's_CV.pdf", "rb")
pdfr = p2.PdfFileReader(pdf)


#appedning all the text from the pdf in a list, getNumPage() return the number of page
#use this number of page as termination condition of loop
for i in range(pdfr.getNumPages()):
    page = pdfr.getPage(i)
    content.append(page.extractText())
    

#spliting each sentences in words    
for i in range(len(content)):
    token.append(content[i].split())
    

#as we split words from different index content of a list, so spliting creates a new list with multiple list inside
# we will flatten the list to get all the words in a single list

token2 = flatten(token)
for i in range(len(token2)):
    for j in range(len(tools)):
        #here before comparing keywords with company tools keyword we convert all of them into lower string
        if tools[j].lower() in token2[i].lower():
            have_skills.append(tools[j].lower())
            
#printing all the skills that matches with the company requirements            
print(set(have_skills))

#scoring the cv based on how much skills (actually keyword mentioned in CV) 
#this score can be customised with any weights for each skills and any equations

score = len(set(have_skills))/len(tools)*100

#and finally printing the score
print(str(round(score, 2)) + "%")
{'r', 'python', 'git', 'java'}
50.0%

Further work

  • CV name and score can be stored in a excel file
  • Sort the score based on accending order and approach the person who have heigher score
  • At least remove the person who have very low score

Shorticomings

  • There is some encoding problem, sometimes some CV gives zero score
  • So zero score CV should be handled carefully
In [ ]:
 

Github Link To this Project