evancy score was defined as the ratio of coexisting
(from the input text) surface forms in the vector of
the word2vec-related surface form out of all coexisting surface forms matched (Zhao et al. 2015).
Formally, let X be a set of all candidate surface
forms extracted from an input text. Then for any sur-
face form x ∈ X and its word2vec vector v, the rele-
vancy score is defined as:
where IA(x) is the indicator function s.t. IA(x) = 1 if x
∈ A, and 0 otherwise. Note that we control the vector
settings such that x ∉ v, that is, a vector is not inclusive of its surface form.
It is important to note that all surface forms in the
word2vec vector of related surface forms v are treat-
ed as equally important. This leads to a major draw-
back. For example, if a resume contains both C++ and
Visual C++, then the relevancy score of C (a related
programming language) should be high, while if
Hewlett-Packard Graphics Language (HPGL) and
automata theory appear instead, then the score
should be lower. Unfortunately, there is no distinc-
tion in relevancy score between these two cases
under the frequency-based approach. To address this
drawback, we propose a weighted semantic relevan-
cy scoring approach. As described in equation 2, this
relevancy score takes into account the weight of each
matched surface form in the vector of related surface
forms. These weights are in fact the cosine similari-
ties of the word2vec vector of related skills. Hence, if
c denotes the vector of cosine similarities cj ∈ c, the
new relevancy score is formally defined as:
where vj and cj denote the j component of v and c
Therefore, for a matched surface form, the occurrence of its closely related surface forms increases its
relevancy score substantially, while the occurrence of
its loosely related surface forms does not impact its
relevancy score much.
In practice, the relevancy scores were extremely
low with a 2 percent median and a 0.003 percent
variance, making relevancy ranking difficult. This is
because the size of the related surface forms is much
larger than the size of the coexisting surface forms
from an input text. We utilized beta distribution fitting to address this problem, as highlighted in figure
3. The distribution of the raw scores (depicted by
black curve) is heavily right skewed (centered at
0.02), giving an incorrect perspective of low relevancy. After scaling through beta distribution, the final
scores (depicted by red curve) span more evenly in
the [0, 1] interval, making relevancy rating easier.
RelScore( x) = j;xj;X IX xj() ;
j;xj;XIv xj() ;
RelScore(x) = j;vj;Xcj ;
The parameters for the beta distribution (α = 0.1627,
β = 6.2385) were empirically chosen so that the final
relevancy scores span more evenly on the desired
[0, 1] interval. Only surface forms that score 70 percent or higher are returned.
The evaluation of our skill tagging framework was
performed through sampling-based user surveys.
While an automatic approach is available, we believe
that the users are the ones who know best what skills
they possess. We analyzed more than 1,300 responses of active users over six months across all industries
categorized by SOC. To measure precision, we asked
Table 2. A Resume Sample.
Georgia Institute of Technology, Atlanta, GA
M.S. Computer Science (Specialization: Machine
Learning). Aug 2015 - May 2018
Georgia Institute of Technology, Atlanta, GA
B.S. Chemical and Biomolecular Engineering. Aug
2009 - Dec 2011
Skills: Java, Android, Python, Git, software development,
Project: A System for Automated Testing of Android Apps
Design architecture of a crawler-based robot system to
automate mobile app testing.
Integrate various testing framework (e.g. Appium,
SL4A) APIs into system implementation.
Developer Intern, AT&T, Atlanta, GA. June 2017 – Present
Evaluate U-verse data and apply machine learning
techniques to predict potential failures.
Developer (Part-time), BlueFletch Mobile, Atlanta, GA.
Aug 2016 - April 2017.
Developed client Android apps (implemented app
features, utilized APIs to achieve various functionalities
efficiently, refined user interface and user interface
Android Application Developer, Academia Sinica, Taipei,
Taiwan. Jan 2016 June 2016
Implemented/tested SmartHear 2.0 app functionalities,
and published to Google Play store.
Issuing Services Consultant (Software Development),
FIME, Taipei, Taiwan. Dec 2013 Jan 2016
Coordinated with clients to define/refine requirements
for software development projects.
Fluent in Mandarin Chinese, volunteer at Atlanta
Shakespeare Tavern, member of Atlanta Android
Developers group, member of Taiwanese Student