Phuong Hoang is a data scientist in the
Data Science R&D group at CareerBuilder.
His research lies in the field of natural language processing and machine learning
with applications to human capital management domain. He attended North Carolina State University, where he earned his
BS in financial mathematics, and his M.S.
and Ph.D. in applied mathematics, specializing in machine learning applications for
medical diagnostics and sports analytics. He
currently resides in Atlanta, Georgia, USA.
Thomas Mahoney is the manager of candidate and data services at CareerBuilder,
where he oversees a group of teams focused
on building out and maintaining classification, data enrichment, and candidate management web services to power CareerBuilder’s products. He has worked in the
human capital management space for the
past five years and is passionate about building fast, reliable, and highly scalable
microservices that deliver high customer
value. He holds a B.S. degree in computer
science from the Georgia Institute of Technology and currently resides in Roswell,
Georgia, USA.
Faizan Javed is a manager of data science at
CareerBuilder, where he leads the Data Science group responsible for data enrichment
technologies such as knowledge bases, entity taxonomies and relationships, data standardization, and deduplication and normalization algorithms for the online
recruitment domain. He has almost 10 years
of industry experience in diverse domains
with multiple technology stacks. Faizan has
over 30 publications in areas ranging from
data science and machine learning to software systems and model-driven engineering. His current area of focus is the application of data science to end-to-end human
capital management processes. He holds an
M.S. degree in computer science and bioinformatics, a Ph.D degree in computer and
information sciences, and a certificate in
technology entrepreneurship, all from the
University of Alabama in Birmingham, Alabama, USA.
Matt McNair, is vice president of global
services strategy, where he focuses on the
edges of Careerbuilder’s products, ensuring
that they drive recruiter efficiency by integrating well with each other and with the
tools that recruiters use daily. His 12 years of
experience in the recruitment space has
made him passionate about applying data
science, running high-scale microservices,
and bringing it all together into a recruiter-friendly sourcing product. He currently
resides in Atlanta, Georgia, USA.
socioeconomic problem. To this end,
we describe the SKILL system for skill
normalization that has been in production at CareerBuilder for more than a
year. More specifically we follow up on
our previous work, which was back
then an emerging prototype, by
describing how the system evolved
over time as it gained greater traction
and usage across the company. We also
focus on the collaboration between the
data science and data engineering
teams and describe how both organizational teams are needed to bring large-scale, high-impact ideas to fruition.
We have received extensive and
valuable feedback both from our internal stakeholders and from external
customers on areas for improvement,
and we are currently researching several future directions for the system. We
plan to improve it by supporting case-sensitive tagging to minimize false positives and building a more comprehensive skill hierarchy. We will also
support emerged, established, and saturated skill categorization. As we
expand the service in various international markets, we will also support
multilingual skill tagging and taxonomies. To support our compensation analytics and career path efforts,
we plan to extend the service with
skills proficiency, expertise inference,
and skill effort capabilities.
Notes
1. www.bbc.com/news/education-25714313
2. money.cnn.com/2016/02/09/news/econ-
omy/america-5-6-million-
record-job-openings/
3. www.wsj.com/articles/indias-skills-short-fall-challenges-modis-
manufacturing-vision-1470653407
4. ec.europa.eu/esco
5. www.careerbuilder.com/
6. www.ai.mit.edu/projects/jmlr/papers/
volume5/lewis04a/ a11-smart-stop-
list/ english.stop
7. en.wikipedia.org/w/api.php
8. www.bls.gov/soc/2010/2010_major_
groups.htm
9. developers.google.com/web-search/docs
References
Bastian, M.; Hayes, M.; Vaughan, W.; Shah,
S.; Skomoroch, P.; Kim, H.; Uryasev, S.; and
Lloyd, C. 2014. LinkedIn Skills: Large-Scale
Topic Extraction and Inference. In Proceed-
ings of the 8th ACM Conference on Recom-
mender Systems, 1–8. New York: Association
for Computing Machinery. doi.org/10.
1145/2645710.2645729
Kivimäki, I.; Panchenko, A.; Dessy, A.;
Verdegem, D.; Francq, P.; Fairon, C.; Bersi-ni, H.; and Saerens, M. 2013. A Graph-Based
Approach to Skill Extraction from Text.
Paper presented at the Graph-Based Methods for Natural Language Processing workshop, 18 October, Seattle, WA.
Luo, Q.; Zhao, M.; Javed, F.; and Jacob, F.
2015. Macau: Large-scale Skill Sense Disambiguation in the Online Recruitment
Domain. In Proceedings of the 2015 IEEE
International Conference on Big Data, 1324–
1329. Piscataway, NJ: Institute for Electrical
and Electronics Engineers. doi.org/10.1109/
BigData.2015.7363890
Mikolov, T.; Chen, K.; Corrado, G.; and
Dean, J. 2013. Efficient Estimation of Word
Representations in Vector Space. Unpublished Manuscript. arXiv preprint arX-
iv:1301.3781. Ithaca, NY: Cornell University Library.
Singh, S.; Subramanya, A.; Pereira, F.; and
McCallum, A. 2011. Large-Scale Cross-Doc-ument Coreference Using Distributed Inference and Hierarchical Models. In Proceedings
of the 49th Annual Meeting of the Association
for Computational Linguistics: Human Language Technologies, Volume 1, 793–803.
Stroudsberg, PA: Association for Computational Linguistics.
Singh, S.; Wick, M.; and McCallum, A.
2012. Monte Carlo MCMC: Efficient Inference by Approximate Sampling. In
Proceedings of the 2012 Joint Conference on Empirical
Methods in Natural Language Processing and
Computational Natural Language Learning,
1104–1113. Stroudsberg, PA: Association for
Computational Linguistics.
Varshney, K. R.; Wang, J.; Mojsilovic, A.;
Fang, D.; and Bauer, J. H. 2013. Predicting
and Recommending Skills in the Social
Enterprise. In Social Computing for Workforce
2.0: Papers from the 2013 ICWSM Workshop,
AAAI Technical Report WS-13-02, 20–23.
Palo Alto, CA: AAAI Press.
Wang, Z.; Li, S.; Shi, H.; and Zhou, G. 2014.
Skill Inference with Personal and Skill Connections. In Proceedings of the 25th International Conference on Computational Linguistics (COLING), 520–529. Stroudsberg, PA:
Association for Computational Linguistics.
Zhao, M.; Javed, F.; Jacob, F.; and McNair, M.
2015. Skill: A System for Skill Identification
and Normalization. In Proceedings of the
29th AAAI Conference on Artificial Intelligence, 4012–4018. Palo Alto, CA: AAAI Press.