all been renamed for clarity. The codebase was
upgraded to Java 8 and uses new features in the latest version of the Java SDK, such as optionals and the
streams API, for more expressive and maintainable
code.
The development team that deploys and operates
the SKILL service was also able to reduce lines of
codes in the project by ripping out the old servlet
configuration code and using a homegrown web
service chassis instead. This chassis is used in many
web services at CB and handles application startup,
request and response deserialization, and other boilerplate web service functionalities, thereby removing
complexity from the SKILL service and improving
ease of maintenance. The service has also been
enhanced to respond with a variety of descriptive
HTTP status codes for various errors, such as 400 Bad
Request errors for improperly structured requests and
401 Unauthorized errors for requests that do not
present the required authentication credentials.
The SKILL service has been available for production use within CB’s technology department for more
than two years at the time of writing. In this time, a
large number of development teams have found
applications for the service.
Figure 4 shows a graph of production traffic to the
service over a three-day window, broken down by
caller. In total, the service provides skill tagging for
over a dozen applications within the CB ecosystem.
Traffic patterns vary per customer: some have higher
volume, some send spiky bursts of traffic, and so on.
Figure 5 shows a graph of production traffic for a sin-
gle application, CB’s demand data processing system,
which runs in very large batches and creates massive
amounts of traffic in short bursts. This caller’s traffic
was omitted from the graph in Figure 4 to ensure
visual legibility. Even during our highest traffic peri-
ods, the SKILL service remains highly performant,
with a 0.00 percent error rate and a 99th-percentile
response time of 35ms. In the past year, we have been
able to tune our performance and scalability to these
levels by moving the service to a Docker-based con-
tainerized infrastructure, which allows us to bring up
new instances in seconds during traffic spikes, and
also reduces operational overhead costs.
Scaling up our server fleet to handle these traffic
spikes smoothly proved quite difficult. Our first solution to this was to run more instances at all times, but
this was wasteful and expensive. The service itself was
already quite optimized, so there were no easy gains
to be made with regards to performance. Ultimately,
we found that the best solution was to consult with
our users and ask them to build gradual scaling into
their batch processes. Currently, a locally deployable,
offline version of the SKILL service is being developed
that will enable teams to perform skills enrichment
without sending requests to our service at all, at whatever speed their own hardware will allow.
Usage and Maintenance
After maintaining the SKILL service in production for
some time, we received customer feedback indicating
a desire for a service that would return related skills
for a skill. We were able to develop and deploy this
functionality in a short amount of time and with
Figure 4. Skills Traffic Over a Three-Day Window, Broken Down by Caller.
The y-axis indicates the total calls per 30 minutes.
200k
400k
600k
800k
1M
Sat Jan 20 12 PM Sun Jan 21 12 PM Mon Jan 22
US:candidatedataprocessing@cb:C03452b21 US:JobDistributionDev@cb:35b31053 US:consumerintel@cb:C21247e84
EU:tech-uk@bb:32e90b4c US:workforceanalyticsdevelopment@cb:697ccfae
cid=Cf6fe44da|cbu=|cba=|oid=2445581120207|cbiu= US:relevancyandrecommendationsdevelopment@cb:10a16274
US:txtor@tk:C6442d6f9 US:RDBDevelopment@cb:c1877095 EU:datascienceapplicationdevelopment@cb:Cc55385c7