You can try to optimize your resume for biggies like Taleo and Greenhouse, but there’s less than half a chance the job you end up applying to will use this system. Complicating matters, the third largest slice (“Homegrown”) is not a clever ATS name, but rather represents the large share of companies that just built their own in-house system with unknowable quirks. Larger ATS like Greenhouse offer a directory of third party plugins that may end up doing the screening, further fragmenting the market.

You would be foolish to spend much time or energy optimizing for thousands of different ATS. Every ATS working to filter your resume is built on very similar principles of data science. Therefore, this basic crash course in data science techniques will provide you all the knowledge you need to beat the ATS.

Natural Language Processing 101

The history of natural language processing (NLP) can roughly be divided into “deductive” and “inductive” phases.

For the first half of data science history, the field was dominated by a “deductive” approach, in which programmers would write complicated programs to mimic the rules of grammar, and then use these rules to try to deconstruct real world language.

The “inductive” approach reversed this, starting by collecting very large sets of real world language, and then feeding it en masse to machine learning libraries and letting the computer figure out the rules.

Instead of getting your resume past thousands of ATS, it’s far simpler to think how you can get it through programs designed in either of these eras.

The Deductive Phase

The deductive approach to NLP is similar to sentence diagramming you may have had to do in your grammar classes.

But how he got in my pajamas, I’ll never know…

These deductive approaches were necessary due to the scarcity of data in the early days of computing. For decades computers could not always connect to the internet, and transfer speeds were sluggish even when they were connected. For reference, data used to be moved around by floppy disk, which could carry up to 1.44 MB (about half the size of a modern resume). Nowadays, if you can’t transfer this amount of data every second you will complain loudly and angrily that your internet is broken.

Since good datasets were tough to come by, inductive methods were difficult and the deductive approach dominated. For several decades Natural Language Processing consisted of trying to encode every rule of linguistics into computer code and translate sentences accordingly.

As it turns out, language is incredibly complicated. You can create grammatically correct sentences that are nonetheless highly ambiguous. A classic example sentence used to trip up NLP is “Time flies like an arrow, fruit flies like a banana.”

Parsing language is tough enough, and human idiosyncrasies make it much tougher. For example, most human readers won’t particularly care if you omit periods at the end of your bullet points. The NLP parser may misinterpret it as an incomprehensible run-on sentence.

ONE MISPLACED PERIOD CAN HELP SOMEBODY ELSE. GET A JOB.

Programmers burned through millions of dollars in early AI hype cycles trying to build increasingly complicated models to diagram complicated sentences. Progress was slow, and onlookers scoffed that natural language processing was dead on arrival.

Haters gonna hate, but the decades of research was not for naught. Much of the research became the basis of many open source libraries today. This phase enabled open source libraries like Stanford NER and Python’s NLTK to become the popular and powerful tools they are today. These libraries continue to improve, thanks in large part to general advances in AI spurred by the rise of the inductive phase.

The Inductive Phase

Once people became connected to high-speed internet, the amount of data available exploded and artificial intelligence swiftly became more sophisticated. Larger datasets allowed the inductive approach to more successfully process language. Instead of trying to program a complicated series of rules and hope the data would fit, it became possible to feed a massive set of data to computers and see if the computer could isolate the patterns and construct rules.

The inductive approach provided a shot in the arm to the stagnant field of Natural Language Processing. After decades with little progress, your messaging programs got pretty good at completing your sentences almost overnight.

Of course, much work remains.

Data science today requires more machine processing power than it does human brainpower —nowadays you can simply throw tens of thousands of pictures of cats at any number of deep learning libraries, and out pops a pretty good cat classifier.

The catch with this approach? The rules that are generated using this inductive approach generally get wired into a black box, making it almost impossible to reverse engineer. Heisenberg’s uncertainty principle seemingly has a cousin in information processing — you can get the right answer or you can see how to get the answer, but you can’t see both. If true, we predict machine learning will one day solve the traveling salesman problem, but the efficiency of the solution will be impossible to determine and academics will continue to debate whether or not P vs. NP was actually solved.

More practically, the black box of machine learning has tangible issues with how you could build an ATS on the inductive approach. If you want to understand why an ATS built on last generation’s deductive approach missed a great MIT candidate, you can read through the code and change your regular expression to catch for M.I.T. with periods. With the inductive approach there’s no way to know why the machine decided as it did. If you realize the error, the best way to fix it is to add dozens of cases to your dataset, retrain your engine from scratch, and hope you didn’t introduce any unintended errors.

For more information about machine learning that’s powering this inductive phase of data science, grab your favorite big data set, check out TensorFlow, Keras, or PyTorch, and get ready to spend up on GPU processing credit.

Optimizing Toward These Two Approaches

As we’ve seen, the landscape of ATS software is very diverse. Some companies have been using the same ATS software for decades. Some companies use more modern systems based on technology like the RezScore API. If you are applying to a few dozen jobs, you will likely confront a blend of both strategies.

The trend in the long-term favors the inductive approach, but there’s some obstacles will slow its onset. The inductive techniques are only as good as the size of their training dataset, so startups to midsize companies will have an insufficient number of resumes to run any good machine learning. With so many ATS out there, many will never get the scale required to run truly state-of-the-art machine learning. As a result, you may assume that the average ATS is currently likely to run deductive techniques.

Optimizing Toward a Deductive ATS

If the ATS is taking the old-fashioned deductive approach, you can expect that programmers are writing the rules to screen out your resume. You may not know exactly how the ATS is doing this, but you can count on a few things:

Resume parsing is quite hard and many cannot accomplish it 100%.

Because resume parsing is hard, the programmers will always be looking for shortcuts and easy workarounds.

Because the workarounds are imperfect, programmers are looking to codify the most important filters for a company.

Therefore, you can maximize your chances by preparing your resume to pass not just the most sophisticated ATS checks, but also the least sophisticated.

In other words, a good ATS will be relatively well programmed, whereas a poor one will show some common errors. If you consider both extremes, you will maximize your chance against any ATS.

The other rule of thumb that emerges is to keep things simple! In fact, this is pretty good advice just generally. All parsers can parse a simple plain resume with minimal formatting. What use is the most beautifully designed resume if the parser knocks it out before anybody sees it?

Simply keep in mind Think about what kind of resume the hirer wants to let through the applicant tracking system and make your resume fit this and you’ll get through 99% of ATS. Here are some particular things hirers care about how they may code for it:

Geography

Objective: Hire locally so as to not to pay relocation bonuses when possible.

Common ATS Approaches:

Parse out all addresses on the resume and contextualize them.

Addresses closer to the top or most recent job position receive greatest precedence.

Geolocate the candidate relative to the job in question and score the candidate inversely to their proximity.

Strict matching: The job says New York, NY, but Jersey City is in NJ, not NY.

Ignoring the first listed address in favor of the address associated with your last job.

Tips:

List your current city the same as the job is hiring. List the general metropolitan area that best matches the position. (If the job is in Brooklyn, and you live in Brooklyn, write your address as Brooklyn. If you apply to the Brooklyn job but live in Manhattan, you live in Manhattan, then write your address as New York City.)

If your last job position was in a different city than you currently live, or if you have relocated a lot, consider omitting location on your work experience.

Format geographic terms as City, XX (two letter state abbreviation). This is how about 90% of people format it, so this is what parsers seek.

If you live on London Street, consider omitting this. You don’t want a badly programmed ATS to reject you because it thinks you’re in the UK and need a visa.

Skills

Objective: Ensure the applicants have all required skills

Common ATS Approaches:

Gauge the relative time value ofskills; if you used Python on a job you started five years ago, the credit your application with five years of Python.

Use clustering and/or document similarity to interpolate missing keywords based on similar resumes.

Calculate the similarity between the job description and the resume.

Reject resumes that do not have specific mandatory keywords.

Grant additional weight to keywords that appear close to the top of the resume.

Grant additional weight to keywords that appear frequently throughout the resume.

Tips:

Make sure major keywords in the job description exist in your resume.

Make sure spelling and capitalization of search keywords is standard (and matches the job description).

Move technical skills section to the top of the document for applications that will go through an ATS (if you know it will only be read by a human and not an ATS it is better at the bottom).

Keep your experience section formatted like most standard experience section templates, so the parser can easily find your details.

Education

Common ATS Approaches:

Incorporate school rankings to weight candidates.

Parse out schools from email .edu extension.

Parse out the last school you attended and present this in the dashboard summary of candidates.

Tips: