An article in the Harvard Business Review recently touched on Why Job Postings Don’t Equal Jobs, explaining that these data should be considered unreliable when trying to estimate job demand under various circumstances. Specifically:
- Professional-type jobs are more likely to be posted online
- Companies often advertise the same job multiple times, and
- For job boards that require payment to post openings, firms may post more openings when there is a discount offered, whether or not they currently need those workers.
A few additional concerns were not mentioned in that article:
- Some jobs are posted for legal reasons, such as firms sponsoring foreign workers for permanent residence (green card). Firms have to “test” the labor market by advertising those jobs even though they have hired a foreign worker already and have no intention of hiring someone else. Most of these cases are professional jobs as well.
- The methods used to collect and clean online postings and estimate trends over time can be problematic.
The methods used to obtain and clean job postings data are varied and are typically closely guarded. For an objective review of several providers of these data, see the Vendor Product Review: A Consumer’s Guide to Real-time Labor Market Information. Vendors scrape and spider job boards automatically and manually, code results into anywhere from 5 to 70 data elements, and deduplicate 60 to 90 percent of job ads. Based on around 4 million job postings daily, and assuming ads are only duplicates (not triplicates, etc.), that could mean anywhere from 1.2 to 1.8 million job postings are thrown out as duplicates every day.
Methods for analyzing the postings range from keyword searches to natural language processing and text analytics, but small details in methodology can have outsized effects on what gets counted. Take, for example, the difference between searching job postings on Indeed.com for registered nurses using different keywords such as “rn,” “registered nurse,” or “registered nurses.”
This simple search raises a few questions:
- Which keyword or collection of search terms best represents job postings for registered nurses?
- Do the keyword results change by region?
- In another field, how might different data providers distinguish between R, the statistical programming language, and H.R. (Human Resources) or R&D (Research & Development)?
The answers to these types of questions will likely vary by data provider and should be considered before relying on the data for analysis.
Many providers make a concerted effort to improve collection, parsing, and deduplication methods; however, significant changes in methodologies can cause additional confusion and inconsistency in job advertisement data if used in analysis over time. Changes to the deduplication methodology used by The Conference Board, for example, resulted in revisions lowering estimates by about 460,000 jobs for every month in the series. The overall curves were fairly consistent, showing similar shape and trends, but anyone relying on the actual levels for measuring or forecasting employment demand could find old estimates too high by hundreds of thousands of jobs.
In summary, the use of online job postings data to glean labor market information is promising, but there are a number of concerns that suggest these data are not sufficient replacements for traditional labor market data.
Research assistance for this post was provided by Patrick Clapp.