Using Machine Learning To Automate Data Coding At The Bureau Of Labor Statistics (BLS)


Government agencies are awash in documents. Many of these documents are paper-based, but even for the electronic documents a human is still often needed to process and understand those documents to make use of them for vital services. Federal agencies are increasingly looking to  AI to help improve those document and human-bound processes by applying advanced machine learning, neural network, and natural language processing (NLP) technologies. While for many these technologies might be fairly new in their organization, in some government agencies, they have been using that technology for many years, augmenting and enhancing various workflows and tasks. 

In the case of the Bureau of Labor Statistics (BLS), the agency is mandated to conduct a Survey of Occupational Injuries and Illnesses to determine workplace injuries and help guide policy. To perform this survey, BLS has dozens of trained staff in offices throughout the country who classify injuries and illnesses using workplace-generated survey data. However, the human-based processes performed at BLS were performed manually, causing inconsistencies in labeling, coding errors, and speed and cost bottlenecks. 

To streamline this process, BLS implemented machine learning to help. About a decade ago Alex Measure, Economist at Bureau of Labor Statistics decided to explore how machine learning (ML) could help the agency improve and shares with us how he incorporated AI into BLS as well as some of the unique challenges that the federal government has around data usage that could be obstacles for agencies looking to use AI as well as what he’s most excited to see in the coming years. In this article, he shares his insights on applying ML to the sorts of document and human-bound processes that exist throughout the government. 

What are some of the unique challenges of the BLS with regards to data and data collection?

Alex Measure: The Bureau of Labor Statistics produces information about a wide variety of topics covering everything from employment and prices to time-use and workplace injuries. One thing that all of these activities have in common however, is language. When we go out there to collect this information, whether by interviews, surveys, or some other means, most of the information we are collecting is communicated in the form of language. One of the ways we convert this language into statistics is through a process we call coding, in which we assign standardized classifications to indicate key characteristics of interest. For example, the Survey of Occupational Injuries and Illnesses collects hundreds of thousands of written descriptions of work related injury and illness each year. In order to answer questions like “What is the most common cause of injuries for janitors?”, we go through each of these descriptions and assign codes to indicate things like the occupation of the worker, and the event that caused their injury. The resulting information can then be aggregated to answer our questions. One problem, at least until recently, is that this is a lot of work, and work that mostly has to be done by hand. For the Survey of Occupational Injuries and Illnesses, we estimate it requires about 25,000 hours of labor each year. If you want it done quickly that means you need to have a lot of people working on it simultaneously, and that means you need to train a lot of people and make sure they are all interpreting things consistently. It’s not easy, in fact we find that even when we ask experienced experts to code the exact same injury narratives, any two experts will only agree on the same codes for the same case about 70% of the time. That’s a big challenge not just in BLS, but in many organizations working on similar tasks around the world. 

How is Bureau Labor Statistics using machine learning to solve these problems?

Alex Measure: : Seven years ago, BLS did all of the coding for the Survey of Occupational Injuries and Illnesses by hand. This past year we did more than 85% of it automatically using supervised machine learning, specifically with deep neural networks. BLS is increasingly applying these same techniques to a wide variety of related tasks covering everything from the classification of occupations and products to medical benefits and job requirements.  

How has the BLS’s view and use of AI evolved over the years?

Alex Measure: When I started at BLS nearly 12 years ago, the main approach people were using was what’s sometimes called the knowledge engineering or rule-based approach. The basic idea is, if you want a computer to do something, you need to explicitly tell it every rule and piece of information that is necessary to perform the task. If you’re classifying occupations, for example, that could mean creating a list of all the job titles that might show up, and the corresponding occupation codes that should be assigned when they do. 

This approach works well when working with simple and standardized things, but unfortunately that’s rarely the case with human language, even in a domain as narrow as job titles. In the Survey of Occupational Injuries and Illnesses, for example, we found that each year we received about 2,000 different job titles all corresponding to the occupation “janitor”. To make matters worse, many of those job titles had never occurred in our data previously. To make matters still worse, many of those job titles were associated with different occupations depending on other factors like the naming practices of the individual company or the industry of the employer. The result is that you need a huge number of often complicated rules, all to assign just one of the more than 840 occupation classifications we assign. Building and maintaining this sort of system can be incredibly time consuming and difficult.

Supervised machine learning provides an alternative, instead of telling the computer everything it needs to know and do, we instead tell the computer how to learn from data and then feed it lots of data showing how some task should be performed. If you have lots of this data, which we do, since we’ve been doing this by hand for many years, you can often build a very effective system with very little additional work. In our case, we built our first machine learning systems using free and open source software in just a couple of weeks, and found they vastly outperformed the far more expensive rule based alternatives we had been exploring. Even more surprisingly, they also outperformed our average human coders. 

This opened the door to a lot more automation than was previously feasible, and this same technique is now being rapidly adopted for many similar tasks in BLS and in statistical agencies throughout the world. And of course, machine learning is useful for many other tasks. We’re now also using machine learning to automatically detect data errors, and to automatically match records in datasets with imperfect identifiers. That’s a big deal for us because it allows us to increasingly use the vast amounts of data being generated by many different sources.  

How has AI / ML changed the role of humans who performed these tasks before?

Alex Measure: There was a lot of concern when we started thinking about implementing this sort of automation that staff would be resistant and see this as a threat. That’s not what happened however, and I think that’s due to a combination of how we implemented things, and the circumstances of our situation. First and foremost, a decision was made very early on that the focus of our automation would be on improving data quality. This was important not just because data quality is important but also because this was a very new way of doing things, and we needed to make sure we did it right, and had a good backup plan if something went wrong. Our plan was basically the following:

  1. Automate the stuff that the computer does best, leave the stuff that humans do best to humans.
  2. Introduce automation gradually, so staff have time to adjust to the changes in workload.
  3. Keep humans in charge by asking staff to review all automatically assigned codes and make changes when they feel the computer is wrong. 
  4. Direct the remaining resource savings to other important tasks like data collection and data review.

The result was that over the course of 6 years, large amounts of routine coding workload were gradually replaced by more and better review and data collection. One unexpected consequence is that although our need for routine manual coding has sharply decreased, our need for small amounts of expert human coding has actually increased as experts are still needed for the very difficult cases that the model cannot handle, and are also now critical for verifying that the machine learning system is working correctly.

What are some interesting or surprising insights you can share about BLS’s use of AI?

Alex Measure: There were two really big surprises for me. The first was that free open-source software made it so easy to build a machine learning system that outperformed the expensive rule-based approaches we had been considering. The second was that the system assigned codes more accurately, on average, than trained human coders. I certainly did not expect that when we first started and I think it took all of us a while to believe it, but the results have been consistent. On average, our machine learning systems assign these codes more accurately than our trained human staff and this gap has grown as we have introduced more and better training data and switched to better machine learning algorithms such as deep neural networks.

What are some unique challenges that the federal government has around data usage that could be obstacles for agencies looking to use AI?

Alex Measure: One of the big challenges is data confidentiality. Machine learning requires data, government agencies have lots of data that is useful for many very important tasks, but they also have many restrictions on how that data can be shared and this restricts the way agencies can use machine learning. When we first started exploring deep neural networks, for example, it meant we couldn’t use cloud resources because they were prohibited by existing policy. That was a problem because BLS did not have the hardware we needed to train the sorts of neural networks we were interested in. Ultimately we resolved the issue by purchasing and installing the necessary hardware in-house, but it’s the sort of thing that can easily delay or halts these sorts of initiatives.

Another important challenge is sharing the models. When BLS develops a machine learning model that can automatically classify occupations or injuries into standardized categories, that model is useful not just to BLS, and not just to the many other federal agencies that perform similar tasks, but also to external researchers and members of the general public. BLS and other agencies are uniquely positioned to train these models because of our access to large amounts of relevant data, so it would be nice if we could share, but research has also shown that models can inadvertently reveal information about the data used to train them, which means we have to be very careful. Recent research has demonstrated techniques that can be used to mitigate these risks and BLS has recently started exploring them, but it is not an easy challenge. 

Looking more broadly, what are a few areas where you’re seeing AI effectively being used in the federal government?

Alex Measure: If you could peek inside every federal agency, I think you would see what I see, which is lots of opportunities for applying supervised machine learning to automate routine tasks that we have been doing by hand for a very long time. One clear example in the statistical agencies is the language coding and classification work. I’m also seeing more statistical agencies using similar techniques for automatic error detection and matching records from different datasets.   

What can federal agencies do to attract the skilled workforce it needs to keep up with technological innovations?

Alex Measure: I think the best thing federal agencies have going for them is mission. Federal agencies tend to work on very important things that benefit the entire country and that is very attractive to civic-minded people. It’s also not the sort of thing you get to do just anywhere. I should also point out that hiring is not the only way to get a skilled workforce. The injury coding project, and many subsequent machine learning projects within the BLS did not start by hiring outside experts in AI, they started with people already in the agency that became interested in automation and then learned the relevant techniques from free online resources like Coursera, and later each other. BLS may be especially well suited for this sort of thing as staff already have lots of experience in the closely related field of statistics, but I’ve seen similar stories play out many times in many other locations. A successful machine learning project requires knowledge both of machine learning and the subject matter. Agencies already have the latter and the internet is full of excellent free resources for acquiring the former. 

What AI technologies are you most looking forward to in the coming years?

Alex Measure: I rely heavily on supervised machine learning for my work, but it has one very large limitation; you need training data to get good performance, and not just a little data, often tons of it, far more than the typical human would need to learn a similar task. I am talking about, in many cases, hundreds or thousands of examples of training data for every concept you want the model to learn. That is a huge barrier because most tasks just don’t have that kind of data lying around. 

In the last several years researchers have made a huge amount of progress addressing this and it has come largely from advances in two areas; transfer learning, that is transferring the knowledge learned on one task to another, and what’s sometimes called self-supervised learning, which is basically applying supervised learning techniques to data that is not explicitly labeled. One popular self-supervised task for language these days is some variation of the following; gather up a huge collection of text, then repeatedly sample some small subset, hide some of the words in the sample, and train the model to predict the missing pieces from the context. If you do this correctly you can get a model that knows a huge amount about language, all without any explicit labels. You can then use the techniques from transfer learning to apply this model to a different language task, such as predicting injury classifications, and if you’re lucky, dramatically reduce the amount of injury-specific training data needed to automate the task. This opens the doors to all sorts of automation that isn’t currently feasible because of the lack of training data.

Another area that I’m following very closely is differential privacy, which has applications both inside and outside AI. Recent advances have produced mechanisms that allow the sharing of machine learning models while providing rigorous privacy protections to the underlying training data. Advances in this area are likely to increasingly allow trusted data collectors to share useful aggregates (like trained machine learning models and statistical estimates), while improving the confidentiality of the underlying data.



Source link