Art Unit Prediction using Machine Learning – Limited Prototype

I have been working on a tool for patent practitioners that aims to predict the Art Unit that a given patent application is likely be to assigned. Before digging into the details, a link to an early prototype is available here: https://tools.engineerturnedlawyer.com

Context

The USPTO has thousands of Patent Examiners that examine newly filed patent applications. Typically, a patent application is handled by one Examiner from initial filing to final disposition (e.g., issuance into a patent or abandonment by the applicant). Accordingly, the USPTO needs to assign an appropriate Patent Examiner with a technology background that aligns with the technology of the subject application. For instance, an application directed to a new electronic device should be assigned to a Patent Examiner with a background in that technology area (e.g., a degree in Electrical Engineering and/or experience designing electronic devices).

The USPTO organizes the Patent Examiners by technology to facilitate this assignment of new patent applications to an appropriate Patent Examiner. At the highest level, the USPTO organizes examiners into Technology Centers including:

TC 1600 – Biotechnology & Organic Fields
TC 1700 – Chemical and Materials Engineering Fields
TC 2100 – Computer Architecture Software and Information Security
TC 2400 – Computer Networks, Multiple, Cable and Cryptography/Security
TC 2600 – Communications
TC 2800 – Semiconductors, Electrical and Optical Systems and Components
TC 3600 – Transportation, Electronic Commerce, Construction, Agriculture, Licensing and Review
TC 3700 – Mechanical Engineering, Manufacturing and Products

Each of these Technology Centers are sub-divided into Groups with a narrower technology focus within their respective Technology Center. Further, each of the Groups within a Technology Center are sub-divided into Art Units with an even narrower technology focus than their given Group. The diagram below depicts this hierarchy.

When a new patent application is filed, the USPTO finds the Art Unit that most closely aligns with the technology covered by the patent application and assigns a Patent Examiner from the identified Art Unit. Typically, the USPTO uses the language of the Claims at the end of a new patent application that define the scope of the invention to find a corresponding Art Unit.

In practice, some Art Units, Groups, and/or Technology Centers tend to be “easier” in terms of examination than other art units. For instance, some Art Units may allow 70% of assigned patent applications to issue into a patent (i.e., have a 70% allowance rate) while other Art Units may only allow 10% of assigned patent applications to issue into a patent (i.e., have a 10% allowance rate).

Given that disparity between how Art Units treat assigned patent applications, it can be beneficial to steer patent applications away from “tougher” Art Units with low allowance rates to “easier” Art Units with high allowance rates. Many patent practitioners draft claims to avoid some of the toughest Art Units such as those that examine “Business Method” inventions. For instance, patent practitioners may avoid using words that are related to financial concepts (e.g., money, loan, stocks, bonds, etc.) to reduce the likelihood of being assigned to a Business Method Art Unit.

Project Goal

The goal of this project to make a free software tool using the latest transformer-based machine learning models that accurately predicts the Art Unit(s) that a patent application is likely to be assigned to based on a draft claim.

Current Prototype

The current prototype is my first try at using a transformer-based machine learning model for Art Unit classification. The current prototype is constrained to only identify the Technology Center (rather than the Art Unit) because I only need to differentiate between 8 distinct classes (corresponding to 8 different Technology Centers) rather than 50+ of classes (corresponding to each of the many Art Units). At present, accuracy of the predictions provided by the underlying model is above 80% on examples that were not used as part of the training dataset for the model. In the future, I plan to both improve the performance of the model on distinguishing between Technology Centers (e.g., to 90%+ accuracy) and also layer in predictions for the Group and individual Art Units.

Feel free to give it a try https://tools.engineerturnedlawyer.com.