Skip to content

ClaimCompare Patent Novelty Destruction Data Pipeline

Notifications You must be signed in to change notification settings

aravp21/claim-compare

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

ClaimCompare: A Data Pipeline for Evaluation of Novelty Destroying Patent Pairs

Paper Abstract

A fundamental step in the patent application process is the determination of whether there exist prior patents that are novelty destroying. This step is routinely performed by both applicants and examiners, in order to assess the novelty of proposed inventions among the millions of applications filed annually. However, conducting this search is time and labor-intensive, as searchers must navigate complex legal and technical jargon while covering a large amount of legal claims. Automated approaches using information retrieval and machine learning approaches to detect novelty destroying patents present a promising avenue to streamline this process, yet research focusing on this space remains limited. In this paper, we introduce a novel data pipeline, ClaimCompare, designed to generate labeled patent claim datasets suitable for training IR and ML models to address this challenge of novelty destruction assessment. To the best of our knowledge, ClaimCompare is the first pipeline that can generate multiple novelty destroying patent datasets. To illustrate the practical relevance of this pipeline, we utilize it to construct a sample dataset comprising of over 27K patents in the electrochemical domain: 1,045 base patents from USPTO, each associated with 25 related patents labeled according to their novelty destruction towards the base patent. Subsequently, we conduct preliminary experiments showcasing the efficacy of this dataset in fine-tuning transformer models to identify novelty destroying patents, demonstrating 29.2% and 32.7% absolute improvement in MRR and P@1, respectively.

Repository Overview

This repository contains the ClaimCompare submission for the 2024 PatentSemTech Workshop. Included is the code for the ClaimCompare pipeline as well as our sample dataset for the electrochemical domain. This sample dataset possesses the claims and relevant metadata for 1,045 base patents each matched with 25 novelty destroying or related (non-novelty destroying) patents. To fit the entire dataset in GitHub, we segment it into 11 chunks of 95 patents each, all of which can be found under the sample_dataset folder.

About

ClaimCompare Patent Novelty Destruction Data Pipeline

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors