GitHub - aravp21/claim-compare: ClaimCompare Patent Novelty Destruction Data Pipeline

ClaimCompare: A Data Pipeline for Evaluation of Novelty Destroying Patent Pairs

Paper Abstract

A fundamental step in the patent application process is the determination of whether there exist prior patents that are novelty destroying. This step is routinely performed by both applicants and examiners, in order to assess the novelty of proposed inventions among the millions of applications filed annually. However, conducting this search is time and labor-intensive, as searchers must navigate complex legal and technical jargon while covering a large amount of legal claims. Automated approaches using information retrieval and machine learning approaches to detect novelty destroying patents present a promising avenue to streamline this process, yet research focusing on this space remains limited. In this paper, we introduce a novel data pipeline, ClaimCompare, designed to generate labeled patent claim datasets suitable for training IR and ML models to address this challenge of novelty destruction assessment. To the best of our knowledge, ClaimCompare is the first pipeline that can generate multiple novelty destroying patent datasets. To illustrate the practical relevance of this pipeline, we utilize it to construct a sample dataset comprising of over 27K patents in the electrochemical domain: 1,045 base patents from USPTO, each associated with 25 related patents labeled according to their novelty destruction towards the base patent. Subsequently, we conduct preliminary experiments showcasing the efficacy of this dataset in fine-tuning transformer models to identify novelty destroying patents, demonstrating 29.2% and 32.7% absolute improvement in MRR and P@1, respectively.

Repository Overview

This repository contains the ClaimCompare submission for the 2024 PatentSemTech Workshop. Included is the code for the ClaimCompare pipeline as well as our sample dataset for the electrochemical domain. This sample dataset possesses the claims and relevant metadata for 1,045 base patents each matched with 25 novelty destroying or related (non-novelty destroying) patents. To fit the entire dataset in GitHub, we segment it into 11 chunks of 95 patents each, all of which can be found under the sample_dataset folder.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
sample_dataset		sample_dataset
README.md		README.md
pipeline.ipynb		pipeline.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ClaimCompare: A Data Pipeline for Evaluation of Novelty Destroying Patent Pairs

Paper Abstract

Repository Overview

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

aravp21/claim-compare

Folders and files

Latest commit

History

Repository files navigation

ClaimCompare: A Data Pipeline for Evaluation of Novelty Destroying Patent Pairs

Paper Abstract

Repository Overview

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages