Feature Request: Add Taiwan-specific predefined recognizers
Is your feature request related to a problem? Please describe.
Presidio does not seem to include Taiwan-specific predefined recognizers yet. This makes it harder to detect common Taiwan identifiers in Traditional Chinese / Taiwan datasets using the built-in recognizer set.
Describe the solution you'd like
I would like to contribute Taiwan-specific predefined recognizers to presidio-analyzer, starting with identifiers that seem to have clear public formats and validation logic.
Suggested first scope:
TW_NATIONAL_ID — Taiwan's national identification number (commonly called "身分證字號"); format is 1 leading letter plus 9 digits, with public checksum rules available.
- Similar country-specific personal ID entities already exist in Presidio, such as
US_SSN, PL_PESEL, and IT_IDENTITY_CARD.
US_SSN does not seem like a good fit because it is explicitly tied to the United States social security system.
IT_IDENTITY_CARD is closer at the document level, but Taiwan's commonly used concept is usually the personal ID number itself rather than only the document type.
- My current view is that Taiwan is better modeled as a country-specific personal identifier, but I would welcome maintainer guidance on the final naming.
TW_PHONE_NUMBER — Taiwan phone numbers (commonly called "電話號碼"); Presidio already includes region-aware phone-number recognition, and Taiwan numbering has public structural rules.
- Related existing support already appears in Presidio's phone recognizer flow, and maintainers have discussed region-based phone support such as
US and UK.
- Taiwan fixed-line numbers seem to have clearer area-code and length rules, so landline support looks like a strong first candidate.
- Taiwan mobile numbers also appear structurally clear, but I would likely scope the first PR as either landline-only or fixed-line-first with bounded mobile coverage, depending on maintainer preference.
I also plan to update the relevant documentation as part of the contribution.
Describe alternatives you've considered
As of Presidio 2.2.359 on June 16, 2026, the published recognizer/docs surface suggests several country-specific suffix patterns:
_PASSPORT: used by India and Italy; this suffix is for passport identifiers, and Taiwan seems possible, but public validation logic looks weaker than the candidates above.
_IDENTITY_CARD: used by Italy; this suffix is for national identity-card style document numbers, but Taiwan may still be better modeled as a country-specific personal identifier.
_BANK_NUMBER: used by the United States; this suffix is for banking identifiers, and Taiwan does not seem like a good first fit because I have not confirmed a clear low-false-positive validation rule.
_DRIVER_LICENSE: used by Italy and the United States; this suffix is for driver's license identifiers, and Taiwan may be possible, but I have not confirmed a stable public validation approach suitable for a first contribution.
_MEDICARE, _MBI, _NPI, _NHS: used by Australia, the United States, and the United Kingdom; these suffixes are for healthcare-related identifiers, and while Taiwan has healthcare identifiers, I have not yet confirmed a clearly suitable public validation rule for a safe first contribution.
_NINO, _PESEL, _AADHAAR, _PAN, _UEN: used by the United Kingdom, Poland, India, and Singapore; these are country-specific identifier systems without a direct Taiwan counterpart.
Taiwan also has a business registration identifier commonly called "統一編號" or "統編". Its 8-digit validation logic appears to be public and deterministic, but I have not yet found a clearly matching suffix already used across other countries in Presidio, so I think it should remain under maintainer discussion before proposing a final entity name.
Additional context
References:
Feature Request: Add Taiwan-specific predefined recognizers
Is your feature request related to a problem? Please describe.
Presidio does not seem to include Taiwan-specific predefined recognizers yet. This makes it harder to detect common Taiwan identifiers in Traditional Chinese / Taiwan datasets using the built-in recognizer set.
Describe the solution you'd like
I would like to contribute Taiwan-specific predefined recognizers to
presidio-analyzer, starting with identifiers that seem to have clear public formats and validation logic.Suggested first scope:
TW_NATIONAL_ID— Taiwan's national identification number (commonly called "身分證字號"); format is 1 leading letter plus 9 digits, with public checksum rules available.US_SSN,PL_PESEL, andIT_IDENTITY_CARD.US_SSNdoes not seem like a good fit because it is explicitly tied to the United States social security system.IT_IDENTITY_CARDis closer at the document level, but Taiwan's commonly used concept is usually the personal ID number itself rather than only the document type.TW_PHONE_NUMBER— Taiwan phone numbers (commonly called "電話號碼"); Presidio already includes region-aware phone-number recognition, and Taiwan numbering has public structural rules.USandUK.I also plan to update the relevant documentation as part of the contribution.
Describe alternatives you've considered
As of Presidio
2.2.359on June 16, 2026, the published recognizer/docs surface suggests several country-specific suffix patterns:_PASSPORT: used by India and Italy; this suffix is for passport identifiers, and Taiwan seems possible, but public validation logic looks weaker than the candidates above._IDENTITY_CARD: used by Italy; this suffix is for national identity-card style document numbers, but Taiwan may still be better modeled as a country-specific personal identifier._BANK_NUMBER: used by the United States; this suffix is for banking identifiers, and Taiwan does not seem like a good first fit because I have not confirmed a clear low-false-positive validation rule._DRIVER_LICENSE: used by Italy and the United States; this suffix is for driver's license identifiers, and Taiwan may be possible, but I have not confirmed a stable public validation approach suitable for a first contribution._MEDICARE,_MBI,_NPI,_NHS: used by Australia, the United States, and the United Kingdom; these suffixes are for healthcare-related identifiers, and while Taiwan has healthcare identifiers, I have not yet confirmed a clearly suitable public validation rule for a safe first contribution._NINO,_PESEL,_AADHAAR,_PAN,_UEN: used by the United Kingdom, Poland, India, and Singapore; these are country-specific identifier systems without a direct Taiwan counterpart.Taiwan also has a business registration identifier commonly called "統一編號" or "統編". Its 8-digit validation logic appears to be public and deterministic, but I have not yet found a clearly matching suffix already used across other countries in Presidio, so I think it should remain under maintainer discussion before proposing a final entity name.
Additional context
References: