Skip to content

IPV6 recognizer not working properly #907

Closed

Description

I was trying to use presidio to identify and remove IP addresses, and I ran into the following issue. It was recognizing '::' as a string containing an IP address, and '2345:0425:2CA1:0000:0000:0567:5673:23b5' was not being recognized as an IP address. I ran a couple of tests as follows:

analyzer = AnalyzerEngine()

results = analyzer.analyze(text='::',
        entities=['IP_ADDRESS'],
        language='en')
print(results)

results2 = analyzer.analyze(text='2345:0425:2CA1:0000:0000:0567:5673:23b5',
        entities=['IP_ADDRESS'],
        language='en')
print(results2)


results3 = analyzer.analyze(text='2345:0425:2CA1::0567:5673:23b5',
        entities=['IP_ADDRESS'],
        language='en')
print(results3)

Output:

[type: IP_ADDRESS, start: 0, end: 2, score: 0.6]
[]
[type: IP_ADDRESS, start: 13, end: 30, score: 0.6]

This made it seem like it is just identifying an IPV6 address as any element that contains two consecutive colons. I then checked the source code, and found this in the tests:

# IPv6 tests TODO IPv6 regex needs to be fixed

Can the IPv6 regex be fixed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions