Skip to content

compliance/sancation_list_update_centralization #45

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 63 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
f8dc107
compliance/sancation_list_update_centralization
mat-fs Sep 13, 2022
1d178ec
[ci] redis functionality added
mat-fs Sep 13, 2022
0508281
[ci] typo and minor bugs - tests pass
mat-fs Sep 13, 2022
baefd84
[ci] json module added
mat-fs Sep 13, 2022
e39cedb
[ci] singleton object implemented
mat-fs Sep 13, 2022
10838b4
[ci] comment added about singleton cunstructor
mat-fs Sep 13, 2022
9b5a4ad
[ci] comments + test script fix
mat-fs Sep 13, 2022
4a9584b
[ci] version updated
mat-fs Sep 13, 2022
3db7903
[ci] version fixed
mat-fs Sep 13, 2022
d38f513
[ci] two new subtests
mat-fs Sep 14, 2022
1aa9ffb
[ci] tests finished
mat-fs Sep 15, 2022
0367572
[ci] minor fixes
mat-fs Sep 15, 2022
7c37288
[ci] verified time restored from redis
mat-fs Sep 15, 2022
f0b53a1
cleanup + load verified data
mat-fs Sep 15, 2022
0604ffe
trigger tests [ci] 2022-09-15 12:41:48
mat-fs Sep 15, 2022
2e8a508
[ci] just one redis arg + fix the failure for the missing sanction file
mat-fs Oct 2, 2022
08f8b6a
Merge branch 'compliance/sancation_list_update_centralization' of git…
mat-fs Oct 2, 2022
e483298
[ci] test failed
mat-fs Oct 2, 2022
34804c5
[ci] code cleanup
mat-fs Oct 2, 2022
59a1435
[ci] export data added
mat-fs Oct 2, 2022
3a3e746
trigger tests [ci] 2022-10-03 10:39:05
mat-fs Oct 3, 2022
06ee215
[ci] factory pattern
mat-fs Oct 3, 2022
3d11d82
Merge branch 'compliance/sancation_list_update_centralization' of git…
mat-fs Oct 3, 2022
e002113
[ci] trigger
mat-fs Oct 3, 2022
7d95b0c
Update Redis.pm
mat-fs Oct 9, 2022
2d7863f
trigger tests [ci] 2022-10-18 22:44:55
mat-fs Oct 18, 2022
414d9b9
trigger tests [ci] 2022-10-19 05:56:19
mat-fs Oct 19, 2022
653e6c2
[ci] load data is moved to the parent class
mat-fs Oct 19, 2022
81c2c35
[ci] test files are renamed + error load and save error is fixed
mat-fs Oct 19, 2022
6fad98a
[ci] error message cleanup
mat-fs Oct 19, 2022
9665904
trigger tests [ci] 2022-10-19 13:57:35
mat-fs Oct 19, 2022
4a67464
trigger tests [ci] 2022-10-19 14:28:36
mat-fs Oct 19, 2022
598cabd
[ci] missing values initialized with default values + new test for lo…
mat-fs Oct 20, 2022
2434af8
trigger tests [ci] 2022-10-20 15:51:18
mat-fs Oct 20, 2022
64b6646
trigger tests [ci] 2022-10-23 16:01:55
mat-fs Oct 23, 2022
8603eda
trigger tests [ci] 2022-10-23 16:04:17
mat-fs Oct 23, 2022
df4777f
trigger tests [ci] 2022-10-28 17:45:29
mat-fs Oct 28, 2022
86b0749
trigger tests [ci] 2022-10-28 19:01:35
mat-fs Oct 28, 2022
831663e
trigger tests [ci] 2022-10-30 10:42:05
mat-fs Oct 30, 2022
35a5f66
trigger tests [ci] 2022-11-01 08:26:31
ragheb-deriv Nov 1, 2022
0ede5ef
add MockTime module to cpanfile [ci]
ragheb-deriv Nov 1, 2022
8ab564d
added redis docker image for circleci [ci]
ragheb-deriv Nov 2, 2022
be9cc9f
added apt instal redis-server and remved image[ci]
ragheb-deriv Nov 2, 2022
bdf450d
wip [ci]
ragheb-deriv Nov 2, 2022
d561505
added docker run step [ci]
ragheb-deriv Nov 2, 2022
c5dae72
fix lint issues [ci]
ragheb-deriv Nov 2, 2022
5242ccc
add docker client [ci]
ragheb-deriv Nov 2, 2022
b36e31d
wip [ci]
ragheb-deriv Nov 2, 2022
a6f4f50
wip [ci]
ragheb-deriv Nov 2, 2022
d2779d9
refactor [ci]
ragheb-deriv Nov 2, 2022
25ad2b0
bug fix [ci]
ragheb-deriv Nov 2, 2022
0624ae1
bug fix [ci]
ragheb-deriv Nov 2, 2022
a00ff76
make tidy and fixed lint errors [ci]
ragheb-deriv Nov 2, 2022
86c5f99
added no critic to pod SYNOPSIS [ci]
ragheb-deriv Nov 2, 2022
fad04a3
bug fix [ci]
ragheb-deriv Nov 2, 2022
5a5b28c
fixed eol errors [ci]
ragheb-deriv Nov 2, 2022
b02ef49
trigger tests [ci] 2022-11-03 06:45:58
ragheb-deriv Nov 3, 2022
dd463f1
trigger tests [ci] 2022-11-03 14:11:28
ragheb-deriv Nov 3, 2022
8ece397
trigger tests [ci] 2022-11-07 06:35:24
ragheb-deriv Nov 7, 2022
d65d0b9
trigger tests [ci] 2022-11-07 07:55:58
ragheb-deriv Nov 7, 2022
bbb52af
removed redis-derive from ci [ci]
ragheb-deriv Nov 7, 2022
f8b1b00
trigger tests [ci] 2022-11-07 08:53:50
ragheb-deriv Nov 7, 2022
a4639c1
trigger tests [ci] 2022-11-09 10:32:31
ragheb-deriv Nov 9, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,10 @@ jobs:
- image: perldocker/perl-tester:<< parameters.perl-version >>
steps:
- checkout
- run:
name: Install Redis server
command: |
apt-get install -y redis
- run:
command:
cpm install -g --no-test Dist::Zilla Dist::Zilla::App::Command::cover ExtUtils::MakeMaker
Expand Down
1 change: 1 addition & 0 deletions Changes
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
Revision history for Data-Validate-Sanctions

{{$NEXT}}

0.13 2022-07-26 13:55:00 CST
Improving the search for larger sanction lists

Expand Down
6 changes: 6 additions & 0 deletions cpanfile
Original file line number Diff line number Diff line change
Expand Up @@ -19,14 +19,20 @@ requires 'Getopt::Long', '2.42';
requires 'Syntax::Keyword::Try', '0.18';
requires 'Locale::Country', '3.66';
requires 'Text::Trim', 0;
requires 'JSON::MaybeUTF8', 0;
requires 'Clone', 0;

on test => sub {
requires 'Test::More', '0.96';
requires 'Test::Warn', '0.23';
requires 'Test::Warnings', '0.026';
requires 'Test::MockModule', '0.15';
requires 'Test::MockObject', '1.20161202';
requires 'Test::MockTime';
requires 'Test::Deep', '0';
requires 'FindBin', '0';
requires 'Path::Tiny', '0';
requires 'Class::Unload', '0';
requires 'Test::RedisServer', '0.23';
requires 'RedisDB', '2.57';
};
90 changes: 67 additions & 23 deletions lib/Data/Validate/Sanctions.pm
Original file line number Diff line number Diff line change
Expand Up @@ -9,26 +9,33 @@ our @EXPORT_OK = qw/is_sanctioned set_sanction_file get_sanction_file/;

use Carp;
use Data::Validate::Sanctions::Fetcher;
use Data::Validate::Sanctions::Redis;
use File::stat;
use File::ShareDir;
use YAML::XS qw/DumpFile LoadFile/;
use YAML::XS qw/DumpFile LoadFile/;
use Scalar::Util qw(blessed);
use Date::Utility;
use Data::Compare;
use List::Util qw(any uniq max min);
use Locale::Country;
use Text::Trim qw(trim);
use Clone qw(clone);

our $VERSION = '0.13';
our $VERSION = '0.14';

my $sanction_file = _default_sanction_file();
my $sanction_file;
my $instance;

# for OO
sub new { ## no critic (RequireArgUnpacking)
my ($class, %args) = @_;

my $storage = delete $args{storage} // '';

return Data::Validate::Sanctions::Redis->new(%args) if $storage eq 'redis';

my $self = {};

$self->{sanction_file} = $args{sanction_file} // _default_sanction_file();

$self->{args} = {%args};
Expand All @@ -43,18 +50,27 @@ sub update_data {
$self->_load_data();

my $new_data = Data::Validate::Sanctions::Fetcher::run($self->{args}->%*, %args);

my $updated;
my $updated = 0;
foreach my $k (keys %$new_data) {
$self->{_data}->{$k} //= {};
$self->{_data}->{$k}->{updated} //= 0;
$self->{_data}->{$k}->{content} //= [];
if ($self->{_data}{$k}->{updated} != $new_data->{$k}->{updated}

if (!$new_data->{$k}->{error} && $self->{_data}->{$k}->{error}) {
delete $self->{_data}->{$k}->{error};
$updated = 1;
}

if ($new_data->{$k}->{error}) {
warn "$k list update failed because: $new_data->{$k}->{error}";
$self->{_data}->{$k}->{error} = $new_data->{$k}->{error};
$updated = 1;
} elsif ($self->{_data}{$k}->{updated} != $new_data->{$k}->{updated}
|| scalar $self->{_data}{$k}->{content}->@* != scalar $new_data->{$k}->{content}->@*)
{
print "Source $k is updated with new data \n" if $args{verbose};
$self->{_data}->{$k} = $new_data->{$k};
$updated = 1;
print "Source $k is updated with new data \n" if $args{verbose};
} else {
print "Source $k is not changed \n" if $args{verbose};
}
Expand All @@ -76,7 +92,7 @@ sub last_updated {
return $self->{_data}->{$list}->{updated};
} else {
$self->_load_data();
return max(map { $_->{updated} } values %{$self->{_data}});
return max(map { $_->{updated} // 0 } values %{$self->{_data}});
}
}

Expand All @@ -87,6 +103,7 @@ sub set_sanction_file { ## no critic (RequireArgUnpacking)
}

sub get_sanction_file {
$sanction_file //= _default_sanction_file();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be better to use state here ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in some cases, the state declaration does not apply because we are using the internal scope of the variable in the set_sanction_file and the get_sanction_file subroutines.

return $instance ? $instance->{sanction_file} : $sanction_file;
}

Expand All @@ -103,6 +120,14 @@ sub is_sanctioned { ## no critic (RequireArgUnpacking)
return (get_sanctioned_info(@_))->{matched};
}

sub data {
my ($self) = @_;

$self->_load_data() unless $self->{_data};

return $self->{_data};
}

=head2 _match_other_fields

Matches fields possibly available in addition to name and date of birth.
Expand Down Expand Up @@ -173,9 +198,12 @@ It returns a hash-ref containg the following data:
=over 4

=item - matched: 1 if a match was found; 0 otherwise
list: the source for the matched entry,
matched_args: a name-value hash-ref of the similar arguments,
comment: additional comments if necessary,

=item - list: the source for the matched entry,

=item - matched_args: a name-value hash-ref of the similar arguments,

=item - comment: additional comments if necessary,

=back

Expand All @@ -184,7 +212,7 @@ It returns a hash-ref containg the following data:
sub get_sanctioned_info { ## no critic (RequireArgUnpacking)
my $self = blessed($_[0]) ? shift : $instance;
unless ($self) {
$instance = __PACKAGE__->new(sanction_file => $sanction_file);
$instance = __PACKAGE__->new(sanction_file => get_sanction_file());
$self = $instance;
}

Expand Down Expand Up @@ -219,7 +247,7 @@ sub get_sanctioned_info { ## no critic (RequireArgUnpacking)
# and deduplicate the list
my $filtered_sanctioned_names = {};
foreach my $token (@client_name_tokens) {
foreach my $name ( keys %{$self->{_token_sanctioned_names}->{$token}}) {
foreach my $name (keys %{$self->{_token_sanctioned_names}->{$token}}) {
$filtered_sanctioned_names->{$name} = 1;
}
}
Expand Down Expand Up @@ -286,12 +314,12 @@ sub get_sanctioned_info { ## no critic (RequireArgUnpacking)
}

sub _load_data {
my $self = shift;
my $sanction_file = $self->{sanction_file};
$self->{last_time} //= 0;
$self->{_data} //= {};
$self->{_sanctioned_name_tokens} //= {};
$self->{_token_sanctioned_names} //= {};
my $self = shift;
my $sanction_file = $self->{sanction_file};
$self->{last_time} //= 0;
$self->{_data} //= {};
$self->{_sanctioned_name_tokens} //= {};
$self->{_token_sanctioned_names} //= {};

if (-e $sanction_file) {
return $self->{_data} if stat($sanction_file)->mtime <= $self->{last_time} && $self->{_data};
Expand All @@ -303,8 +331,8 @@ sub _load_data {
foreach my $sanctioned_name (keys $self->{_index}->%*) {
my @tokens = _clean_names($sanctioned_name);
$self->{_sanctioned_name_tokens}->{$sanctioned_name} = \@tokens;
foreach my $token (@tokens){
$self->{_token_sanctioned_names}->{$token}->{$sanctioned_name}=1;
foreach my $token (@tokens) {
$self->{_token_sanctioned_names}->{$token}->{$sanctioned_name} = 1;
}
}

Expand All @@ -320,10 +348,11 @@ Indexes data by name. Each name may have multiple matching entries.
sub _index_data {
my $self = shift;

$self->{_data} //= {};
$self->{_index} = {};
for my $source (keys $self->{_data}->%*) {
my @content = ($self->{_data}->{$source}->{content} // [])->@*;
warn "Content is empty for the sanction source $source. The sanctions file should be updated." unless @content;
my @content = clone($self->{_data}->{$source}->{content} // [])->@*;

for my $entry (@content) {
$entry->{source} = $source;
for my $name ($entry->{names}->@*) {
Expand Down Expand Up @@ -392,6 +421,12 @@ sub _name_matches {
return 0;
}

sub export_data {
my ($self, $path) = @_;

return DumpFile($path, $self->{_data});
}

1;
__END__

Expand Down Expand Up @@ -487,6 +522,15 @@ set sanction_file which is used by L</is_sanctioned> (procedure-oriented)

Pass in the client's name and sanctioned individual's name to see if they are similar or not


=head2 export_data

Exports the sanction lists to a local file in YAML format.

=head2 data

Gets the sanction list content with lazy loading.

=head1 AUTHOR

Binary.com E<lt>fayland@binary.comE<gt>
Expand Down
34 changes: 17 additions & 17 deletions lib/Data/Validate/Sanctions/Fetcher.pm
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,15 @@ use warnings;
use DateTime::Format::Strptime;
use Date::Utility;
use IO::Uncompress::Unzip qw(unzip $UnzipError);
use List::Util qw(uniq any);
use List::Util qw(uniq any);
use Mojo::UserAgent;
use Text::CSV;
use Text::Trim qw(trim);
use Syntax::Keyword::Try;
use XML::Fast;
use Locale::Country;

our $VERSION = '0.10';
# VERSION

=head2 config

Expand Down Expand Up @@ -234,7 +234,7 @@ sub _ofac_xml {
$ref->{publshInformation}{Publish_Date} =~ m/(\d{1,2})\/(\d{1,2})\/(\d{4})/
? _date_to_epoch("$3-$1-$2")
: undef; # publshInformation is a typo in ofac xml tags
die 'Publication date is invalid' unless defined $publish_epoch;
die "Corrupt data. Release date is invalid\n" unless defined $publish_epoch;

my $parse_list_node = sub {
my ($entry, $parent, $child, $attribute) = @_;
Expand Down Expand Up @@ -301,16 +301,16 @@ sub _hmt_csv {
my $raw_data = shift;
my $dataset = [];

my $csv = Text::CSV->new({binary => 1}) or die "Cannot use CSV: " . Text::CSV->error_diag();
my $csv = Text::CSV->new({binary => 1}) or die "Cannot use CSV: " . Text::CSV->error_diag() . "\n";

my @lines = split("\n", $raw_data);

my $parsed = $csv->parse(trim(shift @lines));
my @info = $parsed ? $csv->fields() : ();
die 'Publication date was not found' unless @info && _date_to_epoch($info[1]);
die "Currupt data. Release date was not found\n" unless @info && _date_to_epoch($info[1]);

my $publish_epoch = _date_to_epoch($info[1]);
die 'Publication date is invalid' unless defined $publish_epoch;
die "Currupt data. Release date is invalid\n" unless defined $publish_epoch;

$parsed = $csv->parse(trim(shift @lines));
my @row = $csv->fields();
Expand Down Expand Up @@ -342,7 +342,7 @@ sub _hmt_csv {
# Fields to be added in the new file format (https://redmine.deriv.cloud/issues/51922)
# We can read these fields normally after the data is released in the new format
my ($passport_no, $non_latin_alias);
$passport_no = $row[$column{'Passport Number'}] if defined $column{'Passport Number'};
$passport_no = $row[$column{'Passport Number'}] if defined $column{'Passport Number'};
$non_latin_alias = $row[$column{'Name Non-Latin Script'}] if defined $column{'Name Non-Latin Script'};

_process_sanction_entry(
Expand Down Expand Up @@ -393,7 +393,7 @@ sub _eu_xml {
my @place_of_birth = map { $_->{'-countryIso2Code'} || () } $entry->{birthdate}->@*;
my @citizen = map { $_->{'-countryIso2Code'} || () } $entry->{citizenship}->@*;
my @residence = map { $_->{'-countryIso2Code'} || () } $entry->{address}->@*;
my @postal_code = map { $_->{'-zipCode'} || $_->{'-poBox'} || () } $entry->{address}->@*;
my @postal_code = map { $_->{'-zipCode'} || $_->{'-poBox'} || () } $entry->{address}->@*;
my @nationality = map { $_->{'-countryIso2Code'} || () } $entry->{identification}->@*;
my @national_id = map { $_->{'-identificationTypeCode'} eq 'id' ? $_->{'-number'} || () : () } $entry->{identification}->@*;
my @passport_no = map { $_->{'-identificationTypeCode'} eq 'passport' ? $_->{'-number'} || () : () } $entry->{identification}->@*;
Expand All @@ -415,7 +415,7 @@ sub _eu_xml {
my @date_parts = split('T', $ref->{'-generationDate'} // '');
my $publish_epoch = _date_to_epoch($date_parts[0] // '');

die 'Publication date is invalid' unless $publish_epoch;
die "Corrupt data. Release date is invalid\n" unless $publish_epoch;

return {
updated => $publish_epoch,
Expand All @@ -440,7 +440,7 @@ sub run {
foreach my $id (sort keys %$config) {
my $source = $config->{$id};
try {
die "Url is empty for $id" unless $source->{url};
die "Url is empty for $id\n" unless $source->{url};

my $raw_data;

Expand All @@ -461,8 +461,8 @@ sub run {
my $count = $data->{content}->@*;
print "Source $id: $count entries fetched \n" if $args{verbose};
}
} catch {
warn "$id list update failed because: $@";
} catch ($e) {
$result->{$id}->{error} = $e;
}
}

Expand All @@ -480,7 +480,7 @@ sub _entries_from_file {

my $entries;

open my $fh, '<', "$1" or die "Can't open $id file $1 $!";
open my $fh, '<', "$1" or die "Can't open $id file $1 $!\n";
$entries = do { local $/; <$fh> };
close $fh;

Expand Down Expand Up @@ -513,16 +513,16 @@ sub _entries_from_remote_src {
try {
my $resp = $ua->get($src_url);

die "File not downloaded for $id" if $resp->result->is_error;
die "File not downloaded for $id\n" if $resp->result->is_error;
$entries = $resp->result->body;

last;
} catch {
$error_log = $@;
} catch ($e) {
$error_log = $e;
}
}

return $entries // die "An error occurred while fetching data from '$src_url' due to $error_log";
return $entries // die "An error occurred while fetching data from '$src_url' due to $error_log\n";
}

1;
Loading