TAB-delimited text parser and serializer for ABAP
changelog
Text2tab is an utility to parse TAB-delimited text into an internal table of an arbitrary flat structure.
- supports "non-strict" mode which allows to skip fields in the source data (for the case when only certain fields are being loaded).
- supports "header" specification as the first line in the text - in this case field order in the text may differ from the internal abap structure field order.
- supports loading into a structure (the first data line of the text is parsed).
- supports type-less parsing, when the data is not checked against existing structure but dynamically create as a table with string fields.
- supports specifying date and amount formats
- supports on-the-fly field name remapping (e.g. field
FOO
in the parsed text move toBAR
component of the target internal table) - supports "deep" parsing - filling structure or table components in the target data container
- supports "corresponding" parsing - filling only those fields which are in target structure. Kind of opposite to "non-strict" feature above.
And vice versa - serialize flat table or structure to text.
- support specifying date and amount formats, and line-break symbol
The package also contains 2 examples:
ZTEXT2TAB_EXAMPLE
- simple parsing codeZTEXT2TAB_BACKUP_EXAMPLE
- example of DB table backup with serializer
You can install the whole code using abapGit tool (recommended way). Alternatively, you can also copy content of *.clas.abap
to your program (please keep the homepage reference and license text).
The tool is open source and distributed under MIT license. It was initially created as a part of another project - mockup loader - but then separated as an independent multi-usage tool.
Source text file (CRLF as a line delimiter, TAB as a field delimiter)
NAME BIRTHDATE
ALEX 01.01.1990
JOHN 02.02.1995
LARA 03.03.2000
Simple parsing code
types:
begin of my_table_type,
name type char10,
birthdate type datum,
end of my_table_type.
data lt_container type my_table_type.
zcl_text2tab_parser=>create( lt_container )->parse(
exporting
i_data = my_get_some_raw_text_data( )
importing
e_container = lt_container ).
... or a more complex one ...
types:
begin of my_table_type,
name type char10,
city type char40, " <<< New field, but the text still contains just 2 !
birthdate type datum,
end of my_table_type.
...
zcl_text2tab_parser=>create(
i_pattern = lt_container " table or structure
i_amount_format = ' .' " specify thousand and decimal delimiters
)->parse(
exporting
i_data = my_get_some_raw_text_data( )
i_strict = abap_false " text may contain not all fields (city field will be skipped)
i_has_head = abap_true " headers in the first line of the text
importing
e_container = lt_container ). " table or structure (first data line from text)
Of course, you can keep the object reference returned by create()
and use it to parse more data of the same pattern.
Change target field name on-the-fly. (Useful e.g. for data migration)
* Input text to parse:
* FULL_NAME BIRTHDATE
* ALEX 01.01.1990
* JOHN 02.02.1995
data lt_map type zcl_text2tab_parser=>tt_field_name_map.
field-symbols <map> like line of lt_map.
append initial line to lt_map assigning <map>.
<map>-from = 'Full_name'.
<map>-to = 'name'.
...
zcl_text2tab_parser=>create( lt_container )->parse(
exporting
i_rename_fields = lt_map " <<<<<<< FIELD MAP
i_data = my_get_some_raw_text_data( )
importing
e_container = lt_container ).
* Output data:
* NAME BIRTHDATE
* ALEX 01.01.1990
* JOHN 02.02.1995
Renames can be also passed as a string in this format: 'field_to_rename:new_name;another_field:another_new_name'
. Coding convenience is important ;)
zcl_text2tab_parser=>create( lt_container )->parse(
exporting
i_rename_fields = 'Full_name:name'
i_data = my_get_some_raw_text_data( )
importing
e_container = lt_container ).
Sometimes your structure contain technical non-flat fields which are not supposed to be in source data but present in the target structure. In example, color settings for ALV. It would be cumbersome to create a separate structure just for data. You can specify i_ignore_nonflat = abap_true
during parser creation so that non-flat components are ignored. The parser will not throw Structure must be flat
error. However, it will still check that ignored fields are not in the source data.
E.g. target structure is 'DATE,CHAR,COLORCOL'
, where colorcol
is a structure. The parser will accept and parse data like
DATE CHAR
01.01.2019 ABC
But will fail on
DATE CHAR COLORCOL
01.01.2019 ABC 123
If you have a target data with deep fields - tables or structures - it is possible to fill them in one run. For this you have to create a data source provider implementing zif_text2tab_deep_provider
and pass it to the parser. An example of implementation can be found in mockup_loader. But let's consider an simple example here.
Let's assume you have 2 linked tables - header and lines
DOCUMENT
========
ID DATE ...
1 ...
2 ...
LINES
========
DOCID LINEID AMOUNT ...
1 1 100.00 ...
1 2 123.00 ...
2 1 990.00 ...
Let's assume you have a target data of the following structure
types:
begin of ty_line,
docid type numc10,
lineid type numc3,
" ...
end of ty_line,
tt_line type table of ty_line,
begin of ty_document,
id type numc10,
" ...
lines type tt_line, " <<< DEEP FIELD, supposed to be filled with lines of the document
end of ty_document,
tt_documents type table of ty_document.
So you can run the parser as follows to parse all at once
zcl_text2tab_parser=>create(
i_pattern = lt_container
i_deep_provider = lo_implementation_of_zif_text2tab_deep_provider
)->parse(
exporting
i_data = my_raw_text_data
importing
e_container = lt_container ).
So what is lo_implementation_of_zif_text2tab_deep_provider
in this example? The parser does not know how to get additional data other than my_raw_text_data
. The implementation of zif_text2tab_deep_provider
should. Maybe you want to get it from a file, which is located somewhere near, or download from web, or just select from a DB table - this is up to your design and need. For example, the mentioned mockup_loader uses the text value in place of "deep" field in order to find the data and parse it. The source file should contain a reference in place of "deep" field to help finding the source data. E.g.
DOCUMENT
========
ID DATE ... LINES
1 ... lines.txt[docid=@id]
2 ... lines.txt[docid=@id]
where mockup_loader interprets lines.txt[docid=@id]
as "go find lines.txt file, parse it, extract the lines, filter those where docid
= id
value of the current header record".
zif_text2tab_deep_provider
requires just one method to implement - select
.
methods select
importing
i_address type string " e.g. filename[key_field_name=123]
i_cursor type any " reference to currently processed data line to fetch field values
exporting
e_container type any.
i_address
is the value of "deep" field in text data (in our example value ofLINES
field -lines.txt[docid=@id]
)i_cursor
is the current record with parsed field values (in our example: id, date, ...)e_container
is the target component of the current record (in our example -LINES
component ofty_document
)
So the implementation of the interface must parse the address and get the "deep" data. You are not limited to the specific address format, however, if you want to follow the format described above (<sourceref>[<sourcekeyfield>=<cursorkeyfield>]
), you can reuse parse_deep_address
and get_struc_field_value_by_name
methods implemented in zcl_text2tab_utils
.
You can also create an instance that does not validate type against some existing type structure. Instead it generates the table dynamically, where each field if the line is unconverted string.
data:
lr_data type ref to data,
lt_fields type string_table.
zcl_text2tab_parser=>create_typeless( )->parse(
exporting
i_data = my_get_some_raw_text_data( )
importing
e_head_fields = lt_fields " Contain the list of field names !
e_container = lr_data ). " The container is created inside the parser
It does actually what it states - move only those field which are the same, example is below.
Source text file
NAME BIRTHDATE
ALEX 01.01.1990
JOHN 02.02.1995
LARA 03.03.2000
types:
begin of my_table_type,
name type char10, " The only corresponding name
_other type string, " "Birth date" is not here but "_other" is
end of my_table_type.
data lt_container type my_table_type.
zcl_text2tab_parser=>create( lt_container )->parse(
exporting
i_data = my_get_some_raw_text_data( )
i_strict = abap_false " corresponding is in fact non-strict, so must be false
i_corresponding = abap_true
importing
e_container = lt_container ).
create
accepts parameters that control expected data formatting:
- i_amount_format - a
char2
value, 1st char specifies thousand delimiter, 2nd - fractional delimiter. E.g.' ,'
- for1 234,56
and also1234,56
,',.'
would expect1,234.56
. - i_date_format - a
char4
value, where first 3 defines the order of day, month and year, and the last is the separator between them. E.g.YMD
for20180901
,DMY-
for01-09-2018
. Supported separators are./-
and empty char.
To do serialization use ZCL_TEXT2TAB_SERIALIZER
class. Flat tables and structures are supported. In case of a structure it is serialized as one-line table.
data lo_serializer type ref to zcl_text2tab_serializer.
lo_serializer = zcl_text2tab_serializer=>create(
" the below params are all optional and have defaults inside
i_decimal_sep = ','
i_date_format = 'DMY-'
i_max_frac_digits = 5 " For floats only ! not for decimals
i_use_lf = abap_true " to use LF as line-break (not CRLF)
).
data lv_string type string.
lv_string = lo_serializer->serialize( lt_some_table ).
The serialize
method also accepts i_fields_only
param - a explicit field list to serialize. You can also pass i_keep_order = abap_true
to keep serilization order according to the fields only list (by default the order is takes from the data structure).
The serializer can also create a tab-delimited string with human readable field descriptions or technical names. E.g. User Name \t Date \t Time
(c_header-descriptions) or UNAME \t DATUM \t UZEIT
(c_header-tech_names). This may be useful to prepend the description before the technical fields. To get such a string use method serialize_header
.
lv_string = lo_serializer->serialize_header(
i_header_type = lo_serializer->c_header-descriptions
i_data = lt_some_table ).
" OR
lv_string = lo_serializer->serialize_header(
i_data = lt_some_table
i_header_type = lo_serializer->c_header-descriptions
i_lang = 'D'
" i_keep_order = abap_true " to respect i_fields_only order
i_fields_only = value#( ( 'UNAME' ) ( 'DATUM' ) ) ).
The feature allows serialization to html table <table><tr>...
. In particular this form is natively supported when pasted to Excel sheet. And in particular it is possible to mark fields as text, so that no auto conversion happen to the values. E.g. "1.1" is auto converted to a date which is not intended.
lv_string = lo_serializer->as_html( i_text_fields = value#( ( 'CODE' ) ) )->serialize( lt_some_table ).
To see the impact of modes, check ZTEXT2TAB_BACKUP_EXAMPLE
- the selection screen allows control over the html/text modes. As well it allows copying the result to the clipboard for pasting to Excel after.
Since version 2.2.0 the text file can contain comment lines. A comment lines begins with one specific char, which is supplied in the factory method create
.
A sample text-file:
* comment line: expected birthdays in test class ...
NAME BIRTHDAY
JOHN 01.01.1990
Now we should call the factory method like this and the first line is interpreted as a comment:
zcl_text2tab_parser=>create( i_pattern = ls_birthday i_begin_comment = '*' ).
The char '*' must have the first position in the text line. Otherwise it isn't interpreted as a comment.
The comment (or rather the description) line can also be auto-detected with i_begin_comment = zif_text2tab=>c_auto_detect_by_space
. It tries to find spaces in the first line of the text and, if found, ignores this first line. The idea is that the descriptions will probably have spaces, while abap field names will not:
Name Date of birth <<< containes spaces
NAME BIRTHDAY
JOHN 01.01.1990
The exception class - zcx_text2tab_error
- exposes structure
, field
, line
and msg
attributes (and some others). They can be used to reformat the message text if needed. For example:
...
catch zcx_text2tab_error into lx. " Reformat to -> Import error at line LINE, field 'FIELD': MSG
l_error_msg = 'Import error'.
if lx->line is not initial.
l_error_msg = |{ l_error_msg } at line { lx->line }|.
endif.
if lx->field is not initial.
l_error_msg = |{ l_error_msg }, field '{ lx->field }'|.
endif.
l_error_msg = |{ l_error_msg }: { lx->msg }|.
raise exception type lcx_my_program_error
exporting msg = l_error_msg.
endtry.
This is supported in parser only at the moment. Serializer does not produce many error on line level.
zif_text2tab_constants
has the version
attribute. And there is a helper method check_version_fits
to check if the text2tab package has the minimal required version.
* Assuming version is 2.1.0
" Returns false, 2.1.2 is required
zcl_text2tab_parser=>check_version_fits( 'v2.1.2' ).
" Returns true, 2.1.0 > 2.0.0
zcl_text2tab_parser=>check_version_fits( 'v2.0.0' ).