This repository was archived by the owner on Jan 28, 2021. It is now read-only.
This repository was archived by the owner on Jan 28, 2021. It is now read-only.
Support for regexp_matches #756
Closed
Description
Currently when there isn't a parser available for a file type, it would be very useful to be able to do substring extractions based on regular expressions.
The use case I was trying was to build a table of base images use in Dockerfiles. In Postgres I would use the following function for this.
regexp_matches(string text,pattern text [, flags text]) | setof text[] | Return all captured substrings resulting from matching a POSIX regular expression against the string. See Section 9.7.3 for more information. | regexp_matches('foobarbequebaz', '(bar)(beque)') | {bar,beque}
source: https://www.postgresql.org/docs/9.1/functions-string.html
In gitbase I can imagine this working like this:
SELECT
r.repository_id AS repo,
c.committer_when AS date,
file_path AS path,
EXPLODE(REGEXP_MATCHES( f.blob_content,'FROM ([^\s]+)')) AS base_images
FROM
refs AS r
NATURAL JOIN commits AS c
NATURAL JOIN commit_files AS cm
NATURAL JOIN files AS f
WHERE r.ref_name = 'HEAD'
AND f.file_path REGEXP('Dockerfile')
AND NOT IS_BINARY(f.blob_content)
AND f.blob_size < 1000000
AND f.file_path NOT REGEXP 'vendor.*'
AND f.blob_content REGEXP 'FROM.*'