Skip to content
This repository was archived by the owner on Jan 28, 2021. It is now read-only.
This repository was archived by the owner on Jan 28, 2021. It is now read-only.

Support for regexp_matches #756

Closed
@eiso

Description

@eiso

Currently when there isn't a parser available for a file type, it would be very useful to be able to do substring extractions based on regular expressions.

The use case I was trying was to build a table of base images use in Dockerfiles. In Postgres I would use the following function for this.

regexp_matches(string text,pattern text [, flags text]) | setof text[] | Return all captured substrings resulting from matching a POSIX regular expression against the string. See Section 9.7.3 for more information. | regexp_matches('foobarbequebaz', '(bar)(beque)') | {bar,beque}

source: https://www.postgresql.org/docs/9.1/functions-string.html

In gitbase I can imagine this working like this:

SELECT
    r.repository_id AS repo,
    c.committer_when AS date,
    file_path AS path,
    EXPLODE(REGEXP_MATCHES( f.blob_content,'FROM ([^\s]+)')) AS base_images
FROM
    refs AS r
    NATURAL JOIN commits AS c
    NATURAL JOIN commit_files AS cm
    NATURAL JOIN files AS f
WHERE r.ref_name = 'HEAD'
    AND f.file_path REGEXP('Dockerfile')
    AND NOT IS_BINARY(f.blob_content)
    AND f.blob_size < 1000000
    AND f.file_path NOT REGEXP 'vendor.*'
    AND f.blob_content REGEXP 'FROM.*'

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions