Skip to content

Incorrect identification for JS content starting with comments #86

Closed
@tzabaman

Description

@tzabaman

I am opening this issue for discussion due to some inconsistent behaviour, coming from ActiveStorage #identify method...

There is an inconsistent behaviour in method Marcel::MimeType.for concerning (at least) javascript files starting with comments and given specific declared types (text/javascript or application/x-javascript). For example:

Marcel::MimeType.for("//Comment\nconsole.log('test');", name: 'sample.js', declared_type: 'application/javascript')
# => application/javascript --> Correct

Marcel::MimeType.for("console.log('test');", name: 'sample.js', declared_type: 'text/javascript')
# => text/javascript --> Correct

Marcel::MimeType.for("//Comment\nconsole.log('test');", name: 'sample.js', declared_type: 'text/javascript')
# => text/plain --> Inconsistent (Incorrect?)

Marcel::MimeType.for("//Comment\nconsole.log('test');", name: 'sample.js', declared_type: 'application/x-javascript')
# => text/plain --> Inconsistent (Incorrect?)

Of course, the explanation is that on (generated) file lib/marcel/tables.rb there are definitions only for type application/javascript so the above behaviour is somehow expected.

The thing is that on file data/tika.xml, we have reference for types text/javascript and application/x-javascript, as they are described as "alias" of application/javascript.

marcel/data/tika.xml

Lines 335 to 340 in 8e28563

<mime-type type="application/javascript">
<alias type="application/x-javascript"/>
<alias type="text/javascript"/>
<sub-class-of type="text/plain"/>
<_comment>JavaScript Source Code</_comment>
<glob pattern="*.js"/>

So my question is, should we change the script/generate_tables.rb script in order to also take into consideration the aliases, when adding values on constants TYPE_EXTS and TYPE_PARENTS?? Something like:

TYPE_EXTS = {
  ...
  'application/javascript' => %w(js),
  'application/x-javascript' => %w(js),
  'text/javascript' => %w(js),
  ...
}

TYPE_PARENTS = {
    ...
    'application/javascript' => %w(text/plain),
    'application/x-javascript' => %w(text/plain),
    'text/javascript' => %w(text/plain),
    ...
}

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions