Skip to content

[Bug] [microTVM] Truncated names of fused operators exceed CRT graph executor buffer lengths #8953

@PhilippvK

Description

@PhilippvK

I realized that two models (int8 quantized vww and aww of the MLPerf Tiny benchmark) which have worked fine using the graph executor a few month ago, stopped working with the latest TVM version.

Expected behavior

Running the builded models (on x86 for simplicity) via the CRT graph runtime should not result in any error messages, segmentation faults or other crashes.

Actual behavior

The following is printed to the terminal during the Initialization of the graph executor (e.g. while parsing the JSON graph):

Error: string size greater than buffer size (120).
Error at line 331, JSON object expect '}' or ',' but got '"'
Error at line 331, Expect '"' but reach end of line
Error at line 331, Expect ':' but get '"'
do not support key  
invalid format
failed to load an element in `nodes` field in graph executor node.
invalid format
Segmentation fault (core dumped)

The actual error is the the first line while the others just seem to be consequences of the previous problem.

Investigations

The big question to me was why is this only happening for these too models? As the error message is about string sizes I figured out, that this is related to the char func_name[120]; array in the struct TVMOpParam. (See include/tvm/runtime/crt/graph_executor.h)

Looking at the graph.json file I found out that there are in fact function names with a length slightly larger than 120 characters. I also get rid of the error temporarily by limiting the maximum number of fused ops (using relay.FuseOps.max_depth) which results in shorter function names.

Here is the code used by TVM to generate unique truncated function names by appending a hash:

    readable_name_stream_ << "fused";
    auto outputs = this->VisitExpr(prim_func->body);
    auto candidate_name = readable_name_stream_.str();
    constexpr static size_t kMaxFuncNameLength = 80;
    if (candidate_name.size() > kMaxFuncNameLength) {
      std::stringstream truncated_name;
      truncated_name << candidate_name.substr(0, kMaxFuncNameLength);
      truncated_name << "_" << std::hash<std::string>{}(candidate_name) << "_";
      candidate_name = truncated_name.str();
    }

Then I did the following calculation based on this (example) JSON func_name entry: tvmgen_default_fused_nn_conv2d_add_cast_multiply_add_right_shift_cast_add_clip_cast_clip_cast_s_9959535092109263429__2

120 (max allowed function length)
- 80 (truncated original function name)
- 19 (length of appended hash)
- 5 ('fused' prefix)
- 4 (total number of '_' characters)
=  12 (remaining number of characters for the model name INCLUDING the 'tvmgen_' prefix which requires another 7 characters)

This leads to the conclusion that the mod_name passed to the relay.build() function needs to have at most 5 characters which is suboptimal, especially considering that the default name seems to the default (e.g. tvmgen_default).

I suppose that the easiest possible fixes would be either

  • increasing the size of the aforementioned func_name[] array by a few characters
  • or reducing kMaxFuncNameLength by some characters.

In addition it might be good to limit the length of the model name which can be specified by the user to make sure that this does not break again for too long model names in the future.

Why does this only happen for a few models? - Most operator/function names afer the FuseOps transformation do not exceed the kMaxFuncNameLength and therefore do not need to be truncated.

Environment

  • OS: Ubuntu 20.04
  • Architecture: x86_64
  • Python version: 3.8.10
  • TVM version: based on commit 9b034d7 from yesterday
  • Target: MicroTVM (Standalone) using CRT graph executor/runtime (apps/bundle_deploy/bundle_static.c)
  • Script: See next section!

Steps to reproduce

Here is a branch I set up to reproduce this issue using the aforementioned TFLite models:

Step-by-Step instructions for reproducing the issue can be found here.

There is also a GitHub Action which runs the demonstration of the bug. The log output can be investigated here: https://github.com/PhilippvK/tvm/runs/3531514621?check_suite_focus=true

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions