Skip to content

The problem about how to run the baseline start. #5

@Aoidayo

Description

@Aoidayo

Hello Lin Yan,

Recently, I started researching UniTE and encountered several issues while trying to run the start module:

  1. It seems that the start module cannot use gatEncoding. The road network data required by GAT does not appear to be included in the h5 file.
  2. Additionally, parameters such as gat_num_features, gat_num_heads, and node_fea_dim passed to the bertEncoder function in the start model are not being utilized, as they all default to their initial values.
  3. Furthermore, in the Bert class, the line self.device = self.config.get('device', torch.device('cpu')) is duplicated twice. It feels like there are several minor bugs in start.py.
  4. While attempting to write a YAML configuration to test the Chengdu dataset, I encountered numerous errors. For instance, num_roads seems to be hardcoded as 2505 in many parts of the code (I’m not entirely sure, as I just started trying to understand your code this week).
  5. I want to figure out how to use MultiTrainer in start, i have tried to run data.py with mlm-18, trim-0.15, shift-0.15 here is my test yaml, i don't know how to tackle this yaml's problem.
  6. In the same time, the mlm in data.py, Initially I thought the sample-rate was a proportion, so I set it to 0.3, and as a result, everything was masked by 2505. Later, after reviewing the code, I realized it was actually seconds. Haha!
- repeat: 1
  data:
    name: chengdu
    meta:
      - type:
          - trip
          - mlm-18
          - trim-0.15
          - shift-0.15
  models:
    - name: bert
      config:
        # 1. 必须参数
        d_model: 256
        dis_feats: [1] # road
        num_embeds: [2505]
        # road_prop
        con_feats: [2]
        # 2. bert所需参数
        hidden_size: 256
        num_layers: 6
        num_heads: 8
        output_size: 256
        road_feat: [1] # road
        token_feat: [8] # data type : mlm 加上的token列
        add_gat: False
        # 关于gat的配置,可能不需要填写
        # gat_num_features: unuse
        # gat_num_heads: unuse
        gat_dropout: 0.1
        # 3. data_feature
        # vocab_size: 2505 # num_roads
        # node_fea_dim其实没用上
        # node_fea_dim: 9 # with token
      preprocessor:
        name: pass
  pretrain:
    load: False
    loss:
      # Reconstruction + Contrastive
      - name: mlm
        config:
          out_dis:
            feats: [1]
            num_embeds: [2505]
          out_con_feats: [2]
          latent_size: 256
          con_weight: 1.0
          dis_weight: 1.0
      - name: simclr
        config:
          embed_dim: 256
          similarity: inner
          temperature: 0.05
    trainer:
      name: multiple
      config:
        # basic config
        num_epoch: 30
        batch_size: 64
        lr: 2.0e-4
        # MultiTrainer's config
        meta_types: [trip,mlm-18,trim-0.15,shift-0.15]
        loss_coef: [0,1] # unuse i don;t know its purpose
        contra_meta_i: [2,3] # trim, shift
        gen_enc_meta_i: [1] # mlm-15
        gen_rec_meta_i: [0] # trip
  downstream:
    # Destination prediction task
    - task: destination
      # Use first model (encoder) for prediction
      select_models: [ 0 ]
      # Use test set for evaluation
      eval_set: 2
      config:
        # Number of points to use for prediction
        pre_length: 1
        # Whether to fine-tune pre-trained model
        finetune: true
        num_epoch: 20
        batch_size: 64
        save_prediction: false
        lr: 2.0e-4
        # Early stopping patience
        es_epoch: 10
        meta_types:
          - trip
        # Meta feature indices for encoder and labels
        enc_meta_i: [ 0 ]
        label_meta_i: [ 0 ]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions