The problem about how to run the baseline start.

Hello Lin Yan,

Recently, I started researching UniTE and encountered several issues while trying to run the `start` module:

1. It seems that the `start` module cannot use `gatEncoding`. The road network data required by GAT does not appear to be included in the h5 file.
2. Additionally, parameters such as `gat_num_features`, `gat_num_heads`, and `node_fea_dim` passed to the `bertEncoder` function in the `start` model are not being utilized, as they all default to their initial values.
3. Furthermore, in the `Bert` class, the line `self.device = self.config.get('device', torch.device('cpu'))` is duplicated twice. It feels like there are several minor bugs in `start.py`.
4. While attempting to write a YAML configuration to test the Chengdu dataset, I encountered numerous errors. For instance, `num_roads` seems to be hardcoded as 2505 in many parts of the code (I’m not entirely sure, as I just started trying to understand your code this week).
5. I want to figure out how to use MultiTrainer in start, i have tried to run data.py with mlm-18, trim-0.15, shift-0.15 here is my test yaml, i don't know how to tackle this yaml's problem.  
6. In the same time, the mlm in data.py, Initially I thought the sample-rate was a proportion, so I set it to 0.3, and as a result, everything was masked by 2505. Later, after reviewing the code, I realized it was actually seconds. Haha!
```yaml
- repeat: 1
  data:
    name: chengdu
    meta:
      - type:
          - trip
          - mlm-18
          - trim-0.15
          - shift-0.15
  models:
    - name: bert
      config:
        # 1. 必须参数
        d_model: 256
        dis_feats: [1] # road
        num_embeds: [2505]
        # road_prop
        con_feats: [2]
        # 2. bert所需参数
        hidden_size: 256
        num_layers: 6
        num_heads: 8
        output_size: 256
        road_feat: [1] # road
        token_feat: [8] # data type : mlm 加上的token列
        add_gat: False
        # 关于gat的配置，可能不需要填写
        # gat_num_features: unuse
        # gat_num_heads: unuse
        gat_dropout: 0.1
        # 3. data_feature
        # vocab_size: 2505 # num_roads
        # node_fea_dim其实没用上
        # node_fea_dim: 9 # with token
      preprocessor:
        name: pass
  pretrain:
    load: False
    loss:
      # Reconstruction + Contrastive
      - name: mlm
        config:
          out_dis:
            feats: [1]
            num_embeds: [2505]
          out_con_feats: [2]
          latent_size: 256
          con_weight: 1.0
          dis_weight: 1.0
      - name: simclr
        config:
          embed_dim: 256
          similarity: inner
          temperature: 0.05
    trainer:
      name: multiple
      config:
        # basic config
        num_epoch: 30
        batch_size: 64
        lr: 2.0e-4
        # MultiTrainer's config
        meta_types: [trip,mlm-18,trim-0.15,shift-0.15]
        loss_coef: [0,1] # unuse i don;t know its purpose
        contra_meta_i: [2,3] # trim, shift
        gen_enc_meta_i: [1] # mlm-15
        gen_rec_meta_i: [0] # trip
  downstream:
    # Destination prediction task
    - task: destination
      # Use first model (encoder) for prediction
      select_models: [ 0 ]
      # Use test set for evaluation
      eval_set: 2
      config:
        # Number of points to use for prediction
        pre_length: 1
        # Whether to fine-tune pre-trained model
        finetune: true
        num_epoch: 20
        batch_size: 64
        save_prediction: false
        lr: 2.0e-4
        # Early stopping patience
        es_epoch: 10
        meta_types:
          - trip
        # Meta feature indices for encoder and labels
        enc_meta_i: [ 0 ]
        label_meta_i: [ 0 ]


````

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The problem about how to run the baseline start. #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

The problem about how to run the baseline start. #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions