Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Pathlib everywhere #23

Open
rusheb opened this issue Feb 20, 2023 · 5 comments
Open

Use Pathlib everywhere #23

rusheb opened this issue Feb 20, 2023 · 5 comments
Labels
code-quality code quality improvement good first issue Good for newcomers

Comments

@rusheb
Copy link
Collaborator

rusheb commented Feb 20, 2023

Let's use pathlib everywhere and stop passing strings around.

This could also be a good opportunity to add more unit tests.

@rusheb rusheb added the good first issue Good for newcomers label Mar 2, 2023
@valedan
Copy link
Contributor

valedan commented Mar 6, 2023

@rusheb Can you clarify where these changes are needed?

@rusheb
Copy link
Collaborator Author

rusheb commented Mar 7, 2023

I just did a quick search for the regex path.*: str. I doubt this is exhaustive, but probably a good start:

10 results - 6 files

maze_transformer/evaluation/eval_model.py:
   45  def load_model_with_configs(
   46:     model_path: str, data_cfg_class: type, verbose: bool = False
   47  ) -> LoadedModelConfigs:

  179      # split the tokens into maze (prompt) and path
  180:     path_start_token: str = SPECIAL_TOKENS["start_path"]
  181      path_start_idx: int = tokens.index(path_start_token) + 1

  242  def generate_plot_predicted_path(
  243:     model_path: str,
  244      n_tokens_pred: int = 5,

maze_transformer/evaluation/plot_loss.py:
  12  def plot_loss(
  13:     log_path: str,
  14      window_sizes: int | Iterable[int] | str = (10, 50, 100),

maze_transformer/training/mazedataset.py:
  300          self,
  301:         path_base: str = "data/test-001",
  302          do_config: bool = True,

  345          cls,
  346:         path_base: str,
  347          do_config: bool = False,

maze_transformer/training/training.py:
  209      # ==================================================
  210:     final_model_path: str = output_dir / TRAIN_SAVE_FILES.model_final
  211      logger.saving(f"saving final model to {final_model_path.as_posix()}", 10)

scripts/create_dataset.py:
   36  def create_dataset(
   37:     path_base: str,
   38      n_mazes: int,

  103  
  104: def load(path: str) -> None:
  105      d = MazeDataset.disk_load(path, do_tokenized=True)

scripts/train_model.py:
  18  
  19: def train_model(basepath: str, cfg_name: str = "tiny-v1"):
  20      dataset = MazeDataset.disk_load(basepath, do_config=True, do_tokenized=True)

@valedan
Copy link
Contributor

valedan commented Mar 7, 2023

Ah okay great, this is a good starting point! I guess part of this ticket is finding a good way to locate places where we're doing things the bad way.

@rusheb
Copy link
Collaborator Author

rusheb commented Mar 7, 2023

In my opinion if we replaced all of the above (and added unit tests for all those functions?) then that would be sufficient to close this ticket. Any stragglers could be picked up on an ad-hoc bases.

@valedan
Copy link
Contributor

valedan commented Mar 22, 2023

Nice article about some of the benefits of using Pathlib: https://treyhunner.com/2018/12/why-you-should-be-using-pathlib/

@mivanit mivanit added the code-quality code quality improvement label Sep 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code-quality code quality improvement good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants