Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the new feature of customized initial population #1352

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

t-harden
Copy link

What does this PR do?

Add the new feature of allowing users to specify customized initial pipeline population for TPOT.

Where should the reviewer start?

tpot/tests/test_custom_iniPop.py and tpot/tpot/base.py

How should this PR be tested?

The test code is at tpot/tests/test_custom_iniPop.py:

from tpot import TPOTClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target,
                                                    train_size=0.75, test_size=0.25, random_state=42)

individual_str1 = 'MultinomialNB(input_matrix, MultinomialNB__alpha=0.1, MultinomialNB__fit_prior=True)'
individual_str2 = 'GaussianNB(DecisionTreeClassifier(input_matrix, DecisionTreeClassifier__criterion=entropy, DecisionTreeClassifier__max_depth=4, DecisionTreeClassifier__min_samples_leaf=17, DecisionTreeClassifier__min_samples_split=13))'
individual_str3 = 'GaussianNB(SelectFwe(CombineDFs(input_matrix, ZeroCount(input_matrix))))'

est = TPOTClassifier(generations=3, population_size=5, verbosity=2, random_state=42, config_dict=None,
                     customized_initial_population=[individual_str1, individual_str2, individual_str3],
                      )
est.fit(X_train, y_train)
print(est.score(X_test, y_test))

You can test it by:

cd tpot
nosetests tests/test_custom_iniPop.py -s

Any background context you want to provide?

Under this version, users can specify well-defined initial pipeline population in string format by themselves. This update has the potential to enhance the algorithm's performance and reduce evolutionary time.

Several Tips:

1. These string pipelines can be obtained in two ways:

  • Referencing the examples in test_custom_iniPop.py and modifying them according to TPOT's config_dict.
  • Extracting the keys of self.evaluated_individuals_ evolved by TPOT. This method is particularly useful for constructing appropriate initial pipelines for better evolution.

2. We consider the relationship between #customized initial pipelines and #population as follows:

"check if #customized initial pipelines <= #population"
if len(iniPop) <= self.population_size:
    for _ in range(self.population_size - len(iniPop)):
        individual_rand = self._toolbox.individual()
        iniPop.append(individual_rand)
    print(len(customized_initial_population), "customized pipelines +", self.population_size - len(customized_initial_population), "randomized pipelines as initial population.")
else:
    raise Exception("the number of customized initial pipelines > the number of population size!")

3. We also found that in this version, the configurations (i.e., operators and parameters) of customized initial pipelines should be a subset of those specified by the config_dict parameter. This issue can be noted in the documentation or can be addressed in the near future if you agree with this PR.

What are the relevant issues?

#1321

Questions:

  • Do the docs need to be updated?
    Yes
  • Does this PR add new (Python) dependencies?
    No

@t-harden t-harden marked this pull request as draft June 29, 2024 11:44
@t-harden t-harden changed the base branch from development to master June 29, 2024 11:45
@t-harden t-harden marked this pull request as ready for review June 29, 2024 11:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant