File Name | Data Type | Rows | Columns |
---|---|---|---|
merged_data | Tabular Text | 40457 | 3 |
papers_final_data | Tabular Text | 36398 | 2 |
- BERT
- DistilBERT
- RoBERTa
- I freezed the model with it's pre-trained weights and ranged the learning rate between suitable values.
- Then I trained the model for 10 epochs using fit_one_cycle() method.
- After that, I unfreezed the trained model and again selecting a learning rate range, trained the model for 10 epochs.
Model | Micro Average | Weighted Average | ||||
---|---|---|---|---|---|---|
Precision | Recall | F1-Score | Precision | Recall | F1-Score | |
BERT | 62.211 | 45.104 | 52.294 | 60.635 | 45.104 | 50.618 |
DistilBERT | 65.810 | 40.588 | 50.209 | 63.739 | 40.588 | 48.119 |
RoBERTa | 69.113 | 20.353 | 31.446 | 59.215 | 20.353 | 24.646 |
Model | Size(MB) | Performance |
---|---|---|
BERT | 838.8 | 52.2939 |
Compressed BERT | 105.3 | 50.8322 |
I prepared a short video demonstration and shared it as a linked in post. Check it out here.
- Fallah, Haytame, et al. "Adapting transformers for multi-label text classification." CIRCLE (Joint Conference of the Information Retrieval Communities in Europe) 2022. 2022.
-
After a scraper script runs for a long time, sometimes it shows "Aw, Snap!" message in the running chrome. In that case, I just reloaded the webpage mannually and then it started working properly as previous.
- The required webelements distribution in all webpages wasn't the same. For some webpages, the scraper collecting details were working fine but it showed exceptions for those. So, I had to re-write some codes considering the different ones and generalize the codes.
- As I had to collect a lot of data, so, I created same type of scrapers and running them simultaneously from different indexes. It boosted my data collection process a bit although it depended much on internet speed.
- Some abstracts contains values like "Retracted.", "Final version", "IEEE Plagarism Policy." and some more unconsiderable values. So, I went through the whole dataset and found these values mannually for the data cleaning process.
- In the end, it took huge time to collect a desirable amount of data. So, I had to wait with patience.