Documentation, code and data for the study "Classifying Positive Results in Clinical Psychology Using Natural Language Processing" by Louis Schiekiera, Jonathan Diederichs & Helen Niemeyer. The preprint for this study is available on PsyArxiv.
The best-performing model, SciBERT, was deployed under the name 'NegativeResultDetector' on HuggingFace. It can be used directly via a graphical user interface for single abstract evaluations or for larger-scale inference by downloading the model files from HuggingFace, utilizing this script from the GitHub repository.
Background: This study addresses the gap in machine learning tools for positive results classification by evaluating the performance of SciBERT, a transformer model pretrained on scientific text, and random forest in clinical psychology abstracts.
Methods: Over 1,900 abstracts were annotated into two categories: ‘positive results only’ and ‘mixed or negative results’. Model performance was evaluated on three benchmarks. The best-performing model was utilized to analyze trends in over 20,000 psychotherapy study abstracts.
Results: SciBERT outperformed all benchmarks and random forest in in-domain and out-of-domain data. The trend analysis revealed non-significant effects ofpublication year on positive results for 1990-2005, but a significant decrease in positive results between 2005-2022. When examining the entire time-span, significant positive linear and negative quadratic effects were observed.
Discussion: Machine learning could support future efforts to understand patterns of positive results in large data sets. The fine-tuned SciBERT model was deployed for public use.
Table 1
Different metric scores for model evaluation of test data from the annotated MAIN corpus, consisting of n = 198 abstracts authored by researchers affiliated with German clinical psychology departments and published between 2012 and 2022
| Accuracy | Mixed & Negative Results | Positive Results Only | |||||
|---|---|---|---|---|---|---|---|
| F1 | Recall | Precision | F1 | Recall | Precision | ||
| SciBERT | 0.864 | 0.867 | 0.907 | 0.830 | 0.860 | 0.822 | 0.902 |
| Random Forest | 0.803 | 0.810 | 0.856 | 0.769 | 0.796 | 0.752 | 0.844 |
| Extracted p-values | 0.515 | 0.495 | 0.485 | 0.505 | 0.534 | 0.545 | 0.524 |
| Extracted NL Indicators | 0.530 | 0.497 | 0.474 | 0.523 | 0.559 | 0.584 | 0.536 |
| Number of Words | 0.475 | 0.441 | 0.423 | 0.461 | 0.505 | 0.525 | 0.486 |
Figure 1
Comparing model performances across in-domain and out-of-domain data; Colored bars represent different model types; Samples: MAIN test: n = 198 abstracts; VAL1: n = 150 abstracts; VAL2: n = 150 abstracts.

This study was conducted as part of the PANNE Project (German acronym for “publication bias analysis of non-publication and non-reception of results in a disciplinary comparison”) at Freie Universität Berlin and was funded by the Berlin University Alliance.
If you use the data or the code, please cite the paper as follows:
Schiekiera, L., Niemeyer, H., & Diederichs, J. (2024). Political bias in historiography - an experimental investigation of preferences for publication as a function of political orientation. F1000Research, 14, 320. https://f1000research.com/articles/14-320/v1