Maladaptive vs Immersive Daydreaming Study

A comprehensive R-based analysis of text data comparing immersive and maladaptive daydreaming using topic modeling, sentiment analysis, and embedding-based approaches. This project processes Reddit posts from two communities and applies advanced NLP techniques to extract insights about cognitive and emotional patterns.

Project Overview

This study analyzes written descriptions from two Reddit communities:

r/MaladaptiveDreaming - Maladaptive Dreaming (MD)
r/ImmersiveDaydreaming - Immersive Daydreaming (ID)

The analysis pipeline includes text preprocessing, parallel embedding generation, topic modeling (LDA), sentiment/valence analysis, and visualization of findings across multiple dimensions.

Core R Scripts

Data Processing & Preparation

imports.r: Initializes the R environment with all required packages and memory settings. Source this file first to ensure all dependencies are loaded.
preprocessing.r: Modular text cleaning functions using spaCy (via spacyr). Implements tokenization, lemmatization, part-of-speech filtering, and text normalization. Functions include initialize_spacy() and clean_text_spacy().
data_descriptives.r: Generates descriptive statistics including valence t-tests, word frequencies, and basic dataset characterizations.

Analysis & Modeling

parallel_predictions.r: Efficiently processes large text datasets using batch processing and parallel computing. Generates word embeddings and valence predictions across multiple CPU cores. Splits data into configurable batches and exports results to .rds files.
topicPlots.r: Implements Latent Dirichlet Allocation (LDA) topic modeling with visualization. Produces topic distributions, prevalence plots, 1D valence distributions, and 2D subreddit-vs-valence scatter plots. Includes custom stopword handling.
sentences_dataset.r: Processes sentence-level datasets and manages sentence embeddings.
sentences_analysis.r: Analyzes sentence-level data, extracts sentence examples, and builds lookup functions for mapping posts to authors. Generates result tables with representative sentences.
lollipop.r: Creates lollipop-style visualizations for topic comparisons with colorblind-friendly palettes. Reorders topics by category (Immersive Daydreaming vs Maladaptive Dreaming).
textTrainExamples.r: Visual representation of the most extreme sentence samples for the two datasets.

Experiments

clustering_experiments.r: Experimental clustering analyses and comparisons.
testLDA.r: Test and validation scripts for LDA implementations.

Typical Workflow

Initialize Environment
```
source("imports.r")
```

Preprocess Text Data

source("preprocessing.r")
initialize_spacy()
# Use cleaning functions on raw text

Generate Embeddings and Valence Predictions

source("parallel_predictions.r")
# Processes data in parallel batches

Analyze Descriptive Statistics
```
source("data_descriptives.r")
```

Perform Topic Modeling

source("topicPlots.r")
# Generates LDA models and visualizations using the 'topics' package

Create Sentence Dataset

source("sentences_dataset.r")
# Processes sentence-level data and generates sentence embeddings for downstream analysis

Analyze Sentence-Level Patterns

source("sentences_analysis.r")
# Extracts representative sentences and analyzes sentence-level valence

Create Publication-Ready Visualizations

source("lollipop.r")
source("topicPlots.r")  # For additional plots

Requirements

R: Version 4.0 or higher recommended
Python: spaCy with English model (en_core_web_sm)
Key R Packages:
- Topic modeling: topics - For Latent Dirichlet Allocation (LDA) models
- NLP/Embeddings: text - For word embeddings and text representation learning
- Text processing: spacyr, stringr, quanteda, stopwords
- Data manipulation: tidyverse, dplyr, tibble, purrr
- Visualization: ggplot2, ggforce
- Parallel computing: parallel, future
- Utilities: reticulate, crayon

Installation

Clone the repository
Install R packages: See imports.r for the complete list
Install spaCy model:
```
python -m spacy download en_core_web_sm
```

Performance Notes

For datasets with 10,000+ records, parallel processing is strongly recommended (see parallel_predictions.r)
Batch size should be adjusted based on available RAM (typically 1,000-5,000 records per batch)
LDA computation time scales with corpus size and number of topics; start with smaller topic numbers for testing
Pre-computed embeddings are cached in data/ to avoid recomputation

Contact & Attribution

Part of the Maladaptive Dreaming Study research conducted at The Harmony Lab.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Maladaptive vs Immersive Daydreaming Study

Project Overview

Core R Scripts

Data Processing & Preparation

Analysis & Modeling

Experiments

Typical Workflow

Requirements

Installation

Performance Notes

Contact & Attribution

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.gitignore		.gitignore
README.md		README.md
clustering_experiments.r		clustering_experiments.r
custom_stopwords.txt		custom_stopwords.txt
data_descriptives.r		data_descriptives.r
imports.r		imports.r
lollipop.r		lollipop.r
parallel_predictions.r		parallel_predictions.r
preprocessing.r		preprocessing.r
sentences_analysis.r		sentences_analysis.r
sentences_dataset.r		sentences_dataset.r
testLDA.r		testLDA.r
textTrainExamples.r		textTrainExamples.r
topicPlots.r		topicPlots.r

theharmonylab/maladaptivedreaming-study

Folders and files

Latest commit

History

Repository files navigation

Maladaptive vs Immersive Daydreaming Study

Project Overview

Core R Scripts

Data Processing & Preparation

Analysis & Modeling

Experiments

Typical Workflow

Requirements

Installation

Performance Notes

Contact & Attribution

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages