DISCO: A Hierarchical Disentangled Cognitive Diagnosis Framework for Interpretable Job Recommendation
by Xiaoshan Yu, Chuan Qin (corresponding author), Qi Zhang, Chen Zhu, Haiping Ma (corresponding author) https://arxiv.org/pdf/2410.07671v1
- Abstract
- I introduction
- II related work
- III Preliminaries
- IV methodology
- V experiments
- VI conclusion
Job Recommendation Systems
Online Recruitment Platforms:
- Create unprecedented opportunities for job seekers
- Pose significant challenge of quickly and accurately aligning jobs with skills/preferences
Job Recommendation Models:
- Text-matching based methods
- Behavior modeling based methods
- Realized impressive outcomes, but research on explainability is unexplored
Proposed Framework: DISCO
- Hierarchical Disentanglement based Cognitive diagnosis framework
- Accommodates underlying representation learning model for effective and interpretable job recommendations
DISCO Components:
- Hierarchical representation disentangling module: Mines hierarchical skill-related factors in hidden representations of job seekers and jobs
- Level-aware association modeling:
- Inter-level knowledge influence module: Enhances information communication and robust representation learning between levels
- Level-wise contrastive learning: Improves inter- and intra-level knowledge transfer
- Interaction diagnosis module:
- Incorporates a neural diagnosis function for modeling multi-level recruitment interaction process
- Introduces cognitive measurement theory
Experimental Results:
- Demonstrate the effectiveness and interpretability of DISCO on real-world recruitment recommendation datasets and an educational recommendation dataset
Availability:
- Code available at https://github.com/LabyrinthineLeo/DISCO
Job Recommender Systems:
- Online platforms like LinkedIn and Glassdoor have revolutionized job seeking process
- Need for accurate and trustworthy recommender systems to suggest positions based on job seeker's preferences and capabilities
- Recent studies focus on text-matching based methods using vast textual data from resumes and job postings for job suggestions
- Interaction behavior based approaches explore users' personalized preferences and intentions through modeling interaction behaviors between job seekers and recruiters (DPGNN)
Challenges in Traditional Job Recommendation Systems:
- Lack of reasons for recommendations makes it difficult for users to choose from suggestions
- Job seekers may pursue positions misaligned with their abilities or career aspirations, hindering job search success
- Need for interpretable and explainable job recommendation systems
Proposed Hierarchical Disentanglement based Cognitive diagnosis framework (DISCO):
- Re-examines recruitment recommendation process from a hierarchical disentanglement based cognitive diagnosis perspective
- Facilitates evaluation of diverse skill demands of various job positions and current competitive standing of candidates
- Addresses technical challenges in mapping user and job representations to specific skill dimensions, hierarchizing competency levels, and mitigating interaction bias.
DISCO Framework:
- Hierarchical representation disentangling module: extracts and clarifies hierarchical skill-related factors embedded in hidden representations of job seekers and jobs.
- Level-aware self-attention network: explores intrinsic associations between inter-level skill prototypes.
- Noise perturbation based level-wise contrastive module: enhances robust representation learning.
- Interaction diagnosis module with neural diagnosis function: effectively captures multi-level recruitment interaction process and incorporates cognitive measurement theory for explainability.
- Extensive experiments demonstrate the effectiveness and interpretability of DISCO in job recommendation task on real-world datasets.
Online Job Recommendation:
- Emergence as pivotal task due to potential accurate matching of job seekers with suitable positions [18, 19, 20, 21]
- Two main categories: text-matching based methods and interaction behavior based methods
Text-Matching Based Methods:
- Focus on matching textual content [5, 4, 22]
- Text-matching strategies or text enhancement techniques [23, 24]
- Example: APJFNN [4] uses RNN for word-level semantic representations and hierarchical ability-aware attention strategy.
Interaction Behavior Based Methods:
- Focus on users' personalized preferences and intentions [9, 10]
- Modeling interaction behaviors between job seekers and jobs (recruiters)
- Example: SHPJF [10] models users' search histories in addition to learning semantic information from text content.
Limitations:
- Significant shortfall in interpretable exploration of matching job seekers with job positions.
Cognitive Diagnosis (CD)
Background:
- Classical methodology for assessing ability in educational psychology [25, 26]
- Analyzes learning behaviors to portray learners’ proficiency profile [27, 28]
Traditional Psychometric-Based CD Approaches:
- Based on psychological theories to depict student knowledge state through latent factors [25, 29]
- Examples:
- Deterministic Inputs, Noisy And gate (DINA) model [29]
- Characterizes each student with a binary vector indicating mastery of knowledge concepts
- Requires all relevant skills for highest positive response probability
- Deterministic Inputs, Noisy And gate (DINA) model [29]
Neural Network (NN)-Based CD Approaches:
- Propelled by the development of deep learning [30, 31], [32]]
- Model complex interactions among learning elements (students, exercises, and knowledge concepts)
Examples:
- NeuralCD [30]
- Employs multidimensional parameters for detailed depiction of students’ knowledge level and exercise attributes
- Incorporates Multi-Layer Perceptron (MLP) to model complex interactions between students and exercises
- RDGT [31]
- Designs a relation-guided dual-side graph transformer model to mine potential associations between learners and exercises
Gap in Skillful Application:
- Remains in effectively modeling the job recommendation task using cognitive diagnostics.
Disentangled Learning
- Purpose: Identify and disentangle underlying explanatory factors of observed complicated data, enhancing efficiency and interpretability [33, 34(https://arxiv.org/html/2410.07671v1#bib.bib34)]
- Initially in computer vision due to effectiveness [34]
- Recent approaches for graph-structured data: DGCL, DisenGCN [35, 36(https://arxiv.org/html/2410.07671v1#bib.bib36)]
- DGCL: Employs contrastive learning to uncover latent factors within the graph, extracting disentangled representations [36]
- DisenGCN: Proposes a unique neighborhood routing mechanism for disentangling node representation in graph networks, enabling dynamic identification of latent factors [35]
- Learning disentangled representations of user latent intents from interaction feedback in recommendation domain [37, 38(https://arxiv.org/html/2410.07671v1#bib.bib38), 39(https://arxiv.org/html/2410.07671v1#bib.bib39)]
- MacridVAE: Proposes a macro-micro disentangled variational auto-encoder to learn disentangled representations based on user behavior across multiple geometric spaces [37]
Learning Disentangled Competency Representations (Cognitive Diagnostic Perspective)
- Unexplored in this area [[---]]
Overview Architecture of Proposed DISCO Framework
- Figure 2: Link
Job Recommendation System
Introduction:
- Problem definition for job recommendation
- Set of job seekers: 𝒞={c1, c2, …, cN}
- Set of jobs: 𝒥={j1, j2, …, jM}
- Set of skills at different granularity levels: Å=∪l=1LÅl, where 𝒮L is the atomic skill level and K=∑l=1L|𝒮l|
- Each job seeker and job associated with textual documents describing resumes and requirements
Job-Skill Relationship:
- Q-matrix representing relationship between jobs and skills: 𝒬={quv}M×K
- quv=1 if job j requires skill sv, 0 otherwise
Interaction Matrix:
- Interaction matrix between job seekers and jobs: ℛ={ruv}N×M
- ruv ∈ {0, 1, 2, 3} corresponds to four interaction behaviors:
- Browse: candidate browses job
- Click: candidate clicks on job
- Chat: candidate engages in chat with recruiter about job
- Match: both parties are satisfied and the pair is matched (ruv=3)
- ruv ∈ {0, 1, 2, 3} corresponds to four interaction behaviors:
Goals:
- Predict compatibility between jobs and candidates using interaction records ℛ, relationship matrix 𝒬, and resume/job descriptions
- Top-K job recommendation based on predicted degree of matching between candidates and jobs.
DISCO Framework:
- Goal: model interaction patterns between users and jobs for flexible recommendation models
- Base embedding model (ℳ(𝒞,𝒥,ℛ)): encodes job seekers (𝒞) and jobs (𝒥) into d-dimensional matrices C and J, respectively
- Variable acquisition of embedding representations depends on base models
- Example: NGCF using user-item interaction graph for propagation and embedding learning (right part of Figure 2)
- Refined embeddings after k layers of propagation: cu(k) and jv(k)
- Nonlinear activation function: σ
- Element-wise multiply operator: ⊙
- Sets representing interacted users/items: 𝒩u, 𝒩v
- Trainable weight matrices for feature transformation: W1, W2
- Output embeddings obtained by concatenating all layers' representations (cu∈C, jv∈J)
DISCO Framework Overview
The DISCO framework is detailed here with its four main components:
- Hierarchical skill-aware representation disentangling
- Level-aware self-attention network
- Level-wise contrastive learning
- Interaction diagnosis module, as depicted in Figure 2.
Modeling Latent Skill Factors
- The main goal is to predict job seeker-job matching based on observed interactions
- Skill factors influence outcome: candidate's skill mastery, job difficulty [43]
- Prediction objective: y = f(X) = g(E(zc), E(zj))
- X = (cu, jv, sjv)
- y: learned matching score
- zc, zj: skill factors for candidates and jobs, respectively
- E(z): encoding function
- Optimization objective: θ∗ = arg min∑i|ℛ|−logp(yi│Xi) = arg max∑i|ℛ|logp(zi│Xi)
- Approximation: E(z) ≈ g(E(z)) [44]
- Constraining approximation error within prediction function g(⋅)
Hierarchical Skill-Aware Disentangling
- Primary determinants of interaction outcome: skill proficiency, demand [18]
- Explicitly disentangle skill factors for understanding and enhancing interpretability
- Skills at different levels of granularity [43]
- Construct L-layer mappers to project embeddings into hierarchical skill spaces: cu,lh and jv,lh
- Wlc, Wlj: trainable matrices
- dh: hidden dimension
- Build multi-level encoders for users and jobs to learn ability prototypes (cu,lz) and skill difficulty prototypes (jv,lz) at each skill layer
- Encoder architecture: multilayer perceptron network (MLP)
- dz = |ℒL|: number of atomic skills.
Level-Aware Self-Attention Network
- Enhances learning by exploring inter-level skill relationships
- Incorporates correlated information into enhanced skill representations (c~u,lz}, jv,lz)
- Uses Self-Attention module for processing query, key and value vectors (𝒬c, 𝒦c, 𝒱c; 𝒬j, 𝒦j, 𝒱j)
- Design:
- c~u,lz, jv,lz ∈ ℝdz are enhanced l-level skill aware representations
- SelfAtt(⋅) indicates the Self-Attention module [45]
Level-Wise Contrastive Learning
- Enhances robustness of skill-aware disentangled representations
- Inspired by recent developments in contrastive learning [44]
- Proposed level-wise contrastive learning loss:
- Maximizes expectation of L subtasks (pθ(cu′|cu,zc,l))
- Formulated for job seeker side only
- Contrastive learning subtask for l-level ability:
- Defined as Eq. (8) in the text
- pθ(cu′|cu,zc,l) denotes candidate ability contrastive learning subtask
- zc,l is the l-th level latent skill factor of the job seeker
- Goal: Learn optimal L ability prototypes that maximize expectation of L subtasks.
- Implementation:
- Augment l-level ability representation (c~u,lz+) by adding random noises (Δu,l′), subject to ‖Δ‖2=ϵ and the second constraint for maintaining validity of positive samples.
Interaction Diagnosis Module for Job Seekers and Jobs
Cognitive Diagnosis Theory:
- Key research focus: measuring a tester's ability level by modeling their ability representations against the difficulty representations of exercises across different knowledge concepts
- In DISCO framework, skill characteristics of job seekers and jobs are disentangled and mapped to skill dimensions, enabling interaction modeling using diagnosis functions
Neural Diagnosis Function:
- Seamlessly integrates with non-linear neural network layers
- Capable of modeling high-dimensional interactive elements, enabling acquisition of extensive knowledge and presentation of interpretable information
- Formalized as equation (12): 𝒯(c
u,lz,jv,lz) = 𝒬jvl⊙(σ(cu,lz) - σ(jv,lz)) - Obtains the matching distance in the l-level skill space between cu and jv from a diagnosis perspective
Hierarchical Diagnosis Prediction:
- Aggregates hierarchical competency matching distances between candidates and jobs
- Concatenates L-layer matching distance representations into an aggregated interaction vector hu,v
- Uses full connection layers to model high-order interaction features
- Predicts the probabilities of different interaction categories between cu and jv
Loss Function:
- Multi-class cross-entropy loss function for predicting job seeker-job interaction categories
- Constructs complete contrastive learning loss as the optimization objective
Statistics of Experimental Datasets:
Statistics | Technology | Service | Edu-Rec |
---|---|---|---|
#Candidates | 4,726 | 10,022 | 61,567 |
#Items | 34,962 | 23,866 | 20,828 |
#Skills | 986 | 3,241 | 384 |
#Interactions | 616,504 | 866,065 | 2,200,731 |
Avg. interactions per user | 130.45 | 86.41 | 35.74 |
Experimental Validation of DISCO Framework
Datasets:
- MF (Normal)
- NCF
- AutoInt
- FINAL
- NGCF (Normal)
- LightGCN
- DPGNN
Performance Metrics:
- AUC (Area Under the ROC Curve)
- HR@5, HR@10 (Hits at position n)
- NDCG@5, NDCG@10 (Normalized Discounted Cumulative Gain at positions n)
Base Models and Baselines:
- Normal: MF, NCF, AutoInt, NGCF, LightGCN, DPGNN
- Underline: NCF, AutoInt, FINAL
- DISCO
Results:
- DISCO outperforms baselines in most metrics on all datasets
- Significant improvements marked with "*"
Technology and Service Based Models:
- MF (Mean-Field)
- NCF (Neural Collaborative Filtering)
- AutoInt (Autoencoder with Attention)
- FINAL (Final version of DISCO)
- NGCF (Network-Based Collaborative Filtering)
- LightGCN (Light Graph Convolutional Network)
- DPGNN (Dynamic Point-wise and Graph Neural Network)
Dataset Description and Preparation:
- Dataset provided by an online recruitment platform with four behaviors: Browse, Click, Chat, Match
- Filtered out job seekers with fewer than ten Match interaction logs and jobs with fewer than five records
- No sensitive information, all IDs remapped to ensure they do not correspond to original identifiers
- Selected two subsets based on career clusters: technology and service
- Randomly split data into three parts for training, validation, and testing sets
Baseline Approaches:
- Four widely used recommendation methods as base models: MF, NGCF, LightGCN, DPGNN
- Incorporated three interaction modeling methods (NCF, AutoInt, FINAL) into base models to construct complete baselines
- Selected two state-of-the-art methods (SHPJF and ECF) for job recommendation and interpretable recommendation, respectively
Evaluation Protocols and Implementation Details:
- Employed three widely used metrics: Area Under the ROC Curve (AUC), Hit Ratio (HR@k), Normalized Discounted Cumulative Gain (NDCG@k)
- Set k to 5 and 10 for evaluation of job recommendation task
- Utilized random sampling of 25 jobs as negative instances for each positive instance
- Implemented all models using Pytorch with Python on a Linux server with eight Nvidia A800 GPUs
- Conducted experiments five times and used average value as final result
- Used t-test to identify significant differences between performances of DISCO and baselines
- Initialized network parameters with Xavier initialization, learning rate searched from {5e-5, 8e-5, 1e-4, 2e-4, 5e-4}
- Set coefficient λ of contrastive loss to 1e-3.
Performance Comparison: DISCO Framework vs Baselines
Observations:
- DISCO framework embedded in four models outperforms all baselines on two recruitment datasets: AUC metric improved by an average of 0.65 and 0.64, HR@5 and NDCG@5 metrics improved on average by 2.96 to 3.62.
- Significant advantages for job recommendation tasks with DISCO framework over baselines in terms of:
- Greater improvement in recommendation metrics than classification metrics
- Modeling high-order user-item interactions effective in enhancing performance (FINAL, AutoInt methods)
- Job recommendations using NGCF and LightGCN models are more effective than other model types due to high connectivity between job seekers and jobs.
Experimental Results:
- Technology dataset: DISCO outperforms baselines in AUC, HR@5, NDCG@5, HR@10, NDCG@10 (refer to Table II).
- Edu-Rec dataset: DISCO holds significant advantages over interaction methods for recommendation tasks on educational data despite the increase in data size (not limited to job recommendation domain).
Additional Comparisons:
- DISCO outperforms SHPJF model by 2.25% and 3.20% for HR@5 and NDCG@5, respectively.
- Significant relative improvement of DISCO over interpretable recommendation model ECF: 31.81% and 52.36% for HR@5 and NDCG@5, respectively.
RQ2 Ablation Experiment Results:
- Conducted to investigate effectiveness of each component in DISCO framework on Technology dataset using NGCF as base model
- Variations: w/o HD, SA, CL, ID
Findings:
- Impact of Submodules: All variations perform worse than NGCF-DISCO, highlighting significance of designed submodules.
- Hierarchical Skill-Aware Disentangling Module (w/o HD): Elimination causes considerable drop in performance, validating importance and effectiveness of hierarchical disentangling idea.
- Sensitivity Analysis:
- Learning rate: Figure 4 (refer to the caption for details)
- Coefficient λ: Figure 5 (refer to the caption for details)
Parameter Sensitivity Analysis for RQ3:
- Explores hyper-parameter impacts, mainly focusing on learning rate and weight coefficient λ of contrastive loss
- Experiments conducted on Technology dataset using NGCF and DPGNN as base models
- Set learning rates to {5e-5, 8e-5, 1e-4, 2e-4, 5e-4} and λ values {1e-5, 5e-4, 1e-3, 5e-3, 1e-2}
- Optimal learning rates found to be 8e-5 for NGCF-DISCO and 1e-4 for DPGNN-DISCO (both increase before decreasing)
- Best performance achieved when λ is set to 1e-3 for both models
- An intriguing observation: trends of coefficients impacting performance differ across the three metrics as λ value increases
Case Study on Interpretability of DISCO Model
Purpose:
- Explore interpretability of DISCO model through a case study
- Analyze job seekers' abilities and difficulty of job skills
Methodology:
- Select pair of job seeker and position that achieved matching in the job search process
- Demonstrate interpretable content by outputting hierarchical skill-associated representations from the model
Findings:
- Candidate c's mastery of each skill at second and third levels: shown as examples, with corresponding proficiency influencing coarse-grained level (e.g., s1)
- Job requirement values for each skill: compared to candidate c's proficiency level
- Compatibility between candidate c's proficiency level and job j's required level: explains the pair's matching
Benefits:
- Output from the model improves interpretability of job recommendations
- Provides deeper understanding of the job search process for both job seekers and recruiters.
DISCO Framework for Job Recommendations:
- Components: hierarchical representation disentangling module, level-aware association modeling (inter-level knowledge influence module & level-wise contrastive learning), interaction diagnosis module with neural diagnosis function.
- Hierarchical Representation Disentangling Module: mines skill-related factors in job and job seeker representations.
- Level-Aware Association Modeling: enhances communication and robust representation learning, includes inter-level knowledge influence module and level-wise contrastive learning.
- Interaction Diagnosis Module: integrates a neural diagnosis function for effective modeling of multi-level recruitment interaction process between job seekers and jobs.
- Cognitive Measurement Theory: incorporated in the interaction diagnosis module.
- Datasets: two real-world recruitment recommendation datasets, one educational recommendation dataset used for evaluation.
- Results: demonstrate effectiveness and interpretability of DISCO framework.