Skip to content

Questions about my problog usage #128

@Sleantt

Description

@Sleantt

Hi,

I'm currently working on a student project aiming to evaluate the relevance of ProbLog (and probabilistic programming in general) in machine learning. To do so, i'm trying to solve the Kaggle challenge : Titanic, machine learning from a disaster. I did not find any forums to discuss my questions about ProbLog so i thought i would give it a shot here.

I'm really not sure about my approach, and encountered multiple issues while trying to implement my classifier.

First variant

I initially used only discrete features (Sex, and Passenger Class) to implement a simpler version of the classifier. I use a python script to generate the input files and handle the outputs of ProbLog.

% sex(S) : Male (0) or Female (1)
t(_)::sex(0);t(_)::sex(1).

% pclass(P) : Class 1, 2 or 3
t(_)::pclass(1);t(_)::pclass(2);t(_)::pclass(3).

% person(ID, Sex, PassengerClass) : The passenger with the given id has the given sex and passenger class
person(X, S, PClass) :- sex(S), pclass(PClass).

% survived(+PassengerId, Survived) the given passenger survived is Survived equals 1
t(_)::survived(X, 1); t(_)::survived(X, 0) :- person(X,S,P).

The input files are generated from a "training set" and are structured as such :

evidence(person(1, 1, 1)). % Passenger with id 1 is a Woman in first class
evidence(survived(1, 1)). % Passenger with id 1 survived
---
% More evidences

And i use ProbLogs lfi modality to generate a learned model.

Using the model

I'm currently using a second file to classify the data from my test set as such :

:-consult('learned_model.pl'). % Load the learned model
person(863,1,1). % Add passenger with ID 863 who is a woman in first class to the persons list
query(survived(863, 1)). % Query whether the passenger survived
% More of the above, for each passenger of the test set

I initially ran the model using ProbLogs sample modality, but i wasn't satisfied with the results. I didn't find how the learned models were supposed to be used while reading the docs, but i found out that ProbLogs mpe modality gives good (and consistent) results. I am however not really sure whether this is the intended use for learned models.

Second variant

I also found out while reading the tutorial a second time that i could ground each of the 'person' predicates parameters in the variable probabilities of the 'survived' predicate as such :

% sex(S) : Male (0) or Female (1)
t(_)::sex(0);t(_)::sex(1).

% pclass(P) : Class 1, 2 or 3
t(_)::pclass(1);t(_)::pclass(2);t(_)::pclass(3).

% person(ID, Sex, PassengerClass) characterizes a titanic passenger
% with their passenger id, sex, passenger and class
person(_ID, Sex, PassengerClass) :- 
  sex(Sex), 
  pclass(PassengerClass).

% survived(+PassengerId, Survived) the given passenger survived is Survived equals 1
t(_, Sex, PassengerClass,)::survived(PassengerId, 1); t(_, Sex, PassengerClass)::survived(PassengerId, 0) :- 
  person(PassengerId, Sex, PassengerClass).

This variant gave me a more coherent learned model, that actually made use of the given features.

Third variant

I realized while reading the documentation that ProbLog doesn't support continuous values, and i tried to circumvent this limitation for my use-case. I had three options in mind :

  • Divide the passengers in age groups
  • "Discretize" the values by rounding them to the nearest integer value
  • Use one of ProbLogs extensions (namely DC-ProbLog)

When trying out my first and second options, i realized that the model learning process was quite slower (from ~30 seconds to several minutes). I used annotated disjunctions like so :

% First option
t(_)::age(0);t(_)::age(1);t(_)::age(2);t(_)::age(3).

% Second option
t(_)::age(0);t(_)::age(1);t(_)::age(2);t(_)::age(3);%more of the same%t(_)::age(80).

I also altered my initial model this way :

[...]
% person(ID, Sex, PassengerClass, Age) characterizes a titanic passenger
% with their passenger id, sex, passenger class and age
person(_ID, Sex, PassengerClass,Age) :- 
  sex(Sex), 
  pclass(PassengerClass), 
  age(Age).

t(_, Sex, PassengerClass, Age)::survived(PassengerId, 1); t(_, Sex, PassengerClass, Age)::survived(PassengerId, 0) :- 
  person(PassengerId, Sex, PassengerClass, Age).

However the second option (having 80 variants for the age predicate) was too slow to even use (no results after an hour).

My question here is the following : have i reached a technical limitation or am i using ProbLogs features in a non-optimal way ?

Also, is there any documentation concerning DC-ProbLog asides from the official thesis ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions