Entity Ranking with Logistic Regression

This project is for Baidu's entity search competition:

Rules are here
Data are in data/ directory.

We are the Chitu team! -- Link to project page.

Features

Awesome error analysis interface (demo).
- Viewing queries the system performed the worst
- Summarize performance on held-out as well as cross-validation data

Easy to implement new feature extractors

Write a function, and decorate it with @Extractor

@Extractor
def HasCommonCharacter((q_type, query), entity):
  return [("HAS_COMMON_CHAR", int(len(set(query).intersection(set(entity)) != 0)))]

Scripts to download data
- Multi-thread downloading Baidu Baike data (in small_scripts/)

Usage

10-fold cross validation experiment

The following command runs 10-fold cross validation, and tests on hold-out set.

python run_experiment.py --extractors=e1,e2...

Output:

Experiment ID: 101. Detailed report at reports/101.html. Model at models/101.model

Cross validation:
| task1      | task2     | ...
| 0.70+-0.01 | 0.80+-0.02| ...

Hold-out set:
| task1    | task2    | ...
| 0.70     | 0.80     | ...

Listing Past Experiment Results

Listing past experiment result overview. For each feature set, we only list the last run.

python experiments_summary.py

Output:

  ID  features                                                 MAP    Celebrity     Movie    Restaurant    tvShow  time                                                                                      
  ----  --------------------------------------------------  --------  -----------  --------  ------------  --------  --------------------------                                                                
     7  nchar,char,nsumchar,sumchar,cont_bigram,cont_match  0.460184     0.822061  0.379818      0.277135  0.361721  2016-04-06 17:14:28.113651                                                                
     6  nchar                                               0.276946     0.360595  0.331004      0.259918  0.156267  2016-04-06 16:34:30.566979

Export Submission Output

python export_result.py --model_id=101 --test_dir=data/DEV\ SET --output_dir=output

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
data		data
html		html
notebook		notebook
output		output
small_scripts		small_scripts
README.md		README.md
entitydb.py		entitydb.py
experiments.py		experiments.py
experiments_summary.py		experiments_summary.py
export_result.py		export_result.py
extractors.py		extractors.py
models.py		models.py
run_experiment.py		run_experiment.py
settings.py		settings.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Entity Ranking with Logistic Regression

Features

Usage

10-fold cross validation experiment

Listing Past Experiment Results

Export Submission Output

About

Releases

Packages

Languages

xuehuichao/Baidu-entity-search-competition

Folders and files

Latest commit

History

Repository files navigation

Entity Ranking with Logistic Regression

Features

Usage

10-fold cross validation experiment

Listing Past Experiment Results

Export Submission Output

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages