Skip to content

vdmitriyev/gdelt-data-to-sap-hana-loader

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

About

Simple python script that loads data from GDELT dataset to SAP HANA DB table.

Dependencies Setup

GDELT Table Structure

  • Table structure is taken from GDELT table definition - http://gdeltproject.org/data/lookups/SQL.tablecreate.txt
  • To create table in SAP HANA script use following script 'gdelt_dailyupdates.hdbtable'
  • Directory "data" contains python script that fetches daily data updates(interval can be specified) from GDELT website and stores and upzips them on your PC.

Load GDELT Daily Updates from PC

  • Move to the subdirectory data
  • Run on bat file
'gdelt_download.bat'
  • in command line (for only daily updates)
python gdelt_download_daily.py fetch_from_date -d "../zipped" -U -du "../unzipped"
  • in command line (fromdate <option -F and date in format 'YYYYMMDD'>, todate <option -T and date in format 'YYYYMMDD'>)
python gdelt_download_daily.py fetch_from_date -d "../zipped" -U -du "../unzipped" -F 20140321

Credentials for the SAP HANA DB

  • Create file 'sap_hana_credentials.py'
  • Copy->Paste code below and insert your credentials
# Server 
SERVER = '<server>'
PORT = <port>

# User Credentials
USER = '<user>'
PASSWORD = '<password>'

Applications <port> should be 3<instance number>15. For example, 30015, if the instance is 00.

Run on Windows

To main python script on windows machine you can use 'run.bat'. Note: (a) all configurations must be performed before script can be executed properly;

run.bat

Known Problems and Drawbacks

  • [FIXED] Not all event from daily updates are parsed properly (some shift in data is possible);
  • [FIXED] All fields(if generated from SAP HANA .hdbtable) are generated as 'NVARCHAR' data type;

Credits

About

Loading GDELT data to SAP HANA (only daily updates).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published