Description
What are some general characteristics of this market (in particular the venue du)?
Given our non-trivial holding time, we should devise our strategy according to key observations we have on our data.
We have observed:
Very few (model, size) trade regularly
Most can be characterized as having large spread, low volume and liquidity. Missing data on one side is often expected. It would be extremely risky to get into illiquid positions.
Strategy report from the most complete feed run suggests
total (style_id, size) pairs 34887
total (style_id, size) pairs 8725 with data
total (style_id, size) pairs 8725 with fresh data
total (style_id, size) pairs 4753 with fresh transactions
total (style_id, size) pairs 1820 satisfying profit cutoff ratio (bid to last) of 0.01
We can consider trading about 13.6% of the scraped space.
High volatility and price spikes are to be expected, even in most heavily traded pairs
If we sort by size (hence roughly number of transactions) (apparently we've too many files so just a ls -Sl
wouldn't do)
find . -name "*.json" -exec ls -l {} + | tr -s ' ' | cut -d' ' -f 5,9 | sort -s -n -k 1,1 | tail
And look at some of our most liquid pairs:
./du_analyzer.py --style_id 554723-051 --size 7.0 --mode plot
The decision the current strategy would derive around 20191014 would be drastically different from any other time.
I cannot yet come up with an explanation for the spikes.
Another more extreme example (a 2017 valentine's day issue):
./du_analyzer.py --style_id 881426-009 --size 7.0 --mode plot
./du_analyzer.py --style_id 881426-009 --size 7.0 --mode stats
First Date: 2019-07-31T06:49:14.439100
Last Date: 2019-12-24T06:48:50.548423
Number of Sales: 160
Sales / Day: 1.10
High: 2699.00 CNY 385.80 USD
Low: 1819.00 CNY 260.01 USD
First: 1859.00 CNY 265.73 USD
Last: 2109.00 CNY 301.46 USD
Average: 2167.75 CNY 309.86 USD
Stdev: 178.92
This would indicate filtering and sorting by mid-to-last
can be quite misleading.
We suspect our strategy to be inherently biased towards more risky new releases
./strategy.py --start_from ../feed/merged.20191225.csv | grep "Release date" | tr -s ' ' | cut -d' ' -f 3 | sort
2008-11-28
2017-01-28
2017-06-10
2017-08-05
2017-10-07
2017-10-07
2017-11-21
2018-09-05
2019-01-22
2019-06-10
2019-08-24
2019-10-25
2019-11-07
2019-11-30
2019-12-06
2019-12-06
2019-12-07
2019-12-07
2019-12-07
As a rough estimate, using the default cut-off ratios, more than half was from this year, and among those tilted towards those just released and started trading a few days ago.
Our belief is that the new issues generally "stabilize" to a price lower than the trading price of the first few days, and this stabilization period would be shorter than our holding time, meaning capturing the difference in new issues can be tricky.
I'm bearish due to volatility for automated bids.
Backtesting (sim) would require non-trivial implementation effort, and due to data scarcity I'd be doubtful whether we can derive meaningful conclusion.
Keep gathering data wouldn't hurt, but I tend to think it too risky to automate bids: we aren't disciplined enough and don't have enough data.