Very good question you bring here.

In fact, my concern was specifically studying how LSTMs could perform with raw timeseries data at the moment I wrote this article. And I wrote this articles about neural networks, xgboost and time series to show a few examples on how we should prepare data for analytics using neural networks and machine learning. And what we get here is precisely the obvious stuff: «hey, this is trending up». Regarding a confusion matrix, what I usually do is just to adjust the threshold (in binary representation it could be 0.5/0.5 or 0.7/03, etc…). It depends how much sensitivity and specificity you want.

For financial analysis, there are a few other features you need to take into account such as open markets, news releases, spreads and commissions and a buch of other stuff like equity, leverage, risk, stop loss; one could become mad.

Besides, having an accuracy if a market will go up or down is not enough. What about the trade management that implies an uncertain number of future time steps where there is no way to know how every and each next bar will be?

The financial time series problem, from the machine learning perspective is very complex I mean and we need to drastically reduce the complexity of the data upfront in my opinion. So I think we need to think about financial data structures more than about particular events like open, high, low and close.

And then, predicting or clasifying is only a description of a certain state. But, as I say, it does not gives us an optimal way to handle that information. How much to buy, how much to sell, how much to stay in the trade…?

Coming back to your initial question, in trading the important thing is how much you win in average vs how much you lose. You can make a profit by winning 30% of the time and lose all your money in a week even if your winning ratio is 95%.

Regards.

]]>The values used to calculate the data for the predicted period are always shifted in fact.

The XGB is not using a linear regression function, but something closer to a logistic regression, as show in the article:

y_predicted_binary = [1 if yp >=0.5 else 0 for yp in y_predicted]

Because we are not looking for the best fit to the line but the highest probability.

]]>Just to clarify, the current market price and volume for the particular time frame or day has been used to predict the same days Up or Down call?

Shouldn’t it be shift(1) and shift(2) to predict the future Up or Down call?

Is it possible to see the developed XGB Model or regression for the related work in Python (Eg: Y=mX + C)

Thank You!

Also, I’m fairly new to Python and am unable to appreciate the purpose of the pandas merge statements:

X = pd.merge(Predictors, target,left_index=True,right_index=True)[Predictors.columns]

y = pd.merge(Predictors, target,left_index=True,right_index=True)[target.columns]

After the merge, X seems to be a copy of the Predictors dataframe and and y is a single column of the target dataframe. Does the merge do something more than if one was to simply use:

X=Predictors

y=target

Thank you very much for your kind advice and for sharing your expertise.

]]>If you are getting today’s data comparing tomorrow data using shift(1), you can change it by using shift(-1). The models I have prepared use those past data to predict future data as well.

]]>It is also important understanding that this is not a trading model, but a machine learning exercise. This prediction has no application in real trading and it is not a trading model.

]]>Could you clarify if the prediction is not based on future data that in any out of sample prediction would not be known? Thank you.

]]>