Building trading algorithms from scratch. Part III.
Trading is a risky activity. If you follow anything written in the article, you are responsible for the results. The result of this article may be decreased deposit and lost funds. By reading below and making any actions, you agree with our Disclaimer, Policies, and Risk Warning.
Hi everyone, once again. This time we continue with our freshly-programmed algorithm, and we will add some new trading dimensions into our analysis. In my last article, I also said that we would search for the flaws of the algorithm. Why do we add dimensions first? Well, it's because new dimensions may alter or fix some of the problems. So, we will waste our time if we analyze the flaws now. We will do it once we finish with our filtering procedures. Ok, let's start!
Those who watched Sharpedge courses know that there are several ways how you can approach the filtering stage - you can analyze chart or you can do, what I call, "a table view." We will do the table view because I want to show you how I use my third and final tool, which is Python and Jupyter notebook, and because it's faster.
Before we move on, I need to point out that I found a mistake in the last version of our code. Because of me being an idiot, the strategy didn't cancel the stop orders when it closed the positions. This error didn't affect the algorithm on the current stage but it will, once we move forward. In this code, I fixed the problem.
The table view, how do we approach it? First of all, what is it? A brief answer would be something like that:
A table view is an analysis of all the trading dimensions that can be used with an algorithm
In our case, we will analyze the following dimensions:
Trend-direction dimension, using SMA and Donchian Channel
Trend-strength dimension, using ADX
Momentum dimension, using smoothed Williams %R
Volatility dimension, using ATR and Standard Deviation
Those who watched the courses see that we ignore many dimensions here. Why is that? There are several different reasons.
We don't use sentiment, money flow, and volume dimensions because of our timeframe. The volume dimension can only be used on a daily timeframe or if you use regular trading sessions. Otherwise, your volume fluctuates during the day (the volume on CME is much higher when the U.S. trades) and makes it impossible to apply. The sentiment and money flow are just extremely long-term dimensions, and it's not that useful in fast trading. So with the intention to keep complexity low, I will skip these inputs.
Besides, we also ignore market breadth. Breadth is mostly about stocks, and I see limited usage of it in futures trading.
Now, when we know what we want to analyze let's go to the first step - data preparation.
Every data science course starts with the data preparation stage. This stage allows you to move faster afterward and makes life easier. Hopefully for you, I spent a lot of time trying to figure out a good way to export all the necessary data from NinjaTrader when we backtest an algorithm. This code will automatically export all the data into a .txt file, which you can find in the "Documents -> NinjaTrader8" core folder.
We have the data. Lovely! Let's now move to Jupyter. Jupyter is an interactive Python environment that works in your browser. I first started using Jupyter when I was doing "Data Science Specification" in Michigan University on Coursera, and I've never stopped since then. I use Jupyter as a part of Anaconda package. You can see the print screen of Jupyter below .
There is one important thing to note. Since my laptop is in Russian language, the format of the figures is different. In Russia, we use a comma instead of a period to divide whole from fraction. This minor difference forces me to make one more step during the data preparation step. If you open the Python code, you'll see a comment stating: "Delete this part if your format is period-based."
Now, when we have the data and the notebook running, how we are going to analyze the data? Well, I do the following steps:
I define how I what to manipulate with the data and which aspects I want to analyze.
I divide the data into bins and analyze the bins with charts. Let's consider an example.
Below you see four charts of ADX bins. There are two for long trades and two for short trades - one for the last bar and the one for the previous bar. The bars' hights show the aggregate profit of trades in the bins, while the black numbers you see at the top of the bars are the number of trades in each group.
ADX analysis for VSH3_corrected
There are two questions to answer - why do I analyze aggregate profit here, and what do I see on the charts? Well, in terms of bins and the statistics that you investigate you can do whatever you want to. It strongly depends on your preferences. You can analyze reliability if you search the algorithm with a high percentage of winning trades, you can analyze an average profit factor if you try to optimize risk-return characteristics. I got used to profit-analysis because I'm a risk-seeking person, and my studies show that in 80% of the cases, whatever your statistics is, you end up with more or less the same filters. So I stick to my guns.
The second question is what do I see on the chart? First of all, you need to understand that in 90% you won't see anything. Not because you cannot spot the patterns, but because there is nothing to spot. Finding a valuable and credible pattern takes time. Sometimes you end up with no patterns at all. Shit happens. There are also some scientific limitations to your search. The process of searching a pattern by looking at different dimensions is known as data mining. David Aronson, the author of the book "Evidence-Based Technical Analysis," argues that with each new attempt, the credibility of your finding decreases - it's just a matter of time until you find something that was lucky enough to work. He offers to use a hypothesis-testing process to verify the credibility of a new rule (if you didn't have statistics course - don't bother, it's nerdy stuff, and you don't need it). I offer to change the process a little bit - first, you come up with a logical explanation behind the filter, then you try the filter. Using dimensions is one step towards this direction - you don't try dozens of Oscillators or Momentum indicators, just one. Making logical data manipulation is the second step. This approach allows you not to shoot with your eyes closed.
So, can we have an answer to the question of what to do with the god damn chart? Yes, we can! First of all, we see that ADX filter affects both sides in the same way - the higher the ADX on the entry bar, the worse the results. In other words, the filter is symmetric. Most filters, however, are asymmetric. It means you will see the profit for long trades on the one side and profit for short trades on the other.
Another thing, that we can note, is that ADX filter doesn't add any value to our algorithm. Why? Well, it certainly can filter some negative trades for both sides. However, to do that you need different conditions for the two sides. As you see, ADX above 60 generates loss for short trades, while for long trades anything above 40 is bad already. So if you make two different rules, you slightly increase the performance but increases complexity. On the other hand, if you would use the same rule for both sides, for example - filter trades if ADX is above 40 - then you would filter more profitable short trades, than losing long trades. As a result, your algorithm actually becomes worse. If you filter all the trades with ADX above 60, you will basically stay with the same result, but once again increase complexity.
To conclude - how to use this ADX chart? Don't use it. Go to the next one. Finding out that dimension doesn't work in this particular way is the most common thing you face. So let's check out some additional dimensions that I use and analyze in my daily job. You can see more details on statistics, bins, and calculations here, in python code.
5-period smoothed Williams %R (asymmetric filter)
Distance of Close from SMA of the last 10 bars (asymmetric filter)
10-bars Range ( Highest - Lowest of the 10 bars) in ATR (symmetric filter)
Distance from the Highest High of the last 10 bars to Close in ATR (asymmetric filter)
Distance from the Lowest Low of the last 10 bars to Close in ATR (asymmetric filter)
So, how was that? Got any ideas on how to change the algorithm? Here are mine:
Smoothed Williams %R certainly looks interesting - above -50 for long trades and below -50 for short trades. Easy and potentially useful rule.
Distance between Close and SMA in ATRs - above 0 for long and below 0 for short.
Distance between Close and High for long trades and between Close and Low for short trades in ATRs - in both cases the distance is below 0.6ATR (interesting note - here, we make symmetric rule from 1 asymmetric filters).
The last one is the most questionable. Before we add the rules there are two important things. First, when you modify entries, you need to add only one filter at a time. Often filters coincide and remove the same bad trades. As a result, you can introduce one additional rule instead of two and keep complexity low. Another reason for such an approach is that when you add new rules, you change the algorithm profile, and you need to repeat this data analysis procedure after every new step to discover potentially new patterns in the updated profile.
The second important thing to note is the exits. When you increase the complexity of your entries, you should keep the exits as they were. Otherwise, you don't filter entries, you change the profile of the trades - you make it harder to enter and make to exit. So be sure to separate filter from triggers, especially if reversed triggers are used to close position (for example, a short trigger may close long trade, but not open a new short trade because of filters).
All filter (just because I'm curious) - iterative steps with data analysis will be done anyway (code here)
Equity curve of VSH3_corrected on January 2010 - January 2018 on GC
Equity curve VSH3Will on January 2010 - January 2018 on GC
Equity curve VSH3SMA on January 2010 - January 2018 on GC
Equity curve VSH3HighLow on January 2010 - January 2018 on GC
Equity curve VSH3All3 on January 2010 - January 2018 on GC
Well, as you see we slowly move forward and getting a better and better equity curve. Now, I will repeat the same process several times starting with Williams %R filter. In the next article, we will start with already filtered steps, a new code, and a new equity curve. See you soon!