Simulating an equity strategy is difficult. Much more so than simulating a futures strategy. There’s a lot more moving parts to care about. Much more complexity. All too often, I see articles and books that just skipped the difficult parts. Either they didn’t understand it, or they hoped it wouldn’t matter. It does.
When I set out to write Stocks on the Move, I wanted to make sure that anything I write about could be replicated by readers. There’s nothing worse than books that show stellar simulation results, only to be vague about how it’s done. Even worse, the books that asks you to buy some trading system for thousands of dollars to find out how the results in the book were made. I wanted to show all the details, just like I did for Following the Trend.
But showing all the details isn’t enough. There’s still the issue of cost. In the hedge fund space, we’re lucky enough to play with quite decent budgets. We can spend quite a bit of money on market data, applications and developers to make sure we get what we need. That’s not a luxury that most readers will have. So I have to find a way to get the strategy for the book done on the cheap.
I spent some time researching how to get the desired quality at a reasonable price. It was an interesting task, and hopefully my findings can help you improve your own simulation quality.
Stocks on the Move is about systematic equity momentum. The strategy picks the highest momentum stocks, constructs a portfolio and uses periodical rebalancing to ensure that we have the right stocks at the right weights. Simple concept, tricky implementation.
I decided from the start to use the S&P 500 stocks as investment universe. It’s simply easiest for a book targeted at hobby traders. US large caps is something everyone can trade. Trading costs are low. It’s a single currency index. It seemed like a reasonable choice to limit the book to this one index.
And finding data for US large caps is easy. Right?
Finding data isn’t easy. Forget about all the free data sources you were just thinking of. None are good enough for simulations. Let’s take look at our minimum requirements.
- Daily or higher frequency prices.
- Coverage of all current S&P 500 constituents.
- Coverage of all previous S&P 500 constituents.
- Coverage of graveyard, i.e. all delisted former members of the S&P 500.
- S&P 500 historical joiners and leavers table.
- Mapping of changed symbols, e.g. mergers, spin-offs etc.
- Corporate actions adjustment factors.
- Cash dividends adjustment factors.
That got a little more complicated than you had expected, didn’t it? Trust me, you need this.
A key point is that you have to make sure your simulation only looks at stocks that were part of the index on a given day. If you were to use the current index members and run a ten year back test on that, it would of course look great. It would also be utterly useless.
The stocks in the index right now are there for one reason only. They had great historical performance, and therefore they became valuable enough to be included in the index. To think that you would have traded the same stocks ten years ago is ridiculous. It would be reasonable however to include only members of an index, as it looked at that date.
For that we need to know when stocks joined and left the index. Many of those stocks are now delisted or merged or otherwise not present in the same shape as before. This adds complication.
Another vital point is about cash dividends. While most data providers already adjust prices for corporate actions, such as splits, they normally don’t adjust for cash dividends, nor do they provide factor data for it. Cash dividends has a massive impact over time, and ignoring it is not an option.
So, how do we find all this data at reasonable price?
I investigated a few different solutions, and in the end I went with a company called QuantQuote, which I had never heard of before. I know your next question. Do I get paid for marketing QuantQuote? Is this article really just a thinly veiled plot to siphon off your hard earned cash?
Sadly no. I asked them if they were interested in a similar deal that I have with CSI, where my readers get a discount and I get a kickback. I mean, a referral fee… They didn’t seem too interested. Well, I don’t base my recommendations on who pays me, I base them on what I like. And QuantQuote got the job done. I believe I paid around $1,000 for an initial historical data dump at minute resolution, and then $100 per month for updates. Given the quality and coverage, I believe that to be quite reasonable.
Structuring the Data
Delivery format vary between data providers. QuantQuote delivers flat files of three types via FTP. There are raw data files, factor files and mapping files. It’s up to you to organize this data in a way that you can use it for your purposes.
The data file above shows the default layout, at minute resolution. One of these files is delivered on the FTP server every day for every stock. Now, this is raw data of course. We need to apply adjustment factors to get real prices.
The above table shows the adjustment factors for Apple. The first field is for dividend adjustment, the second is the split factor. If the raw price in the data file is multiplied with the split factor, you get what you would normally see in most charts. Multiply by the dividend factor as well, and you get a total return series.
The mapping table lets us link changed symbols, merged companies etc. The example above is for the JPM mapping file, where it CMB became JPM in 2001.
Are you with me so far? Good.
All your database are belong to us
Since I never had much trust for flat files, a better storage is needed. I’m not really in the mood for making a data adapter to the simulation software that traverses hundreds of zip files, each with hundreds of flat files inside. No, we need to automate this whole thing and get some structure going.
Since the files are delivered on an FTP server, the first thing you need to do is automate a download. No, no, I’m not going to show you code for that. That’s what StackOverflow.com is for. It’s easy, trust me.
Putting everything into a real database makes a lot of sense. In a database, you’ll have much greater flexibility and speed when you want to access the data. My preferred database solution is MySql. It’s free, it’s fast, it can do everything you need. I’m not going to give you arguments for choosing MySql over MS SQL or similar. It really doesn’t matter. If you like another database, fine, go with that.
Now of course you have to figure out what you might need in your database, and how to structure it. Do you really need to learn about databases and all these technical things just to make a little equity simulations? Yes, you really do. If you want to make a simulation that’s not a complete waste of time, you really do.
If you want to play in the kiddie league, get some cheap retail technical analysis software and load it up with data from Yahoo Finance. It might be fun, but don’t expect anything actually useful to come out of it. This article is for those who are ready to put in some work to get things right. If you want to be a professional, be a professional. Put in the work needed to learn and get it right, or you’ll never be able to play in the big boy league.
Back to the structure. You’ll likely need at least four tables for this exercise. First, let’s make a table with raw price history. I’ll simplify it slightly and make it a daily resolution table, since this article probably lost most readers already due to the complexity. Here’s an example of how your equity_history table could look like.
Got it? Good, then we’ll make one more table for the factors, and one for the mappings.
There. Now we have what we need to get proper historical data. You just need to ask these three tables politely, and they’ll return a really nice properly adjusted and mapped data series to you. What, it’s not clear already? Fine, I’ll show you a sample SQL query.
ticker = 'aapl'
h.trade_date <= factor_date
), 1) AS divFactor,
ticker = 'aapl'
h.trade_date <= factor_date
), 1) AS splitFactor
LEFT JOIN (
map_from = 'aapl'
dates.trade_date <= map_date
) AS localTicker
where trade_date >= (select min(map_date) from mapping where map_from = 'aapl')
) tickers ON h.ticker = tickers.localTicker
AND h.trade_date = tickers.trade_date
localTicker IS NOT NULL
Yes, yes. I know this query could be made prettier and shorter. I wrote it this way in hope that it would be easier to read and follow. End result is the same.
And what will this little query give us?
But the price are still not adjusted, right? That’s correct. This query is designed to give you the all the info, in case you’d like access to unadjusted data or data only adjusted for splits for some reason. Don’t worry, from here it’s quite easy. If you multiply all the price data by the split factor, you get what you’d normally see in stock charts. That is, prices adjusted for splits and corporate actions. If you also multiply by the dividend factor, you’ll get a total return series. Easy, isn’t it? Oh, and of course, you’d need to divide the volume by the split factor in case you’re using the volume field for anything.
Are we there yet?
Not quite. We’ve only got the proper data and pushed it into a database. All we’ve done is to create a decent data structure. There’s a lot more to go. To see how this exciting story plays out, tune in for the next episode.
I was planning on writing a single article about how to make proper equity simulations. Somewhere along the ride though, I realized that it would be a far too long article. In fact, I’m not sure if anyone is still reading at this point.
You’re still there, right guys? Right? Guys..?
In that case, I’ll take this opportunity to shamelessly plug our really neat subscriber services. Yes, we’ve got those here. There’s a unique futures analytics package, which gives you access to pro level data and information on global futures. The price for this was recently lowered, plus the new equity package is included as well. Or you can get the equity package alone, with momentum analytics access for the US large cap markets for a bargain.
Fine, I’m done.