Midas51: third time is the charm

It has been nearly three years since my last update on Midas51, but not for lack of activity.  The gulf between updates has been due to project work dominating most of the time that has remained after my day jobs have gotten their due.

The 32% compound annual growth rate (CAGR) and maximum drawdown of -34% that is being yielded in simulation by an ensemble learning model discovered in Midas51’s third iteration warrants a progress update.

I spent much of last summer reading up on data science, machine learning, statistical learning, and predictive modeling.   Interestingly those topics turned out to be highly overlapping in the details although they differed in terminology.  In turn I will refer to them collectively as data science.

My motivation for getting up to speed on data science came from my experience with the second iteration of the Midas51.  Here I was still utilizing well known financial market technical analysis and price chart pattern techniques and some custom techniques that were inspired by the well known techniques.  A lot of experiments were done but none had yielded any results that were acceptable to me as the CAGR was to low, the maximum drawdown to deep, and/or the result was most likely overfit and unlikely to be reproducible.

To ensure I wasn’t missing anything, in the spring of last year I read numerous books on financial market technical analysis and price chart patterns.  I hadn’t done anything with Elliot Waves (EW) previously, so that was new.  After paying to be able to watch pros do EW analysis, it was clear there is a subjective aspect to doing EW analysis that precludes using simple algorithms to automate such analysis.  EW practitioners also have fairly inconsistent track records.  This makes it unclear if EW analysis can’t work consistently or if the human analysts aren’t applying it consistently enough.

This lead to asking the questions:

  • Could the learning algorithms of data science could be applied to do EW analysis in a sufficiently consistent way?
  • Or perhaps learning algorithms could do price chart pattern identification and then pattern recognition?

And thus began my journey into data science land.  After getting a solid introduction to data science and some inspiration from Mandelbrot’s The Misbehavior of Market, I had some newly formed hypothesis that I believe to be worthy of making a substantial investment in exploring.

So, with this combination of new knowledge and inspiration I left my position at Tableau Software last September and fully immersed myself into the third iteration of Midas51.

And now back to work!

Posted in About Me, Coding | Tagged | Leave a comment

The Future of Software Development

I have my theory of what the future of software development should be.  Some years in the future I may write that down, but I’d rather write the code to make it happen.  In the meantime here are the most interesting pages on the subject I found in my wanderings today:

The Future of Software Development at aranya.com made use of a construction analogy and the feedback is sent them on this was:

There are a number of flaws in the using building construction as an analogy for business software development.  In that analogy the computers and operating systems business software runs on is the equivalent of the land under a building.  While not all land will provide stable support for thousands of years, much of it will.  On the other hand the computers and operating systems business software runs on is unlikely to be around in a decade or two and the operating system will be patched a number of times.  When a building is renovated or remodeled this usually doesn’t involve making additions or changes to the buildings foundation or superstructure, where as changes to business software frequently involve making additions to its architecture and changes are not uncommon.

A better analogy may be constructing a ship and then expanding its capacity and upgrading its equipment while it is operating at sea without diverting around bad weather or mine fields.

Posted in Coding | Tagged | Leave a comment

++productivity == Python

JARGON ALERT:  this post contains jargon, jargon, and more jargon; you may want to skip directly to the last three paragraphs.  Otherwise enjoy some tasty jargon!  Mmmm.

I was surprised that almost none of the Python code I’ve written makes use of the Inversion of Control techniques of Dependency Injection or Service Locator.  I’ve used the Service Locator design pattern extensively in the C# and C++ code I’ve written over the last decade, and I’ve made extensive use of C++ templates to achieve at compile time what the Dependency Injection design pattern achieves at runtime.

Given the extensive period and common frequency that I have used those two IoC design patterns I was taken aback that not only was not using those patterns in my Python code, but I hadn’t even noticed!

After a brief knee-jerk impulse to add an item to my TODO list to fix this oversight, I took some time for considered thought and  came to the conclusion that this was generally fine.  C++ and C# are both statically typed languages that purposefully make it difficult to access attributes that have been explicitly made accessible, where Python is dynamically typed everything is accessible whether the developer wants it to be or not.

For example in Python ‘private’ members are still accessible through a portable naming convention, so ‘private’ is really a mechanism for communicating “no guarantees are made that using this will work in the future) rather than an access prevention mechanism.  This means anything can be overridden, including imports of other modules, as imported modules are just attributes of the importing module and can be modified the same as an object’s attributes.  Additionally, the classes an object inherits from can be modified at runtime (i.e. Python supports mixins), so the Curiously Recurring Template Pattern is not needed.  Finally, the signature of functions and methods can changed without needing to change all of the locations calling them.  While both C++ and C# (4.0+) do allow adding additional default parameters, neither allows removing parameters.  In Python removing parameters is possible in many cases.

There are some exceptions to this, such as when ‘__slots__’ is used to reduce the memory footprint the objects of a particular class, and there are certainly cases where the use of the design patterns are part of the primary usage scenario for a type, but there are the exceptions rather than the general case.

The key difference is that there are far fewer ‘things’ that will result in needing to refactor simple Python code then there are for simple C++ or C# code.  As a result, there is less need to utilize more complicated designs in Python to achieve testability or extensibility (aka future proofing).

Less time spent on designing for testability and extensibility means more time for producing functionality.

Thus:  ++productivity == Python

Posted in Coding | Tagged , , | Leave a comment

Vagrant rocks and Midas51 to be hosted on Linux

While I was surprised to discover that Windows Azure provided Linux VMs at a significantly cheaper rate then Windows VMs, I decided to roll with it since Python has excellent cross-platform compatibility.  After getting CentOS 6, Debian 7 and Ubuntu 12 VMs running on VirtualBox, a test run of the Midas51 code showed it to be running fine in both Python and Jython (I also test with IronPython on Windows).  I’m going to drop CentOS for the time being as I would like to be able to take a dependence on Python 2.7, CentOS 6 has Python 2.6.6, and installing Python 2.7 there isn’t just a simple matter of running a command to install a package.

To configure and manage the Linux VMs I used Vagrant with boxes provided by PuppetLabs. Using Vagrant and prebuilt boxes provided an awesome experience in configuring, managing, and using Linux VMs from the command line while keeping the configuration under source control.  Put another way Vagrant is a developer tool for *nix VM management.  One of Vagrant’s key features is making it trivially easy to configure file sharing between the host and guest filesystems.  When using a non-complied language like Python this means you can make a change in your source code on your host, start your tests on your host, switch to your SSH session into your VM, and start your tests there without having to run commands to sync files.

I am also considering utilizing Fabric to create a tool to run all of my tests on my dev box and my VMs.  I tried the test running frameworks Nose and Py.Test yesterday.  Py.Test did not work with IronPython.  Nose imposed a requirement on directory structure that meant it either wouldn’t work with my segregated sub-project structure or that it wouldn’t fulfill the purpose of using a test runner (in my mind anyway) as it would need to be executed in each and every sub-project.

Posted in Coding | Tagged , , , , , , , , | Leave a comment

Midas51, hello and meet Grail42

Work on a proof of concept for software to aid with investment decisions began two months ago.  The concept has proven sufficiently promising to warrant continuing investment as Project Midas51.

The code hacked for the proof of concept is throw-away code that provides a workflow only useable by software developers as it is driven by command line apps, produces flat files, and requires tweaks in the code as part of the workflow.  While this was useful for what it was, Midas51 has pristine new code repository in which to commit crafted code for the core software necessary to creating an accessible prototype that can move from prototype to alpha to beta to launch without imploding due to poor craftsmanship.

Grail42 is being restructured to facilitate significantly expanding the amount of Python code it contains.  When the structure of Grail42 was initially laid out it was a purely C++ project and with a goal of having minimal dependencies.  Ultimately the productivity provided by including templates for C++ projects and header files warranted taking a dependency on Python; however, the Python code was shoe-horned into the existing C++ centric file structure.  It is now time to pay that debt and enable Grail42 to expand with Python code as well as C++ code, so that Grail42 can better serve Midas51.

Posted in Coding | Tagged , | Leave a comment

New Project: Investment Decision Aid

My next side project will be an Azure app that aids people in decision making via interactive algorithms that analyze the data underlying the interactive controls, the data stored about the specific user, and the general data stored in the system that is relevant to the particular decision.

Examples of the types of decisions the app will aid people in making are:

  1. financial investment portfolio asset class allocations
  2. choosing between different employment offers
  3. choosing between different residences (including buy vs. rent)
  4. choosing between different vehicles
  5. choosing between different colleges

I plan to start developing the app with a focus on aiding in investment decisions.  In 2008 I created software for backtesting parametrized technical indicators commonly used in trading.  That software did not result in an strategy algorithm that produced annualized returns that were sufficiently high and consistent, but I did learn a things from the experience that will help with the architecture, design, and implementation of this app.

  1. The backtesting software and the investment strategy being tested need to be high performance and parallelized, as the results never come out fast enough. This means the backtesting software needs to at least be multi-threaded and optimized native code.  It would be even better if it were distributed across multiple machines as well.
  2. While very tight coupling between the backtesting code and the strategy code provides significant opportunities for optimization, it does not allow trying out different investment strategies without also changing the backtesting software.  This means the backtesting software developed initially needs a well defined interface.  Creating more specialized backtesting software should be deferred to the future even if that means higher operating costs now.
  3. Software is needed for analyzing and comparing the results of the backtesting runs on investment strategies, but runtime performance isn’t a gating factor for this software.  This mean this code can be written in Python or IronPython.
  4. Software is needed to manage the results data and tie it to the specific version of the code the produced.  The results data is too large to commit to source code revision control systems.  This means a reliable and durable mechanism is needed to map strategy code and backtesting code to specific results and vice versa.

This app meets the goals I set for my next side project as it will be an Azure app and I’ll be able to make use of my recently created multi-threaded C++ unit testing framework in the development of the backtesting software.  I may even find it necessary to enhance the framework to support multi-processed applications if single-process multi-threaded performance proves underwhelming.

Now I just need to come up with a name for this project and get started!

Posted in Coding | Tagged , , , , , , , | 6 Comments

Next project and new job

Two weeks ago I rejoined Microsoft as Principal SDE in the Windows Azure Active Directory (WAAD) group.  Microsoft’s moonlighting policy currently supports employees in creating commercial Windows Azure apps, so my next side project will definitely involve Windows Azure.  I have one project in mind that also needs a rich client and will allow me to make use of the multithreaded C++ unit testing framework I recently created.  I’ll be brainstorming to some others as soon as I get more ramped up at work and can reclaim my weekends.

Posted in About Me | Leave a comment