After tracking down numerous alternatives to Google App Engine after hitting a wall on Project Fangorn, I looked at the most promising alternatives in more depth. Here is what I found.
Using GigaSpaces XAP would mean–at a minimum–creating Python bindings to their C++ API and re-writing my data model. Actually running on their EC2 value added service costs a significant premium over what Amazon EC2 costs. While they do offer XAP free to startups, in that case then you have to operate your own servers and the list of alternatives in that space is quite large. I also couldn’t find any statements by GigaSpaces that indicated that you could have no instances running and have one spun up if a web request comes in, so it looks like at least one small instance would have to be running constantly putting the minimum cost at over $200 a month to use their EC2 service.
Using 10gen would currently mean a total rewrite as it currently supports Jython rather then Python and the effort to get Pylons working on Jython is not yet complete. Once that effort is complete 10gen becomes more promising as then I should only need to rewrite my datamodel and I might not even need to do that at some point as the 10gen folks have indicated they are willing to take on the portability issue. In their alpha stage, they are providing free hosting with quotas and their FAQ states “but will be competitive with other offerings, and with a very low minimum bill size”.
For those using Django rather then Pylons 10gen is working on putting the DJango data model on top of their API.
Amazon EC2 & SimpleDB
Running Python is no problem on EC2, nor is accessing SimpleDB since it is accessed via a REST API. So after rewriting my datamodel I would get the joy of managing my own instances and a minimum bill of over $70 a month. I could add third party instance management for an increase in my monthly minimum to over $100 a month.
UPDATE 10/23/2008: Amazon has announced a private beta for their own monitoring, load balancing, and automatic scaling service offering. No mention of price and it seems likely to be simply included with EC2 service.
to see who gets there first. Will it be:
- Google by raising their mcycle soft cap, excluding the mcycle consumption of puts from the calculation, or reducing the mcycle consumption of puts?
- GigaSpaces by lowering their EC2 premium, allowing an app to go offline until a web request comes in, and providing Python bindings?
- 10gen indirectly with the help of those working on getting Pylons working on Jython (or directly by providing the DJango data model for those using Django)?
- Someone else?
If they all take too long?
Then there are numerous options and a few of them follow.
The one that might involve the least changes to my existing code could be continuing and expanding the work of this project to make the GAE SDK portable. While doing this work would also help other GAE refugees to just change providers without changing code I don’t feel a great need to tie myself to the GAE API/object model.
An interesting project is Terracotta which uses Aspect Orient Programming to distribute existing multi-threaded Java apps. It could be possible to reuse everything in the project except for the part doing the aspecting, which would need to be rewritten to Python. Such an effort could potentially be combined with the work of continuing and expanding the work of making the GAE SDK portable.
Another interesting project is OpenSSI, which basically does the reverse of what is of virtualization. Rather then allowing multiple virtual machines to run on one physical machine, OpenSSI allows multiple physical machines (and virtual machines to I’d imagine) to be treated as one machine from the perspective of a virtual process (and thus from the developer’s perspective). You still have to deal sychronization just like writing a multi-threaded process as well as inter-process communication between different virtual processes, but you don’t have to deal with writing distribution code. This may not be a great thing to use for Python because of its global interpreter lock. When I started getting into Python a couple of years ago I made the decision to multiprocess rather then multithread because of the GIL. This has a couple of advantages anyway. One being robustness and the other being just a step away from being a networked distributed app. If something could be developed to allow the use of Terracotta with Python, then its likely the same could be done to make take advantage of OpenSSI as well.
The PyPy project could be useful in combining Terracotta or OpenSSI with Python since its goal is to let you create a custom interpretor for Python.
Another interesting project is Hadoop which is a distributed filesystem and its derivative HBase which is a distributed database.
Durus could be a starting point for distributed transactional memory in Python. I say starting point as it’s home page states “Durus is best suited to collections of less than a million instances with relatively stable state” so clearly it doesn’t scale as is.
This Wikipedia entry also lists numerous other software transactional memory implementations although most are not distributed but OpenSSI could be utilized to provide that.