At Google IO, a large part of the Tools track was dedicated to
AppEngine. Brett Slatkin gave a talk titled Building scalable Web
Applications with Google AppEngine which focused on optimizing the
server part of web apps. As other presenters demonstrated it, like
Steve Souders in his talk Even Faster Websites, optimizing the
browser part of webapps is not to be neglected either.
First of all, I must confess I am used to repeat that "early
optimisation is the root of all evil" and "delay commitment until the
last responsible time". But reading about AppEngine and listening to
the Google IO talks, it appears that the tools we have today ask for
human intervention to reach web-scale performance, even when "we"
stands for "Google".
In order for web-scale applications to handle the kind of load they
are facing, they must be designed and implemented carefully. As
carefully as any application was designed before the exponential
growth of PC computation power let us move away from low-level
implementation details and made some inefficiencies acceptable as long
as the time spent developing was short enough.
It all depends on the parameters of your cost function, but for
web-scale applications, it seems like we have not enough computer-time
and can not trade it for human-time.
To get a better idea of the work constraints, one should know that a
disk seek is about 10ms, which means there will be a maximum of 100
accesses per second. On the other hand, if we need consistent data as
opposed to transactional data (the latter implying that data is
fetched each time it is asked for), data can be read from disk once
then cached. Following reads are done from memory at a rate of about
4GB/sec, which means 4000 accesses per second if entities are around
1MB in size. Result of this back of the envelope approximation is 40
reads equals one write.
It follows that, although the actual time depends on the size and shape
of data, writes are very expensive compared to reads and both are
better done in batches to optimise disk access.
The AppEngine Datastore was designed with this constraints in
mind. Entities are sets of property name/value pairs. Each entity may
have a parent. An entity without a parent is the root of a hierarchy
called an entity group.
Entities of the same group are stored on disk close to each other, but
two distinct entity groups may be stored on different computers. Read
access to entities of the same group is thus faster than read access
to entities of different groups.
Write access is serialized per entity group. As opposed to a
traditionnal RDBMS that provides row locking, the datastore only
provides entity group locking. Writes to the a single entity group
will always happen in sequence, even though changes concern different
entities.
There is no limit to the number of entity groups or to the number of
entities per group, but because of the locking strategy, large entity
groups will cause high contention and a lot of failed
transactions. Since writes are expensive, not thinking about write
throughput is a very bad idea when designing an AppEngine application
if one want it to scale.
On the other hand, the parallel nature of the datastore make it scale
wide and there is no limit to the number of entity groups that can be
written to in parallel, nor to the number of reads that can be done in
parallel.
To understand this design in details, you will have to read about GFS,
BigTable and other technologies developed by Google to implement
large-scale clustering.
Counters are a good example to address when discussing write
throughput, because the datastore locking strategy makes writing to
global data very expensive.
Let us assume that we want to display on the main page of a wiki
application the total number of comments posted.
A global counter would serialize all its updates. If 100 users were to
add comments at the same time, some of them would have to wait several
seconds for their action to complete: one write for the comment, one
write for the counter, at most 100 writes per second for the counter
and a lot of time lost due to failed transaction that need to be
restarted.
The solution to make the counter scale is to partition it among all
entity groups then sum these partial counters when the global value is
needed.
Since chances are low that a given user will write more than one
comment at a time, comment entities for a user can be grouped together
and a partial counter can be added to the same entity group. Creating
a new comment and increasing the partial counter will be done in the
same batch.
When a new request for the main page is received, the counter total is
looked up in the cache. If it is not found, all partial counters are
fetched and summed up, then the cache is refreshed with a short
timeout, for example one minute.
During the next minute, the counter will be "consistent", read no too
far-off, and served extremely fast from the cache.
To sum things up, when implementing applications on top of AppEngine
with web-scale usage as a goal, everything that can be done to save
time should be considered. Including the following:
- importing python modules as late as possible will minimize the
python runtime overhead
- retrieving data that is not going to be used is a waste
- repeated queries and queries returning large result sets must be
avoided
- when Get() if sufficient, do not spend time on Query()
- landing pages are traffic intensive and would better use the same
query for everyone
- entity groups have to be designed to match the load and aim at low
write contention
- caching must be used aggressively (it is no surprise that
memcache was the first improvement that followed within a month of
the AppEngine public release)
As a conclusion, the interface AppEngine is exhibiting today requires
to optimize early, but I would bet that in the years to come, new
languages and domain-specific compilers or database engines will take
part of that burden off the hands of the developers.
Did not Yahoo and Google start developping PigLatin and Sawzall to
make it easier to write parallel data-processing programs ? The same
could happen with describe a data-model in a high-level language and
get a tool to optimize it for write contention and web-scale
application.