Google Android contains j.u.c

Great news: Android.jar contains package java.util.concurrent and sub-packages atomic and locks.

It also contains intriguing org.apache.http.util.concurrent containing classes Executor.java and ThreadFactory.java.

It has of course many more packages. It seems to be more than Java SE, rather than something between Java ME and Java SE. It looks like Google removed a few packages from SE and added their own. For example, they have removed, awt, Swing, corba.

I hope the speech recognition API works.

I also hope that it has better APIs than current Java open source products for accessing peripherals. These shouldn't be too hard to beat, the JCP-issued APIs being so poor at it. Android seems to have a generic way for programs to communicate with the OS and with services running on a device, including peripherals. The Java world really needed this badly (i.e., a way to integrate with peripherals) and we have been promised this by Sun et al. since 1996, but few were delivered. Soon we should know if Google has delivered it, by trying our Android apps in an emulator. But the final proof will be when the devices are out and the carriers/telcos are supporting them, probably in 12 months or so, if we are lucky.

Will we have to move to a different country in order to be able to use some of cool apps on Android, including our own?


Current Approaches As Of August 2007

  1. The conventional multithreading where the programmer writes all the controls for the threads and how they share and don't share data, in the hope of optimizing resources (e.g., threads, memory, communication).

  2. New multithreading techniques, some usable, some in dev, where the programmer does not have to write all the controls for the threads and how they share and don't share data, and yet resources are still optimized.

    1. The most promising technique is TM - not Transcendental Meditation - but Transactional Memory, where the CPU (instruction set), OS, and VM are designed to make all accesses to the RAM work somewhat like transactional access to a database.
      Sun will be including TM in it's Rock processor due near the end of 2008.

    2. X10 - an extension of Java - in development - IBM is a main contributor.

  3. Grid-for-Concurrency frameworks that allow at least some sharing of data between threads:

    1. Fork-Join Frameworks:

      1. The most popular is Google's MapReduce.

      2. TODO - more fork-join examples

    2. The Appistry Grid where the programmer writes single-threaded logic and the underlying system takes care of managing the available resources (threads and memory).

    3. TODO other grid (for concurrency) examples.

  4. Java EE - A great multithreading platform, particularly the EJB containers. The extra seconds in http request processing are worth it when considering the difficulties in developing robust, mission-critical multithreaded apps and the reduction in costs that Java EE makes possible for such apps.

    UPDATE 2007.10.13: memory leaks occurring when using a ThreadLocal in certain situations maybe reducing Java EE's advantage for safe multithreading.

Labels: , , , , , , , , , , , , , , ,


Appistry is one way of doing it

Appistry 3.5
allows Java and .NET applications to scale out across a number of servers while running as though it were one application on one server. The Fabric makes the application think it's running on one computer, and the developer writes applications as though it will run on a single computer. But in execution, the application could be distributed over 100 or more computers.

So we seem to have at least 4 approaches:
  1. The conventional multithreading where the programmer writes all the controls for the threads and how they share and don't share data, in the hope of optimizing resources (e.g., threads, memory, communication).
  2. The future multithreading yet to be usable, where the programmer does not have to write all the controls for the threads and how they share and don't share data, yet resources are still optimized.
  3. The Google Grid technique called MapReduce.
  4. The Appistry Grid technique where the programmer writes single-threaded logic and the underlying system takes care of managing the available resources (threads and memory).
The Appistry approach seems to pose difficulties for applications that require multiple threads to share data. It allows applications to launch multiple threads but probably does not offer more help for controling their interactions than what the language already offers. This remains to be verified. If it is the case, then for scaling, the programmer has to make difficult non-obvious choices between the single-thread auto-scaling by Fabric versus the conventional multithreading approach. I would think that if you have Fabric then you minimize your use of multithreading.

Gosling Agrees: The next big thing is multithreading

Dr James Gosling on c|net news.com, March 19, 2007: The next big innovation is multithreading.

What do you think will be the next big tech innovation that will affect enterprise IT?
Gosling: There's a lot of stuff going on around multithreading--for example, the way that Moore's Law is shifting from clock rate to number of cores, which means people have to be increasingly conscious of what it means to build multithreaded applications.


Erlang is top contender for distributed app today

The book on Erlang, by the creator of Erlang, Joe Armstrong.

Joe designed and implemented the first version of Erlang in 1986 and he currently works for Ericsson AB where Erlang is used to build highly-fault tolerant switching systems.

The above link has links to 2 large size pdf extracts from the book. One is http://media.pragprog.com/titles/jaerlang/Concurrent.pdf

The web site on the open source version: http://www.erlang.org/

Here's a 2007.03.03 blog by Tim O'Reilly on Erlang and other concurrency issues.

Tim mentions that some have found Erlang to be too "old".

TODO learn Erlang

Erlang is called "a pure message passing language". Erlang processes do not share memory. It is excellent for distributed apps but can it be considered a standard multithreaded language? I do not yet know if Erlang has threads where memory is shared.

I recommend that we distinguish standard multithreaded languages where threads share memory from those where threads do not share memory.


Dave Patterson, et al., U.C., Berkely

A web page from Dec. 18, 2006: The Landscape of Parallel Computing Research: A View from Berkeley - Introduces this pdf.

My summary of this paper (they call it white):
• Goal = make it easy to write programs (that execute efficiently on highly parallel systems).
• 1000s of cores per chip.
• For benchmarking, use Dwarfs (a dwarf is a basic class of patterns of computation and communication, a primary type of computation); this team has defined 13 dwarfs currently. For example, MapReduce is a dwarf, Spectral Methods (a type of DSP that includes FFT) is another.
• Use Autotuners to select the most effective implementation of a library, of a dwarf or other pattern, at development or deploy time.
• Programming models to be more human-centric (to help developers with concurrency issues).
• Programming models to be independent of the number of processors where an application is deployed.
• Systems to include performance and energy counters adapted to multicore and for use by developers.

The blog - very useful, I recommend it.

The wiki. I don't know how complete this section is but it is useful start: Parallel Programming Model Watch

The glossary is also useful and necessary to read this wiki.

A discussion on Slashdot. Entries show that many if not most developers have a weak grasp of the issues. There's a lot of work to be done for multicores to be effectively used.


More contenders for the solution

This post mentions MapReduce, used by Google. The post also mentions E, Erlang, Haskell, Hadoop, OCaml, Smalltalk (a precursor to Java).

About MapReduce, from the link above: "Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system."

It looks to me that this is nice for Google but maybe not a generic solution to the concurrency problem. MapReduce appears to be a way to work around the concurrency problem by making each thread run on its own low cost computer (i.e., Linux on Intel) - the old grid solution - not a bad one.

Here's an interesting intro to MapReduce.

TODO evaluate.

OCaml - apparently far from the solution yet: "
OCaml bytecode and native code programs can be written in a multithreaded style. However, because the garbage collector is not designed for concurrency, multiple OCaml threads in the same process cannot run concurrently".

Also see O'Haskell. A very recent and still academic language.

Don Stewart wrote:
"O'Haskell is an experimental object oriented extension to Haskell. It is not a concurrency extension.

Haskell is by default concurrent and parallell. Normal everyday Haskell, as compiled by GHC supports both multithreaded concurrency and multicore paralellism.

For an overview of the playground for concurrency and parallelism that is modern Haskell, see this summary.

In particular, Haskell provides SMP threads (so you can use all your cores), along with software transactional memory, for non-blocking synchronisation, which is not available in any other widely used language. "
Haskell looks like a contender, thanks to Don.

E is a p2p scripting language http://erights.org/

Hadoop is a beta Java framework from Apache
supporting distributed applications running on large clusters. In this Jan. 23, 2007, blog "Threads considered harmful" Nat Torkington called Hadoop "the open source MapReduce implementation".

Here's a forum thread on the above blog post "Threads Considered Harmful" by
Nat Torkington on O'Reilly's Radar blog.