Dave Patterson, et al., U.C., Berkely

A web page from Dec. 18, 2006: The Landscape of Parallel Computing Research: A View from Berkeley - Introduces this pdf.

My summary of this paper (they call it white):
• Goal = make it easy to write programs (that execute efficiently on highly parallel systems).
• 1000s of cores per chip.
• For benchmarking, use Dwarfs (a dwarf is a basic class of patterns of computation and communication, a primary type of computation); this team has defined 13 dwarfs currently. For example, MapReduce is a dwarf, Spectral Methods (a type of DSP that includes FFT) is another.
• Use Autotuners to select the most effective implementation of a library, of a dwarf or other pattern, at development or deploy time.
• Programming models to be more human-centric (to help developers with concurrency issues).
• Programming models to be independent of the number of processors where an application is deployed.
• Systems to include performance and energy counters adapted to multicore and for use by developers.

The blog - very useful, I recommend it.

The wiki. I don't know how complete this section is but it is useful start: Parallel Programming Model Watch

The glossary is also useful and necessary to read this wiki.

A discussion on Slashdot. Entries show that many if not most developers have a weak grasp of the issues. There's a lot of work to be done for multicores to be effectively used.


More contenders for the solution

This post mentions MapReduce, used by Google. The post also mentions E, Erlang, Haskell, Hadoop, OCaml, Smalltalk (a precursor to Java).

About MapReduce, from the link above: "Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system."

It looks to me that this is nice for Google but maybe not a generic solution to the concurrency problem. MapReduce appears to be a way to work around the concurrency problem by making each thread run on its own low cost computer (i.e., Linux on Intel) - the old grid solution - not a bad one.

Here's an interesting intro to MapReduce.

TODO evaluate.

OCaml - apparently far from the solution yet: "
OCaml bytecode and native code programs can be written in a multithreaded style. However, because the garbage collector is not designed for concurrency, multiple OCaml threads in the same process cannot run concurrently".

Also see O'Haskell. A very recent and still academic language.

Don Stewart wrote:
"O'Haskell is an experimental object oriented extension to Haskell. It is not a concurrency extension.

Haskell is by default concurrent and parallell. Normal everyday Haskell, as compiled by GHC supports both multithreaded concurrency and multicore paralellism.

For an overview of the playground for concurrency and parallelism that is modern Haskell, see this summary.

In particular, Haskell provides SMP threads (so you can use all your cores), along with software transactional memory, for non-blocking synchronisation, which is not available in any other widely used language. "
Haskell looks like a contender, thanks to Don.

E is a p2p scripting language http://erights.org/

Hadoop is a beta Java framework from Apache
supporting distributed applications running on large clusters. In this Jan. 23, 2007, blog "Threads considered harmful" Nat Torkington called Hadoop "the open source MapReduce implementation".

Here's a forum thread on the above blog post "Threads Considered Harmful" by
Nat Torkington on O'Reilly's Radar blog.


Ms Crawford sells IBM Roadrunner

Catherine Crawford, cait@alum.mit.edu, chief architect for next-generation systems software at IBM Systems Group's Quasar Design Center, writes here.

"Where's the software to take advantage of all these processors, cores and threads?"

"IDC's Earl Joseph concluded in a study on technical computing software that 'many ISV codes today scale only to 32 processors, and some of the most important ones for industry don't scale beyond four processors' (www.hpcwire.com/hpc/43036§0. html)."

The Roadrunner system from IBM is alledged to enabled easier development of multithreading software but does not provide substancial information. The reader must then google Roadrunner IBM in order to get the marketing material on this new product line. Exactly what IBM wants us to do so that we may come to believe that Roadrunner is a solution, if we read enough about it.