2007-02-04

More contenders for the solution

This post mentions MapReduce, used by Google. The post also mentions E, Erlang, Haskell, Hadoop, OCaml, Smalltalk (a precursor to Java).

About MapReduce, from the link above: "Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system."

It looks to me that this is nice for Google but maybe not a generic solution to the concurrency problem. MapReduce appears to be a way to work around the concurrency problem by making each thread run on its own low cost computer (i.e., Linux on Intel) - the old grid solution - not a bad one.

Here's an interesting intro to MapReduce.

TODO evaluate.

OCaml - apparently far from the solution yet: "
OCaml bytecode and native code programs can be written in a multithreaded style. However, because the garbage collector is not designed for concurrency, multiple OCaml threads in the same process cannot run concurrently".

Also see O'Haskell. A very recent and still academic language.


Don Stewart wrote:
"O'Haskell is an experimental object oriented extension to Haskell. It is not a concurrency extension.

Haskell is by default concurrent and parallell. Normal everyday Haskell, as compiled by GHC supports both multithreaded concurrency and multicore paralellism.

For an overview of the playground for concurrency and parallelism that is modern Haskell, see this summary.

In particular, Haskell provides SMP threads (so you can use all your cores), along with software transactional memory, for non-blocking synchronisation, which is not available in any other widely used language. "
Haskell looks like a contender, thanks to Don.

E is a p2p scripting language http://erights.org/

Hadoop is a beta Java framework from Apache
supporting distributed applications running on large clusters. In this Jan. 23, 2007, blog "Threads considered harmful" Nat Torkington called Hadoop "the open source MapReduce implementation".

Here's a forum thread on the above blog post "Threads Considered Harmful" by
Nat Torkington on O'Reilly's Radar blog.

4 Comments:

Blogger Don Stewart said...

O'Haskell is an experimental object oriented extension to Haskell. It is not a concurrency extension.

Haskell is by default concurrent and parallell. Normal everyday Haskell, as compiled by GHC supports both multithreaded concurrency and multicore paralellism.

For an overview of the playground for concurrency and parallelism that is modern Haskell, see this summary.

In particular, Haskell provides SMP threads (so you can use all your cores), along with software transactional memory, for non-blocking synchronisation, which is not available in any other widely used language.

Mon Feb 05, 04:25:00 AM EST  
Blogger serge said...

Thanks Don. I made the correction.

Mon Feb 05, 05:46:00 PM EST  
Blogger Bob Lozano said...

Always liked the map/reduce formulation for my old ai / lisp days, and think the google folks have a nice formulation for many large-scale data problems.

I tend to think that there won't be a single solution to large-scale concurrency / scale-out problems, so at appistry we've take a fairly general architectural approach that supports several of the better approaches. For example, the application fabric is a great natural platform for building and deploying map / reduce - style applications.

Thx.

- Bob

Mon Mar 26, 10:37:00 AM EDT  
Blogger Jon Harrop said...

Don's statements about Haskell are a triumph of hope over reality, I'm afraid. Haskell only has a rudimentary stop-the-world parallel GC that scales so poorly that it can actually degrade performance for >4 cores (according to the implementor's own benchmarks).

All non-mainstream language implementations on Linux share Haskell's fate. The reason is simply that creating a working scalable concurrent GC is incredibly hard. The OCaml team spent years on this and never even managed to create a concurrent GC for OCaml that worked, let alone an efficient one.

The only real functional contender is now F# because it inherited scalable concurrent GC and essential functional features (like tail calls) from the CLR.

The next best bet are JVM-based languages but the JVM prohibits the creation of efficient and robust functional languages because it lacks basic features like tail calls. This can be workaround but only at the cost of a crippling performance degradation.

Tue May 06, 02:11:00 AM EDT  

Post a Comment

<< Home