With PyWren, AWS Lambda Finds an Unexpected Market in Scientific Computing

Feb 16th, 2017 10:26am by Joab Jackson

A University of California-Berkeley researcher has found a new, and somewhat unexpected, use case for the Lambda serverless computing service from Amazon Web Services: scientific computing.

At the Continuum Analytics’ first user conference, AnacondaCON 2017, held last week in Austin Texas, Eric Jonas, a postdoctoral research student at University of California-Berkeley’s AMPLab, talked about how a number of UC Berkeley researchers are starting to use the AWS Lambda serverless service for their investigation, with the help of a software package Jonas has created, PyWren.

PyWren provides the ability to parse out Python-based scientific workloads across many different Lambda services, in effect creating a giant, if extremely temporary, computing cluster: PyWren can, by Jonas’ assertion, “get Lambda to scale shockingly well,” a point he illustrated with a benchmark of PyWren pulling 25TFLOPS (25 trillion floating-point operations per second) from a fleet of AWS Lambdas.

“This is something that is just not possible with MatLab frameworks,” Jonas asserted,

And it’s easy enough to be used even by a graduate student (albeit one versed in Python), who can simply include PyWren in the program’s library includes, and pull function calls through the API:

In a Pickle

AMPlab supports research across a number of different disciplines, many of which involve examining data across multiple scales. One is solar flares on the sun, the study of which will help better predict the solar storms that would hamper with our power lines and satellite operations. The nearby Solar Dynamics Observatory throws off 1.5TB of observations a day.

Solar Dynamics Observatory

From the every large to the every small, researchers are also looking at how large numbers of neurons combine to way to create behaviors, diseases and even cognition.

Most graduate students, Jonas noted, have never run a Spark or Hadoop job. Their investigations are carried out on laptops or workstations, which sadly limits the breadth of their testing.

One way to scale, obviously, is the cloud services, such as Amazon Web Services, which has bequeathed a generous grant of usage for Berkeley. To the average researcher, however, getting AWS to do something useful is a formidable task. Better to just let the laptop churn a little longer.

Enter PyWren.

PyWren is perfect for jobs such as “parameter tuning.” This could be a job that, say, takes about 5 minutes to run, and it needs to run 1,000 times. The aim of PyWren would be to run that entire workload, all 1,000 instances, in 5 minutes.

A stateless computing function-as-a-service, Lambda gives the user a single process with each instance that can run Python, JavaScript (Node) or Java code for up to 300 seconds, along with 512mb of temporary space and 1.5GB of RAM.

Massively distributed computing did not seem to be one of AWS’ original intents of Lambda, though AWS has acknowledged Jonas work.

The standard use case was one of a Lambda job being triggered by someone uploading an object of some sort to S3, Lambda doing a small function on the object, and delivering the results back to a database. But there is no reason why you couldn’t map a single Python function across 2,000 Lambda services, Jonas noted.

Behind the scenes, PyWren serializes the function with the data, using Python’s Pickle serialization function and a bit of technology borrowed from the PySpark project. PyWren places serialized data and function into S3, then evokes Lambda, along with a slimmed-down version of Anaconda, a packaged version of Python and supporting tools offered by Continuum IO. The results are delivered back to S3, then unpickled, and returned to the user.

To be sure, there are some drawbacks, the engineering tradeoffs, if you will. Network overhead and possible throttling by AWS can slow the submission of jobs, the completion of which can then stagger in. Also, a good 20 percent of most jobs are taken up by the set-up, even if most jobs come nowhere near the 300-second max.

“There is some transactional overhead,” Jonas admitted, though added, “Most of our users don’t care if their serial job takes twice as long as otherwise, because now they can run 3,000 of them at once.”

Jonas envisions over time that Lambda could take on even more complex “stateless” tasks, such as executing full MapReduce jobs, which splits out data analysis across many nodes (the “mapping” part) and then reassembling into a meaningful result (the “reduce” portion). As such, it could simplify a lot of “big data”-styled analysis, he suggested (noting a 2015 Yahoo study that the average Hadoop data set was 15GB, something that could easily fit on a laptop).

Today, PyWren can do mapping portion of MapReduce jobs, but its reducing capability, needed for parameter tuning, is still experimental.

Joans was not so sure, however, that this microservices-styled approach would work with the typical super large-scale scientific workloads carried out on today’s supercomputers. The High-Performance Computing workloads, for tasks such as computational fluid dynamics, tend to have many more threads and larger data sets, as well as rely on GPUs, which Lambda doesn’t yet support.

Although PyWren was designed to work with lambda specifically, the idea could be applied to other serverless platforms, such as the just-launched Fission from Platform9. Jonas said that writing PyWren took him only about a weekend’s worth of time.

Joab Jackson is a senior editor for The New Stack, covering cloud native computing and system operations. He has reported on IT infrastructure and development for over 25 years, including stints at IDG and Government Computer News. Before that, he...