java - Apache spark in memory caching -

- January 15, 2012

spark caches working dataset memory , performs computations @ memory speeds. there way control how long working set resides in ram?

i have huge amount of data accessed through job. takes time load job ram , when next job arrives, has load data again ram time consuming. there way cache data forever(or specified time) ram using spark?

to uncache explicitly, can use rdd.unpersist()

if want share cached rdds across multiple jobs can try following:

cache rdd using same context , re-use context other jobs. way cache once , use many times
there 'spark job servers' exist above mentioned functionality. checkout spark job server open sourced ooyala.
use external caching solution tachyon

i have been experimenting caching options in spark. can read more here : http://sujee.net/understanding-spark-caching/

Search This Blog

Add

java - Apache spark in memory caching -

Comments

Post a Comment

Popular posts from this blog

c++ - QTextObjectInterface with Qml TextEdit (QQuickTextEdit) -

xcode - Swift Playground - Files are not readable -

jboss7.x - JBoss AS 7.3 vs 7.4 and differences -