java - Apache spark in memory caching -


spark caches working dataset memory , performs computations @ memory speeds. there way control how long working set resides in ram?

i have huge amount of data accessed through job. takes time load job ram , when next job arrives, has load data again ram time consuming. there way cache data forever(or specified time) ram using spark?

to uncache explicitly, can use rdd.unpersist()

if want share cached rdds across multiple jobs can try following:

  1. cache rdd using same context , re-use context other jobs. way cache once , use many times
  2. there 'spark job servers' exist above mentioned functionality. checkout spark job server open sourced ooyala.
  3. use external caching solution tachyon

i have been experimenting caching options in spark. can read more here : http://sujee.net/understanding-spark-caching/


Comments

Popular posts from this blog

c++ - QTextObjectInterface with Qml TextEdit (QQuickTextEdit) -

javascript - angular ng-required radio button not toggling required off in firefox 33, OK in chrome -

xcode - Swift Playground - Files are not readable -