Memory Management in R: A Few Tips and Tricks
2015-03-25 11:28
337 查看
This post discusses a few strategies that I have used to to manage memory in R.
Stack Overflow Tips
Stack Overflow has a thread on Memory Management Tricks. I tend to follow these suggestions:
.ls.objects(): There's a nice function (.ls.objects()) that lists the memory usage of the objects in the workspace using the most memory. It's good for flagging memory hogging objects that can be deleted.
Use scripts: Hadley Wickham suggests recording all R actions as a script and rerunning the script to restore all objects and thus remove temporary objects created in the process of programming the script.
Import and Save: Josh Reich mentions the strategy of importing data and then saving these imported objects to disk (see post for details).
Additional Tricks that I use
[b]Develop code on subset of data:
I've recently been processing logs of key presses from an experiment on skill acquisition. There are a million records. In order to speed up the process of testing and developing my code, I extract a subset of the data for the purposes of writing the
code. A lot of people use this approach within the model testing area where models on the full dataset would take hours to run. Thus, the strategy is to build the model on a subset and then run it on the full dataset.[/b]
A tweaked version .ls.objects:
I slightly tweaked the .ls.objects() function. I find it useful to see the size of objects in terms of megabytes. Thus, when I run into the issue of using too much memory, I'll run this function and see if any of the objects using a lot of memory should
be removed from the workspace (optionally saving to disk first).
Stack Overflow Tips
Stack Overflow has a thread on Memory Management Tricks. I tend to follow these suggestions:
.ls.objects(): There's a nice function (.ls.objects()) that lists the memory usage of the objects in the workspace using the most memory. It's good for flagging memory hogging objects that can be deleted.
Use scripts: Hadley Wickham suggests recording all R actions as a script and rerunning the script to restore all objects and thus remove temporary objects created in the process of programming the script.
Import and Save: Josh Reich mentions the strategy of importing data and then saving these imported objects to disk (see post for details).
Additional Tricks that I use
[b]Develop code on subset of data:
I've recently been processing logs of key presses from an experiment on skill acquisition. There are a million records. In order to speed up the process of testing and developing my code, I extract a subset of the data for the purposes of writing the
code. A lot of people use this approach within the model testing area where models on the full dataset would take hours to run. Thus, the strategy is to build the model on a subset and then run it on the full dataset.[/b]
A tweaked version .ls.objects:
I slightly tweaked the .ls.objects() function. I find it useful to see the size of objects in terms of megabytes. Thus, when I run into the issue of using too much memory, I'll run this function and see if any of the objects using a lot of memory should
be removed from the workspace (optionally saving to disk first).
.ls.objects <- function (pos = 1, pattern, order.by = "Size", decreasing=TRUE, head = TRUE, n = 10) { # based on postings by Petr Pikal and David Hinds to the r-help list in 2004 # modified by: Dirk Eddelbuettel (http://stackoverflow.com/questions/1358003/tricks-to-manage-the-available-memory-in-an-r-session) # I then gave it a few tweaks (show size as megabytes and use defaults that I like) # a data frame of the objects and their associated storage needs. napply <- function(names, fn) sapply(names, function(x) fn(get(x, pos = pos))) names <- ls(pos = pos, pattern = pattern) obj.class <- napply(names, function(x) as.character(class(x))[1]) obj.mode <- napply(names, mode) obj.type <- ifelse(is.na(obj.class), obj.mode, obj.class) obj.size <- napply(names, object.size) / 10^6 # megabytes obj.dim <- t(napply(names, function(x) as.numeric(dim(x))[1:2])) vec <- is.na(obj.dim)[, 1] & (obj.type != "function") obj.dim[vec, 1] <- napply(names, length)[vec] out <- data.frame(obj.type, obj.size, obj.dim) names(out) <- c("Type", "Size", "Rows", "Columns") out <- out[order(out[[order.by]], decreasing=decreasing), ] if (head) out <- head(out, n) out }
相关文章推荐
- wince系统内存管理 System Memory Management in Windows CE
- 133 You want to enable automatic PGA memory management in your database. Which setting is required t
- Advances in Memory Management for Windows
- 想翻译Memory Management Algorithms and Implementation in c/c++
- Memory management in JavaScript
- PatentTips - Mechanisms for strong atomicity in a transactional memory system
- OpenCV memory leaking management in C/C++ (OpenCV 内存泄露)
- Virtual Memory Management in VAX/VMS Operating System
- Memory Management in iOS
- An overview of memory management in QEMU
- System Memory Management in Windows CE .NET
- Memory management options in Win32
- Garbage Collection Part 2: Automatic Memory Management in the Microsoft .NET Framework垃圾回收:在微软NET框架自动内存管理 (二)
- Memory management in C programs
- 10 tips for time management in a multitasking world
- Memory management in block
- memory management in the java HotSpot Virtual Machine(中文翻译)
- Garbage Collection : Automatic Memory Management in the Microsoft .NET Framework 垃圾回收:在微软NET框架自动内存管理(一)
- Memory management in C programs