Building Android ROMs on limited RAM: zRAM vs. zswap comparison

20 Nov, 2024

Building an Android ROM is a memory-intensive task. The RAM requirements to build AOSP (Android Open Source Project) have kept growing for the last 15 years:

For Android 4, 8 GB of RAM is enough.
From Android 6, 16 GB of RAM is recommended.
From Android 10, 32 GB of RAM is recommended, although it could work with some disk swap.
From Android 13, 64 GB of RAM (really?!) is recommended.

I’ll show you how I still build ROMs within a reasonable amount of time with 16GB of RAM. Even less actually, since I’m compiling in WSL (Windows Subsystem for Linux), which is basically a Linux virtual machine running on Windows.

Spoiler: 8 GB is tight to build Android, but 16 GB is plenty. Read below.

Why does AOSP build consume so much memory

From my experience, building a custom ROM based on AOSP indeed requires about 30-40GB of memory. It may be more for ROMs with more requirements, such as LineageOS-based ROMs.

Just lower the number of jobs!

Well, this doesn’t work. Why? The Android build process occurs in multiple stages:

Build dependency analysis: The build system (Soong) analyzes de build dependencies by parsing the .bp blueprint files. A dependency graph is generated to determine the build order, with relationships between thousands of modules. As far as I know, the number of jobs doesn’t matter: everything is loaded in memory, and that uses about 30 GB of memory or more with Android 14 (this includes RAM and swap). This step uses a huge fixed amount of memory, regardless of the number of jobs.
Compilation and Linking: these steps can be run in parallel, so the number of jobs will determine how much memory will be needed. From my experience, with 8 jobs, less than 30 GB are used.
Image generation: It is quite RAM-hungry, but I remember it consumes less than 10 GB on my build setup.

You’re telling me that if I have less than 32 GB of RAM, I’m screwed?

No, of course not. While you may theoretically need 30 GB, you can still compile AOSP smoothly with just 16 GB of RAM, or even less.

Virtually increase your system memory

If your system needs more RAM, adding more may not be an option: you are low on budget, you have maxed out the available RAM on your computer, or the RAM is soldered…

Yet, Linux provides two main techniques to virtually increase system memory: swapping memory to a physical drive, or using in-memory compression.

1. Use a SWAP partition on a physical drive

SWAP increases the available memory by extending it to the physical drives. For example, you can create a 30 GB SWAP partition on your SSD. I did that for months, and while it “works”, it’s painfully slow. Since everything must be loaded in memory and the SSD is so much slower than RAM, the slightest change needs 50 minutes on my machine for a rebuild.

→ You trade SSD space and IOPS to virtually increase the memory. The SSD becomes the bottleneck, and using a hard drive instead will be significantly slower.

2. Enable in-memory compression (zRAM / zswap)

Instead of swapping data into a physical drive, Linux can also use compression to store data directly in RAM. This is very interesting because you can usually achieve a 3:1 compression ratio, or even more depending on the data and the compression algorithm. Also, the swap performance is miles ahead of the SSD so you might not even notice the performance drop. The caveat is that the CPU must compress and decompress the memory for the application, so there is more latency and less throughput than reading for uncompressed RAM directly. You can choose the compression algorithm to arbitrate between speed and compression ratio.

Currently, the most supported ways to enable memory compression in Linux are zRAM and zswap.

→ You trade CPU cycles to virtually increase the memory. The CPU becomes the bottleneck.

Cool. Which one should I use?

Here are the main recommendations I found on the Internet:

If you already have a SWAP partition on your SSD, use zswap.
Otherwise, use zRAM.
Android ROM building guides, such as LineageOS, say that enabling zRAM can be helpful. That’s also what other Android communities recommend.

Let’s compare to see if these recommendations hold.

Benchmark

Here’s the setup I’m using for benchmarking:

Hardware specs: Intel 4910MQ (2.9 GHz 4-core/8-thread CPU from 2014), 16 GB RAM, SATA SSD.
Kernel: Microsoft Kernel 6.6.36, rebuilt with zswap and zram support.
Compression algorithm: lz4 v1.9.4, zstd v1.5.2
Android ROM: LineageOS 21 (Android 14)
Build environment: WSL2 with Ubuntu 24.04.
Tests performed: Soong build step, measured 3 times for accuracy.

Fixed parameters:

Swap on SATA SSD: 40 GB
/proc/sys/vm/swappiness=60 (OS default, controls how aggressively Linux uses swap space relative to RAM)
/proc/sys/vm/pagecluster=0 (Write 4 KB pages one by one, as a SATA SSD is fast enough with 4K reads and writes)
For zswap: zsmalloc allocator to fully benefit from lz4 and zstd compression ratio

Each result is the mean of three runs of the following script:

#!/bin/bash

NB_JOBS=8  # After verification, this has no impact on the Soong analysis stage.

# Log file to store memory usage
MEMORY_LOG_FILE="memory_use.log"

# Monitor the device where swap is written
DEVICE="sdb"

# Return the total RAM + Swap use in MB
get_total_memory_use(){
    free | awk '/^Mem:/ {ram_used=$3} /^Swap:/ {swap_used=$3; total_used=(ram_used + swap_used) / 1024; print total_used}'
}

# Function to monitor memory usage
monitor_memory() {
    while true; do
        get_total_memory_use >> "$MEMORY_LOG_FILE"
        sleep 1
    done
}

source build/envsetup.sh
breakfast redfin  # redfin is Google Pixel 5. Could be any other device.

# Add a dummy comment to the Android.bp file to trigger a new Soong analysis
echo "// dummy comment" >> Android.bp

echo "System memory use before benchmark: $(get_total_memory_use) MB (RAM + Swap)"

# Start memory monitoring in the background
monitor_memory &

# Store the PID of the monitoring process
MONITOR_PID=$!

echo $(date '+%Y-%m-%d %H:%M:%S'): Starting benchmark

# Get initial SSD swap partition write stats
initial_write=$(iostat -m | awk -v dev="$DEVICE" '$1 == dev {print $6}')

# Run the build. The "nothing" target is to do the Soong step only.
/usr/bin/time -v bash -c "source build/envsetup.sh && m -j${NB_JOBS} nothing"

# Stop the memory monitoring
kill "$MONITOR_PID"
wait "$MONITOR_PID" 2>/dev/null

current_write=$(iostat -m | awk -v dev="$DEVICE" '$1 == dev {print $6}')
write_mb=$((current_write - initial_write))

max_mem_used=$(awk 'NR==1 {max=$1} $1>max {max=$1} END {print max}' "$MEMORY_LOG_FILE")
echo "Maximum memory usage during build (RAM + Swap): ${max_mem_used} MB"
echo "Total MB Written to $DEVICE: $write_mb MB"
echo "$(date '+%Y-%m-%d %H:%M:%S'): End of benchmark\n"

SSD Swap performance

Let’s start with SSD swap only. Here is the time (hh:mm:ss) to build the dependency graph (Soong step only) for Lineageos 21 (Android 14), with, if any, the amount of memory swapped out to the SSD during the build:

Compression settings \ System RAM	7GB RAM	12GB RAM
No zRAM, no zswap	2:16:55 / 224 GB	2:00:37 / 216 GB

This is the time it took on my machine… for the graph dependency analysis only. Note how little difference between 7 GB and 12 GB of RAM! The majority of the build time is dedicated to memory swapping. Do this every day and your SSD will eventually die.

Only the memory swapped out to the SSD is measured: the other writes, such as the build output or other processes, are excluded from the results.

zRAM performance

Now, let’s try with zRAM. zRAM acts as a compressed swap partition stored directly in RAM, reducing or eliminating the need for an SSD swap partition. Still, I kept the SSD swap, but with a lower priority. So the system can swap out to the zRAM, until either of the two conditions is met:

The zRAM size limit is reached: the system continues by swapping out to the SSD.
The zRAM size limit is not reached, but the RAM is actually full: the system is on memory pressure and the build will most probably fail.

#!/bin/sh

# Enable in-memory compressed swap with zRAM
echo lz4 | tee /sys/block/zram0/comp_algorithm
echo 40G | tee /sys/block/zram0/disksize
mkswap /dev/zram0
swapon -p 5 /dev/zram0  # Higher swap priority than the physical disk

# Change page cluster from 3 to 0. Better, as long as there is no physical disk swapping
echo 0 | tee /proc/sys/vm/page-cluster

The main settings to adjust are the zRAM device size and the compression algorithm. Here are the results:

Compression settings \ System RAM	7GB RAM	12GB RAM
24 GB zRAM [`lz4`]	1:39:11 / 126 GB	18:49 / 8 GB
24 GB zRAM [`zstd`]	59:16 / 61 GB	17:21
40 GB zRAM [`lz4`]	29:35 (FAIL)	15:01
40 GB zRAM [`lz4hc`]	1:26:58 (FAIL)	34:46
40 GB zRAM [`zstd`]	34:57	19:08
40 GB zRAM [`deflate`]	Not tested	33:09

No doubt, you can achieve much faster builds with in-memory compression. Here is an explanation of the results:

24 GB of zRAM may be too low: the zRAM gets full, even if the compressed equivalent uses much less than 24 GB in RAM. So, if you set this value too low, some memory is swapped out to the SSD, making the build slower. It did work well with zstd and 12 GB of RAM though, but more zRAM with lz4 compression was more effective.
With 12 GB of RAM and more, lz4 is often the best choice due to its low CPU usage and high speed. While its compression ratio is inferior to zstd, it leaves enough free space in RAM, making it an excellent option for systems with enough memory.
With 8 GB of RAM, zstd should give the best results with decent speed and a higher compression ratio, allowing to keep everything in RAM, whereas lz4 won’t give you enough space in RAM. Either the build eventually fails, or you must use SSD swap as well. Other high compression algorithms such as deflate are too CPU intensive. I didn’t bother testing it with 7 GB or RAM.
There is no point in setting zRAM size to absurd values such as 500 GB of RAM: the limit will never be reached, and the zRAM driver will vainly consume a bit more RAM. Just set what is needed at maximum to complete a build without swapping to disk.

You can check how much RAM your zRAM device is using at any time:

> watch -n 5 sudo zramctl
NAME       ALGORITHM DISKSIZE  DATA COMPR TOTAL STREAMS MOUNTPOINT
/dev/zram0 lz4            40G  6,5G  1,7G  1,7G       8 [SWAP]

During the Soong build, you can expect a compression ratio of approximately 4:1 to 5:1 when using lz4. With zstd, CPU load is higher but the ratio is about 10:1!

Note: I didn’t enable zRAM writeback, since it is not automatic: it requires a custom service to monitor memory pressure and write data back on SSD when needed.

Zswap performance

Zswap works a bit differently than zRAM. It requires a swap partition (hard drive, SSD… or even zRAM!) and acts as a compressed cache in RAM. So, when the system starts to swap out, the data will be compressed and kept on an allocated space of RAM. The size is configurable: for instance, you can set 20% (max_pool_percent=20) of your RAM dedicated to the compressed space.

#!/bin/sh

# Enable in-memory compressed swap cache with zswap

# The zpool allocator must be set to zsmalloc to achieve the best compression ratios
echo zsmalloc | sudo tee /sys/module/zswap/parameters/zpool
echo zstd | tee /sys/module/zswap/parameters/compressor

# Should be very high for specific cases when a lot of RAM is needed in a short time, ie. AOSP building
echo 80 | tee /sys/module/zswap/parameters/max_pool_percent

# If the page is read back to RAM, keep it also compressed (like zRAM)
echo N | tee /sys/module/zswap/parameters/exclusive_loads

# Change page cluster from 3 to 0. Fast as long as there is no SSD swapping
echo 0 | tee /proc/sys/vm/page-cluster

Let’s see the results:

Compression settings \ System RAM	7GB RAM	12GB RAM
zswap, max_pool_percent=20 [`lz4`]	59:59 / 105 GB	00:25:21 / 33 GB
zswap, max_pool_percent=20 [`zstd`]	51:55 / 73 GB	21:53 / 14 GB
zswap, max_pool_percent=60 [`lz4`]	44:52 / 26 GB	23:00
zswap, max_pool_percent=60 [`zstd`]	44:11 / 3 GB	17:24
zswap, max_pool_percent=80 [`lz4`]	52:26 / 8 GB	15:50
zswap, max_pool_percent=80 [`zstd`]	44:30	15:29

The results are close to the benchmark with zRAM:

A high percentage of RAM must be allocated to zswap to achieve good performance.
With 12 GB of RAM, lz4 works great. Surprisingly, zstd seems equivalent.
With 8 GB of RAM zstd works better and is required to avoid swapping out to the SSD.

Note: zswap seems to handle memory pressure better than zRAM: with 7 GB of RAM, 80% (maximum) allocated to zswap, and lz4 compression, it most probably swaps out to the SSD long before the allocated space is full. To achieve the same result with zRAM, you must monitor it and trigger memory recompression and/or write back manually, a script or a service.

Conclusion: zRAM or zswap?

Features	zRAM	zswap
Can work without a swap partition	yes	no
High compression ratio	yes	yes (with zsmalloc as zpool allocator)
Multiple compression algorithms & recompression	yes (manual)	no
Write back to the SSD	yes (manual)	yes (automatic)
Compressed write back to the SSD	yes	no

Both technologies offer close performances.

If you can’t have a swap partition, use zRAM. A lot of known devices shouldn’t have a physical drive as a swap: Raspberry Pi and Android devices use zRAM as swap because the Flash storage is too slow and not durable enough with repetitive write operations. Synology NAS also use zRAM.

If your system already comes with a swap partition on a hard drive or an SSD, both will work, but it might be easier to use zswap. However! your kernel must support zsmalloc as the zpool allocator, otherwise, you will end with a much lower compression ratio than with zRAM.

On a recent machine with a fast NVMe SSD, I’d probably use zswap — it’s simpler to set up and the NVMe is fast enough that swapping to disk isn’t much of a penalty. For machines with limited CPU and/or storage performance, I’d give a slight advantage to zRAM, for two reasons:

When the zRAM device is full, you can write back the compressed pages to the physical drive. Currently, zswap doesn’t support that. When it’s full, it just acts as if the system was directly swapping to the physical drive: all pages are swapped out uncompressed, same-filled pages and zero-filled pages are duplicated as they were originally in RAM, thus lowering the swap performance.
zRAM allows multiple compression algorithms, which mean you can use a fast one like lz4, and recompress pages with a higher compression ratio algorithm such as zstd when needed. Zswap doesn’t support that.

These two advantages come at a cost: write-back and recompression are not automatic, and you must tinker with the settings and write a dedicated service for your system to behave correctly. This goes beyond the scope of this article.

For the compression algorithm: lz4 will offer a good compression ratio with minimal performance cost. However, if memory constraints remain an issue, you can opt for zstd to free up more RAM, though it comes at the expense of higher CPU usage. It’s the trade-off between speed and memory efficiency.

Finally, not all kernels support both zRAM and zswap, so you may just end up using whatever is available on your system. Both technologies are in active development and keep converging in features, so keep an eye on their evolution.

#Android #Linux