OurOpenResty®open source web platformis known for its high execution speed and alsosmallmemory footprint. Wehave users running complex OpenRestyapplications inside embedded system devices like robots. And people have beenobserving significant memory usage reduction when they migrate applicationsfrom other technical stacks like Java, NodeJS, and PHP. But still, sometimeswe may want to optimize the memory usage of a particular OpenResty applicationwhich incurs large memory usage and/or leaking memory. Such applications maycontain bugs or deficiencies in its Lua code, Nginx configurations, and/or3rd-party Lua libraries and Nginx modules.
To effectively debug and optimize memory usage or leaking issues, it is alwaysbeneficial to have a good understanding of how OpenResty, Nginx, and LuaJITallocate and manage memory under the hood.Our commercial product,OpenResty XRay, canalsoautomatically analyze and troubleshoot a wide range of memory usage issuesin any OpenResty applications without any modifications or compromises, evenin an online production environment. This post is the first of a series of articlesexplaining memory allocations and management in OpenResty, Nginx, and LuaJIT,with real world sample data and graphs provided by OpenResty XRay.
We start with an introduction to the system level memory usage breakdown for Nginx processes and then take a look at memory allocated and managed by various different allocators on the application level.
In modern operating systems, processes request and utilize virtual memory onthe highest level. The operating system manages the virtual memory foreach process. It maps those actually used virtual memory pages to physical memorypagesbacked by hardware (like DDR4 RAM sticks). It is important to note that a processmay request alotof virtual memory space but may merely use a small portionof it. For example, a process can always successfully claim, say, 2TB of virtualmemory from the operating system even though the system may only have 8GB ofRAM. This will go without any problems as long as that process does not writeto too many memory pages of this huge virtual memory space. It is this potentiallysmall portion of the virtual memory space that will actually get mapped to thephysical memory devices and thus is what we really care about. So never panicwhen you see large virtual memory usage (usually namedVIRT) in tools likepsandtop.
The small portion of the virtual memory which is actually used (meaning writtendata to) is usually calledRSS orresident memory. Well, when the systemis running out of physical memory and some part of the resident memory getsswappedout to the diskone, then this swapped-out portion is no longer part ofthe resident memory and becomes the “swapped memory” (or “swap” for short).
Many tools can provide the size of virtual memory, resident memory, and swappedmemory for any process (including OpenResty’snginx worker processes).OpenRestyXRaycan render a pretty pie graph like belowfor an nginx worker process.
In this graph, the full pie represents the whole virtual memory space claimedby the nginx worker process. TheResident Memorycomponent indicates the residentmemory usage,which is the memory pages actuallyused. And finally, theSwapcomponentthatis not shown here represents the swapped out portion.
As mentioned above, we usually care most about theResident Memory component.That is usuallythe absolute focus of any memory usage optimizations. It is also quite alarmingif theSwap component shows up, which means the physical memory is no longerenoughand the operating system may get overloaded by swapping memory pages in andout. We should also pay attention to theunused portion of the virtual memoryspace, however. It may be that we allocate some Nginx shared memory zones waytoo big and they may someday bite us badly when they actually get filled upwith data (in which case they would become part of theResident Memorycomponent).
We will cover resident memory in more detail in another article. Next let’stake a look at memory usage on the application level.
On The Application Level
It is much more useful to inspect memory usage further on the application level.In particular, we care about how much memory currently used is allocated bythe LuaJIT memory allocator, how much is used by the Nginx core and its modules,and how much by Nginx’s shared memory zones.
Consider the following pie graph generated byOpenResty XRaywhen analyzing an unmodified nginx worker process of an OpenResty application.
TheGlibc Allocatorcomponent in the pie graph indicates the total memorysizeallocated by the Glibc library, which is the GNU implementation of the standardC runtime library. This allocator is usually invoked on the C language level throughthe function callsmalloc(), realloc(), calloc(), and etc. This is usuallyalso known as thesystem allocator. Note that the Nginx core and its modulesalso allocate memory via this system allocator as well (an important exceptionto this is Nginx’s shared memory zones which we will discuss shortly below).Some Lua libraries with C components orFFIcalls may sometimes directly invoke the system allocator, but it is more common forthem to utilize LuaJIT’s allocator. The OpenResty or Nginx build maychoose to use a different C runtime library than Glibc, like themusl libc.We will extend our discussions on system allocators and Nginx’s allocator ina dedicated article.
Nginx Shared Memory
The pie’sNginx Shm Loaded component is for the total size of theactually usedportionof the shared memory (or “shm”) zones allocated by the Nginx core or its modules.Shared memory zones are allocated via themmap()system call directly andthus bypassing the standard C library’s allocator completely. Nginx shared memoryzones are shared among all its worker processes. Common examples are those definedby the standard Nginx directives likessl_session_cache,proxy_cache_path,limit_req_zone,limit_conn_zone,and upstream’szone.It also includes shared memory zones defined by Nginx’s 3rd-party modules likeOpenResty’s core componentngx_http_lua_module.OpenResty applications usually define their own shared memory zones throughthis module’slua_shared_dict directivein Nginx configuration files. We will cover the Nginx shm zones' memory issuesin great detail in another dedicated article.
You can check out the following articles for more detailed discussions on OpenResty and Nginx’s shared memory zones:
TheHTTP/Stream LuaJIT Allocator components in the pie represent the total memory sizeallocated andmanaged by LuaJIT’s builtin allocator, one for the LuaJIT virtual machine (VM) instance in the Nginx HTTPsubsystem and onefor the VM instance in the Nginx Stream subsystem. LuaJIT also has a build optiontwoto use thesystem allocator but it is mostly just used with special testing and debugging tools only (likewithValgrindandAddressSanitizer).Lua strings, tables, functions, cdata objects, userdata objects, upvalues, andetc, are all allocated by this allocator. On the other hand, primitive Lua valueslike integersthree, numbers, light userdata values, and booleans do not require anydynamic allocations, however. C-level memory blocks allocated by LuaJIT’sffi.new() callson the Lua land are also allocated by LuaJIT’s own allocator. All the memoryblocks allocated by this allocator are managed by LuaJIT’s garbage collector(GC), and therefore it is not the user’s responsibility to free up those memoryblockswhen they are no longer neededfour. Naturally, these memory objects are also called“GC objects”. We will elaborate on this in another article.
Program Code Sections
The pie’sText Segmentscomponent corresponds to the total size of the.textsegmentsof both the executable and the dynamically loaded libraries for the target processmapped into the virtual memory space. The.textsegments usually contain executablemachine code.
System Stacks
Finally, theSystem Stackscomponent in the pie refers to the total allocated size ofallthe system stacks (or “C stacks”) in the target process. Each operating system(OS) thread has its own system stack. So multiple stacks only occur when multipleoperating system threads are used (note that OpenResty’s “light threads” createdby thengx.thread.spawnAPI function is completely different from such system-level threads). Nginxworker processes usually have only one system thread unless its OS thread poolsare configured (via theaio threadsconfiguration directive).
Other System Allocators
Some users may hook up 3rd-party memory allocators with their OpenResty or Nginxprocesses. Common examples aretcmallocandjemallocwhich speed up the system allocator (likemalloc). They help speed up small and naivemalloc()calls in some Nginx3rd-party modules, Lua C modules, or some C libraries (OpenSSL included!), buttheyare not very useful for the parts of the software which already use a good allocatorlike Nginx’s memory pools and LuaJIT’s builtin allocator. Use of such “add-on”allocator libraries introduce new complexity and problems which wewill cover in great detail in a future post.
Used or Not Used
The application-level memory usage breakdown we just introduced is a littlebit complicated when we analyze whether it is for used virtual memory pagesor unused ones. Only theNginx Shm Loadedcomponent in the pie graph is forthe virtual memory pages actually used. Other components include both kinds.Fortunately memory allocated by the Glibc allocator and LuaJIT allocator areusually used anyway. It does not make any big differences most of the time.
For Traditional Nginx Servers
Traditional Nginx servers are just a strictsubsetof OpenResty applications.These users still see system allocator memory and Nginx shared memory zone usage,among other things.OpenResty XRaycan still be used to directly inspect andanalyze such server processes, even in production. You will not see any Lua related stuff if youhave not compiled OpenResty’s Lua modules into your Nginx build, of course.
Conclusion
This article is a first in a series of posts which explain how OpenResty and Nginx allocate and manage memory in the hope of optimize memory usage of those applications based on them. This post gives an overview of how memory is used and allocated on the highest level. In subsequent articles in the same series, we will have a closer look at each allocator and memory manage facility in great detail in each dedicated article. Stay tuned!
Yichun Zhang (Github handle: agentzh), is the original creator of theOpenResty®open-source project and the CEO ofOpenResty Inc..
Yichun is one of the earliest advocates and leaders of “open-source technology”. He worked at many internationally renowned tech companies, such asCloudflare, Yahoo!. He is a pioneer of “edge computing”, “dynamic tracing” and “machine coding”, with over 22 years of programming and 16 years of open source experience. Yichun is well-known in the open-source space as the project leader ofOpenResty®, adopted by more than 40 million global website domains.
OpenResty Inc., the enterprise software start-up founded by Yichun in 2017, has customers from some of the biggest companies in the world. Its flagship product,OpenResty XRay, is a non-invasive profiling and troubleshooting tool that significantly enhances and utilizesdynamic tracingtechnology. And itsOpenResty Edgeproduct is a powerful distributed traffic management and private CDN software product.
As an avid open-source contributor, Yichun has contributed more than a million lines of code to numerous open-source projects, including Linux kernel, Nginx,LuaJIT,GDB,SystemTap,LLVM, Perl, etc. He has also authored more than 60 open-source software libraries.
Translations
We provide aChinese translationfor this article onblog.openresty.comourselves.We also welcome interested readers to contribute translations in other naturallanguages as long as the full article is translated without any omissions. Wethank them in advance.
We are hiring
We always welcome talented and enthusiastic engineers to join our team atOpenResty Inc.to explore various open source software internals and build powerful analyzers andvisualizers for real world applications built atop the open source software. If you areinterested, please send your resume totalents@openresty.com. Thank you!
Morden Android operating systems do support swapping memory pages out tomemory, but with those pages compressed, still saving physical memory space.↩︎
This build option is-DLUAJIT_USE_SYSMALLOC. But never use it in production!↩︎
By default the LuaJIT runtime only uses a single number representation for both integers and numbers, which is the double-precision floating-point numbers. But still the user can pass the build option-DLUAJIT_NUMMODE=2to enable the additional 32-bit integer representation at the same time.↩︎
But it is still our responsibility to make sure all references to those useless objects are properly removed.↩︎
Related Articles
OpenResty XRayAug 10, 2020
OpenResty XRayUpdated Jul 6, 202310 mins read
Memory Fragmentation in OpenResty and Nginx's Shared Memory Zones
An empty zone
Filling entries of similar sizes
Deleting odd-numbered keys
Deleting the keys in the first half
Mitigating Fragmentation
An empty zone
Filling entries of similar sizes
Deleting odd-numbered keys
Deleting the keys in the first half
Mitigating Fragmentation
OpenResty XRayAug 4, 2020
OpenResty XRayUpdated Jul 6, 202313 mins read
How OpenResty and Nginx Shared Memory Zones Consume RAM
Slabs and pages
What is allocated is not what is paid for
Fake Memory Leaks
HUP reload
Slabs and pages
What is allocated is not what is paid for
Fake Memory Leaks
HUP reload
OpenResty XRayAug 31, 2020
OpenResty XRayUpdated Apr 10, 202419 mins read
Introduction to Lua-Land CPU Flame Graphs
What is a Flame Graph
Simple Lua samples
Complicated Lua applications
Sampling overhead
Safety
Compatibility
Other types of Lua-land Flame Graphs
What is a Flame Graph
Simple Lua samples
Complicated Lua applications
Sampling overhead
Safety
Compatibility
Other types of Lua-land Flame Graphs
OpenResty XRayJul 15, 2023
OpenResty XRayUpdated Jul 15, 20237 mins read
Memory and CPU usage statistics among Kong plugins online (using OpenResty XRay)
CPU usage among all Kong plugins in a server process
Memory usage among all Kong plugins in a server process
Extra overhead for the servers
CPU usage among all Kong plugins in a server process
Memory usage among all Kong plugins in a server process
Extra overhead for the servers
OpenResty XRayJul 14, 2022
OpenResty XRayUpdated Jun 28, 202311 mins read
Tracing the Slowest PCRE Regular Expressions in OpenResty or Nginx Processes
System Environment
Narrowing Down the Culprit without Guessing
Limiting the execution overhead of PCRE
Non-Backtracking Regular Expression Engines
Lua’s Builtin Patterns
Tracing Applications inside Containers
How The Tools are Implemented
The Overhead of the Tools
System Environment
Narrowing Down the Culprit without Guessing
Limiting the execution overhead of PCRE
Non-Backtracking Regular Expression Engines
Lua’s Builtin Patterns
Tracing Applications inside Containers
How The Tools are Implemented
The Overhead of the Tools
Table of Contents
Trending
OpenResty XRayJan 14, 2023
OpenResty XRayJan 14, 2023
Automatic Analysis Reports in OpenResty XRay
The Past
The Present
The Future
OpenResty XRayNov 12, 2021
OpenResty XRayNov 12, 2021
Ylang: Universal Language for eBPF, Stap+, GDB, and More (Part 1 of 4)
What is Dynamic Tracing
Why the “Y” Name
Getting Started
Various Backends and Runtimes
Why a Unified Frontend Language
The Language Syntax
OpenResty XRayAug 31, 2020
OpenResty XRayAug 31, 2020
Introduction to Lua-Land CPU Flame Graphs
What is a Flame Graph
Simple Lua samples
Complicated Lua applications
Sampling overhead
Safety
Compatibility
Other types of Lua-land Flame Graphs
OpenResty XRayAug 10, 2020
OpenResty XRayAug 10, 2020
Memory Fragmentation in OpenResty and Nginx's Shared Memory Zones
An empty zone
Filling entries of similar sizes
Deleting odd-numbered keys
Deleting the keys in the first half
Mitigating Fragmentation
OpenResty XRayAug 4, 2020
OpenResty XRayAug 4, 2020
How OpenResty and Nginx Shared Memory Zones Consume RAM
Slabs and pages
What is allocated is not what is paid for
Fake Memory Leaks
HUP reload
OpenResty XRayJan 21, 2020
OpenResty XRayJan 21, 2020
How OpenResty and Nginx Allocate and Manage Memory
On The System Level
On The Application Level
For Traditional Nginx Servers
Latest Articles
OpenResety EdgeApr 10, 2024
OpenResety EdgeApr 10, 2024
Accurately Restore the Real Client IP Address in OpenResty Edge
Set “Trusted hosts to set real IP” and “Real IP from” Globally
Configure the Application to Output the Client Address
Check the Client Address Received on OpenResty Edge
What is OpenResty Edge
OpenResty XRayApr 4, 2024
OpenResty XRayApr 4, 2024
Use C++ to Dynamic-Trace C++ Applications
Setting Up the Target C++ Program
Crafting the C++ (or Y++) Analyzer
Operationalizing the Target and Analyzer
Advancing Support for Complex C++ Applications
About the Debug Symbols
OpenResty XRayApr 1, 2024
OpenResty XRayApr 1, 2024
Pinpointing the hottest Erlang code paths with high CPU usage (using OpenResty XRay)
Problem: high CPU usage
Spot the CPU-hottest Erlang code paths
Automatic analysis and reports
What is OpenResty XRay
OpenResty XRayMar 30, 2024
OpenResty XRayMar 30, 2024
How to Trace Exceptions inside Perl Applications (using OpenResty XRay)
How to detect and analyze exceptions inside Perl applications
Automatic analysis and reports
What is OpenResty XRay
OpenResty XRayMar 27, 2024
OpenResty XRayMar 27, 2024
Analyze OpenResty/Nginx Applications without Debug Symbols (using OpenResty XRay)
Problem: Debug Symbols are missing
Automatically Analyze and Rebuild Debug Symbols
Automatic analysis and reports
What is OpenResty XRay
Related Articles
OpenResty XRayAug 10, 2020
OpenResty XRayAug 10, 2020
Memory Fragmentation in OpenResty and Nginx's Shared Memory Zones
An empty zone
Filling entries of similar sizes
Deleting odd-numbered keys
Deleting the keys in the first half
Mitigating Fragmentation
OpenResty XRayAug 4, 2020
OpenResty XRayAug 4, 2020
How OpenResty and Nginx Shared Memory Zones Consume RAM
Slabs and pages
What is allocated is not what is paid for
Fake Memory Leaks
HUP reload
OpenResty XRayAug 31, 2020
OpenResty XRayAug 31, 2020
Introduction to Lua-Land CPU Flame Graphs
What is a Flame Graph
Simple Lua samples
Complicated Lua applications
Sampling overhead
Safety
Compatibility
Other types of Lua-land Flame Graphs
OpenResty XRayJul 15, 2023
OpenResty XRayJul 15, 2023
Memory and CPU usage statistics among Kong plugins online (using OpenResty XRay)
CPU usage among all Kong plugins in a server process
Memory usage among all Kong plugins in a server process
Extra overhead for the servers
OpenResty XRayJul 14, 2022
OpenResty XRayJul 14, 2022
Tracing the Slowest PCRE Regular Expressions in OpenResty or Nginx Processes