An Exampel to Understand C Program Memory Allocation
Memory Allocation of a C Program
Historically, a C program has the following parts:
- Text segment.
- Initialized data segment.
- Uninitialized data segment.
- Stack
- Heap
Below picture shows the typical arrangement of these segments.
Note: The above picture is from APUE book.
Here we go. Below is the example to explain the above memory arrangement.
#include <stdio.h> #include <stdlib.h> int sum[100]; //uninitialized variable int *globle; //uninitialized variable int max = 9999; //initialized variable int main() { char *str = malloc(15); //dynamic allocate int *autopoint; //auto variable int autoVal = 0; //auto variable int autoVal2; //auto variable printf("sum addr = %p, context = %d\n", sum, sum[0]); printf("globle addr = %p\n", &globle); if(globle == NULL) { printf("total is NULL\n"); } printf("max addr = %p\n", &max); printf("str addr = %p\n", str); printf("autopoint addr = %p\n", autopoint); if (autopoint != NULL) { printf("autopoint isn't NULL\n"); } printf("autoVal addr = %p\n", &autoVal); printf("autoVal2 addr = %p\n", &autoVal2); return 0; }
Output:
sum addr = 0x601080, context = 0
globle addr = (nil)
total is NULL
max addr = 0x601050
str addr = 0x18f0010
autopoint addr = 0x7fff14f699a0
autopoint isn't NULL
autoVal addr = 0x7fff14f698a0
autoVal2 addr = 0x7fff14f698a4
Explanation:
- Max variable is assigned an initial value, so it is stored in the initialization segment. As you can see from the above results, it has the lowest address, as shown in the typical storage arrangement diagram above.
- Both sum and globle are uninitialized variables and are declared outside the function, so they are stored in the uninitialized segment. Indeedm the address shown in the result is higher than the address of the initialized variable because the address space of the uninitialized segment is above the address space of the initialized segment. As mentioned above, the system will assign initial values to these variables, with the contents of sum being 0 and globle being NULL.
- The str variable is dynamically allocated using malloc, so it is allocated on the heap, and we can see from the results that their addresses are higher than the uninitialized ones.
- AutoPoint, AutoVal, and AutoVal2 are all automatic variables that are stored on the stack, whether they are initialized or not because they are declared inside functions, and each function is allocated on the stack frame. So of course the variables inside of them are also on the stack. Looking at the results, it can be seen that their addresses are obviously different from other addresses. Their address values are very large, which is the high address shown in the figure above.
An example of C program
Code is shown as below:
#include <stdio.h> #include <stdint.h> typedef struct { uint8_t Type; uint16_t Offset; char **StringList; } DATA; char *offset1[] = { "hello", "world~~~", "I", "am", "amazing", "today", "here", "we", "are", "go" }; char *offset2[] = { "Other", "Unknown", "Safe", "Warning", "Critica", "Non-recoverable", "Error" }; char *offset3[] = { "we", "are", "None", "friend" }; DATA maping [] = { {3, 0x05, offset1}, {3, 0x0B, offset2}, {3, 0x0C, offset3} }; int main() { printf("str = %s\n", maping[1].StringList[10]); return 0; }
Output:
$ ./a.out
str = None
Result analysis:
First, I was surprised that there were no segment errors. Maping is an array of structures with contiguous memory space so it simply reads the value of the next structure in the array.
I had no idea how Type,Offset, and StringList were allocated before investigating. From the result, we would know it reads the content of stringList in mapping[1] but the index 10 should be “friend”. Why is it “None”?
So I decided to figure out how this whole memory is allocated. Thus, there are the following code and the storage allocation structure diagram, I think it can let everyone understand how structure, Pointers, arrays of C program are allocated.
The main function below replaces the main function above, leaving everything else unchanged.
int main() { printf("DATA = %lu\n", sizeof(DATA)); printf("%lu\n", sizeof(offset1[0])); printf("maping Addr = %p\n", maping); printf("offset1.type address = %p,\t offset addr = %p,\t, offset1 addr = %p\n", \ &maping[0].Type, &maping[0].Offset, &maping[0].StringList); printf("str1 = %p, str2 = %p, str3 = %p\n", \ maping[0].StringList[0], maping[0].StringList[1], maping[0].StringList[2]); printf("offset1.type address = %p,\t offset addr = %p,\t, offset1 addr = %p\n", \ &maping[1].Type, &maping[1].Offset, &maping[1].StringList); printf("str1 = %p, str2 = %p, str3 = %p\n", \ maping[1].StringList[0], maping[1].StringList[1], maping[1].StringList[2]); printf("offset2.type address = %p,\t offset addr = %p,\t, offset1 addr = %p\n", \ &maping[2].Type, &maping[2].Offset, &maping[2].StringList); printf("str1 = %p, str2 = %p, str3 = %p\n", \ maping[2].StringList[0], maping[2].StringList[1], maping[2].StringList[2]); int i = 0; for (i = 0; i < 24; i++) { printf("ADDR = %p, str = %s\n", &maping[0].StringList[i], maping[0].StringList[i]); } }
Output:
DATA = 16
8
maping Addr = 0x601120
offset1.type address = 0x601120, offset addr = 0x601122, , offset1 addr = 0x601128
str1 = 0x400758, str2 = 0x40075e, str3 = 0x400767
offset1.type address = 0x601130, offset addr = 0x601132, , offset1 addr = 0x601138
str1 = 0x400789, str2 = 0x40078f, str3 = 0x400797
offset2.type address = 0x601140, offset addr = 0x601142, , offset1 addr = 0x601148
str1 = 0x40078f, str2 = 0x400782, str3 = 0x4007c2
ADDR = 0x601060, str = hello
ADDR = 0x601068, str = world~~~
ADDR = 0x601070, str = I
ADDR = 0x601078, str = am
ADDR = 0x601080, str = amazing
ADDR = 0x601088, str = today
ADDR = 0x601090, str = here
ADDR = 0x601098, str = we
ADDR = 0x6010a0, str = are
ADDR = 0x6010a8, str = go
ADDR = 0x6010b0, str = (null)
ADDR = 0x6010b8, str = (null)
ADDR = 0x6010c0, str = Other
ADDR = 0x6010c8, str = Unknown
ADDR = 0x6010d0, str = Safe
ADDR = 0x6010d8, str = Warning
ADDR = 0x6010e0, str = Critica
ADDR = 0x6010e8, str = Non-recoverable
ADDR = 0x6010f0, str = Error
ADDR = 0x6010f8, str = (null)
ADDR = 0x601100, str = we
ADDR = 0x601108, str = are
ADDR = 0x601110, str = None
ADDR = 0x601118, str = friend
The storage allocation structure drawn from this result is as follows:
Above is the storage allocation diagram I drew with Visio. Now let’s start to analyze this result in detail.
Conclusion 1: Indefinite arrays, Ubuntu allocates some reserved space.
As you can see from the Ubuntu results, there is a NULL space after STR = Error, rather than immediately following the STR = we string, so this will result in us getting None instead of friend when we select index = 10.
The stringList that we define as a structure is a pointer, which is essentially an array of variable length, so the system will allocate one or two more bytes. In my own experiments, I’ve found that if you reduce the number of entries in an array, it sometimes doesn’t show this NULL, which is when the system thinks it’s less likely to add a string. If you add another string, it will have another NULL, or sometimes the NULL will be at the end.
This I did not give the detailed operation process and results, you can have a try. This corresponds to a vector in C++. If you are interested, look at the source implementation of a vector.
In fact, this has explained the problem of why the result is STR = None.But there are other things we can verify from this.
Conclusion 2: storage space is allocated continuously.
The highest address of the green address is connected to the lowest address of the red address.String constants are also assigned consecutively.
Conclusion 3: The platform is a Little Endian platform.
Little Endian Byte Order: The least significant byte (the “little end”) of the data is placed at the byte with the lowest address.
The address space highlighted here in red is allocated to the array maping. In array, index 0 is stored at low addresses, index 1 is higher, and index 2 is highest. It’s the same thing in structures. That means the low byte is in the low address, and that means the platform is a little endian system. In fact the Ubuntu I’m running is x86_64, and x86 is a small-endian system.
Conclusion four: bytes are aligned by the max bytes in the smallest range.
As for DATA structure, it was allocated 16 bytes because the pointer is 8 bytes. I have no doubt about that, but I think the storage allocation is like below:
Actually, the storage allocation is like below:
The Offset variable occupys 2 bytes, so the Type variable align to 2 bytes first. Interesting discovery.
I hope this article can help you to understand the memory allocation of a C program.
Have a nice day and any feedback are welcome.