Memory

Also, you should set up your shell so that you don't generate core files when doing this lecture. I.e., if it is not done in your .cshrc file, do:

UNIX> limit coredumpsize 0

As I have said previously, memory is like a huge array with (say) 0xffffffff elements. A pointer in C is an index to this array. Thus when a C pointer is 0xefffe034, it points to the 0xefffe035th element in the memory array (memory being indexed starting with zero).

Unfortunately, you cannot access all elements of memory. One example that we have seen a lot is element 0. If you try to dereference a pointer with a value of 0, you will get a segmentation violation. This is Unix's way of telling you that that memory location is illegal.

For example, the following code will generate a segmentation violation:

main() { char *s; char c; s = (char *) 0; c = *s; }

The code (or "text"): These are the instructions of your program The globals: These are your global variables (init data and bss) The heap: This is memory that you get from malloc(). The stack: This contains your local variables and procedure arguments.

|--------------| 0 | | | void | | | |--------------| 0x10000 | | | code | | | |--------------| | void | |--------------| 0x20000 | | | globals | | | |--------------| | | | heap | | | |||||||||||||||| |vvvvvvvvvvvvvv| | | | | | void | | | | | |^^^^^^^^^^^^^^| |||||||||||||||| | | | stack | | | 0xefffffff |--------------|

Paging

With an operating system provides each process it loads with a memory address space that start from 0x0 and goes up to 0xffffffff or 0x8fffffff, depending on what type of system you are on. These addresses are all virtual memory addresses. An analogy that might help you understand is the assignment of phone numbers to your house. Phone numbers are just logical and can be easily changed, while your street address is not. Here, the operating system needs to map this virtual address space to its physical address space, i.e. entries on the chips holding the memory banks. This is part of the job done by OS in terms of memory management, which also include how to best use a limited physical address space to meet the need by a large number of processes. There are many ways to memory management, fortunately, all UNIX systems use a pretty standard approach, called paging. Let's use the hydra machines as an example.

On the hydra machines, memory is broken up into 8192-byte chunks. These are called pages. On some machines, pages are 4096 bytes -- this is something set by the hardware. Mostly, on the same order of magnitude.

The way memory works is as follows: The operating system allocates certain pages of memory for you. Whenever you try to read to or write from an address in memory, the hardware first checks with the operating system to see if that address belongs to a page that has been allocated for you. If so, then it goes ahead and performs the read/write. If not, you'll get a segmentation violation (note, there are many ways to get segmentation violation, and this is only one of them).

This is what happens when you do:

s = (char *) 0; c = *s;

A page fault is generated when the OS detects that a process is trying to access a page in the virtual memory address space, but that page is not in the physical memory. As a result of that, the OS stops this process until that requested page is read in. Page fault is, in most cases, a page fault is not an error. Segment fault is almost always an error, antithetically.

The exact mechanics of paging are covered in classes on Operating Systems. I won't go into it further here.

As it turns out, the first 8 pages on our hydra machines are void. This means that trying to read to or write from any address from 0 to 0xffff will result in a segmentation violation.

The next page (starting with address 0x10000) starts the code segment. This segment ends at the variable &etext, which I'll go over in a bit. The globals segment starts at page 0x20000. It goes until the variable &end. The heap starts immediately after &end, and goes up to sbrk(0), which I'll talk about still later. The stack ends with address 0xefffffff. Its beginning changes with the different procedure calls you make. We'll go over this more later in this lecture. Every page between the end of the heap and the beginning of the stack is void, and will generate a segmentation violation upon accessing.

&etext, &edata and &end.

These are three external variables that are defined as follows:

extern etext; extern edata; extern end;

Look at the program testaddr1.c. This prints out the addresses of etext, edata and end. Then it prints out 6 values:

main is a pointer to the first instruction of the main() procedure. This is simply a location in the code segment, which should be familiar to you from the assembler lectures.

is a pointer to the first instruction of the procedure. This is simply a location in the code segment, which should be familiar to you from the assembler lectures. I is a global variable. Thus &I should be an address in the globals segment.

is a global variable. Thus should be an address in the globals segment. i is a local variable. Thus &i should be an address in the stack.

is a local variable. Thus should be an address in the stack. argc is an argument to main() . Thus, &argc should be an address in the stack.

is an argument to . Thus, should be an address in the stack. ii is another local variable. Thus, &ii should be an address in the stack. However, ii is a pointer to memory that has been malloc'd. Thus, ii should be an address in the heap.

When we run testaddr1, we get something like the following:

UNIX> testaddr1 &etext = 0x108b8 &edata = 0x20a34 &end = 0x20a54 main = 0x10688 &I = 0x20a4c &i = 0xffbef82c &argc = 0xffbef884 &ii = 0xffbef828 ii = 0x20a68 UNIX>

This is the first really gross piece of C code that you'll see. What it does is print out &etext and &end, and then prompt the user for an address in hexidecimal. It puts that address into the pointer variable s. You should never do this unless you are writing code like this which is testing memory. The first thing that it does with s is try to read from that memory location (c = *s). Then it tries to write to the memory location (*s = c). This is a way to see which memory locations are legal.

So, lets try it out with an illegal memory value of zero:

UNIX> testaddr2 &etext = 0x1191b &end = 0x21d90 Enter memory location in hex (start with 0x): 0x0 Reading 0x0: Segmentation Fault UNIX>

Memory locations 0x0 to 0xffff are illegal -- if we try any address in that range, we will get a segmentation violation:

UNIX> testaddr2 &etext = 0x1191b &end = 0x21d90 Enter memory location in hex (start with 0x): 0xffff Reading 0xffff: Segmentation Fault UNIX> testaddr2 &etext = 0x1191b &end = 0x21d90 Enter memory location in hex (start with 0x): 0x4abc Reading 0x4abc: Segmentation Fault UNIX>

Memory location 0x10000 is in the code segment. This should be a legal address:

UNIX> testaddr2 &etext = 0x1191b &end = 0x21d90 Enter memory location in hex (start with 0x): 0x10000 Reading 0x10000: 127 Writing 127 back to 0x10000: Segmentation Fault UNIX>

You'll note that we were able to read from 0x10000 -- it gave us the byte 127, which begins some instruction in the program. However, we got a seg fault when we wrote to 0x10000. This is by design: The code segment is read-only. You can read from it, but you can't write to it. This makes sense, because you can't change your program while it's running -- instead you have to recompile it, and rerun it.

Now, what if we try memory location 0x11fff? This is above &etext, so it should be outside of the code segment:

UNIX> testaddr2 &etext = 0x1191b &end = 0x21d90 Enter memory location in hex (start with 0x): 0x11fff Reading 0x11fff: -48 Writing -48 back to 0x11fff: Segmentation Fault UNIX>

Now, pages 9 to 15 are undreadable again:

UNIX> testaddr2 &etext = 0x1191b &end = 0x21d90 Enter memory location in hex (start with 0x): 0x12000 Reading 0x12000: Segmentation Fault UNIX> testaddr2 &etext = 0x1191b &end = 0x21d90 Enter memory location in hex (start with 0x): 0x1f000 Reading 0x1f000: Segmentation Fault UNIX>

The globals starts at 0x20000, so we see that the 16th page is readable and writable:

UNIX> testaddr2 &etext = 0x1191b &end = 0x21d90 Enter memory location in hex (start with 0x): 0x20000 Reading 0x20000: 127 Writing 127 back to 0x20000: ok UNIX>

We can read from and write to any location (0x20000 to 0x21fff) in this page. The next page (starting at 0x22000) is unreachable:

UNIX> testaddr2 &etext = 0x1191b &end = 0x21d90 Enter memory location in hex (start with 0x): 0x21dff Reading 0x21dff: 0 Writing 0 back to 0x21dff: ok UNIX> testaddr2 &etext = 0x1191b &end = 0x21d90 Enter memory location in hex (start with 0x): 0x22000 Reading 0x22000: Segmentation Fault UNIX>

What this tells us is that the globals go from 0x20000 to 0x21d90. The heap goes from 0x21d90 up to some higher address in the same page.

Sbrk(0)

UNIX> testaddr3 &etext = 0x11993 &end = 0x21e18 sbrk(0)= 0x21e18 &c = 0xffbee103 Enter memory location in hex (start with 0x): 0x21fff Reading 0x21fff: 0 Writing 0 back to 0x21fff: ok UNIX>

UNIX> testaddr3a &etext = 0x119a3 &end = 0x21e28 sbrk(0)= 0x23e28 &c = 0xffbee103 Enter memory location in hex (start with 0x): 0x23fff Reading 0x23fff: 0 Writing 0 back to 0x23fff: ok UNIX> testaddr3a &etext = 0x119a3 &end = 0x21e28 sbrk(0)= 0x23e28 &c = 0xffbee103 Enter memory location in hex (start with 0x): 0x24000 Reading 0x24000: Segmentation Fault UNIX>

The stack

UNIX> testaddr3 &etext = 0x11993 &end = 0x21e18 sbrk(0)= 0x21e18 &c = 0xffbee103 Enter memory location in hex (start with 0x): 0xffb00000 Reading 0xffb00000: 0 Writing 0 back to 0xffb00000: ok UNIX> testaddr3 &etext = 0x11993 &end = 0x21e18 sbrk(0)= 0x21e18 &c = 0xffbee103 Enter memory location in hex (start with 0x): 0xff3f0000 Reading 0xff3f0000: 0 Writing 0 back to 0xff3f0000: ok UNIX> testaddr3 &etext = 0x11993 &end = 0x21e18 sbrk(0)= 0x21e18 &c = 0xffbee103 Enter memory location in hex (start with 0x): 0xff3effff Reading 0xff3effff: Segmentation Fault UNIX>

UNIX> testaddr3 &etext = 0x11993 &end = 0x21e18 sbrk(0)= 0x21e18 &c = 0xffbee103 Enter memory location in hex (start with 0x): 0xffbeffff Reading 0xffbeffff: 0 Writing 0 back to 0xffbeffff: ok UNIX> testaddr3 &etext = 0x11993 &end = 0x21e18 sbrk(0)= 0x21e18 &c = 0xffbee103 Enter memory location in hex (start with 0x): 0xffbf0000 Reading 0xffbf0000: Segmentation Fault UNIX>

You can print out the default stack size, and change it using the limit command (read the man page):

UNIX> limit ... stacksize 8192 kbytes ...

UNIX> testaddr4 argc = 1. &argc = 0xffbee15c, &argv = 0xffbee160, &i = 0xffbee104 argc = 0. &argc = 0xffbee0e4, &argv = 0xffbee0e8, &i = 0xffbee08c UNIX> testaddr4 v argc = 2. &argc = 0xffbee154, &argv = 0xffbee158, &i = 0xffbee0fc argc = 1. &argc = 0xffbee0dc, &argv = 0xffbee0e0, &i = 0xffbee084 argc = 0. &argc = 0xffbee064, &argv = 0xffbee068, &i = 0xffbee00c UNIX> testaddr4 v o l s argc = 5. &argc = 0xffbee144, &argv = 0xffbee148, &i = 0xffbee0ec argc = 4. &argc = 0xffbee0cc, &argv = 0xffbee0d0, &i = 0xffbee074 argc = 3. &argc = 0xffbee054, &argv = 0xffbee058, &i = 0xffbedffc argc = 2. &argc = 0xffbedfdc, &argv = 0xffbedfe0, &i = 0xffbedf84 argc = 1. &argc = 0xffbedf64, &argv = 0xffbedf68, &i = 0xffbedf0c argc = 0. &argc = 0xffbedeec, &argv = 0xffbedef0, &i = 0xffbede94 UNIX>

UNIX> breakstack1 ... &c = 0xff3fa347, iptr = 0xff3f7c30 ... ok &c = 0xff3f7bbf, iptr = 0xff3f54a8 ... ok &c = 0xff3f5437, iptr = 0xff3f2d20 ... ok Segmentation Fault UNIX>

The second way to break the stack is to simply allocate too much local memory. E.g. look at breakstack2.c. It tries to allocate 10M of memory in the stack. It segfaults in a because it tries to reference smaller memory addresses than 0xff3f0000. Exactly where does the seg fault happen? Think about it -- answer below.

The segfault happens in a when the code attempts to push iptr on the stack for the printf call. This is because the stack pointer is pointing to the void. Had we not referenced anything at the stack pointer, our program should have worked. For example, try breakstack3.c.

UNIX> breakstack3 Calling a. i = 1 After a is done. i = 5 UNIX>