[25] Built-in / intrinsic / primitive data types
(Część C++ FAQ Lite, Copyright © 1991-2002, Marshall Cline, cline@parashift.com)


FAQ - sekcja [25]:


[25.1] Can sizeof(char) be 2 on some machines? For example, what about double-byte characters?

No, sizeof(char) is always 1. Always. It is never 2. Never, never, never.

Even if you think of a "character" as a multi-byte thingy, char is not. sizeof(char) is always exactly 1. No exceptions, ever.

Look, I know this is going to hurt your head, so please, please just read the next few FAQs in sequence and hopefully the pain will go away by sometime next week.

GóraDółPoprzednia sekcjaNastępna sekcjaSzukaj w FAQ ]


[25.2] What are the units of sizeof?

Bytes.

For example, if sizeof(Fred) is 8, the distance between two Fred objects in an array of Freds will be exactly 8 bytes.

As another example, this means sizeof(char) is one byte. That's right: one byte. One, one, one, exactly one byte, always one byte. Never two bytes. No exceptions.

GóraDółPoprzednia sekcjaNastępna sekcjaSzukaj w FAQ ]


[25.3] Whoa, but what about machines or compilers that support multibyte characters. Are you saying that a "character" and a char might be different?!?

Yes that's right: the thing commonly referred to as a "character" might be different from the thing C++ calls a char.

I'm really sorry if that hurts, but believe me, it's better to get all the pain over with at once. Take a deep breath and repeat after me: "character and char might be different." There, doesn't that feel better? No? Well keep reading — it gets worse.

GóraDółPoprzednia sekcjaNastępna sekcjaSzukaj w FAQ ]


[25.4] But, but, but what about machines where a char has more than 8 bits? Surely you're not saying a C++ byte might have more than 8 bits, are you?!? UPDATED!

[Recently removed a superfluous statement thanks to Gennaro Prota (in 6/02). Click here to go to the next FAQ in the "chain" of recent changes.]

Yep, that's right: a C++ byte might have more than 8 bits.

The C++ language guarantees a byte must always have at least 8 bits. But there are implementations of C++ that have more than 8 bits per byte.

GóraDółPoprzednia sekcjaNastępna sekcjaSzukaj w FAQ ]


[25.5] Okay, I could imagine a machine with 9-bit bytes. But surely not 16-bit bytes or 32-bit bytes, right?

Wrong.

I have heard of one implementation of C++ that has 64-bit "bytes." You read that right: a byte on that implementation has 64 bits. 64 bits per byte. 64. As in 8 times 8.

And yes, you're right, combining with the above would mean that a char on that implementation would have 64 bits.

GóraDółPoprzednia sekcjaNastępna sekcjaSzukaj w FAQ ]


[25.6] I'm sooooo confused. Would you please go over the rules about bytes, chars, and characters one more time? UPDATED!

[Recently rewrote; changed the example from the mythical FOO machine to a PDP-10, plus added words showing how to simulate pointers using software, much thanks to Andrew Koenig (in 6/02). Click here to go to the next FAQ in the "chain" of recent changes.]

Here are the rules:

Let's work an example to illustrate these rules. The PDP-10 has 36-bit words with no hardware facility to address anything within one of those words. That means a pointer can point only at things on a 36-bit boundary: it is not possible for a pointer to point 8 bits to the right of where some other pointer points.

One way to abide by all the above rules is for a PDP-10 C++ compiler to define a "byte" as 36 bits. Another valid approach would be to define a "byte" as 9 bits, and simulate a char* by two words of memory: the first could point to the 36-bit word, the second could be a bit-offset within that word. In that case, the C++ compiler would need to add extra instructions when compiling code using char* pointers. For example, the code generated for *p = 'x' might read the word into a register, then use bit-masks and bit-shifts to change the appopriate 9-bit byte within that word. An int* could still be implemented as a single hardware pointer, since C++ allows sizeof(char*) != sizeof(int*).

Using the same logic, it would also be possible to define a PDP-10 C++ "byte" as 12-bits or 18-bits. However the above technique wouldn't allow us to define a PDP-10 C++ "byte" as 8-bits, since 8*4 is 32, meaning every 4th byte we would skip 4 bits. A more complicated approach could be used for those 4 bits, e.g., by packing nine bytes (of 8-bits each) into two adjacent 36-bit words. The important point here is that memcpy() has to be able to see every bit of memory: there can't be any bits between two adjacent bytes.

Note: one of the popular non-C/C++ approaches on the PDP-10 was to pack 5 bytes (of 7-bits each) into each 36-bit word. However this won't work in C or C++ since 5*7 = 35, meaning using char*s to walk through memory would "skip" a bit every fifth byte (and also because C++ requires bytes to have at least 8 bits).

GóraDółPoprzednia sekcjaNastępna sekcjaSzukaj w FAQ ]


[25.7] What is a "POD type"? UPDATED!

[Recently added note that POD types can contain non-virtual member functions thanks to Andrew Koenig (in 6/02). Click here to go to the next FAQ in the "chain" of recent changes.]

A type that consists of nothing but Plain Old Data.

A POD type is a C++ type that has an equialent in C, and that uses the same rules as C uses for initialization, copying, layout, and addressing.

As an example, the C declaration "struct Fred x;" does not initialize the members of the Fred variable x. To make this same behavior happen in C++, Fred would need to not have any constructors. Similarly to make the C++ version of copying the same as the C version, the C++ Fred must not have overloaded the assignment operator. To make sure the other rules match, the C++ version must not have virtual functions, base classes, non-static members that are private or protected, or a destructor. It can, however, have static data members, static member functions, and non-static non-virtual member functions.

The actual definition of a POD type is recursive and gets a little gnarly. Here's a slightly simplified definition of POD: a POD type's non-static data members must be public and can be of any of these types: bool, any numeric type including the various char variants, any enumeration type, any data-pointer type (that is, any type convertible to void*), any pointer-to-function type, or any POD type, including arrays of any of these. Note: data-pointers and pointers-to-function are okay, but pointers-to-member are not. Also note that references are not allowed. In addition, a POD type can't have constructors, virtual functions, base classes, or an overloaded assignment operator.

GóraDółPoprzednia sekcjaNastępna sekcjaSzukaj w FAQ ]


[25.8] When initializing non-static data members of built-in / intrinsic / primitive types, should I use the "initialization list" or assignment? UPDATED!

[Recently clarified the cross-reference (in 6/02). Click here to go to the next FAQ in the "chain" of recent changes.]

For symmetry, it is usually best to initialize all non-static data members in the constructor's "initialization list," even those that are of a built-in / intrinsic / primitive type. The FAQ shows you why and how.

GóraDółPoprzednia sekcjaNastępna sekcjaSzukaj w FAQ ]


[25.9] When initializing static data members of built-in / intrinsic / primitive types, should I worry about the "static initialization order fiasco"? UPDATED!

[Recently made it clearer that the FAQ provides a solution (in 6/02) and reworded thanks to Orjan Petersson (in 9/02). Click here to go to the next FAQ in the "chain" of recent changes.]

Yes, if you initialize your built-in / intrinsic / primitive variable by an expression that the compiler doesn't evaluate solely at compile-time. The FAQ provides several solutions for this (subtle!) problem.

GóraDółPoprzednia sekcjaNastępna sekcjaSzukaj w FAQ ]


[25.10] Can I define an operator overload that works with built-in / intrinsic / primitive types?

No, the C++ language requires that your operator overloads take at least one operand of a "class type." The C++ language will not let you define an operator all of whose operands / parameters are of primitive types.

For example, you can't define an operator== that takes two char*s and uses string comparison. That's good news because if s1 and s2 are of type char*, the expression s1 == s2 already has a well defined meaning: it compares the two pointers, not the two strings pointed to by those pointers. You shouldn't use pointers anyway. Use std::string instead of char*.

If C++ let you redefine the meaning of operators on built-in types, you wouldn't ever know what 1 + 1 is: it would depend on which headers got included and whether one of those headers redefined addition to mean, for example, subtraction.

GóraDółPoprzednia sekcjaNastępna sekcjaSzukaj w FAQ ]


[25.11] When I delete an array of some built-in / intrinsic / primitive type, why can't I just say delete a instead of delete[] a?

Because you can't.

Look, please don't write me an email asking me why C++ is what it is. It just is. If you really want a rationale, buy Bjarne Stroustrup's excellent book, "Design and Evolution of C++" (Addison-Wesley publishers). But if your real goal is to write some code, don't waste too much time figuring out why C++ has these rules, and instead just abide by its rules.

So here's the rule: if a points to an array of thingies that was allocated via new T[n], then you must, must, must delete it via delete[] a. Even if the elements in the array are built-in types. Even if they're of type char or int or void*. Even if you don't understand why.

GóraDółPoprzednia sekcjaNastępna sekcjaSzukaj w FAQ ]


[25.12] How can I tell if an integer is a power of two without looping? NEW!

[Recently created thanks to S.K. Mody (in 6/02). Click here to go to the next FAQ in the "chain" of recent changes.]

 inline bool isPowerOf2(int i)
 {
   return i > 0 && (i & (i - 1)) == 0;
 }

GóraDółPoprzednia sekcjaNastępna sekcjaSzukaj w FAQ ]


E-Mail E-mail the author
C++ FAQ LiteSpis treściSkorowidzO autorze©Pobierz swoją własną kopię ]
Ostatnia aktualizacja Jun 17, 2002
Wersja polska: 0.1h Oct 3, 2003