开发者

Where exactly is the boundary between a preprocessor and a compiler?

开发者 https://www.devze.com 2023-03-30 08:42 出处:网络
According to various sources (for example, the SE radio episode with Kevlin Henney, if I remember correctly), \"C with classes\" was implemented with preprocessor technology (with the output then bein

According to various sources (for example, the SE radio episode with Kevlin Henney, if I remember correctly), "C with classes" was implemented with preprocessor technology (with the output then being fed to a C compiler), whereas C++ has always be开发者_StackOverflow中文版en implemented with a compiler (that just happened to spit out C in the early days). This seems to cause some confusion, so I was wondering:

Where exactly is the boundary between a preprocessor and a compiler? When do you call a piece of software that implements a language "a preprocessor", and when do you call it "a compiler"?

By the way, is "a compiled language" an established term? If so, what exactly does it mean?


This is an interesting question. I don't know a definitive answer, but would say this, if pressed for one:

A preprocessor doesn't parse the code, but instead scans for embedded patterns and expands them

A compiler actually parses the code by building an AST (abstract syntax tree) and then transforms that into a different language


The language of the output of the preprocessor is a subset of the language of the input.

The language of the output of the compiler is (usually) very different (machine code) then the language of the input.


From a simplified, personal, point of view:

I consider the preprocessor to be any form of textual manipulation that has no concepts of the underlying language (ie: semantics or constructs), and thus only relies on its own set of rules to perform its duties.

The compiler starts when rules and regulation are applied to what is being processed (yes, it makes 'my' preprocessor a compiler, but why not :P), this includes symantical and lexical checking, and the included transforms from x (textual) to y (binary/intermediate form). as one of my professors would say: "its a system with inputs, processes and outputs".


The C/C++ compiler cares about type-correctness while the preprocessor simply expands symbols.


A compiler consist of serval processes (components). The preprocessor is only one of these and relatively most simple one.

From the Wikipedia article, Division of compiler processes:

All but the smallest of compilers have more than two phases. However, these phases are usually regarded as being part of the front end or the back end. The point at which these two ends meet is open to debate.

The front end is generally considered to be where syntactic and semantic processing takes place, along with translation to a lower level of representation (than source code).

The middle end is usually designed to perform optimizations on a form other than the source code or machine code. This source code/machine code independence is intended to enable generic optimizations to be shared between versions of the compiler supporting different languages and target processors.

The back end takes the output from the middle. It may perform more analysis, transformations and optimizations that are for a particular computer. Then, it generates code for a particular processor and OS."

Preprocessing is only the small part of the front end job.

The first C++ compiler made by attaching additional process in front of existing C compiler toolset, not because it is good design but because limited time and resources.

Nowadays, I don't think such non-native C++ compiler can survive in the commercial field.

I dare say cfront for C++11 is impossible to make.


The answer is pretty simple. A preprocessor works on text as input and has text as output. Examples for that are the old unix commands m4, cpp (the C Pre Processor), and also unix programs like roff and nroff and troff which where used (and still are) to format man pages (unix command "man") or format text for printing or typesetting. Preprocessors are very simple, they don't know anything about the "language of the text" they process. In other words they usually process natural languages. The C preprocessor besides its name, e.g. only recognizes #define, #include, #ifdef, #ifndef, #else etc. and if you use #define MACRO it tries to "expand" that macro everywhere it finds it. But that does not need to be C or C++ program text, it can as well be a novel written in italian or greek. Compilers that cross compile into a different language are usually called translators. So the old cfront "compiler" for C++ which emitted C code was a C++ translator. Preprocessors and later translators are historically used because old machines simply lacked memory to be able to do everything in one program, but instead it was done by specialized programs and from disk to disk. A typical C program would be compiled from various sources. And the build process would be managed with make. In our days the C preprocessor is usually build directly into the C/C++ compiler. A typical make run would call the CPP on the *.c files and write the output to a different directory, from there either the C compiler CC would compile it straight to machine code or more commonly would output assembler code as text. Note: the c compiler only checks syntax, it does not really care about type safety etc. Then the assembler would take that assembler code and would output a *.o file wich later can be linked with other *.o files and *.lib files into an executable program. OTOH you likely had a make rule that would not call the C compiler but the lint command, the C language analyser, which is looking for typical mistakes and errors (which are ignored by the c compiler). It is quite interesting to look up about lint, nroff, troff, m4 etc. on wikipedia (or your machines terminal using man) ;D

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号