Computer programming language compiler
A computer programming language is an artificial software language, which can be used to control the behavior of a machine, specifically a computer - Computer programming language compiler introduction. Programming languages, just like human languages, are defined through use of the Syntactic and semantic rules to determine structure and meaning respectively. Programming languages are used to facilitate communication about the task of organizing and manipulating information and express algorithms (a finite set of well-defined instructions) precisely. Some writers restrict the term “programming language” to those languages that can express all possible algorithms, and sometimes the term ‘computer language’ is used for more limited artificial languages (The encyclopedia of computer languages by Murdoch University Australia).
More Essay Examples on Computer Rubric
A compiler is a program (or can be defined as a set of programs) for computer, that translates computer language written text (the source language) into another computer language (the target language). The written text called the source code and the ultimate output called the object code. Most commonly, the output has a form suitable for processing by the other program (e.g. a linker), but it may be a human readable text file.
The basic reason for translation of source code is to create an executable program. The term ‘compiler’ is primarily used for programs that translates source code from a high level language to a lower level language (for instance assembly language, or machine language). This process that translates low level language to high level language is called decompiler. Further it is also called language converter, source to source translator or language translator. A basic purpose for compiler is to perform operations likewise lexing, preprocessing, parsing, semantic analysis, code optimization and code generation.
In a most comprehensive definition, a compiler for a language generally has several different stages, listed below as it processes the input.
2. Lexical analysis
3. Syntactical analysis
4. Semantical analysis
5. Intermediate code generation
6. Code optimization
7. Code generation
Most of above stages occurs during a single pass or reading of the source files. In other words, for instance, the preprocessing stage is usually reads only slightly ahead of the lexical analysis stage, which is usually one world ahead of the syntactical analysis stage. (Eric Tolman, Computer Scientist).
There are different ways to classify compilers according to the internal structure, input and output and runtime behavior. We can take following examples as a brief description of the compiler classification.
– A program that translates from a low level language to a high level language is a decompiler.
– A program that translates between high level languages is called source to source translator, a language translator, language converter and language rewriter.
Further, the output of some types of compilers might target hardware at a very low level, for instance a Field Programmable Gate Array (FGA). These compilers are said to be hardware compilers since the programs they compile effectively controls the final configuration of the hardware and how it operates. There are no instructions that are executed in a sequence.
A compiler for a relatively easy language written by a person may be monolithic, a single and piece of software. However, when the source language is complex, large and high quality output oriented, is required that the design may be split into a number of relatively independent phases, or passes. These separate phases development can be parceled up into small parts and authorize to different people for specified operations. It also makes it much easier to replace any single phase by an improved one, or to insert new phase at any later stage (for instance additional optimization). The Production Quality Compiler-Compiler Project (PQCC) at Carnegie Mellon University first championed this division of the compilation processes in phases. Further this project introduced the terms front end, middle end (rarely heard today) and back end in compilation processes.
However, all the smallest of compilers have more than two phases. These phases are usually regarded as being part of the front end or the back end. The point at where these two ends meet is always open to debate. The front end is generally considered to be where syntactic and semantic processing takes place, along with translation to a lower level of representation (than source code). The back end takes the output from the middle. It might perform much more analysis, optimization and transformation, which are particular for computer. Finally, it generates code for a particular process and operating system.
This front/middle/back end approach makes possible the combination of front ends for different languages with back end for different CPUs. A particular example of this approach is the GNU Compiler Collection and the Amsterdam Compiler Kit, which have multiple front ends, shared analysis and multiple back ends.
Front End Compilation
The front end analyses the source code to build an internal representation of the software program, called the intermediate representation or IR. It also manages the symbol table, a date structure mapping each symbol in the source code to be associated information such as location, type and scope. Further, it is done over several phases (Ravi Sethi and Alfred V. Aho), likewise given below.
1. Line reconstruction. Languages that strop their keywords (and allow arbitrary spaces within identifiers) needs a phase before parsing, which is to take the input source and convert it to a canonical form ready for the parser. The top-down-recursive-descent table-driven parsers used in the 1960s typically would read the source a character at a time and would not require a separate tokenizing phase. Coral66, Algol, Atlas Autocode and Imp are the examples of stropped languages whose compilers would have a line construction phase.
2. Lexical Analysis. It breaks the source code text into small pieces typically called tokens. Each of the token is a single atomic unit of the language, for instance a keyword, identifier or symbol name. The syntax of token is typically a regular language, so a finite state automaton constructed from a regular expression can be used to recognize it. This particular phase is also called lexing or scanning, and the software doing lexical analysis is known a lexical analyzer or scanner.
3. Preprocessing. Some of the languages, for example C, require a preprocessing phase to do things such as conditional compilation and macro substitution. In the case of C, the preprocessing phase includes lexical analysis.
4. Syntax Analysis. It involves parsing the token sequences to identify the syntactic structure of the program.
5. Semantic Analysis. It is a pass by a compiler that adds semantic information to the parse tree and performs check based on the given information. It logically follows the parsing phase that the parse tree is generated, and logically precedes the code generation phase, where executable code is generated. (In a compiler implementation, it may be possible to fold different phase into one pass). Some typical examples of semantic information that is added and checked in typing information (type checking) and the binding of variables and functions names to their definition (object binding). Filling out entries of the symbol table is an important activity in this phase.
Back end compilation
The definition back end is sometimes confused with code generator specifically for the overlapped functionality of generating assembly codes. Some researches uses middle end to distinguish the generic analysis and optimization phase in the back end from the machine dependent code generators. The compilation work in the backend is done in multiple steps:
1.Compiler Analysis. This is the gathering phase of program information from the intermediate representation derived from the output. Typically analyses are data flow to build user define chains, alias analysis, dependence analysis, pointer analysis and escape analysis etc. Accurate analyses are the basis for any compiler optimization. The control flow graph and call graph are usually also built during the analysis phase.
2. Optimization. The intermediate language representation is transformed into functionality equivalent but faster (or smaller) forms. Popular optimizations are dead code elimination, inline expansion, constant propagation, loop transformation, register allocation or even automatic paralleization.
3. Code Generation. The transformed intermediate language is translated into output language, commonly the native machine language of the system. This system involves resource and storage decisions, such as deciding which variables to fit into registers and memory and the selection and scheduling of appropriate machine instructions along with their associated addressing modes (Sethi-Ullman Algorithm).
Further to above explanation, the scope of compiler analysis and optimization vary in great sense, from as small as a basic block to the procedure or function level, or even whole program (inter-procedural optimization). Definitely, compiler can potentially do a better job using a broader view, but that broad view is not usually free: large scope analysis and optimization are quite costly in terms of compilation time and memory space, and this is specially true for inter-procedural analysis and optimization.
The existence of inter-procedural analysis and optimizations are common in modern world, especially famous commercial compilers from SGI, IBM, Intel, Microsoft and Sun Microsystems. The open source GCC was criticized for a long time for lacking powerful inter-procedural optimization, but it is changing in this respect. Another good open source compiler with full analysis an optimization infrastructure is Open64 that is used by many commercial organizations for research and commercial purposes.
Dynamically typed, object oriented languages pleases programmers, but their lack of static type information penalizes performance. The new implementation technique extracts static type information from declaration free programs. This new system compiles several copies of a given procedure, each customized for one receiver type, so that the type of the receiver is bound at compile time. The compiler predicts types that are statically unknown but likely, and inserts runtime type tests to verify its predictions. (Craig Chambers University of Washington, Seattle, WA)
It spits calls compiling a copy on each control path, optimized to the specific type on the path. Coupling these new techniques with compile time message lookup, aggressive procedure inlining, and traditional optimization has doubled the performance of dynamically typed object oriented languages (David Ungar Sun Microsystems Laboratories, Mountain View, CA)
Customization: optimizing compiler technology for self, a dynamically-typed object-oriented programming language by Craig Chambers & David Ungar, Volume 39 , Issue 4 (April 2004), retrieved from ACM Digital Library, Website: http://portal.acm.org/
Compilers: Principles, Techniques and Tools by Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman (ISBN 0-201-10088-6)
Engineering a Compiler by Keith D. Cooper and Linda Torczon . Morgan Kaufmann 2004, ISBN 1-55860-699-8.
An Overview of the Production Quality Compiler-Compiler Project by Leverett, Cattel, Hobbs, Newcomer, Reiner, Schatz and Wulf. Computer 13(8):38-49 (August 1980)
High Integrity Software Systems Assurance, by Roger S. Gina, Website: http://hissa.ncsl.nist.gov