Skip to content

Commit 4e5eb6b

Browse files
committed
Split readme into separate documentation files
1 parent 33d1f7f commit 4e5eb6b

8 files changed

+384
-382
lines changed

doc/automatic-translation-features.md

+183
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,183 @@
1+
## Automatic Translation Features
2+
3+
#### Syntax
4+
5+
* C and FB are similar enough to allow most declarations to be converted 1:1 by
6+
doing a pure syntax conversion, for example:
7+
8+
```
9+
struct UDT { => type UDT
10+
float f; => f as single
11+
}; => end type
12+
void f(struct UDT *p); => declare sub f(byval p as UDT ptr)
13+
```
14+
15+
* More complex C syntax cases:
16+
17+
```
18+
Multiple declarations in single statement (FB is less flexible than C here):
19+
extern int a, b, *c, d(void);
20+
=>
21+
extern as long a, b
22+
extern as long ptr c
23+
declare function d() as long
24+
25+
struct/union/enum bodies declared as part of other declarations:
26+
typedef struct { ... } A, *PA;
27+
=>
28+
struct temp1 { ... };
29+
typedef struct temp1 A;
30+
typedef struct temp1 *PA;
31+
=>
32+
type temp1 : ... : end type
33+
type A as temp1
34+
type PA as temp1 ptr
35+
=>
36+
type A : ... : end type
37+
type PA as A ptr
38+
39+
Nested named UDT bodies are moved outside of the parent UDT (not supported in FB).
40+
struct A {
41+
struct B { }; /* gcc warning: declaration does not declare anything */
42+
};
43+
=>
44+
struct B { };
45+
struct A { };
46+
47+
Nested function pointers:
48+
extern int (*(*f)(int (*a)(void)))(int b);
49+
=>
50+
extern f as function(byval a as function() as long) as function(byval b as long) as long
51+
```
52+
53+
* Toplevel C assignment expressions are turned into FB assignment statements,
54+
even wrapped in scope block if it's inside a macro body, to enforce its use as
55+
statement, not expression. Otherwise, assignments could be mis-used as comparisons.
56+
* Toplevel comma operators are translated to a statement sequence:
57+
58+
```
59+
Example of statement or macro body with comma operators:
60+
(a, b, c)
61+
=>
62+
scope
63+
a
64+
b
65+
[return?] c
66+
end scope
67+
```
68+
69+
* Unnecessary scope blocks (e.g. nested inside loop blocks) are solved out.
70+
C if blocks nested in else blocks are converted to FB elseif blocks.
71+
`while (0) ...` or `do ... while (0)` blocks are turned into FB scope blocks.
72+
Sometimes this can clean up macros or inline functions; rarely useful though.
73+
74+
#### Data types and structures
75+
76+
* The normal C data types and some common typedefs such as `size_t`, `int32_t` or `intptr_t` are translated to normal FB data types.
77+
There is special support for translating `char => byte`, `char* => zstring ptr` and `char[N] => zstring * N` (same for `wchar_t => wstring`).
78+
char/wchar_t typedefs are expanded in case they are used as string in some cases and byte in others.
79+
The `-string`/`-nostring` options can be used to override the automatic conversion.
80+
C's `long` and `long double` types are translated to `clong` and `clongdouble`.
81+
#includes for `crt/long[double].bi` or `crt/wchar.bi` are automatically added if needed.
82+
* Named enum => `type enumname as long` + anonymous enum, because C enums/ints stay 32bit on 64bit,
83+
so in FB we have to use the always-32bit LONG type instead of the default ENUM/INTEGER type.
84+
* Function/array typedefs (not supported in FB) => solved out
85+
* struct/union/enum tag names are solved out in favour of typedefs, if any.
86+
Exact-alias typedefs (`typedef struct A A`) or case-alias typedefs (`typedef struct a A`) are solved out, since FB doesn't have the separate tag namespace and is case-insensitive anyways.
87+
* Anonymous structs (not supported in FB) => named after first typedef that uses them, or auto-generated name
88+
* Forward-references to tags/types are handled by auto-adding forward declarations,
89+
but only if the referenced tag is actually declared in the API. This way,
90+
tags/types from other headers like `FILE`/`jmp_buf`/`time_t` or `struct tm` won't be
91+
forward-declared.
92+
93+
#### Macros and preprocessor directives
94+
95+
* Exact-alias-#defines (`#define A A`) are removed (neither needed nor possible in FB).
96+
Case-aliases (`#define a A`) are solved out since FB is case-insensitive anyways.
97+
* #defines with simple constant expression in their bodies => FB constants
98+
* Alias-#defines for constants/types/functions/variables are converted to declarations,
99+
using the ALIAS keyword where needed.
100+
101+
```
102+
const A = ...
103+
#define B A => const B = A
104+
105+
type A as ...
106+
#define B A => type B as A
107+
108+
declare sub/function A
109+
declare sub/function C alias "X"
110+
#define B A => declare sub/function B alias "A"
111+
#define D C => declare sub/function D alias "X"
112+
113+
extern A as ...
114+
extern C alias "X" as ...
115+
#define B A => extern B alias "A"
116+
#define D C => extern D alias "X"
117+
```
118+
119+
* Macro parameters which conflict with FB keywords or other identifiers (due to
120+
FB's case-insensitivity) in the macro body are automatically renamed (because fbc can't catch this as
121+
"duplicate definition" like name conflicts between symbol declarations).
122+
* #defines nested inside struct bodies => moved to toplevel (helps when converting #defines to constants, because FB scopes those inside UDTs)
123+
* `#define m(a, ...) __VA_ARGS__` => `#define m(a, __VA_ARGS__...) __VA_ARGS__`
124+
* `#pragma comment(lib, "foo.lib"|"libfoo.a")` => `#inclib "foo"`
125+
* `#include` statements are generally preserved if not expanded; .h is replaced by .bi.
126+
127+
#### Variables, Functions, Parameters
128+
129+
* Most used calling convention => Extern block. Other calling conventions (if header uses multiple ones) are emitted on the individual procedures.
130+
Extern blocks are also used to avoid the need for explicit case-preserving ALIAS'es on any extern declarations.
131+
* Array parameters (not supported in FB) => pointers (what they become in C behind the scenes anyways)
132+
* Special case for the occasionally used `jmp_buf` parameters: They're explicitly converted to pointers,
133+
* Arrays/strings declared with unknown size => "..." ellipsis
134+
because fbfrog usually doesn't see the CRT headers. `jmp_buf` is an array type in C, i.e. passed as pointer. `jmp_buf` is a UDT in FB's CRT binding.
135+
* Unsized extern array variables aren't allowed in FB, and require some tricks to be translated.
136+
However, if the array size is known, it's better to use `-setarraysize` to specify the exact array size, and get a cleaner translation.
137+
138+
```
139+
extern dtype array[];
140+
extern char s[];
141+
=>
142+
extern array(0 to ...) as dtype
143+
extern s as zstring * ...
144+
=>
145+
#define array(i) ((@__array)[i])
146+
extern __array alias "array" as dtype
147+
extern __s alias "s" as ubyte;
148+
#define s (*cptr(zstring ptr, @__s))
149+
```
150+
151+
* Simple (inline) functions are converted to macros, because FB doesn't have "proper" inline functions.
152+
153+
#### Expressions
154+
155+
* Boolean operations result in `1|0` in C. This is converted to FB's `-1|0` by inserting a negation if used in math context. C's logical NOT `!x` becomes FB's `x = 0`.
156+
* All 32bit unsigned int IIFs/BOPs/UOPs are wrapped in culng()/clng() casts,
157+
in order to make sure the result is truncated to 32bit properly in FB even on 64bit,
158+
where FB will do 64bit arithmetic.
159+
160+
```
161+
For example, in C:
162+
(0u - 100u) => 0xFFFFFF9Cu
163+
but in FB:
164+
(0ul - 100ul) => &hFFFFFF9Cu (32bit)
165+
(0ul - 100ul) => &hFFFFFFFFFFFFFF9Cu (64bit)
166+
and &hFFFFFFFFFFFFFF9Cu <> &hFFFFFF9Cu (no sign extension due to unsigned).
167+
In order to make sure we always get &hFFFFFF9Cu in FB, we have to truncate
168+
the operation's result to 32bit explicitly.
169+
```
170+
171+
* When type-casting string literals, an @ (address-of) operator is inserted, because string literals
172+
automatically become pointers in C, but not in FB.
173+
* `(void)` casts, which are common to ensure a function call or other expression can only be used as a statement,
174+
are automatically removed, because it's probably not worth it to translate to FB,
175+
even though it could be done by wrapping the statement in a scope block.
176+
177+
#### Error handling
178+
179+
Declarations which cannot be processed automatically (yet) will be embedded into
180+
the *.bi file in form of a "TODO" comment, for example: `'' TODO: #define FOO ...`.
181+
This affects complicated #defines ("arbitrary" token sequences),
182+
function bodies (inline functions), plus some others such as language features
183+
or compiler extensions not supported by fbfrog's parser (yet).

doc/bugs.md

+20
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
## Bugs
2+
3+
* In FB, anon UDTs inherit their parent's FIELD alignment, that's not gcc-compatible.
4+
fbfrog needs to generate FIELD=8 on anon UDTs if the parent has a FIELD but the anon doesn't.
5+
http://www.freebasic.net/forum/viewtopic.php?f=3&t=19514
6+
* C parser needs to verify #directives, since they can be inserted by "to c" -replacements,
7+
which aren't verified by the CPP.
8+
* In winapi, there is a case where an auto-generated tagid conflicts
9+
with a real typedef, which is errornously renamed. Luckily fbc detects this
10+
problem easily (recursive UDT).
11+
12+
```
13+
struct Foo {
14+
struct {
15+
HWND hwnd;
16+
} HWND;
17+
};
18+
```
19+
20+
* fbfrog produces `wstr("a") wstr("b")` which isn't allowed in FB; fbfrog needs to insert `+` string concat operators.

doc/extra-features.md

+17
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
## Extra features
2+
3+
fbfrog has various options for manual improvements, for example for...
4+
* Renaming symbols, which is useful to resolve name conflicts.
5+
This is a common problem due to FB's case-insensitivity and different namespacing (e.g. #defines collide with functions).
6+
When using the renaming options, auto-generated list of renamed symbols are added to the top of affected header files.
7+
* Translating #define bodies as token sequences, instead of trying to parse as expression (which is useful in specific cases).
8+
* Removing declarations by name or type.
9+
* Expansion of any typedef by name
10+
* Specifying the size of unsized arrays
11+
* Specifying hints about which identifiers are typedefs, which helps the C parser's
12+
type cast expression parsing when not all typedefs were declared yet. Sometimes C headers
13+
use typedefs from other headers, so fbfrog may not get to see the typedef declarations.
14+
Or a typedef may be used in a #define body before being declared.
15+
* etc., run `./fbfrog` to see list of options
16+
17+
In general, this covers things that can't be decided automatically, but require human intervention.

doc/future.md

+9
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
## Future improvements
2+
3+
Nowadays, I think it would be best to work on adding a C parser to fbc (i.e. the ability
4+
to #include .h files into FB programs), making fbfrog unnecessary.
5+
6+
* Advantage for users: no more outdated/incompatible/missing bindings, no more binding maintenance.
7+
* Advantage with regards to binding generation: fbc only has to deal with one target system or library version at a time, no more parsing 20 times and slow merging.
8+
* fbc could allow specifying translation hints to handle TODOs if needed. It could come with a set of hints for common libraries. This is the same idea as with fbfrog options.
9+
* Only C-to-FB, no FB-to-C interaction, except maybe trivial #defines to give access to user-configurable parts of the C headers.

doc/overview.md

+44
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
## fbfrog Overview
2+
3+
* fbfrog acts a lot like a C compiler, and by default it pretends to be gcc
4+
compiling for the various FB targets (win32, win64, linux-x86, linux-x86_64, etc.).
5+
fbfrog uses mostly the same pre-#defines and data types as gcc. Much of these
6+
can be found in `include/fbfrog/default.h`, the rest is hard-coded in the C parser.
7+
* fbfrog parses the headers top-down like a C compiler, so it's only necessary
8+
to pass the "entry points", i.e. the header(s) that would be #included in a C program.
9+
fbfrog expands all #includes it can find.
10+
The generated binding will cover the API that would become available by
11+
#including those headers in the given order. Separate headers that aren't
12+
intended to be #included together shouldn't be passed to fbfrog together, but
13+
in separate invocations.
14+
* If fbfrog can't find #included header files, you can use the `-incdir <path>`
15+
option to help it. Sometimes this is needed to allow the main header to be parsed successfully.
16+
(For example, a macro can only be expanded if the #define statement was seen.)
17+
* Standard system headers (e.g. from C runtime or POSIX) will often be reported
18+
as "not found", which is ok. Usually they are not needed when generating a
19+
binding for some library. Standard header search directories like `/usr/include`
20+
should not be used for binding generation, because they are system-specific,
21+
while FB bindings usually are supposed to be portable.
22+
* fbfrog preprocesses and parses the input headers multiple times: once for each
23+
supported target (DOS/Linux/Windows/etc, x86/x86_64/arm/aarch64) and merges
24+
all these APIs together into the final binding. If you need to override this
25+
(for example if your .h files don't support DOS and have an #error statement
26+
for this case), then use -target and specify the needed targets manually.
27+
* fbfrog has a custom C preprocessor and parser with some unusual features:
28+
* it preserves #defines and parses their bodies
29+
* it can expand macros and #includes selectively/optionally
30+
* pre-#defines are fully configurable, no hard-coded target/compiler-specifics
31+
* modifyable AST to make it FB-friendly and insert FB-specific constructs
32+
* fbfrog needs a C preprocessor, because most C library headers make extensive
33+
use of macros, usually for specifying the calling convention or other attributes
34+
on function declarations, but in some cases the whole function declaration is
35+
hidden behind multiple layers of macros, such that it would be nearly impossible
36+
to determine the API without doing macro expansion. With the need for macro expansion
37+
comes the need for parsing #defines and evaluating #if blocks. This leads to the multiple
38+
target-specific interpretations of a C header, which is exactly how it would be seen by a
39+
C compiler when compiling for different targets.
40+
* In practice, many C libraries have target-specific headers that are generated
41+
during the compilation of the library. This makes it hard to produce an FB binding
42+
that covers all targets: In general it's necessary to get the proper C headers for each target,
43+
and feed them into fbfrog such that it can combine them into one. Sometimes it's possible
44+
to work-around the problem by using preprocessor tricks or some hand-written fake .h files.

doc/script-options.md

+69
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
## About the -declare*/-select/-ifdef options
2+
3+
fbfrog is able to read multiple headers or multiple versions of the same
4+
header (preprocessed differently) and merge them into a single binding.
5+
1. This is used to support multiple targets (DOS/Linux/Win32, x86/x86_64): Instead of looking for #ifs in the input headers and possibly trying to preserve those, fbfrog preprocesses and parses the input header files multiple times (using different predefines each time), and then merges the resulting target-specific APIs into one final binding, by (re-)inserting #ifs (such as `#ifdef __FB_WIN32__`) where needed.
6+
2. By using the `-declare*` command line options you can combine pretty much any APIs, for example version 1.0 and 2.0 of a library, or the ANSI and UNICODE versions of a Win32-specific header. Of course it only makes sense if the APIs belong together. Sometimes the merging algorithm produces a rather ugly result though, especially if the differences between the APIs are too big, so it's not always useful.
7+
8+
Assuming we have the header files foo1.h and foo2.h, let's use the following
9+
fbfrog options:
10+
11+
```
12+
-declareversions __LIBFOO_VERSION 1 2
13+
-selectversion
14+
-case 1
15+
foo1.h
16+
-case 2
17+
foo2.h
18+
-endselect
19+
```
20+
21+
Save those options into a foo.fbfrog helper file (because it's too much to
22+
type at the command line), and pass it to fbfrog:
23+
24+
```
25+
./fbfrog foo.fbfrog
26+
```
27+
28+
The created binding will allow the user to #define __LIBFOO_VERSION to 1 or
29+
2 in order to select that specific API version:
30+
31+
```
32+
[...declarations that existed in both foo1.h and foo2.h...]
33+
#if __LIBFOO_VERSION = 1
34+
[...declarations that existed only in foo1.h...]
35+
#else
36+
[...declarations that existed only in foo2.h...]
37+
#endif
38+
[...etc...]
39+
```
40+
41+
You can use -declare* options as wanted to support multiple APIs in 1 binding:
42+
43+
```
44+
-declareversions <symbol> <numbers...>
45+
Useful to allow selecting an API by version. This will produce #if
46+
blocks such as #if <symbol> = <number>.
47+
48+
-declarebool <symbol>
49+
Useful to allow API selection based on whether a certain symbol is
50+
defined or not. For example, this could be used to support
51+
distinguishing between UNICODE and ANSI versions of a binding
52+
(-declarebool UNICODE -> #ifdef UNICODE) or the shared library/DLL
53+
version and the static library version, etc.
54+
```
55+
56+
If multiple -declare* options are given, they multiply. For example, `-declarebool A -declarebool B` produces these APIs:
57+
58+
```
59+
defined(A) and defined(B)
60+
defined(A) and (not defined(B))
61+
(not defined(A)) and defined(B)
62+
(not defined(A)) and (not defined(B))
63+
```
64+
65+
You can use the -select/-ifdef logic options to create different "code paths"
66+
where some options will only be used for some APIs (instead of applying to
67+
all APIs). This also works with -declare* options, allowing you to build
68+
even complex API condition trees.
69+

doc/todo.md

+33
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
## Old to-do list
2+
3+
* -1to1 option which automatically adds -emit options for each input .h such
4+
that each .h is emitted into its own .bi, in the directory given with -o. Strip only the common prefix, preserve remaining directory structure (if any).
5+
* Define2Decl shouldn't move all alias defines - it's typically unnecessary for procs/vars/typedefs at least.
6+
* Define2Decl shouldn't count #undefs as declarations (preventing affected symbols from being handled by the pass)
7+
* Define2Decl should count multiple, equal declarations as one declaration
8+
* Only add things to renamelist if they have a RENAMED flag (not everything with an alias was renamed)
9+
* Add -printcconstruct <pattern> option for dumping C constructs as seen by fbfrog
10+
to make writing replacements easier. (TODOs aren't enough, because sometimes we
11+
want to do a replacement even though it's not a TODO)
12+
* don't build VERAND conditions at frogEvaluateScript() time, but rather do it
13+
later when generating the #if conditions. -declareversions/-declarebool should store version number/flags in ApiInfo,
14+
and frogEvaluateScript() should build ApiInfo objects directly, then copy them for recursive invocations, no more separate loadOptions().
15+
* LCS algorithm is main performance bottle-neck (especially for Windows API binding), can it be optimized?
16+
* Turn more inline functions into macros: also void functions whose body can
17+
just be used as macro'd scope block, and doesn't contain any RETURNs.
18+
* It would be nice if fbfrog could preserve comments for documentation purposes.
19+
* Flatten AST data structure such that statements nested inside struct/proc bodies
20+
can be merged separately from the compound block (i.e. use TYPEBEGIN/TYPEEND/PROCBEGIN/PROCEND nodes).
21+
Interesting for merging the fields if UDT's FIELD=N value changes between targets.
22+
* Don't expand macro constants outside CPP expressions, to keep them as array size etc.
23+
* Solve out tag ids if there is an alias typedef, unless the tag id is used elsewhere
24+
* Add pattern-based renames, e.g. `-renamedefine '%' 'FOO_%'`,
25+
or at least `--rename-define-add-prefix '*' FOO_` (add prefix FOO_ to matching defines).
26+
* Auto-convert C's [] array indexing into FB's (): track which vars/fields are
27+
arrays (or pointers) and then compare indexing BOPs against that.
28+
* Add support for `#pragma pack` with named stack entries (`#pragma {pack|pop}(push, <identifier> [, N])`)
29+
Popping by name means popping everything until that node is popped. If not found, nothing is popped.
30+
(MinGW-w64 CRT headers use this)
31+
* Continue support for parsing function bodies: `++` and `--` operators, for loops, continue/break, goto/labels/switch/case.
32+
* Add some C++ support
33+
* ...

0 commit comments

Comments
 (0)